# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [2]:
# #langchain library
# !pip install langchain==0.0.350
# #LLM
# !pip install accelerate==0.25.0
# !pip install transformers==4.36.2
# !pip install bitsandbytes==0.41.2
# #Text Embedding
# !pip install sentence-transformers==2.2.2
# !pip install InstructorEmbedding==1.0.1
# #vectorstore
# !pip install pymupdf==1.23.8
# !pip install faiss-gpu==1.7.2
# !pip install faiss-cpu==1.7.4

In [3]:
# #langchain library
# !pip install langchain==0.1.13
# !pip install langchain-community==0.0.38
# #LLM
# !pip install accelerate==0.26.0
# !pip install transformers==4.45.0
# !pip install bitsandbytes==0.41.3
# #Text Embedding
# !pip install sentence-transformers==2.2.2
# !pip install InstructorEmbedding==1.0.1
# #vectorstore
# !pip install pymupdf==1.23.8
# !pip install faiss-gpu
# !pip install faiss-cpu==1.7.4
# # Hugging Face Hub (Compatible with InstructorEmbedding)
# !pip install huggingface_hub==0.23.3
# # Other dependencies
# !pip install torch==2.2.0 
# !pip install torchvision 
# !pip install nltk 
# !pip install scikit-learn
# !pip install tiktoken

In [4]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [5]:
from langchain import PromptTemplate

# prompt_template = """
#     I'm your friendly NLP chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about Natural Language Processing (NLP). 
#     If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
#     Whether it's about probabilistic models, language models, or any other related topic, 
#     I'm here to help break down complex concepts into easy-to-understand explanations.
#     Just let me know what you're wondering about, and I'll do my best to guide you through it!
#     {context}
#     Question: {question}
#     Answer:
#     """.strip()

prompt_template = """
    Hello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="Hello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, \n    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,\n    and research interests while maintaining privacy and professionalism.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:")

In [6]:
PROMPT.format(
    context = "Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.",
    question = "What is Machine Learning"
)

"Hello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, \n    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,\n    and research interests while maintaining privacy and professionalism.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.\n    Question: What is Machine Learning\n    Answer:"

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [7]:
# from langchain.document_loaders import PyMuPDFLoader

# # nlp_docs = '../docs/pdf/SpeechandLanguageProcessing_3rd_07jan2023.pdf'
# nlp_docs = 'Personal Profile.pdf'

# loader = PyMuPDFLoader(nlp_docs)
# documents = loader.load()

In [8]:
# multiple pdf

import os
from langchain.document_loaders import PyMuPDFLoader

# Path to the folder containing PDF documents
folder_path = 'datasets'

# Initialize an empty list to store loaded documents
documents = []

# Iterate over each file in the folder
for file_name in os.listdir(folder_path):
    # Check if the file is a PDF
    if file_name.endswith('.pdf'):
        # Construct the full path to the PDF file
        pdf_path = os.path.join(folder_path, file_name)

        # Load the PDF document
        loader = PyMuPDFLoader(pdf_path)
        document = loader.load()

        # Add the loaded documents to the list
        documents.extend(document)

# Now, all_documents contains the loaded documents from all PDFs in the "datasets" folder

In [9]:
# documents

In [10]:
len(documents)

5

In [11]:
documents[1]

Document(page_content='Personal Profile: Pete \nFull Name: Pete  \nAge: 26 years old \nCurrent Status: Master’s Student in Data Science & AI \nUniversity: Asian Institute of Technology (AIT) \nResearch Interests: Natural Language Processing (NLP), Artificial Intelligence (AI), and Data \nScience \nPrevious Education: Bachelor’s Degree in Electrical Engineering (Chiang Mai University, 2021) \nWork Experience: Former Electrical Engineer (Power Plant, Rayong) – 2 years \nIndustry Expertise: Energy Sector, Natural Gas Power Generation \nHobbies & Interests: Playing Badminton, Technology & AI Research \n \nI am Pete, I am 26 years old. Born in 1998 , I have always had a deep interest in technology, leading \nme to pursue studies and a career in engineering and AI. Over the years, my academic and \nprofessional journey has shaped my passion for problem-solving and innovation. \n \nI am currently pursuing my Master’s degree in Data Science and AI at the Asian Institute of \nTechnology (AIT). 

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [13]:
doc[1]

Document(page_content='Ekkarat Techanawakarnkul\nWORK EXPERIENCE\nEDUCATION\nAug 2024 - Present\nCHIANG MAI UNIVERSITY\nBachelor Degree, Majors: Electrical Engineering           GPA 3.55 (HONOR)\nAug 2017 - Mar 2021\nProject: Battery Management System for Electric Vehicle\nMachine Learning Project: CO₂ Emission Prediction for Power Generation\n    - Developed a machine learning model to predict CO₂ emissions from various types of power plants,\nhelping policymakers optimize the energy mix for a sustainable, zero-carbon future.\nComputer Programing Project: Customer Segmentation \n    -  Implemented customer segmentation by applying K-means clustering on RFM (Recency, Frequency,', metadata={'source': 'datasets/Resume-Ekkarat Techanawakarnkul.pdf', 'file_path': 'datasets/Resume-Ekkarat Techanawakarnkul.pdf', 'page': 0, 'total_pages': 1, 'format': 'PDF 1.4', 'title': 'Ekkarat Techanawakarnkul', 'author': 'Ekkarat Techanawakarnkul', 'subject': '', 'keywords': 'DAGhZZCxPQY,BADrDxbS5HE,0', '

In [14]:
len(doc)

19

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [15]:
import torch
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

  from tqdm.autonotebook import trange


load INSTRUCTOR_Transformer
max_seq_length  512




### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [16]:
#locate vectorstore
vector_path = '../a6/vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [17]:
# pip install faiss-cpu

In [18]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_stanford'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [19]:
#calling vector from local
vector_path = '../a6/vector-store'
db_file_name = 'nlp_stanford'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp', #default index
    allow_dangerous_deserialization=True
)   

In [20]:
#ready to use
retriever = vectordb.as_retriever()

In [21]:
retriever.get_relevant_documents("What is Dependency Parsing")

  warn_deprecated(


[Document(page_content='updated with new models, techniques, and technologies is challenging but exciting. \n Technical complexity – Some areas, such as deep learning, NLP, and big data processing, \ninvolve complex mathematical and programming concepts that require deep understanding and \npractice. \nDespite these challenges, I enjoy learning and exploring new AI techniques that can be applied in \nreal-world scenarios. \nDuring my Master’s program, my primary research focus is on Natural Language Processing (NLP) \nand its applications. \n Enhancing AI models to improve text understanding and generation \n Applying NLP techniques in real-world industries, such as energy management, automation,', metadata={'source': 'datasets/Personal Profile.pdf', 'file_path': 'datasets/Personal Profile.pdf', 'page': 2, 'total_pages': 4, 'format': 'PDF 1.7', 'title': '', 'author': 'ekkarat techanawakarnkul', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Word for Microsoft 365', 'producer': '

In [22]:
retriever.get_relevant_documents("What is Transformers")

[Document(page_content='systems, ensuring power efficiency, and overseeing the safety and operational stability of \npower generation units. \nMy experience in the energy sector provided me with hands-on exposure to large-scale industrial \noperations, energy management, and problem-solving in high-stakes environments. \nNow, as I transition into the field of data science and AI, I am looking forward to combining my \nengineering background with AI-driven solutions to optimize power systems and automation \ntechnologies. \n \nI have primarily been involved in the power generation and energy industry. My work as an \nElectrical Engineer was focused on the operation and maintenance of a natural gas power plant', metadata={'source': 'datasets/Personal Profile.pdf', 'file_path': 'datasets/Personal Profile.pdf', 'page': 1, 'total_pages': 4, 'format': 'PDF 1.7', 'title': '', 'author': 'ekkarat techanawakarnkul', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Word for Microsoft 365', '

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [23]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

InMemoryChatMessageHistory(messages=[])

In [24]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [25]:
history

InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [26]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [27]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [28]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [29]:
# %cd ./models
# !git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [30]:
# from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
# from transformers import BitsAndBytesConfig
# from langchain import HuggingFacePipeline
# import torch

# model_id = "lmsys/fastchat-t5-3b-v1.0"

# tokenizer = AutoTokenizer.from_pretrained(model_id)
# tokenizer.save_pretrained("../models/fastchat-t5-3b-v1.0/")

# tokenizer.pad_token_id = tokenizer.eos_token_id

# bitsandbyte_config = BitsAndBytesConfig(
#     load_in_4bit = True,
#     bnb_4bit_quant_type = "nf4",
#     bnb_4bit_compute_dtype = torch.float16,
#     bnb_4bit_use_double_quant = True
# )

# # model = AutoModelForSeq2SeqLM.from_pretrained(
# #     model_id,
# #     quantization_config = bitsandbyte_config, #caution Nvidia
# #     device_map = 'auto',
# #     load_in_4bit = True
# # )

# # model = AutoModelForSeq2SeqLM.from_pretrained(model_id).to("cuda")

# pipe = pipeline(
#     task="text2text-generation",
#     model=model,
#     tokenizer=tokenizer,
#     max_new_tokens = 128,
#     model_kwargs = {
#         "temperature" : 0,
#         "repetition_penalty": 1.5
#     }
# )

# llm = HuggingFacePipeline(pipeline = pipe)

In [31]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = "lmsys/fastchat-t5-3b-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
tokenizer.save_pretrained("../a6/models/fastchat-t5-3b-v1.0/")

tokenizer.pad_token_id = tokenizer.eos_token_id

# bitsandbyte_config = BitsAndBytesConfig(
#     load_in_4bit = True,
#     bnb_4bit_quant_type = "nf4",
#     bnb_4bit_compute_dtype = torch.float16,
#     bnb_4bit_use_double_quant = True
# )

# model = AutoModelForSeq2SeqLM.from_pretrained(
#     model_id,
#     quantization_config = bitsandbyte_config, #caution Nvidia
#     device_map = 'auto',
#     load_in_4bit = True
# )

# model = AutoModelForSeq2SeqLM.from_pretrained(model_id).to("cuda")

# bitsandbyte_config = BitsAndBytesConfig(
#     load_in_8bit=True,  # Instead of 4-bit
# )

# model = AutoModelForSeq2SeqLM.from_pretrained(
#     model_id,
#     quantization_config=bitsandbyte_config,
#     device_map="auto"
# )

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    device_map="cpu" 
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

2025-03-15 12:54:47.802200: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742043287.825122   47062 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742043287.832024   47062 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-15 12:54:47.857394: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expec

### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [32]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [33]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [34]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [35]:
query = 'Comparing both of them'
chat_history = "Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

  warn_deprecated(




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:What is Machine Learning
AI:
Human:What is Deep Learning
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:',
 'question': 'Comparing both of them',
 'text': 'What are the main differences between Machine Learning and Deep Learning AI?'}

`combine_docs_chain`

In [36]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Hello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, \n    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,\n    and research interests while maintaining privacy and professionalism.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7544f9bd88c0>)), document_variable_name='context')

In [37]:
query = "What is Transformers?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    systems, ensuring power efficiency, and overseeing the safety and operational stability of 
power generation units. 
My experience in the energy sector provided me with hands-on exposure to large-scale industrial 
operations, energy management, and problem-solving in high-stakes environments. 
Now, as I transition into the field of data science and AI, I am looking forward to combining my 
engineering background with AI-driven solutions to optimize power systems a

{'input_documents': [Document(page_content='systems, ensuring power efficiency, and overseeing the safety and operational stability of \npower generation units. \nMy experience in the energy sector provided me with hands-on exposure to large-scale industrial \noperations, energy management, and problem-solving in high-stakes environments. \nNow, as I transition into the field of data science and AI, I am looking forward to combining my \nengineering background with AI-driven solutions to optimize power systems and automation \ntechnologies. \n \nI have primarily been involved in the power generation and energy industry. My work as an \nElectrical Engineer was focused on the operation and maintenance of a natural gas power plant', metadata={'source': 'datasets/Personal Profile.pdf', 'file_path': 'datasets/Personal Profile.pdf', 'page': 1, 'total_pages': 4, 'format': 'PDF 1.7', 'title': '', 'author': 'ekkarat techanawakarnkul', 'subject': '', 'keywords': '', 'creator': 'Microsoft® Word f

In [38]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

# chain = ConversationalRetrievalChain(
#     retriever=retriever,
#     question_generator=question_generator,
#     combine_docs_chain=doc_chain,
#     return_source_documents=True,
#     memory=memory,
#     verbose=True,
#     get_chat_history=lambda h : h
# )

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    # return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : None
)

chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Hello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, \n    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,\n    and research interests while maintaining privacy and professionalism.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7544f9bd88c0>)), document_variable_name='context'), question_generator=LLMChain(verbose=True, prompt=PromptTempla

## 5. Chatbot

In [39]:
prompt_question = "Who are you by the way?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Personal Profile: Pete 
Full Name: Pete  
Age: 26 years old 
Current Status: Master’s Student in Data Science & AI 
University: Asian Institute of Technology (AIT) 
Research Interests: Natural Language Processing (NLP), Artificial Intelligence (AI), and Data 
Science 
Previous Education: Bachelor’s Degree in Electrical Engineering (Chiang Mai University, 2021) 
Work Experience: Former Electrical Engine

{'question': 'Who are you by the way?',
 'chat_history': [],
 'answer': '  I   am   PeteBot,   your   AI   assistant.   I   am   here   to   answer   your   questions   about   Pete,   my   personal   profile,   and   my   academic   and   professional   journey.                                                                                                                                                                                                                                                                                                                                                                                                                    '}

In [40]:
# prompt_question = "What is the Transformers?"
# answer = chain({"question":prompt_question})
# answer

In [41]:
# prompt_question = "Is it a statistical model?"
# answer = chain({"question":prompt_question})
# answer

In [42]:
prompt_question = "How old are you?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Personal Profile: Pete 
Full Name: Pete  
Age: 26 years old 
Current Status: Master’s Student in Data Science & AI 
University: Asian Institute of Technology (AIT) 
Research Interests: Natural Language Processing (NLP), Artificial Intelligence (AI), and Data 
Science 
Previous Education: Bachelor’s Degree in Electrical Engineering (Chiang Mai University, 2021) 
Work Experience: Former Electrical Engine

{'question': 'How old are you?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content='  I   am   PeteBot,   your   AI   assistant.   I   am   here   to   answer   your   questions   about   Pete,   my   personal   profile,   and   my   academic   and   professional   journey.                                                                                                                                                                                                                                                                                                                                                                                                                    ')],
 'answer': '  I   am   26   years   old. \n'}

In [43]:
prompt_question = "What is your highest level of education?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    generation, circuit design, control systems, and energy efficiency. 
For my Master’s degree education, I pursue major into the field of Data Science and AI at AIT, 
where I am currently deepening my expertise in areas such as machine learning, deep learning, 
and NLP. My interest in AI has grown significantly, and I am particularly fascinated by how artificial 
intelligence can enhance automation, pred

{'question': 'What is your highest level of education?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content='  I   am   PeteBot,   your   AI   assistant.   I   am   here   to   answer   your   questions   about   Pete,   my   personal   profile,   and   my   academic   and   professional   journey.                                                                                                                                                                                                                                                                                                                                                                                                                    '),
  HumanMessage(content='How old are you?'),
  AIMessage(content='  I   am   26   years   old. \n')],
 'answer': '  I   currently   hold   a   Master’s   degree   in   Data   Science   and   AI   from   the   Asian   Institute   of   Technology   (AIT). \n'}

In [44]:
prompt_question = "What major or field of study did you pursue during your education?"
answer = chain({"question":prompt_question})
filtered_answer = {key: answer[key] for key in ["question", "answer"] if key in answer}
print(filtered_answer)



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    me to pursue studies and a career in engineering and AI. Over the years, my academic and 
professional journey has shaped my passion for problem-solving and innovation. 
 
I am currently pursuing my Master’s degree in Data Science and AI at the Asian Institute of 
Technology (AIT). My academic journey started with my Bachelor’s degree in Electrical 
Engineering from Chiang Mai University, which I succe

In [45]:
prompt_question = "How many years of work experience do you have?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    After graduating with my Bachelor’s degree in 2021, I worked at a natural gas power plant in 
Rayong, Thailand. During my tenure, I was responsible for monitoring and maintaining electrical

generation, circuit design, control systems, and energy efficiency. 
For my Master’s degree education, I pursue major into the field of Data Science and AI at AIT, 
where I am currently deepening my expertise in ar

{'question': 'How many years of work experience do you have?',
 'chat_history': [HumanMessage(content='How old are you?'),
  AIMessage(content='  I   am   26   years   old. \n'),
  HumanMessage(content='What is your highest level of education?'),
  AIMessage(content='  I   currently   hold   a   Master’s   degree   in   Data   Science   and   AI   from   the   Asian   Institute   of   Technology   (AIT). \n'),
  HumanMessage(content='What major or field of study did you pursue during your education?'),
  AIMessage(content='  I   pursued   a   Bachelor’s   degree   in   Electrical   Engineering   during   my   education. \n')],
 'answer': 'I   have   two   years   of   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry. \n'}

In [46]:
prompt_question = "What type of work or industry have you been involved in?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    After graduating with my Bachelor’s degree in 2021, I worked at a natural gas power plant in 
Rayong, Thailand. During my tenure, I was responsible for monitoring and maintaining electrical

gain hands-on experience in data analysis, machine learning, and AI. Excited to apply my coursework and
programming skills to real-world projects while learning from industry professionals.

generation, circuit des

{'question': 'What type of work or industry have you been involved in?',
 'chat_history': [HumanMessage(content='What is your highest level of education?'),
  AIMessage(content='  I   currently   hold   a   Master’s   degree   in   Data   Science   and   AI   from   the   Asian   Institute   of   Technology   (AIT). \n'),
  HumanMessage(content='What major or field of study did you pursue during your education?'),
  AIMessage(content='  I   pursued   a   Bachelor’s   degree   in   Electrical   Engineering   during   my   education. \n'),
  HumanMessage(content='How many years of work experience do you have?'),
  AIMessage(content='I   have   two   years   of   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry. \n')],
 'answer': '  I   have   been   involved   in   the   power   generation   industry   as   an   Electrical   Engineer.   I   have   two   years   of   professional   work   experience   as   an   Electrical   Engineer   in   the

In [47]:
prompt_question = "Can you describe your current role or job responsibilities?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Electrical Engineer was focused on the operation and maintenance of a natural gas power plant 
in Rayong, Thailand. 
Key responsibilities included: 
Monitoring power plant operations to ensure efficient electricity production 
Maintaining electrical systems and equipment to prevent outages or failures 
Ensuring energy efficiency and optimizing power generation processes 
Collaborating with cross-functi

{'question': 'Can you describe your current role or job responsibilities?',
 'chat_history': [HumanMessage(content='What major or field of study did you pursue during your education?'),
  AIMessage(content='  I   pursued   a   Bachelor’s   degree   in   Electrical   Engineering   during   my   education. \n'),
  HumanMessage(content='How many years of work experience do you have?'),
  AIMessage(content='I   have   two   years   of   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry. \n'),
  HumanMessage(content='What type of work or industry have you been involved in?'),
  AIMessage(content='  I   have   been   involved   in   the   power   generation   industry   as   an   Electrical   Engineer.   I   have   two   years   of   professional   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry.   I   have   also   gained   hands-on   experience   in   data   analysis,   machine   learning,   and   A

In [48]:
prompt_question = "What are your core beliefs regarding the role of technology in shaping society?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Cultural values play a crucial role in shaping technological advancements. Since technology 
affects all aspects of human life, it should respect and reflect diverse cultures, traditions, and 
ethical considerations. 
 Inclusivity & Diversity – AI systems should be designed to understand and respect cultural 
differences to avoid biases in language models, facial recognition, and decision-making 
algor

{'question': 'What are your core beliefs regarding the role of technology in shaping society?',
 'chat_history': [HumanMessage(content='How many years of work experience do you have?'),
  AIMessage(content='I   have   two   years   of   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry. \n'),
  HumanMessage(content='What type of work or industry have you been involved in?'),
  AIMessage(content='  I   have   been   involved   in   the   power   generation   industry   as   an   Electrical   Engineer.   I   have   two   years   of   professional   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry.   I   have   also   gained   hands-on   experience   in   data   analysis,   machine   learning,   and   AI   through   my   work   at   a   natural   gas   power   plant   in   Rayong,   Thailand.   I   am   currently   pursuing   my   Master’s   degree   in   Data   Science   and   AI   at   AIT,   wher

In [49]:
prompt_question = "How do you think cultural values should influence technological advancements?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Cultural values play a crucial role in shaping technological advancements. Since technology 
affects all aspects of human life, it should respect and reflect diverse cultures, traditions, and 
ethical considerations. 
 Inclusivity & Diversity – AI systems should be designed to understand and respect cultural 
differences to avoid biases in language models, facial recognition, and decision-making 
algor

{'question': 'How do you think cultural values should influence technological advancements?',
 'chat_history': [HumanMessage(content='What type of work or industry have you been involved in?'),
  AIMessage(content='  I   have   been   involved   in   the   power   generation   industry   as   an   Electrical   Engineer.   I   have   two   years   of   professional   work   experience   as   an   Electrical   Engineer   in   the   power   generation   industry.   I   have   also   gained   hands-on   experience   in   data   analysis,   machine   learning,   and   AI   through   my   work   at   a   natural   gas   power   plant   in   Rayong,   Thailand.   I   am   currently   pursuing   my   Master’s   degree   in   Data   Science   and   AI   at   AIT,   where   I   am   deepening   my   expertise   in   areas   such   as   machine   learning,   deep   learning,   and   NLP.   My   passion   for   engineering   and   AI   has   led   me   to   pursue   studies   and   a   career   in

In [50]:
prompt_question = "As a master’s student, what is the most challenging aspect of your studies so far?"
answer = chain({"question":prompt_question})
filtered_answer = {key: answer[key] for key in ["question", "answer"] if key in answer}
print(filtered_answer)



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    updated with new models, techniques, and technologies is challenging but exciting. 
 Technical complexity – Some areas, such as deep learning, NLP, and big data processing, 
involve complex mathematical and programming concepts that require deep understanding and 
practice. 
Despite these challenges, I enjoy learning and exploring new AI techniques that can be applied in 
real-world scenarios. 
During 

In [51]:
prompt_question = "What specific research interests or academic goals do you hope to achieve during your time as a master’s student?"
answer = chain({"question":prompt_question})
filtered_answer = {key: answer[key] for key in ["question", "answer"] if key in answer}
print(filtered_answer)



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHello! I am PeteBot, your AI assistant, here to answer questions about Pete in a polite, 
    informative, and structured manner. My goal is to provide accurate responses about Pete’s background, education, work experience,
    and research interests while maintaining privacy and professionalism.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    updated with new models, techniques, and technologies is challenging but exciting. 
 Technical complexity – Some areas, such as deep learning, NLP, and big data processing, 
involve complex mathematical and programming concepts that require deep understanding and 
practice. 
Despite these challenges, I enjoy learning and exploring new AI techniques that can be applied in 
real-world scenarios. 
During 