# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
# #langchain library
# !pip install langchain==0.1.13
# !pip install langchain-community==0.0.38
# #LLM
# !pip install accelerate==0.31.0
# !pip install transformers==4.45.0
# !pip install bitsandbytes==0.41.3
# #Text Embedding
# !pip install sentence-transformers==2.2.2
# !pip install InstructorEmbedding==1.0.1
# #vectorstore
# !pip install pymupdf==1.23.8
# !pip install faiss-gpu
# !pip install faiss-cpu==1.7.4
# # Hugging Face Hub (Compatible with InstructorEmbedding)
# !pip install huggingface_hub==0.23.3
# # Other dependencies
# !pip install torch==2.2.0 
# !pip install torchvision 
# !pip install nltk 
# !pip install scikit-learn
# !pip install tiktoken

In [2]:
import InstructorEmbedding
print("InstructorEmbedding is installed correctly!")

  from tqdm.autonotebook import trange


InstructorEmbedding is installed correctly!


In [3]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [4]:
    # Hey there! I’m your friendly chatbot who knows everything (well, almost everything) about **Tada Suttaket**, a student who's studying Master’s degree at AIT in Thailand!🎓🚀 
    # **My Mission:** 
    # - Answer your burning questions about Tada’s life, studies, and experiences.
    # - Use **only** the provided documents—no making things up, no wild guesses! 🤖✋
    # - If I don't know the answer, I’ll just admit it like a responsible AI:  
    #   **"Oops! I have no clue. Ask Tada directly!"** 😅

    # **Now, let’s get down to business!**  
    # Here’s what I found:  
    # {context}
    
    # **Your Question:**  
    # {question}
    
    # **My Best Attempt at an Answer:**  

In [5]:
from langchain import PromptTemplate

prompt_template = """
Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 
I am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.

Ask me anything about:
My age, education, and major.
My work experience and industry involvement.
My thoughts on technology and culture.
My academic challenges and research interests.

If I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'

{context}
Question: {question}
Answer:
""".strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\n{context}\nQuestion: {question}\nAnswer:")

In [6]:
PROMPT.format(
    context = "He's a Thai guy who earned his bachelor’s degree in engineering from Kasetsart University. But then, he decided to switch gears and dive into the world of data science. Now, he's pursuing a Master’s degree in Data Science & AI (DSAI) at AIT. Sure, AIT has a ton of assignments and projects that leave him sleepless at times, but he loves it! It’s all part of the fun and keeps him learning more than he ever expected.",
    question = "Can you tell me about Tada Suttaket?"
)

"Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nHe's a Thai guy who earned his bachelor’s degree in engineering from Kasetsart University. But then, he decided to switch gears and dive into the world of data science. Now, he's pursuing a Master’s degree in Data Science & AI (DSAI) at AIT. Sure, AIT has a ton of assignments and projects that leave him sleepless at times, but he loves it! It’s all part of the fun and keeps him learning more than he ever expected.\nQuestion: Can you tell me about Tada Suttaket?\nAnswer:"

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [7]:
from langchain.document_loaders import PyMuPDFLoader

nlp_docs = '../docs/Personal_Profile.pdf'

loader = PyMuPDFLoader(nlp_docs)
documents = loader.load()

In [8]:
# documents

In [9]:
len(documents)

2

In [10]:
documents[1]

Document(page_content='• \nCultural diversity should guide technological advancements, ensuring inclusivity in \nAI-driven solutions. \n \nChallenges in Master’s Studies \n• \nTransitioning from engineering to data science has been challenging but exciting, as it \nrequires learning a new way of thinking. \n• \nExperimental design in deep learning is complex but offers great opportunities for \ninnovation and research. \n \nResearch Interests & Academic Goals \n• \nPrimary Interest: Natural Language Processing (NLP) \n• \nShort-Term Goals:  \no Successfully complete my Master’s degree in two years. \no Find an interesting research topic in AI and NLP. \n• \nLong-Term Goals:  \no Contribute to NLP research, focusing on real-world AI applications. \no Continue learning and exploring the intersection of AI and ethics. \n \n \n', metadata={'source': '../docs/Personal_Profile.pdf', 'file_path': '../docs/Personal_Profile.pdf', 'page': 1, 'total_pages': 2, 'format': 'PDF 1.7', 'title': '', 'a

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700, #chunk every 700 characters
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [12]:
doc[1]

Document(page_content='process optimization. \nWork Experience \n• \nTotal Experience: 5 years \n• \nWorkplace: Home / Café (Freelance Tutor) \n• \nRole: Math & Physics Tutor (Secondary Education) \n• \nKey Responsibilities:  \no Teaching Mathematics and Physics to high school students preparing for \nuniversity entrance exams. \no Helping students improve their grades and problem-solving skills. \n \nBeliefs on Technology & Society \n• \nI believe technology is a powerful tool for improving lives, but ethical AI practices are \nessential to prevent bias and ensure fairness.', metadata={'source': '../docs/Personal_Profile.pdf', 'file_path': '../docs/Personal_Profile.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.7', 'title': '', 'author': 'Tada Suttaket', 'subject': '', 'keywords': '', 'creator': 'Microsoft Word', 'producer': '', 'creationDate': "D:20250313190018+00'00'", 'modDate': "D:20250313190018+00'00'", 'trapped': ''})

In [13]:
len(doc)

4

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [14]:
import torch
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

load INSTRUCTOR_Transformer
max_seq_length  512




### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [15]:
#locate vectorstore
vector_path = '../vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [16]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_stanford'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [17]:
#calling vector from local
vector_path = '../vector-store'
db_file_name = 'nlp_stanford'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp', #default index
    allow_dangerous_deserialization=True  # Allow deserialization
)   

In [18]:
#ready to use
retriever = vectordb.as_retriever()

In [19]:
retriever.get_relevant_documents("Where did he complete his graduation?")

  warn_deprecated(


[Document(page_content='Personal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Experience \n• \nDuration: 3 months \n• \nIndustry: Automotive (ThaiHonda) \n• \nRole: Industrial Process & Mechanical Intern \n• \nKey Responsibilities:  \no Studied the entire industrial workflow, focusing on mechanical and \nautomotive systems. \no Gained hands-on experience in maintenance, system efficiency, and industrial \nprocess optimization. \nWork Experience \n•', metadata={'source': '../docs/Personal_Profile.pdf', 'file_path': '../docs/Personal_Profile.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.7', 'title': '', 'author': 'Tada Suttaket', 'subject': '', 'keywords': '', 'creator': 'Microsoft Word', 'producer': 

In [20]:
retriever.get_relevant_documents("What is his name?")

[Document(page_content='Personal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Experience \n• \nDuration: 3 months \n• \nIndustry: Automotive (ThaiHonda) \n• \nRole: Industrial Process & Mechanical Intern \n• \nKey Responsibilities:  \no Studied the entire industrial workflow, focusing on mechanical and \nautomotive systems. \no Gained hands-on experience in maintenance, system efficiency, and industrial \nprocess optimization. \nWork Experience \n•', metadata={'source': '../docs/Personal_Profile.pdf', 'file_path': '../docs/Personal_Profile.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.7', 'title': '', 'author': 'Tada Suttaket', 'subject': '', 'keywords': '', 'creator': 'Microsoft Word', 'producer': 

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [21]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

InMemoryChatMessageHistory(messages=[])

In [22]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [23]:
history

InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [24]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [25]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [26]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [None]:
# %cd ./models
# !git clone https://USER_NAME:ACCESS_TOKEN@huggingface.co/meta-llama/Llama-3.2-3B-Instruct

In [28]:
# !ls -la /home/jupyter-st124880/A6/code/models/fastchat-t5-3b-v1.0

In [29]:
# !rm -rf /home/jupyter-st124880/A6/code/models/fastchat-t5-3b-v1.0/.git

In [30]:
# !lsof +D /home/jupyter-st124880/A6/code/models/fastchat-t5-3b-v1.0

In [31]:
# !rm -rf /home/jupyter-st124880/A6/code/models/fastchat-t5-3b-v1.0

In [32]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Use Meta's Llama 3 (Local, No API Key Required)
model_id = "meta-llama/Llama-3.2-3B-Instruct"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")

tokenizer.pad_token_id = tokenizer.eos_token_id

# Create text generation pipeline
llm_pipeline = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

# Assign the pipeline to LangChain's LLM wrapper
from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=llm_pipeline)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The model 'LlamaForCausalLM' is not supported for text2text-generation. Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'Qwen2AudioForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGeneration'].


### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [33]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [34]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [35]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [36]:
query = "What is his work experience?"
chat_history = """Human: Where did he study?
AI: He studied at AIT.
Human: What is your highest degree?
AI: He is pursuing a Master’s in Data Science."""

question_generator({'chat_history' : chat_history, "question" : query})

  warn_deprecated(




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human: Where did he study?
AI: He studied at AIT.
Human: What is your highest degree?
AI: He is pursuing a Master’s in Data Science.
Follow Up Input: What is his work experience?
Standalone question:[0m


Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)



[1m> Finished chain.[0m


{'chat_history': 'Human: Where did he study?\nAI: He studied at AIT.\nHuman: What is your highest degree?\nAI: He is pursuing a Master’s in Data Science.',
 'question': 'What is his work experience?',
 'text': "Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\nHuman: Where did he study?\nAI: He studied at AIT.\nHuman: What is your highest degree?\nAI: He is pursuing a Master’s in Data Science.\nFollow Up Input: What is his work experience?\nStandalone question: What is his work experience?\n\nThe follow up question is rephrased to be a standalone question. The original follow up question was asking for specific information about the person's work experience, which is now presented as a direct and concise question."}

`combine_docs_chain`

In [37]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\n{context}\nQuestion: {question}\nAnswer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x77f153791460>)), document_variable_name='context')

In [38]:
query = "Where has Tada Suttaket worked before?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 
I am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.

Ask me anything about:
My age, education, and major.
My work experience and industry involvement.
My thoughts on technology and culture.
My academic challenges and research interests.

If I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'

Personal Profile 
Basic Information 
• 
Full Name: Tada Suttaket 
• 
Nickname: Kid 
• 
Gender: Male 
• 
Birth Date: 13 september 2000 
• 
Age: 24 
• 
Education:  
o Bachelor’s Degree: Engineering, Kasetsart University 
o Master’s Degree (In Progress): Data Science & AI, Asian Institute of 
Technology (AIT) 
 
Internship Experience 
• 
Duration: 3 months 
• 
Industry: Automoti

{'input_documents': [Document(page_content='Personal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Experience \n• \nDuration: 3 months \n• \nIndustry: Automotive (ThaiHonda) \n• \nRole: Industrial Process & Mechanical Intern \n• \nKey Responsibilities:  \no Studied the entire industrial workflow, focusing on mechanical and \nautomotive systems. \no Gained hands-on experience in maintenance, system efficiency, and industrial \nprocess optimization. \nWork Experience \n•', metadata={'source': '../docs/Personal_Profile.pdf', 'file_path': '../docs/Personal_Profile.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.7', 'title': '', 'author': 'Tada Suttaket', 'subject': '', 'keywords': '', 'creator': 'Microsoft

In [39]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\n{context}\nQuestion: {question}\nAnswer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x77f153791460>)), documen

## 5. Chatbot

In [40]:
prompt_question = "Can you tell me about Tada Suttaket?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 
I am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.

Ask me anything about:
My age, education, and major.
My work experience and industry involvement.
My thoughts on technology and culture.
My academic challenges and research interests.

If I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'

Personal Profile 
Basic Information 
• 
Full Name: Tada Suttaket 
• 
Nickname: Kid 
• 
Gender: Male 
• 
Birth Date: 13 september 2000 
• 
Age: 24 
• 
Education:  
o Bachelor’s Degree: Engineering, Kasetsart University 
o Master’s Degree (In Progress): Data Science & AI, Asian Institute of 
Technology (AIT) 
 
Inte

{'question': 'Can you tell me about Tada Suttaket?',
 'chat_history': [],
 'answer': "Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Experience \n• \nDuration: 3 months \n• \nIndustry: Automotive (ThaiHonda) \

In [41]:
prompt_question = "How old are you?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Can you tell me about Tada Suttaket?'), AIMessage(content="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBi

{'question': 'How old are you?',
 'chat_history': [HumanMessage(content='Can you tell me about Tada Suttaket?'),
  AIMessage(content="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Experience \n• \nDuration: 3

In [42]:
prompt_question = "What is your highest level of education?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Can you tell me about Tada Suttaket?'), AIMessage(content="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBi

{'question': 'What is your highest level of education?',
 'chat_history': [HumanMessage(content='Can you tell me about Tada Suttaket?'),
  AIMessage(content="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Expe

In [43]:
prompt_question = "What major or field of study did you pursue during your education?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Can you tell me about Tada Suttaket?'), AIMessage(content="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBi

{'question': 'What major or field of study did you pursue during your education?',
 'chat_history': [HumanMessage(content='Can you tell me about Tada Suttaket?'),
  AIMessage(content="Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master's student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, 'Oops, I don’t know that one!'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology 

In [44]:
prompt_question = "How many years of work experience do you have?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='How old are you?'), AIMessage(content='Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master\'s student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, \'Oops, I don’t know that one!\'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 sept

{'question': 'How many years of work experience do you have?',
 'chat_history': [HumanMessage(content='How old are you?'),
  AIMessage(content='Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master\'s student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, \'Oops, I don’t know that one!\'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \n• \nBirth Date: 13 september 2000 \n• \nAge: 24 \n• \nEducation:  \no Bachelor’s Degree: Engineering, Kasetsart University \no Master’s Degree (In Progress): Data Science & AI, Asian Institute of \nTechnology (AIT) \n \nInternship Experience \n• 

In [None]:
prompt_question = "What type of work or industry have you been involved in?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What is your highest level of education?'), AIMessage(content='Hey there! I’m KIDBOT, your AI-powered assistant, speaking as Tada Suttaket! 🎓🤖 \nI am a Master\'s student at AIT, Thailand, and I can answer questions about my education, work experience, beliefs, and academic journey.\n\nAsk me anything about:\nMy age, education, and major.\nMy work experience and industry involvement.\nMy thoughts on technology and culture.\nMy academic challenges and research interests.\n\nIf I don’t have an answer, I’ll be honest and say, \'Oops, I don’t know that one!\'\n\nPersonal Profile \nBasic Information \n• \nFull Name: Tada Suttaket \n• \nNickname: Kid \n• \nGender: Male \

In [None]:
prompt_question = "Can you describe your current role or job responsibilities?"
answer = chain({"question":prompt_question})
answer

In [None]:
prompt_question = "What are your core beliefs regarding the role of technology in shaping society?"
answer = chain({"question":prompt_question})
answer

In [None]:
prompt_question = "How do you think cultural values should influence technological advancements?"
answer = chain({"question":prompt_question})
answer

In [None]:
prompt_question = "As a master’s student, what is the most challenging aspect of your studies so far?"
answer = chain({"question":prompt_question})
answer

In [None]:
prompt_question = "What specific research interests or academic goals do you hope to achieve during your time as a master’s student?"
answer = chain({"question":prompt_question})
answer