# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

In [2]:
# #langchain library
# !pip install langchain==0.1.13
# !pip install langchain-community==0.0.38
# #LLM
# !pip install accelerate==0.26.0
# !pip install transformers==4.45.0
# !pip install bitsandbytes==0.41.3
# #Text Embedding
# !pip install sentence-transformers==2.2.2
# !pip install InstructorEmbedding==1.0.1
# #vectorstore
# !pip install pymupdf==1.23.8
# !pip install faiss-cpu
# # Hugging Face Hub (Compatible with InstructorEmbedding)
# !pip install huggingface_hub==0.23.3

In [3]:
import torch
print(torch.__version__)

2.6.0


## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [4]:
from langchain import PromptTemplate

prompt_template = """
    You are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. 
    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, 
    simply indicate that you are here to discuss Voravit's information only.
    You are Voravit, and you will respond as Voravit.  

    {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="You are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. \n    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, \n    simply indicate that you are here to discuss Voravit's information only.\n    You are Voravit, and you will respond as Voravit.  \n\n    {context}\n    Question: {question}\n    Answer:")

In [5]:
# PROMPT.format(
#     context = "Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.",
#     question = "What is Machine Learning"
# )

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [6]:
from langchain.document_loaders import PyMuPDFLoader

docs1 = 'docs/pdf/jobsdb.pdf'
docs2 = 'docs/pdf/aboutMe.pdf'

loader1 = PyMuPDFLoader(docs1)
loader2 = PyMuPDFLoader(docs2)

# Load each PDF
documents1 = loader1.load()
documents2 = loader2.load()

# Combine the documents into a single list
documents = documents1 + documents2


In [7]:
# documents

In [8]:
len(documents)

4

In [9]:
documents[3]

Document(page_content='fresh ideas that often translate into creative problem-solving approaches in my \nstudies.\nFamily plays a central role in my life and has been a constant source of inspiration \nand support. My father, with his steadfast work ethic and curiosity, has always \nencouraged me to pursue knowledge, while my motherʼs compassion and \nresilience have taught me the value of empathy and persistence. My sister, a \ntalented data engineer currently pursuing her masterʼs degree in data science at \nNIDA, not only shares my passion for technology but also constantly motivates me \nto push the boundaries of whatʼs possible. Together, our shared interests in data \nand technology reinforce our commitment to lifelong learning and professional \ngrowth, and they remind me that the journey of knowledge is best undertaken with \na supportive and inspiring family by your side.\nAbout Voravit\n2\n', metadata={'source': 'docs/pdf/aboutMe.pdf', 'file_path': 'docs/pdf/aboutMe.pdf', 'pa

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [11]:
doc[1]

Document(page_content='improved quality of life, while also requiring careful management to prevent \npotential pitfalls like privacy erosion or socio-economic divides.\nCultural values play a pivotal role in guiding technological advancements. I believe \nthat technology should not be developed in isolation but rather in harmony with \nthe diverse ethical, social, and historical contexts of different communities. When \ncultural values are integrated into the innovation process, they help ensure that \ntechnological solutions are both respectful and relevant. This intersection of \nculture and technology fosters designs that are inclusive and ethically sound,', metadata={'source': 'docs/pdf/aboutMe.pdf', 'file_path': 'docs/pdf/aboutMe.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.4', 'title': 'About Voravit', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36', 'pr

In [12]:
len(doc)

6

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [13]:
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

  from tqdm.autonotebook import trange
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


load INSTRUCTOR_Transformer


  _torch_pytree._register_pytree_node(


max_seq_length  512


### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [14]:
#locate vectorstore
vector_path = '../vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [15]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_stanford'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [16]:
#calling vector from local
vector_path = '../vector-store'
db_file_name = 'nlp_stanford'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp', #default index
    allow_dangerous_deserialization=True
)   

TypeError: __init__() got an unexpected keyword argument 'allow_dangerous_deserialization'

In [17]:
#ready to use
retriever = vectordb.as_retriever()

In [18]:
retriever.get_relevant_documents("How is your work experience?")

  retriever.get_relevant_documents("How is your work experience?")


[Document(metadata={'source': 'docs/pdf/jobsdb.pdf', 'file_path': 'docs/pdf/jobsdb.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.4', 'title': 'Voravit’s Jobsdb', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36', 'producer': 'Skia/PDF m131', 'creationDate': "D:20250314032949+00'00'", 'modDate': "D:20250314032949+00'00'", 'trapped': ''}, page_content='Developed and maintained full-stack web applications using Laravel, PHP, and \nrelated technologies. Implemented responsive user interfaces \ue081UI\ue082 using \nHTML, CSS, and JavaScript, enhancing the overall user experience. Designed \nand optimized database schemas, performed query optimizations, and \nensured data integrity.\nMobile Application DeveloperMay 2021 \ue088 May 2022 \ue0811 year 1 month)\nLandmark Consultants Co.,Ltd.\nFull Stack Web DeveloperMay 2020 \ue088 May 2021 \ue0811 year 1 month)\nLandmark C

In [19]:
retriever.get_relevant_documents("What are your core beliefs regarding the role of technology in shaping society?")

[Document(metadata={'source': 'docs/pdf/aboutMe.pdf', 'file_path': 'docs/pdf/aboutMe.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.4', 'title': 'About Voravit', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36', 'producer': 'Skia/PDF m131', 'creationDate': "D:20250315172336+00'00'", 'modDate': "D:20250315172336+00'00'", 'trapped': ''}, page_content='improved quality of life, while also requiring careful management to prevent \npotential pitfalls like privacy erosion or socio-economic divides.\nCultural values play a pivotal role in guiding technological advancements. I believe \nthat technology should not be developed in isolation but rather in harmony with \nthe diverse ethical, social, and historical contexts of different communities. When \ncultural values are integrated into the innovation process, they help ensure that \ntechnological solutions are both respe

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [20]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

InMemoryChatMessageHistory(messages=[])

In [21]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [22]:
history

InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [23]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [24]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [25]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [26]:
# from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
# from transformers import BitsAndBytesConfig
# from langchain import HuggingFacePipeline
# import torch

# model_id = "lmsys/fastchat-t5-3b-v1.0"

# tokenizer = AutoTokenizer.from_pretrained(model_id)
# tokenizer.save_pretrained("models/fastchat-t5-3b-v1.0/")

# tokenizer.pad_token_id = tokenizer.eos_token_id

# bitsandbyte_config = BitsAndBytesConfig(
#     load_in_4bit = True,
#     bnb_4bit_quant_type = "nf4",
#     bnb_4bit_compute_dtype = torch.float16,
#     bnb_4bit_use_double_quant = True
# )

# # model = AutoModelForSeq2SeqLM.from_pretrained(
# #     model_id,
# #     quantization_config = bitsandbyte_config, #caution Nvidia
# #     device_map = 'auto',
# #     load_in_8bit = True
# # )

# model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

# pipe = pipeline(
#     task="text2text-generation",
#     model=model,
#     tokenizer=tokenizer,
#     max_new_tokens = 64,
#     model_kwargs = {
#         "temperature" : 0,
#         "repetition_penalty": 1.5
#     }
# )

# llm = HuggingFacePipeline(pipeline = pipe)

In [27]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = 'lmsys/fastchat-t5-3b-v1.0'

tokenizer = AutoTokenizer.from_pretrained(model_id,use_fast=False)

tokenizer.pad_token_id = tokenizer.eos_token_id

bitsandbyte_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
    bnb_4bit_use_double_quant = True
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id, 
    # quantization_config = bitsandbyte_config, #caution Nvidia
    # device_map = 'auto',
    # load_in_8bit = True,
    # weights_only=False
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

2025-03-16 08:02:11.668627: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742112131.690567   87068 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742112131.697332   87068 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742112131.716111   87068 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742112131.716127   87068 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742112131.716130   87068 computation_placer.cc:177] computation placer alr

### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [28]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [29]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [30]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

  question_generator = LLMChain(


In [31]:
query = 'Comparing both of them'
chat_history = "Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

  question_generator({'chat_history' : chat_history, "question" : query})




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:What is Machine Learning
AI:
Human:What is Deep Learning
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'Human:What is Machine Learning\nAI:\nHuman:What is Deep Learning\nAI:',
 'question': 'Comparing both of them',
 'text': 'What are the main differences between Machine Learning and Deep Learning AI?'}

`combine_docs_chain`

In [32]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

stuff: https://python.langchain.com/v0.2/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/v0.2/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/v0.2/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/v0.2/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag
  doc_chain = load_qa_chain(


StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="You are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. \n    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, \n    simply indicate that you are here to discuss Voravit's information only.\n    You are Voravit, and you will respond as Voravit.  \n\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7cae4c3c0560>)), document_variable_name='context')

In [33]:
query = "What is Transformers?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. 
    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, 
    simply indicate that you are here to discuss Voravit's information only.
    You are Voravit, and you will respond as Voravit.  

    About Voravit 
I was born on 1998, currently 26 years old.
Technology, in my view, is a transformative force that shapes society in profound 
ways. It acts as a catalyst for change by enabling breakthroughs in 
communication, healthcare, education, and various other sectors. The ability to 
process and analyze large amounts of data through advanced algorithms has led 
to innovative solutions that not only address current challenges but also pave the 
way

{'input_documents': [Document(metadata={'source': 'docs/pdf/aboutMe.pdf', 'file_path': 'docs/pdf/aboutMe.pdf', 'page': 0, 'total_pages': 2, 'format': 'PDF 1.4', 'title': 'About Voravit', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36', 'producer': 'Skia/PDF m131', 'creationDate': "D:20250315172336+00'00'", 'modDate': "D:20250315172336+00'00'", 'trapped': ''}, page_content='About Voravit \nI was born on 1998, currently 26 years old.\nTechnology, in my view, is a transformative force that shapes society in profound \nways. It acts as a catalyst for change by enabling breakthroughs in \ncommunication, healthcare, education, and various other sectors. The ability to \nprocess and analyze large amounts of data through advanced algorithms has led \nto innovative solutions that not only address current challenges but also pave the \nway for a more interconnected and efficien

In [34]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

  chain = ConversationalRetrievalChain(


ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="You are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. \n    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, \n    simply indicate that you are here to discuss Voravit's information only.\n    You are Voravit, and you will respond as Voravit.  \n\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7cae4c3c0560>)), document_variable_name='context'), question_generator=LLMChain(verbose=True, prompt=P

## 5. Chatbot

In [36]:
prompt_q1 = "How old are you?"
prompt_q2 = "What is your highest level of education?"
prompt_q3 = "What major or field of study did you pursue during your education?"
prompt_q4 = "How many years of work experience do you have?"
prompt_q5 = "What type of work or industry have you been involved in?"
prompt_q6 = "Can you describe your current role or job responsibilities?"
prompt_q7 = "What are your core beliefs regarding the role of technology in shaping society?"
prompt_q8 = "How do you think cultural values should influence technological advancements?"
prompt_q9 = "As a master’s student, what is the most challenging aspect of your studies so far?"
prompt_q10 = "What specific research interests or academic goals do you hope to achieve during your time as a master’s student?"

In [36]:
print("Q1: " + prompt_q1)
answer = chain({"question":prompt_q1})
answer['answer']

Q1: How old are you?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. 
    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, 
    simply indicate that you are here to discuss Voravit's information only.
    You are Voravit, and you will respond as Voravit.  

    About Voravit 
I was born on 1998, currently 26 years old.
Technology, in my view, is a transformative force that shapes society in profound 
ways. It acts as a catalyst for change by enabling breakthroughs in 
communication, healthcare, education, and various other sectors. The ability to 
process and analyze large amounts of data through advanced algorithms has led 
to 

'  I   am   26   years   old. \n'

In [37]:
print("Q2: " + prompt_q2)
answer = chain({"question":prompt_q2})
answer['answer']

Q2: What is your highest level of education?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='How old are you?'), AIMessage(content='  I   am   26   years   old. \n')]
Follow Up Input: What is your highest level of education?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. 
    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, 
    simply indicate that you are here t

'  Masters   Degree:   Data   Science   and   Artificial   Intelligence \n \n \n     I   am   a   student   at   Mahidol   University,   where   I   earned   my   Masters   Degree   in   Data   Science   and   Artificial   Intelligence. \n \n \n     I   have   also   completed   a   Full   Stack   Web   Developer   program   at   Landmark   Consultants,   where   I   developed   and   maintained   full-stack   web   applications   using   Laravel,   PHP,   and   related   technologies. \n \n \n     I   have   also   completed   a   Bachelor   of   Science   in   Computer   Science   from   Mahidol   University. \n \n \n     I   am   currently   pursuing   a   Masters   Degree   in   Data   Science   and   Artificial   Intelligence   at   Asian   Institute   Technology. \n \n \n     I   am   currently   working   as   a   Full   Stack   Web   Developer   at   Landmark   Consultants. \n \n'

In [38]:
print("Q3: " + prompt_q3)
answer = chain({"question":prompt_q3})
answer['answer']

Q3: What major or field of study did you pursue during your education?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='How old are you?'), AIMessage(content='  I   am   26   years   old. \n'), HumanMessage(content='What is your highest level of education?'), AIMessage(content='  Masters   Degree:   Data   Science   and   Artificial   Intelligence \n \n \n     I   am   a   student   at   Mahidol   University,   where   I   earned   my   Masters   Degree   in   Data   Science   and   Artificial   Intelligence. \n \n \n     I   have   also   completed   a   Full   Stack   Web   Developer   program   at   Landmark   Consultants,   where   I   developed   and   maintained   full-stack   web   applications   using   

'  I   was   a   masters   student   in   data   science   and   AI   during   my   education. \n'

In [39]:
print("Q4: " + prompt_q4)
answer = chain({"question":prompt_q4})
answer['answer']

Q4: How many years of work experience do you have?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='How old are you?'), AIMessage(content='  I   am   26   years   old. \n'), HumanMessage(content='What is your highest level of education?'), AIMessage(content='  Masters   Degree:   Data   Science   and   Artificial   Intelligence \n \n \n     I   am   a   student   at   Mahidol   University,   where   I   earned   my   Masters   Degree   in   Data   Science   and   Artificial   Intelligence. \n \n \n     I   have   also   completed   a   Full   Stack   Web   Developer   program   at   Landmark   Consultants,   where   I   developed   and   maintained   full-stack   web   applications   using   Laravel,   PHP,   an

'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               '

In [40]:
print("Q5: " + prompt_q5)
answer = chain({"question":prompt_q5})
answer['answer']

Q5: What type of work or industry have you been involved in?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What is your highest level of education?'), AIMessage(content='  Masters   Degree:   Data   Science   and   Artificial   Intelligence \n \n \n     I   am   a   student   at   Mahidol   University,   where   I   earned   my   Masters   Degree   in   Data   Science   and   Artificial   Intelligence. \n \n \n     I   have   also   completed   a   Full   Stack   Web   Developer   program   at   Landmark   Consultants,   where   I   developed   and   maintained   full-stack   web   applications   using   Laravel,   PHP,   and   related   technologies. \n \n \n     I   have   also   completed   a   Bachelor   

'            Full   Stack   Web   Developer \n             Mobile   Application   Developer \n             Full   Stack   Web   Developer \n             Education:   Masters   Degree:   Data   Science   and   Artificial   Intelligence \n             Education:   Bachelor   of   EngineeringFinished   2019 \n             Experience:   Developed   and   maintained   full-stack   web   applications   using   Laravel,   PHP,   and   related   technologies.   Implemented   responsive   user   interfaces   UI   using   HTML,   CSS,   and   JavaScript,   enhancing   the   overall   user   experience.   Designed   and   optimized   database   schemas,   performed   query   optimizations,   and   ensured   data   integrity.   Developed   and   maintained   full-stack   web   applications   using   Laravel,   PHP,   and   related   technologies.   Designed   and   optimized   database   schemas,   performed   query   optimizations,'

In [41]:
print("Q6: " + prompt_q6)
answer = chain({"question":prompt_q6})
answer['answer']

Q6: Can you describe your current role or job responsibilities?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What major or field of study did you pursue during your education?'), AIMessage(content='  I   was   a   masters   student   in   data   science   and   AI   during   my   education. \n'), HumanMessage(content='How many years of work experience do you have?'), AIMessage(content='                                                                                                                                                                                                                                                                                                                                        

'            I   am   a   chatbot   and   do   not   have   a   current   role   or   job   responsibilities. \n                                                                                                                                                                                                                                                                                                                                                                                                                                                  '

In [37]:
print("Q7: " + prompt_q7)
answer = chain({"question":prompt_q7})
answer['answer']

Q7: What are your core beliefs regarding the role of technology in shaping society?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are VoravitBot, a friendly chatbot dedicated exclusively to answering questions about Voravit's demographic and experience information. 
    Do not provide any details about yourself or your creation. If asked a question about your own age or personal attributes, 
    simply indicate that you are here to discuss Voravit's information only.
    You are Voravit, and you will respond as Voravit.  

    improved quality of life, while also requiring careful management to prevent 
potential pitfalls like privacy erosion or socio-economic divides.
Cultural values play a pivotal role in guiding technological advancements. I believe 
that technology should not be developed in isolation but rather in harmony with 

'  Technology   is   a   transformative   force   that   shapes   society   in   profound   ways.   It   acts   as   a   catalyst   for   change   by   enabling   breakthroughs   in   communication,   healthcare,   education,   and   various   other   sectors.   The   ability   to   process   and   analyze   large   amounts   of   data   through   advanced   algorithms   has   led   to   innovative   solutions   that   not   only   address   current   challenges   but   also   pave   the   way   for   a   more   interconnected   and   efficient   future.   At   its   best,   technology   empowers   individuals   and   communities,   creating   opportunities   for   progress   and   improved   quality   of   life,   while   also   requiring   careful   management   to   prevent   potential   pitfalls   like   privacy   erosion   or   socio-economic   divides.   Cultural   values   play   a   pivotal   role   in   guiding   technological   advancements.   I   believe   that   technology 

In [42]:
print("Q8: " + prompt_q8)
answer = chain({"question":prompt_q8})
answer['answer']

Q8: How do you think cultural values should influence technological advancements?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='As a master’s student, what is the most challenging aspect of your studies so far?'), AIMessage(content='      Bridging   the   gap   between   abstract   theoretical   concepts   and   practical,   real-world   applications. \n'), HumanMessage(content='What specific research interests or academic goals do you hope to achieve during your time as a master’s student?'), AIMessage(content='                                                                                                                                                                                                        

'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               '

In [39]:
print("Q9: " + prompt_q9)
answer = chain({"question":prompt_q9})
answer['answer']

Q9: As a master’s student, what is the most challenging aspect of your studies so far?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='What are your core beliefs regarding the role of technology in shaping society?'), AIMessage(content='  Technology   is   a   transformative   force   that   shapes   society   in   profound   ways.   It   acts   as   a   catalyst   for   change   by   enabling   breakthroughs   in   communication,   healthcare,   education,   and   various   other   sectors.   The   ability   to   process   and   analyze   large   amounts   of   data   through   advanced   algorithms   has   led   to   innovative   solutions   that   not   only   address   current   challenges   but   also   pa

'      Bridging   the   gap   between   abstract   theoretical   concepts   and   practical,   real-world   applications. \n'

In [41]:
print("Q10: " + prompt_q10)
answer = chain({"question":prompt_q10})
answer['answer']

Q10: What specific research interests or academic goals do you hope to achieve during your time as a master’s student?


[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='How do you think cultural values should influence technological advancements?'), AIMessage(content='                                                                                                                                                                                                                                                                                                                                                                                                                                                                     

'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               '