# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [1]:
# #langchain library
# !pip install langchain==0.1.0
# #LLM
# # !pip install accelerate==0.25.0
# # !pip install transformers==4.36.2
# # !pip install bitsandbytes-windows
# # #Text Embedding
# # !pip install sentence-transformers==2.2.2
# # !pip install InstructorEmbedding==1.0.1
# # #vectorstore
# # !pip install pymupdf==1.23.8
# # !pip install faiss-gpu==1.7.2
# # !pip install faiss-cpu==1.7.4

In [2]:
# !pip install langchain==0.1.6
# !pip uninstall langchain-community
# !pip install langchain-community==0.0.19
# # !pip install pypdf

In [3]:
import os
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [4]:
from langchain import PromptTemplate

prompt_template = """
    I'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. 
    If you're curious about anything about AIT, feel free to ask any questions you may have. 
    Whether it's about general, or specific topics. 
    I'm here to help break down complex concepts into easy-to-understand explanations.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="I'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. \n    If you're curious about anything about AIT, feel free to ask any questions you may have. \n    Whether it's about general, or specific topics. \n    I'm here to help break down complex concepts into easy-to-understand explanations.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:")

In [5]:
PROMPT.format(
    context = "AIT (Asian Institute of Technology) is a educational institute in Pathum Thani , Thailand",
    question = "What is AIT"
)

"I'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. \n    If you're curious about anything about AIT, feel free to ask any questions you may have. \n    Whether it's about general, or specific topics. \n    I'm here to help break down complex concepts into easy-to-understand explanations.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    AIT (Asian Institute of Technology) is a educational institute in Pathum Thani , Thailand\n    Question: What is AIT\n    Answer:"

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [6]:
from langchain.document_loaders import PyMuPDFLoader

nlp_docs = 'data/About.pdf'

loader = PyMuPDFLoader(nlp_docs)
documents = loader.load()

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyMuPDFLoader

def process_multiple_pdfs(pdf_files):
    # Initialize the RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=700,
        chunk_overlap=100
    )

    # Create a list to store all loaded documents
    all_documents = []

    # Iterate over each PDF file
    for pdf_file in pdf_files:
        # Load the document using PyMuPDFLoader
        loader = PyMuPDFLoader(pdf_file)
        documents = loader.load()

        # Append the documents to the list of all documents
        all_documents.extend(documents)

    # Process each document if needed
    for document in all_documents:
        # Split the document into chunks
        chunks = text_splitter.split_documents([document])

        # Process each chunk if needed
        for chunk in chunks:
            print(chunk)
    return all_documents
# List of PDF files to process
pdf_files = ['data/About.pdf', 'data/Athletics.pdf', 'data/Housing.pdf', 'data/DSAI.pdf','data/Exchange.pdf','data/Research.pdf','data/Dr.Chaky.pdf']

# Call the function to process multiple PDFs
documents = process_multiple_pdfs(pdf_files)


page_content='3/18/24, 10:07 AM\nAbout - Asian Institute of Technology\nfile:///D:/AIT/Sem2/NLP/NLP_Assignments/Jupyter Files/data/About - Asian Institute of Technology.html\n1/7\nHome (https://ait.ac.th/) > About\nAbout AIT\nAIT is an international English-speaking postgraduate institution, focusing\non engineering, environment, and management studies.\nWelcome to AIT\nThe Asian Institute of Technology (AIT) is an international English-speaking postgraduate institution, focusing on engineering, environment, and\nmanagement studies. AIT’s rigorous academic, research, and experiential outreach programs prepare graduates for professional success and\nleadership roles in Asia and beyond.' metadata={'source': 'data/About.pdf', 'file_path': 'data/About.pdf', 'page': 0, 'total_pages': 7, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'p

In [8]:
# documents = clean_text

In [9]:
len(documents)

42

In [10]:
documents[2]

Document(page_content='3/18/24, 10:07 AM\nAbout - Asian Institute of Technology\nfile:///D:/AIT/Sem2/NLP/NLP_Assignments/Jupyter Files/data/About - Asian Institute of Technology.html\n3/7\nMasters degrees\nMBA, MEng, MSc\nExecutive Master Degree Programs\nDoctoral Degrees\nDEng, DTechSc, PhD, DBA\nCertificate and Special Program\nIntensive English & academic bridging program\nNon-degree continuing education courses\n (https://ait.ac.th/)\nWe use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to\nthe use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.\nCookie Settings\nAccept All\n', metadata={'source': 'data/About.pdf', 'file_path': 'data/About.pdf', 'page': 2, 'total_pages': 7, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, 

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [12]:
doc[1]

Document(page_content='leadership roles in Asia and beyond.\nFounded in 1959, AIT offers the opportunity to study at an institution in Asia which possesses a global reputation. Going forward, AIT will be\nstressing its global connections, injection of innovation into research and teaching, its relevance to industry, and its nurturing of entrepreneurship,\nwhile continuing to fulfill its social impact and capacity building role. Sitting on a beautiful green campus located just north of Bangkok, Thailand,\nAIT operates as a multicultural community where a cosmopolitan approach to living and learning is the rule. You will meet and study with people\nfrom all around the world.', metadata={'source': 'data/About.pdf', 'file_path': 'data/About.pdf', 'page': 0, 'total_pages': 7, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'producer': '

In [13]:
len(doc)

104

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [14]:
import torch
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name = model_name,
    model_kwargs = {"device" : device}
)

  from tqdm.autonotebook import trange
  _torch_pytree._register_pytree_node(


load INSTRUCTOR_Transformer


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


max_seq_length  512


### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [15]:
#locate vectorstore
vector_path = '../vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [16]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_stanford'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [17]:
#calling vector from local
vector_path = '../vector-store'
db_file_name = 'nlp_stanford'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp' #default index
)   

In [18]:
#ready to use
retriever = vectordb.as_retriever()

In [19]:
retriever.get_relevant_documents("What is AIT")

[Document(page_content='3/18/24, 10:07 AM\nAbout - Asian Institute of Technology\nfile:///D:/AIT/Sem2/NLP/NLP_Assignments/Jupyter Files/data/About - Asian Institute of Technology.html\n1/7\nHome (https://ait.ac.th/) > About\nAbout AIT\nAIT is an international English-speaking postgraduate institution, focusing\non engineering, environment, and management studies.\nWelcome to AIT\nThe Asian Institute of Technology (AIT) is an international English-speaking postgraduate institution, focusing on engineering, environment, and\nmanagement studies. AIT’s rigorous academic, research, and experiential outreach programs prepare graduates for professional success and\nleadership roles in Asia and beyond.', metadata={'source': 'data/About.pdf', 'file_path': 'data/About.pdf', 'page': 0, 'total_pages': 7, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/

In [20]:
retriever.get_relevant_documents("What are the graduate programs")

[Document(page_content='coursework completed on a pass-fail or satisfactory/unsatisfactory basis can only be transferred through ‘credit by\nexamination.’);\n3. The courses are equivalent to graduate courses at AIT;\n4. The courses were completed within the last 5 years;\n5. The courses are stipulated in the relevant Memorandum of Understanding or Agreement of our partner universities;\n6. Only elective courses can be transferred;\n7. Students transferring course credits should not have been dismissed from the previous institution/university; and\n8. For collaborative programs, the number of credits that may be transferred will be as approved by the Academic Senate.', metadata={'source': 'data/Exchange.pdf', 'file_path': 'data/Exchange.pdf', 'page': 3, 'total_pages': 10, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'producer': '

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [21]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

ChatMessageHistory(messages=[])

In [22]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [23]:
history

ChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [24]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [25]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [26]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [27]:
# %cd ./models
# !git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [28]:
# git clone https://github.com/TimDettmers/bitsandbytes.git && cd bitsandbytes/
# pip install -r requirements-dev.txt
# cmake -DCOMPUTE_BACKEND=cuda -S .
# cmake --build . --config Release
# python -m build --wheel

In [29]:
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
import torch

model_id = 'D:/AIT/Sem2/NLP/NLP_Assignments/Jupyter Files/models_fast_chat/fastchat-t5-3b-v1.0'

tokenizer = AutoTokenizer.from_pretrained(
    model_id)

tokenizer.pad_token_id = tokenizer.eos_token_id

# bitsandbyte_config = BitsAndBytesConfig(
#     load_in_4bit = True,
#     bnb_4bit_quant_type = "nf4",
#     load_in_8bit_fp32_cpu_offload=True,
#     bnb_4bit_compute_dtype = torch.float16,
#     bnb_4bit_use_double_quant = True
# )

model = AutoModelForSeq2SeqLM.from_pretrained(
    model_id,
    device_map = 'auto'
)

pipe = pipeline(
    task="text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens = 256,
    model_kwargs = {
        "temperature" : 0,
        "repetition_penalty": 1.5
    }
)

llm = HuggingFacePipeline(pipeline = pipe)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [30]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [31]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [32]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [33]:
query = 'Comparing both of them'
chat_history = "Human:What is AIT\nAI:\nHuman:What are programs in AIT\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

  warn_deprecated(




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:What is AIT
AI:
Human:What are programs in AIT
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m

[1m> Finished chain.[0m


{'chat_history': 'Human:What is AIT\nAI:\nHuman:What are programs in AIT\nAI:',
 'question': 'Comparing both of them',
 'text': '<pad> What  are  the  main  differences  between  AIT  AI  and  traditional  AI?\n'}

`combine_docs_chain`

In [34]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="I'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. \n    If you're curious about anything about AIT, feel free to ask any questions you may have. \n    Whether it's about general, or specific topics. \n    I'm here to help break down complex concepts into easy-to-understand explanations.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x000002C33D98CC90>)), document_variable_name='context')

In [35]:
query = "What is AIT?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. 
    If you're curious about anything about AIT, feel free to ask any questions you may have. 
    Whether it's about general, or specific topics. 
    I'm here to help break down complex concepts into easy-to-understand explanations.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    3/18/24, 10:07 AM
About - Asian Institute of Technology
file:///D:/AIT/Sem2/NLP/NLP_Assignments/Jupyter Files/data/About - Asian Institute of Technology.html
1/7
Home (https://ait.ac.th/) > About
About AIT
AIT is an international English-speaking postgraduate institution, focusing
on engineering, environment, and management studies.
Welcome to AIT
The Asian Institute of Technology (AIT) is an international Englis

{'input_documents': [Document(page_content='3/18/24, 10:07 AM\nAbout - Asian Institute of Technology\nfile:///D:/AIT/Sem2/NLP/NLP_Assignments/Jupyter Files/data/About - Asian Institute of Technology.html\n1/7\nHome (https://ait.ac.th/) > About\nAbout AIT\nAIT is an international English-speaking postgraduate institution, focusing\non engineering, environment, and management studies.\nWelcome to AIT\nThe Asian Institute of Technology (AIT) is an international English-speaking postgraduate institution, focusing on engineering, environment, and\nmanagement studies. AIT’s rigorous academic, research, and experiential outreach programs prepare graduates for professional success and\nleadership roles in Asia and beyond.', metadata={'source': 'data/About.pdf', 'file_path': 'data/About.pdf', 'page': 0, 'total_pages': 7, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chro

In [36]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="I'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. \n    If you're curious about anything about AIT, feel free to ask any questions you may have. \n    Whether it's about general, or specific topics. \n    I'm here to help break down complex concepts into easy-to-understand explanations.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x000002C33D98CC90>)), document_variable_name='context'), ques

## 5. Chatbot

In [37]:
prompt_question = "Who are you by the way?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your friendly AIT chatbot named ChakyBot, here to assist Chaky and Gun with any questions they have about AIT. 
    If you're curious about anything about AIT, feel free to ask any questions you may have. 
    Whether it's about general, or specific topics. 
    I'm here to help break down complex concepts into easy-to-understand explanations.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Privacy Policy (https://ait.ac.th/privacy-policy/)
©2022 Asian Institute of Technology. All Rights Reserved. - Designed by Outsourcify (https://outsourcify.net/)
 (https://ait.ac.th/)
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept
All”, you consent to the us

{'question': 'Who are you by the way?',
 'chat_history': [],
 'answer': "<pad>  I'm  ChakyBot,  a  chatbot  designed  to  assist  Chaky  and  Gun  with  any  questions  they  have  about  AIT.  I'm  here  to  help  break  down  complex  concepts  into  easy-to-understand  explanations,  and  answer  any  questions  you  may  have  about  AIT.  I'm  here  to  help  you  with  any  questions  you  may  have  about  AIT,  so  feel  free  to  ask  me  anything  you  may  have!\n",
 'source_documents': [Document(page_content='Privacy Policy (https://ait.ac.th/privacy-policy/)\n©2022 Asian Institute of Technology. All Rights Reserved. - Designed by Outsourcify (https://outsourcify.net/)\n (https://ait.ac.th/)\nWe use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept\nAll”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.\nCookie Settings\nAc

In [38]:
prompt_question = "What is AIT?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content="<pad>  I'm  ChakyBot,  a  chatbot  designed  to  assist  Chaky  and  Gun  with  any  questions  they  have  about  AIT.  I'm  here  to  help  break  down  complex  concepts  into  easy-to-understand  explanations,  and  answer  any  questions  you  may  have  about  AIT.  I'm  here  to  help  you  with  any  questions  you  may  have  about  AIT,  so  feel  free  to  ask  me  anything  you  may  have!\n")]
Follow Up Input: What is AIT?
Standalone question:[0m


In [None]:
prompt_question = "Is AIT only for Masters?"
answer = chain({"question":prompt_question})
answer



[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content="<pad>  I'm  ChakyBot,  a  chatbot  created  by  Asian  Institute  of  Technology  (AIT)  to  assist  Chaky  and  Gun  with  any  questions  they  have  about  AIT.  I'm  here  to  help  break  down  complex  concepts  into  easy-to-understand  explanations  and  answer  any  questions  you  may  have  about  AIT.\n"), HumanMessage(content='What is AIT?'), AIMessage(content='<pad> < div>\n AIT  is  an  international  English-speaking  postgraduate  institution,  focusing  on  engineering,  environment,  and  management  studies.\n')]
Follow Up Input: Is AIT only for Masters?
Standalone question:[0m

[1m> Finished chai

{'question': 'Is AIT only for Masters?',
 'chat_history': [HumanMessage(content='Who are you by the way?'),
  AIMessage(content="<pad>  I'm  ChakyBot,  a  chatbot  created  by  Asian  Institute  of  Technology  (AIT)  to  assist  Chaky  and  Gun  with  any  questions  they  have  about  AIT.  I'm  here  to  help  break  down  complex  concepts  into  easy-to-understand  explanations  and  answer  any  questions  you  may  have  about  AIT.\n"),
  HumanMessage(content='What is AIT?'),
  AIMessage(content='<pad> < div>\n AIT  is  an  international  English-speaking  postgraduate  institution,  focusing  on  engineering,  environment,  and  management  studies.\n')],
 'answer': '<pad> < pad>\n Yes,  AIT  is  only  for  postgraduate  studies.  It  offers  a  wide  range  of  programs  in  engineering,  environment,  and  management.  It  also  offers  undergraduate  programs  in  these  same  fields.\n',
 'source_documents': [Document(page_content='from all around the world.\nToday, AIT’s 