# Natural Language Processing

# Retrieval-Augmented generation (RAG)

RAG is a technique for augmenting LLM knowledge with additional, often private or real-time, data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

<img src="../figures/RAG-process.png" >

Introducing `ChakyBot`, an innovative chatbot designed to assist Chaky (the instructor) and TA (Gun) in explaining the lesson of the NLP course to students. Leveraging LangChain technology, ChakyBot excels in retrieving information from documents, ensuring a seamless and efficient learning experience for students engaging with the NLP curriculum.

1. Prompt
2. Retrieval
3. Memory
4. Chain

In [5]:
#langchain library
!pip install langchain==0.0.350
#LLM
!pip install accelerate==0.25.0
!pip install transformers==4.36.2
!pip install bitsandbytes==0.41.2
#Text Embedding
!pip install sentence-transformers==2.2.2
!pip install InstructorEmbedding==1.0.1
#vectorstore
!pip install pymupdf==1.23.8
!pip install faiss-gpu==1.7.2
!pip install faiss-cpu==1.7.4

Defaulting to user installation because normal site-packages is not writeable
Collecting langchain==0.0.350
  Using cached langchain-0.0.350-py3-none-any.whl.metadata (13 kB)
Collecting langchain-community<0.1,>=0.0.2 (from langchain==0.0.350)
  Using cached langchain_community-0.0.38-py3-none-any.whl.metadata (8.7 kB)
Collecting langchain-core<0.2,>=0.1 (from langchain==0.0.350)
  Using cached langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.1.0,>=0.0.63 (from langchain==0.0.350)
  Using cached langsmith-0.0.92-py3-none-any.whl.metadata (9.9 kB)
INFO: pip is looking at multiple versions of langchain-community to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-community<0.1,>=0.0.2 (from langchain==0.0.350)
  Using cached langchain_community-0.0.37-py3-none-any.whl.metadata (8.7 kB)
  Using cached langchain_community-0.0.36-py3-none-any.whl.metadata (8.7 kB)
  Using cached langchain_community-0.

In [5]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## 1. Prompt

A set of instructions or input provided by a user to guide the model's response, helping it understand the context and generate relevant and coherent language-based output, such as answering questions, completing sentences, or engaging in a conversation.

In [6]:
import torch
print(torch.cuda.is_available())  # Should return True
print(torch.version.cuda)  # Check CUDA version


True
12.4


In [7]:
from langchain import PromptTemplate

prompt_template = """
    I'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. 
    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
    Whether it's about my eductional background or work experience, 
    I'm here to assist you.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    {context}
    Question: {question}
    Answer:
    """.strip()

PROMPT = PromptTemplate.from_template(
    template = prompt_template
)

PROMPT
#using str.format 
#The placeholder is defined using curly brackets: {} {}

PromptTemplate(input_variables=['context', 'question'], template="I'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. \n    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it's about my eductional background or work experience, \n    I'm here to assist you.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:")

In [8]:
PROMPT.format(
    context = "My age is 23 years",
    question = "What is your age?"
)

"I'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. \n    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it's about my eductional background or work experience, \n    I'm here to assist you.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    My age is 23 years\n    Question: What is your age?\n    Answer:"

Note : [How to improve prompting (Zero-shot, Few-shot, Chain-of-Thought, etc.](https://github.com/chaklam-silpasuwanchai/Natural-Language-Processing/blob/main/Code/05%20-%20RAG/advance/cot-tot-prompting.ipynb)

## 2. Retrieval

1. `Document loaders` : Load documents from many different sources (HTML, PDF, code). 
2. `Document transformers` : One of the essential steps in document retrieval is breaking down a large document into smaller, relevant chunks to enhance the retrieval process.
3. `Text embedding models` : Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of text that are similar.
4. `Vector stores`: there has emerged a need for databases to support efficient storage and searching of these embeddings.
5. `Retrievers` : Once the data is in the database, you still need to retrieve it.

### 2.1 Document Loaders 
Use document loaders to load data from a source as Document's. A Document is a piece of text and associated metadata. For example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for loading a transcript of a YouTube video.

[PDF Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)

[Download Document](https://web.stanford.edu/~jurafsky/slp3/)

In [9]:
from langchain.document_loaders import PyMuPDFLoader

nlp_docs = 'CV_Tooba_Mehboob.pdf'

loader = PyMuPDFLoader(nlp_docs)
documents = loader.load()

In [10]:
# documents

In [11]:
# pip install pymupdf

In [12]:
len(documents)

4

In [13]:
documents[1]

Document(page_content='Other language(s):\n  \nUNDERSTANDING\nSPEAKING\nWRITING\nListening\nReading\nSpoken production Spoken interaction\nENGLISH \nC2\nC2\nC2\nC2\nC2\nLevels: A1 and A2: Basic user; B1 and B2: Independent user; C1 and C2: Proﬁcient user \nWeb Technologies Fundamentals - HTML, CSS\n Node.Js, React.Js\n Microsoft Oﬃce\n Zustand\n React Hooks, React\nRedux\n JavaScript\n Git\n Machine learning\n Model training\n Dataset Building\n JQuery\n Email Template\nMarkup\n Tailwind\n Material Tailwind\n Python \nMaize Seeds Species Classiﬁcation using Machine Leanring \n• Collected the samples of diﬀerent varieties of maize seeds and captured image of each seed.\n• Creating datasets by discarding blurred and duplicate images, resizing images and applying data augmentation on\nthe selected images.\n• On these dataset trained ﬁve machine learning models i.e., LeNet, MobileNet, SmallVGG, Random Forest and\nConvolutional Neural Networks.\n• Developed a mobile application by selecting

### 2.2 Document Transformers

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap = 100
)

doc = text_splitter.split_documents(documents)

In [15]:
doc[1]

Document(page_content='project on Maize Seeds Species Classiﬁcation using Machine Learning. During my internship at software company, I\ncontributed to web development projects, gaining practical expertise in ReactJS, NodeJS, NextJS, Express and\nMongoDB. Noteworthy achievements include development of HR software, SMS Tool Console, Data Pulse and SMS API\nTester. I am adept at problem-solving, collaborating in team environments, and eﬀectively communicating complex\ntechnical concepts. I am excited about the prospect of contributing my skills and further advancing my expertise in\ncomputer technologies.\n01/06/2023 – 10/07/2024 Peshawar, Pakistan \nTRAINEE WEB ENGINEER VEEVO TECH', metadata={'source': 'CV_Tooba_Mehboob.pdf', 'file_path': 'CV_Tooba_Mehboob.pdf', 'page': 0, 'total_pages': 4, 'format': 'PDF 1.5', 'title': 'Europass', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'cairo 1.15.12 (http://cairographics.org)', 'creationDate': "D:20240714105949+02'00",

In [16]:
len(doc)

14

In [17]:
# pip install --upgrade transformers sentence-transformers huggingface_hub langchain langchain-community InstructorEmbedding protobuf

### 2.3 Text Embedding Models
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

*Note* Instructor Model : [Huggingface](gingface.co/hkunlp/instructor-base) | [Paper](https://arxiv.org/abs/2212.09741)

In [18]:
import torch
from langchain_community.embeddings import HuggingFaceInstructEmbeddings

device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = 'hkunlp/instructor-base'

embedding_model = HuggingFaceInstructEmbeddings(
    model_name=model_name,
    model_kwargs={"device": device}  # Ensure no 'token' argument is passed
)

print("Model loaded successfully!")


load INSTRUCTOR_Transformer
max_seq_length  512
Model loaded successfully!


### 2.4 Vector Stores

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

In [19]:
#locate vectorstore
vector_path = '../vector-store'
if not os.path.exists(vector_path):
    os.makedirs(vector_path)
    print('create path done')

In [20]:
#save vector locally
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(
    documents = doc,
    embedding = embedding_model
)

db_file_name = 'nlp_stanford'

vectordb.save_local(
    folder_path = os.path.join(vector_path, db_file_name),
    index_name = 'nlp' #default index
)

### 2.5 retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

In [21]:
#calling vector from local
vector_path = '../vector-store'
db_file_name = 'nlp_stanford'

from langchain.vectorstores import FAISS

vectordb = FAISS.load_local(
    folder_path = os.path.join(vector_path, db_file_name),
    embeddings = embedding_model,
    index_name = 'nlp', #default index
    allow_dangerous_deserialization=True
)   

In [22]:
#ready to use
retriever = vectordb.as_retriever()

In [23]:
retriever.get_relevant_documents("What is my name")

  warn_deprecated(


[Document(page_content="Tooba Mehboob \nDate of birth: 26/02/2002\n Nationality: Pakistani\n Gender: Female \n Phone number: (+92) 3465547722 (Mobile) \n \nEmail address: toobamehboob36@gmail.com \n Address: Islamia Colony, Palosi Piran, 25130, Pakistan (Home) \n \nI am a highly motivated computer science graduate with a solid foundation in web technologies which includes HTML,\nCSS, Scss, TailwindCSS, Material Tailwind, JQuery, JavaScript, ReactJS, NodeJS, NextJS, Express, MongoDB, Zustand,\nRedux and API Integrations. My academic journey at University of Engineering and Technology, Peshawar, equipped\nme with a robust skill set, reinforced by hands-on experiences and coursework. During my bachelor's I have done", metadata={'source': 'CV_Tooba_Mehboob.pdf', 'file_path': 'CV_Tooba_Mehboob.pdf', 'page': 0, 'total_pages': 4, 'format': 'PDF 1.5', 'title': 'Europass', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'cairo 1.15.12 (http://cairographics.org)', 'creati

In [24]:
retriever.get_relevant_documents("What is my age?")

[Document(page_content="Tooba Mehboob \nDate of birth: 26/02/2002\n Nationality: Pakistani\n Gender: Female \n Phone number: (+92) 3465547722 (Mobile) \n \nEmail address: toobamehboob36@gmail.com \n Address: Islamia Colony, Palosi Piran, 25130, Pakistan (Home) \n \nI am a highly motivated computer science graduate with a solid foundation in web technologies which includes HTML,\nCSS, Scss, TailwindCSS, Material Tailwind, JQuery, JavaScript, ReactJS, NodeJS, NextJS, Express, MongoDB, Zustand,\nRedux and API Integrations. My academic journey at University of Engineering and Technology, Peshawar, equipped\nme with a robust skill set, reinforced by hands-on experiences and coursework. During my bachelor's I have done", metadata={'source': 'CV_Tooba_Mehboob.pdf', 'file_path': 'CV_Tooba_Mehboob.pdf', 'page': 0, 'total_pages': 4, 'format': 'PDF 1.5', 'title': 'Europass', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'cairo 1.15.12 (http://cairographics.org)', 'creati

## 3. Memory

One of the core utility classes underpinning most (if not all) memory modules is the ChatMessageHistory class. This is a super lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them all.

You may want to use this class directly if you are managing memory outside of a chain.


In [25]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history

InMemoryChatMessageHistory(messages=[])

In [26]:
history.add_user_message('hi')
history.add_ai_message('Whats up?')
history.add_user_message('How are you')
history.add_ai_message('I\'m quite good. How about you?')

In [27]:
history

InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='Whats up?'), HumanMessage(content='How are you'), AIMessage(content="I'm quite good. How about you?")])

### 3.1 Memory types

There are many different types of memory. Each has their own parameters, their own return types, and is useful in different scenarios. 
- Converstaion Buffer
- Converstaion Buffer Window

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon the input variables, you may need to pass some in.

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your prompt) should expect an input named history. You can usually control this variable through parameters on the memory class. For example, if you want the memory variables to be returned in the key chat_history you can do:

#### Converstaion Buffer
This memory allows for storing messages and then extracts the messages in a variable.

In [28]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: hi\nAI: What's up?\nHuman: How are you?\nAI: I'm quite good. How about you?"}

In [29]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': [HumanMessage(content='hi'),
  AIMessage(content="What's up?"),
  HumanMessage(content='How are you?'),
  AIMessage(content="I'm quite good. How about you?")]}

#### Conversation Buffer Window
- it keeps a list of the interactions of the conversation over time. 
- it only uses the last K interactions. 
- it can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [30]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({'input':'hi'}, {'output':'What\'s up?'})
memory.save_context({"input":'How are you?'},{'output': 'I\'m quite good. How about you?'})
memory.load_memory_variables({})

{'history': "Human: How are you?\nAI: I'm quite good. How about you?"}

## 4. Chain

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

An `LLMChain` is a simple chain that adds some functionality around language models.
- it consists of a `PromptTemplate` and a `LM` (either an LLM or chat model).
- it formats the prompt template using the input key values provided (and also memory key values, if available), 
- it passes the formatted string to LLM and returns the LLM output.

Note : [Download Fastchat Model Here](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0)

In [31]:
# %cd ./models
# !git clone https://huggingface.co/lmsys/fastchat-t5-3b-v1.0

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
from langchain.llms import HuggingFacePipeline


model_id = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float32,  # Force 32-bit precision to prevent overflow
    device_map="auto"  # Auto-detect GPU or CPU
)

# pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
# llm = HuggingFacePipeline(pipeline=pipe)


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [32]:
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA

# Wrap HuggingFace pipeline properly
llm_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)
llm = HuggingFacePipeline(pipeline=llm_pipeline)

# Define RetrievalQA with correct LLM
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,  
    retriever=retriever
)


In [33]:
from transformers import AutoModel
AutoModel.from_pretrained("lmsys/fastchat-t5-3b-v1.0", cache_dir="../models/")


T5Model(
  (shared): Embedding(32110, 2048)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32110, 2048)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=2048, out_features=2048, bias=False)
              (k): Linear(in_features=2048, out_features=2048, bias=False)
              (v): Linear(in_features=2048, out_features=2048, bias=False)
              (o): Linear(in_features=2048, out_features=2048, bias=False)
              (relative_attention_bias): Embedding(32, 32)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=2048, out_features=5120, bias=False)
              (wi_1): Linear(in_features=2048, out_features=5120, bias=False)
              (wo): Linear

### [Class ConversationalRetrievalChain](https://api.python.langchain.com/en/latest/_modules/langchain/chains/conversational_retrieval/base.html#ConversationalRetrievalChain)

- `retriever` : Retriever to use to fetch documents.

- `combine_docs_chain` : The chain used to combine any retrieved documents.

- `question_generator`: The chain used to generate a new question for the sake of retrieval. This chain will take in the current question (with variable question) and any chat history (with variable chat_history) and will produce a new standalone question to be used later on.

- `return_source_documents` : Return the retrieved source documents as part of the final result.

- `get_chat_history` : An optional function to get a string of the chat history. If None is provided, will use a default.

- `return_generated_question` : Return the generated question as part of the final result.

- `response_if_no_docs_found` : If specified, the chain will return a fixed response if no docs are found for the question.


`question_generator`

In [34]:
from langchain.chains import LLMChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import ConversationalRetrievalChain

In [35]:
CONDENSE_QUESTION_PROMPT

PromptTemplate(input_variables=['chat_history', 'question'], template='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n{chat_history}\nFollow Up Input: {question}\nStandalone question:')

In [36]:
question_generator = LLMChain(
    llm = llm,
    prompt = CONDENSE_QUESTION_PROMPT,
    verbose = True
)

In [38]:
query = 'Comparing both of them'
chat_history = "Human:What is your name\nAI:\nHuman:What is your age?\nAI:"

question_generator({'chat_history' : chat_history, "question" : query})

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
Human:What is your name
AI:
Human:What is your age?
AI:
Follow Up Input: Comparing both of them
Standalone question:[0m





[1m> Finished chain.[0m


{'chat_history': 'Human:What is your name\nAI:\nHuman:What is your age?\nAI:',
 'question': 'Comparing both of them',
 'text': 'Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\nHuman:What is your name\nAI:\nHuman:What is your age?\nAI:\nFollow Up Input: Comparing both of them\nStandalone question: What'}

`combine_docs_chain`

In [39]:
doc_chain = load_qa_chain(
    llm = llm,
    chain_type = 'stuff',
    prompt = PROMPT,
    verbose = True
)
doc_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="I'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. \n    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it's about my eductional background or work experience, \n    I'm here to assist you.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x79b3b95be330>)), document_variable_name='context')

In [40]:
query = "Study?"
input_document = retriever.get_relevant_documents(query)

doc_chain({'input_documents':input_document, 'question':query})

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. 
    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
    Whether it's about my eductional background or work experience, 
    I'm here to assist you.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    CGPA: 3.57 out of 4.00
Final grade A 
Thesis Maize seeds species classiﬁcation using Machine Learning 
02/09/2017 – 29/07/2019 Peshawar, Pakistan 
HSSC (PRE-ENGINEERING) Agricultural University Public School and College for Girls 
Marks obtained: 897 out of 1100
Final grade A1 
01/04/2015 – 07/07/2017 Peshawar, Pakistan 
SSC Al-Amanah Youth Academy 
Marks obtained: 920 out of 1100
Final grade A1 
Mother tongue(s):  URDU 
ABOUT ME 
WORK EX




[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'input_documents': [Document(page_content='CGPA: 3.57 out of 4.00\nFinal grade A \nThesis Maize seeds species classiﬁcation using Machine Learning \n02/09/2017 – 29/07/2019 Peshawar, Pakistan \nHSSC (PRE-ENGINEERING) Agricultural University Public School and College for Girls \nMarks obtained: 897 out of 1100\nFinal grade A1 \n01/04/2015 – 07/07/2017 Peshawar, Pakistan \nSSC Al-Amanah Youth Academy \nMarks obtained: 920 out of 1100\nFinal grade A1 \nMother tongue(s):  URDU \nABOUT ME \nWORK EXPERIENCE\nEDUCATION AND TRAINING\nLANGUAGE SKILLS', metadata={'source': 'CV_Tooba_Mehboob.pdf', 'file_path': 'CV_Tooba_Mehboob.pdf', 'page': 0, 'total_pages': 4, 'format': 'PDF 1.5', 'title': 'Europass', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'cairo 1.15.12 (http://cairographics.org)', 'creationDate': "D:20240714105949+02'00", 'modDate': '', 'trapped': ''}),
  Document(page_content='computer technologies.\n01/06/2023 – 10/07/2024 Peshawar, Pakistan \nTRAINEE WEB E

In [41]:
memory = ConversationBufferWindowMemory(
    k=3, 
    memory_key = "chat_history",
    return_messages = True,
    output_key = 'answer'
)

chain = ConversationalRetrievalChain(
    retriever=retriever,
    question_generator=question_generator,
    combine_docs_chain=doc_chain,
    return_source_documents=True,
    memory=memory,
    verbose=True,
    get_chat_history=lambda h : h
)
chain

ConversationalRetrievalChain(memory=ConversationBufferWindowMemory(output_key='answer', return_messages=True, memory_key='chat_history', k=3), verbose=True, combine_docs_chain=StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['context', 'question'], template="I'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. \n    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it's about my eductional background or work experience, \n    I'm here to assist you.\n    Just let me know what you're wondering about, and I'll do my best to guide you through it!\n    {context}\n    Question: {question}\n    Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x79b3b95be330>)), document_variable_name='context'), question_generator=LLMChain(verbose=True, pr

## 5. Chatbot

In [42]:
prompt_question = "Who are you by the way?"
answer = chain({"question":prompt_question})
answer

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mI'm your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. 
    If you're curious about how probability works in the context of NLP, feel free to ask any questions you may have. 
    Whether it's about my eductional background or work experience, 
    I'm here to assist you.
    Just let me know what you're wondering about, and I'll do my best to guide you through it!
    Tooba Mehboob 
Date of birth: 26/02/2002
 Nationality: Pakistani
 Gender: Female 
 Phone number: (+92) 3465547722 (Mobile) 
 
Email address: toobamehboob36@gmail.com 
 Address: Islamia Colony, Palosi Piran, 25130, Pakistan (Home) 
 
I am a highly motivated computer science graduate with a solid foundation in web technologies which includes HTML,
CSS, Scss, TailwindCSS, Materia




[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'question': 'Who are you by the way?',
 'chat_history': [],
 'answer': 'I\'m your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. \n    If you\'re curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it\'s about my eductional background or work experience, \n    I\'m here to assist you.\n    Just let me know what you\'re wondering about, and I\'ll do my best to guide you through it!\n    Tooba Mehboob \nDate of birth: 26/02/2002\n Nationality: Pakistani\n Gender: Female \n Phone number: (+92) 3465547722 (Mobile) \n \nEmail address: toobamehboob36@gmail.com \n Address: Islamia Colony, Palosi Piran, 25130, Pakistan (Home) \n \nI am a highly motivated computer science graduate with a solid foundation in web technologies which includes HTML,\nCSS, Scss, TailwindCSS, Material Tailwind, JQuery, JavaScript, ReactJS, NodeJS, NextJS, Express, MongoDB, Zustand,\nRedux and API Integ

In [44]:
prompt_question = "What is your work experience?"
answer = chain({"question":prompt_question})
answer

Token indices sequence length is longer than the specified maximum sequence length for this model (2845 > 2048). Running this sequence through the model will result in indexing errors
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
[HumanMessage(content='Who are you by the way?'), AIMessage(content='I\'m your friendly NLP chatbot named ToobiBot, here to assist anyone with any questions they have about me. \n    If you\'re curious about how probability works in the context of NLP, feel free to ask any questions you may have. \n    Whether it\'s about my eductional background or work experience, \n    I\'m here to assist you.\n    Just let me know what you\'re wondering about, and I\'ll do my best to guide you through it!\n    Tooba Mehboob \nDate of birth: 26/02/2002\n Nationality: Pakistani\n Gender: Female \n Phone number: (+92) 3465547722 (Mobile) \n \nEmail address: toobamehboob36@gmail.com \n Address: Islamia 



OutOfMemoryError: CUDA out of memory. Tried to allocate 2.14 GiB. GPU 0 has a total capacity of 10.75 GiB of which 1.01 GiB is free. Including non-PyTorch memory, this process has 9.73 GiB memory in use. Of the allocated memory 9.35 GiB is allocated by PyTorch, and 197.84 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)