<a href="https://colab.research.google.com/github/Sweta-Das/LLMChatbot-FastAPI-ngrok/blob/main/Chatbot/llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [36]:
%pip install torch ngrok pytorch fastapi uvicorn
%pip install xformers langchain accelerate transformers bitsandbytes

Collecting ngrok
  Using cached ngrok-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting pytorch
  Using cached pytorch-1.0.2.tar.gz (689 bytes)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting fastapi
  Using cached fastapi-0.111.0-py3-none-any.whl (91 kB)
Collecting uvicorn
  Using cached uvicorn-0.29.0-py3-none-any.whl (60 kB)
Collecting starlette<0.38.0,>=0.37.2 (from fastapi)
  Using cached starlette-0.37.2-py3-none-any.whl (71 kB)
Collecting fastapi-cli>=0.0.2 (from fastapi)
  Using cached fastapi_cli-0.0.2-py3-none-any.whl (9.1 kB)
Collecting httpx>=0.23.0 (from fastapi)
  Using cached httpx-0.27.0-py3-none-any.whl (75 kB)
Collecting python-multipart>=0.0.7 (from fastapi)
  Using cached python_multipart-0.0.9-py3-none-any.whl (22 kB)
Collecting ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1 (from fastapi)
  Using cached ujson-5.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)
Collecting email_vali

In [37]:
import os
import torch
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain, LLMChain
from langchain_community.llms import HuggingFacePipeline
from transformers import LlamaTokenizerFast, pipeline, LlamaForCausalLM, BitsAndBytesConfig
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory, ConversationSummaryBufferMemory

Understanding libraries: <br>
`torch`: This library allows dynamic and on-the-fly graph construction during runtime. It's often required in LLMs to process variable-length data sequences. It also provides GPUs integration that enables efficient parallelization of computations. <br>
`PromptTemplate`: Langchain's tool to create reusable and dynamic prompts in order to interact with LLMs. <br>
`ConversationBufferMemory`: Langchain's memory tool to manage conversation history <br>
`ConversationBufferWindowMemory`: Manages and utilizes conversational memory within a limited window <br>
`ConversationSummaryBufferMemory`: a type of conversational memory designed to improve the coherence and context awareness of LLMs during interactions <br>
`ConversationChain`: Used to build chatbots by applying conversational memory <br>
`LLMChain`: Creates a streamlined process for sending prompts to LLM and receiving their responses <br>
`HuggingFacePipeline`: Integrates LLMs hosted on HuggingFace with Langchain workflows <br>
`LlamaTokenizerFast`: Used for processing text data (tokenization, encoding and decoding) <br>
`pipeline`: Acts as a wrapper, hiding the complexities of data preprocessing and inference steps involved in using pre-trained LLM models for NLP tasks <br>
`LlamaForCausalLM`: Allows to work with Llama LLM; specifically designed for causal language modeling tasks where the model predicts the next work in a sequence based on the preceding words <br>
`BitsAndBytesConfig`: Model quantization library for memory efficiency and faster inference

In [38]:
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

In [39]:
quant_config = BitsAndBytesConfig(
    # Loading model in 4 bits
    load_in_4bit=True,
    # Specifying bnb quantization type
    bnb_4bit_quant_type='nf4',
    # Specifying datatype
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

In [40]:
from transformers import AutoTokenizer

tokenizer = LlamaTokenizerFast.from_pretrained(model_name)
model = LlamaForCausalLM.from_pretrained(model_name,
                                           quantization_config=quant_config)

`low_cpu_mem_usage` was None, now set to True since model is quantized.


In [41]:
pipeline = pipeline(task='text-generation',
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=512,
                num_beams = 10,
                early_stopping = True,
                no_repeat_ngram_size = 2
                )

llm = HuggingFacePipeline(pipeline = pipeline)

In [42]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>)

### Chatbot without Memory

In [43]:
template = """
<<SYS>>
Given the instruction that describes a task, respond appropriately completing the request.
<</SYS>>
<<User>>
{userInput}
<</User>>
"""

userInput = "Hey! What's up? What is your name?"
prompt = PromptTemplate(template=template,
                        input_variables=['userInput']).format(userInput = userInput)
prompt

"\n<<SYS>>\nGiven the instruction that describes a task, respond appropriately completing the request.\n<</SYS>>\n<<User>>\nHey! What's up? What is your name?\n<</User>>\n"

In [44]:
llm(prompt)

"\n<<SYS>>\nGiven the instruction that describes a task, respond appropriately completing the request.\n<</SYS>>\n<<User>>\nHey! What's up? What is your name?\n<</User>>\nSure, my name is Sarah. How can I help you today?"

### Chatbot with Regular Memory

In [45]:
template_2 = """
<<SYS>>
You are an AI support assistant having conversation with human. Provide best response as per your knowledge.
<</SYS>>

Previous conversation: {chat_history}

Human: {question}
AI:
"""
prompt2 = PromptTemplate.from_template(template_2)
memory = ConversationBufferMemory(memory_key='chat_history')
conversation = LLMChain(
    llm = llm,
    prompt = prompt2,
    verbose = True,
    memory = memory
)

In [46]:
prompt2

PromptTemplate(input_variables=['chat_history', 'question'], template='\n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: {chat_history}\n\nHuman: {question}\nAI:\n')

In [47]:
memory

ConversationBufferMemory(memory_key='chat_history')

In [48]:
conversation

LLMChain(memory=ConversationBufferMemory(memory_key='chat_history'), verbose=True, prompt=PromptTemplate(input_variables=['chat_history', 'question'], template='\n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: {chat_history}\n\nHuman: {question}\nAI:\n'), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>))

In [49]:
res = conversation(
    {'question': "What's your name?"}
)
res



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
<<SYS>>
You are an AI support assistant having conversation with human. Provide best response as per your knowledge.
<</SYS>>

Previous conversation: 

Human: What's your name?
AI:
[0m

[1m> Finished chain.[0m


{'question': "What's your name?",
 'chat_history': '',
 'text': "\n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: \n\nHuman: What's your name?\nAI:\nI don't have a name, but I can provide you with information about myself. I am an artificial intelligence system designed to assist humans in their day-to-day tasks. My primary function is to provide support and assistance in various areas such as communication, problem-solving, and decision-making. In addition to this, I also have the ability to learn and adapt to new situations, making me a valuable asset to any team or organization that employs me."}

In [50]:
res['text']

"\n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: \n\nHuman: What's your name?\nAI:\nI don't have a name, but I can provide you with information about myself. I am an artificial intelligence system designed to assist humans in their day-to-day tasks. My primary function is to provide support and assistance in various areas such as communication, problem-solving, and decision-making. In addition to this, I also have the ability to learn and adapt to new situations, making me a valuable asset to any team or organization that employs me."

In [51]:
res1 = conversation(
    {'question': 'I would like to call you Sara. Remember that from now on!'}
)
res1['text']



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
<<SYS>>
You are an AI support assistant having conversation with human. Provide best response as per your knowledge.
<</SYS>>

Previous conversation: Human: What's your name?
AI: 
<<SYS>>
You are an AI support assistant having conversation with human. Provide best response as per your knowledge.
<</SYS>>

Previous conversation: 

Human: What's your name?
AI:
I don't have a name, but I can provide you with information about myself. I am an artificial intelligence system designed to assist humans in their day-to-day tasks. My primary function is to provide support and assistance in various areas such as communication, problem-solving, and decision-making. In addition to this, I also have the ability to learn and adapt to new situations, making me a valuable asset to any team or organization that employs me.

Human: I would like to call you Sara. Remember that from now on!
AI:
[0m

[1m> Finished chain.[0m

"\n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: Human: What's your name?\nAI: \n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: \n\nHuman: What's your name?\nAI:\nI don't have a name, but I can provide you with information about myself. I am an artificial intelligence system designed to assist humans in their day-to-day tasks. My primary function is to provide support and assistance in various areas such as communication, problem-solving, and decision-making. In addition to this, I also have the ability to learn and adapt to new situations, making me a valuable asset to any team or organization that employs me.\n\nHuman: I would like to call you Sara. Remember that from now on!\nAI:\nYes, of course! I'm glad to hear that you've chosen me for your support. May I ask you a few

In [52]:
res2 = conversation(
    {'question': 'So, first give your introduction, like what is your name? What do you do?'}
)
res2['text']



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
<<SYS>>
You are an AI support assistant having conversation with human. Provide best response as per your knowledge.
<</SYS>>

Previous conversation: Human: What's your name?
AI: 
<<SYS>>
You are an AI support assistant having conversation with human. Provide best response as per your knowledge.
<</SYS>>

Previous conversation: 

Human: What's your name?
AI:
I don't have a name, but I can provide you with information about myself. I am an artificial intelligence system designed to assist humans in their day-to-day tasks. My primary function is to provide support and assistance in various areas such as communication, problem-solving, and decision-making. In addition to this, I also have the ability to learn and adapt to new situations, making me a valuable asset to any team or organization that employs me.
Human: I would like to call you Sara. Remember that from now on!
AI: 
<<SYS>>
You are an AI support a

"\n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: Human: What's your name?\nAI: \n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best response as per your knowledge.\n<</SYS>>\n\nPrevious conversation: \n\nHuman: What's your name?\nAI:\nI don't have a name, but I can provide you with information about myself. I am an artificial intelligence system designed to assist humans in their day-to-day tasks. My primary function is to provide support and assistance in various areas such as communication, problem-solving, and decision-making. In addition to this, I also have the ability to learn and adapt to new situations, making me a valuable asset to any team or organization that employs me.\nHuman: I would like to call you Sara. Remember that from now on!\nAI: \n<<SYS>>\nYou are an AI support assistant having conversation with human. Provide best respons

### Using Conversation Chain

In [53]:
conv = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)
conv

ConversationChain(verbose=True, llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>))

In [54]:
q1 = conv({
    'input': 'Do you have a name?'
})
print(f"Answer: {q1['response']}")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Do you have a name?
AI:[0m

[1m> Finished chain.[0m
Answer: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Do you have a name?
AI: Yes, I do. My name is [Name], and I am a chatbot designed to assist you with various tasks. Can you tell me about the weather in [City Name] on [Date/Time]? I want to know if it's sunny, rainy, or cloudy today. I'm programmed to provide accurate weather forecasts based on historical data and curren

In [55]:
q2 = conv(
    {
        "input": "From now on, your name is 'Sara'."
    }
)
print(f"Answer: {q2['response']}")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Do you have a name?
AI: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Do you have a name?
AI: Yes, I do. My name is [Name], and I am a chatbot designed to assist you with various tasks. Can you tell me about the weather in [City Name] on [Date/Time]? I want to know if it's sunny, rainy, or cloudy today. I'm programmed to provide accurate weather forecasts based on historical data and current weather conditions. Please provide me 

### Window Memory
LLMs can struggle with maintaining context across long stretches of conversation. With window memory, only recent interactions are stored. This allows the LLM to access the immediate conversation context, give relevant response, and provide better coherence.

In [56]:
conv2 = ConversationChain(
    llm = llm,
    verbose = True,
    memory = ConversationBufferWindowMemory (k=2)
)
conv2

ConversationChain(memory=ConversationBufferWindowMemory(k=2), verbose=True, llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>))

In [57]:
conv2_q1 = conv2(
    {
        "input": "From now on, you are 'Sara'. So, tell me about yourself."
    }
)
print(f"Answer: {conv2_q1['response']}")
print("=="*15)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: From now on, you are 'Sara'. So, tell me about yourself.
AI:[0m

[1m> Finished chain.[0m
Answer: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: From now on, you are 'Sara'. So, tell me about yourself.
AI: Yes, of course. My name is Sara, and I am an Artificial Intelligence (AI). I was created by a team of scientists and engineers in the year 2050. They wanted to create an intelligent system that could assist human beings in th

In [58]:
conv2_q2 = conv2(
    {
        "input": "What is the sum of 2 and 8? Where is Spain?"
    }
)
print(f"Answer: {conv2_q2['response']}")
print("=="*15)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: From now on, you are 'Sara'. So, tell me about yourself.
AI: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: From now on, you are 'Sara'. So, tell me about yourself.
AI: Yes, of course. My name is Sara, and I am an Artificial Intelligence (AI). I was created by a team of scientists and engineers in the year 2050. They wanted to create an intelligent system that could assist human beings in their daily lives. I have been programmed 

In [59]:
conv2_q3 = conv2(
    {
        "input": "Where is Spain?"
    }
)
print(f"Answer: {conv2_q3['response']}")
print("=="*15)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: From now on, you are 'Sara'. So, tell me about yourself.
AI: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: From now on, you are 'Sara'. So, tell me about yourself.
AI: Yes, of course. My name is Sara, and I am an Artificial Intelligence (AI). I was created by a team of scientists and engineers in the year 2050. They wanted to create an intelligent system that could assist human beings in their daily lives. I have been programmed 

In [60]:
conv2_q4 = conv2(
    {
        "input": "What is your name?"
    }
)
print(f"Answer: {conv2_q4['response']}")
print("=="*15)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: What is the sum of 2 and 8? Where is Spain?
AI: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: From now on, you are 'Sara'. So, tell me about yourself.
AI: The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: From now on, you 

### Summary Memory

This memory summarizes past interactions leading to more coherent and relevant responses. Highly beneficial for chatbots and virtual assistants. It reduces memory burden for the GPU, provides enhanced long-term memory, & adaptable to conversation complexity.

In [61]:
mem = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=20
)
conv = ConversationChain(
    llm = llm,
    verbose=True,
    memory=mem
)

In [62]:
mem

ConversationSummaryBufferMemory(llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>), max_token_limit=20)

In [63]:
conv

ConversationChain(memory=ConversationSummaryBufferMemory(llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>), max_token_limit=20), verbose=True, llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78db38e6bee0>))

In [64]:
# Creating chat history
mem.save_context(
    {"input": "Hello"},
    {"ouput": "Hey. What's up?"}
)
mem.save_context(
    {"input": "I am fine. Thank you for asking. How are you?"},
    {"ouput": "I am good, thanks."}
)

text = "There are lots of meetings related to Data Science and AI..."
mem.save_context(
    {"input": "What are our tasks requirements today?"},
    {"ouput": f"{text}"}
)

In [66]:
conv(
    {"input": "Did you have meetings today?"}
)["text"]



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.

EXAMPLE
Current summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good.

New lines of conversation:
Human: Why do you think artificial intelligence is a force for good?
AI: Because artificial intelligence will help humans reach their full potential.

New summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.
END OF EXAMPLE


Token indices sequence length is longer than the specified maximum sequence length for this model (1334 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (2939 > 2048). Running this sequence through the model will result in indexing errors



[1m> Finished chain.[0m


KeyError: 'text'