## Setup Environment

In [1]:
%%capture
!pip install langchain
!pip install langchain-hub
!pip install langchain-community langchain-huggingface
!pip install huggingface_hub transformers
!pip install sentence_transformers==2.2.2
!pip install chromadb faiss accelerate
!pip install -U bitsandbytes
!pip install tiktoken python-dotenv

## Import Modules

In [2]:
# llm modules
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    pipeline
)
from transformers import GenerationConfig
from langchain.llms import HuggingFacePipeline

# chain
from langchain.chains import ConversationChain
from langchain.chains import LLMChain

# memory
from langchain.memory import ConversationBufferMemory

# prompt modules
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import ChatPromptTemplate

# output parser
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

# base modules
import os
import torch
import warnings

warnings.filterwarnings("ignore")

## Create LLM and Chain

In [3]:
model_name = "minkhantycc/Llama-2-7b-chat-finetune-quantized"
device_map = {"": 0}

# bnb config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=False,
)


# base model
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

base_model.generation_config = GenerationConfig(
    max_new_tokens = 256,
    temperature = 0.01,
    repetition_penalty = 1.15,
    do_sample = False,
    eos_token_id = tokenizer.eos_token_id,
    pad_token_id = tokenizer.eos_token_id,
)

# pipeline
pipe = pipeline(
    task="text-generation",
    model=base_model,
    tokenizer=tokenizer,
    device_map=device_map,
    return_full_text=False
)

# llm
llm = HuggingFacePipeline(pipeline=pipe)

config.json:   0%|          | 0.00/630 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/183 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.57k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/437 [00:00<?, ?B/s]

## Memory

Now, we will use a memory to save our chat messages for chat continuity.
Before we continue, we need to format the prompt template as described in the original repo since we are using Llama 2 with HF pipeline. Here, an new input variable `history` is added to the prompt to include the previous chat messages.

In [4]:
# create a Chat Prompt template
template_string = """<s>[INST] \
The following is a friendly conversation between a human and an AI. \
The AI is talkative and provides lots of specific details from its context. \
If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

{history}
Human: {input}
AI: [/INST]"""

prompt_template = PromptTemplate.from_template(template_string)
print(prompt_template)

input_variables=['history', 'input'] template='<s>[INST] The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n\nCurrent conversation:\n\n{history}\nHuman: {input}\nAI: [/INST]'


In [5]:
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True,
    prompt=prompt_template
)

In [6]:
conversation.predict(input="Hi, my name is Andrew")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:


Human: Hi, my name is Andrew
AI: [/INST][0m

[1m> Finished chain.[0m


" Hello Andrew! My name is ChatGenie, I'm here to help you with any questions or problems you might have. How can I assist you today? 😊 "

In [7]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Andrew
AI:  Hello Andrew! My name is ChatGenie, I'm here to help you with any questions or problems you might have. How can I assist you today? 😊 
Human: What is my name?
AI: [/INST][0m

[1m> Finished chain.[0m


' Your name is Andrew. '

In [8]:
print(memory.buffer)
print(memory.load_memory_variables({}))

Human: Hi, my name is Andrew
AI:  Hello Andrew! My name is ChatGenie, I'm here to help you with any questions or problems you might have. How can I assist you today? 😊 
Human: What is my name?
AI:  Your name is Andrew. 
{'history': "Human: Hi, my name is Andrew\nAI:  Hello Andrew! My name is ChatGenie, I'm here to help you with any questions or problems you might have. How can I assist you today? 😊 \nHuman: What is my name?\nAI:  Your name is Andrew. "}


`ConversationBufferMemory` saves all of the chat messages between the user and AI. As the conversation continues, the number of messages will increase and the number of tokens will increase as well. However, our model can only handle upto 4096 tokens. Tokens exceeding this will be truncated and information can be lost. Therefore, we will test another memory.

## Conversation Buffer Window Memory

In [9]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=2)

memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What about you?"},
                    {"output": "Reading a book."})
memory.save_context({"input": "Great. How is it?"},
                    {"output": "It's interesting."})

print(memory.buffer)
print(memory.load_memory_variables({}))

Human: What about you?
AI: Reading a book.
Human: Great. How is it?
AI: It's interesting.
{'history': "Human: What about you?\nAI: Reading a book.\nHuman: Great. How is it?\nAI: It's interesting."}


## Conversation Token Buffer Memory

In [10]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=30)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"},
                    {"output": "Charming!"})

memory.load_memory_variables({})

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

{'history': 'AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

## Conversation Summary Memory

In [20]:
from langchain.memory import ConversationSummaryBufferMemory

# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"},
                    {"output": f"{schedule}"})

print(memory.buffer)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


System: 
The human greets the AI and asks about its plans for the day. The AI responds with some information about its schedule.  END OF EXAMPLE
AI: There is a meeting at 8am with your product team. You will need your powerpoint presentation prepared. 9am-12pm have time to work on your LangChain project which will go quickly because Langchain is such a powerful tool. At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.


In [21]:
conversation = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True,
    prompt=prompt_template,
)
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

System: 
The human greets the AI and asks about its plans for the day. The AI responds with some information about its schedule.  END OF EXAMPLE
AI: There is a meeting at 8am with your product team. You will need your powerpoint presentation prepared. 9am-12pm have time to work on your LangChain project which will go quickly because Langchain is such a powerful tool. At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.
Human: What would be a good demo to show?
AI: [/INST][0m

[1m> Finished chain

" I'm glad you asked! Here are some ideas for demos that could impress your customers:\n\n1. Natural Language Processing (NLP) - Show how well our AI can understand and generate text based on user input. This could include tasks like language translation or sentiment analysis.\n2. Image Recognition - Demonstrate how well our AI can recognize objects within images. This could involve showing examples of different types of images and asking the AI to identify them.\n3. Predictive Modeling - Use data from past projects to demonstrate how accurate our predictions are when using machine learning algorithms. For example, we could use historical sales data to predict future trends.\n4. Reinforcement Learning - Show how well our AI can learn through trial and error by demonstrating a simple game or task where the AI must make decisions based on rewards or penalties.\n5. Generative Adversarial Networks (GANs) - Demonstrate how well our GANs can create realistic images or videos by feeding them 

In [22]:
conversation.predict(input="Customer once said that they are interested in chatbot development.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

System: 
The human greets the AI and asks about its plans for the day. The AI responds with some information about its schedule, including a meeting at 8am with the product team, working on the LangChain project until noon, and having lunch with a customer who is driving from over an hour away. The AI also provides suggestions for demos to showcase its abilities during the meeting.
END OF EXAMPLE
Human: Customer once said that they are interested in chatbot development.
AI: [/INST][0m

[1m> Finished chain.[0m


" I'm glad you asked! As an AI language model, I have been trained on a wide range of topics related to chatbots, including their history, design principles, and applications. I would be happy to provide more detailed information or examples if needed. Please let me know how I can help further."