<a href="https://colab.research.google.com/github/Alex112525/LangChain-with-LLMs/blob/main/LangChain_BufferWindowMemory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers einops accelerate langchain bitsandbytes xformers

In [2]:
!nvidia-smi

Wed Jul 19 02:59:47 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline
from langchain import PromptTemplate, LLMChain
from langchain.chains import ConversationChain

import torch
import langchain
from IPython.display import display, Markdown

In [4]:
def printmd(string):
    display(Markdown(string))

**Falcon-7B-Instruct** is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.).

**Falcon-7B** is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.

In [None]:
model = "tiiuae/falcon-7b-instruct"  # The name of the model to use.

tokenizer = AutoTokenizer.from_pretrained(model)  # The tokenizer to use to encode and decode text.

pipeline = pipeline("text-generation",            # The name of the pipeline to use.
                    model=model,                  # The model to use for text generation.
                    tokenizer=tokenizer,          # The tokenizer to use to encode and decode text.
                    torch_dtype=torch.bfloat16,   # The data type to use for the model's computations.
                    trust_remote_code=True,       # Whether to trust remote code when loading the model. This should only be set to True if you trust the source of the model.
                    device_map="auto",            # The device to use for the model's computations.
                    max_length=512 ,              # The maximum length of the text to generate.
                    do_sample=True,               # Whether to use sampling when generating text.
                    top_k=2,                      # The number of candidates to keep when sampling.
                    num_return_sequences=1,       # The number of text sequences to generate.
                    eos_token_id=tokenizer.eos_token_id  # The ID of the end-of-sequence token.
                    )

LLM **temperature** is a parameter that controls the randomness of the word selection in a language model. It is the inverse of the scaling factor used to adjust the logits before the softmax function is applied. A low temperature makes the model more deterministic and selects the most likely word with a higher probability. A high temperature makes the model more diverse and selects words with lower probabilities.

In [6]:
llm = HuggingFacePipeline(pipeline=pipeline, model_kwargs={"temperature": 0.2})

## Conversation buffer window memory

In Langchain, ConversationBufferWindowMemory is a list of conversation interactions over time. It only uses the last K interactions. This can be useful to keep a sliding window of the most recent interactions, so that the buffer is not too large.

In [7]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=2)
template = """
Instructions:
You are a assistant chatbot designed to engage in conversations with users.
When asked a question, respond in a coherent and natural manner, avoiding providing any additional information beyond the answer.

Conversation History:
Chatbot: Hello, I am your intelligent personal chatbot, please ask me any question and I will try to help you with the best answer. /end
User: What you can do?/end
Chatbot: I can answer all your questions and talk to you about any subject./end
{history}

User: {input}
Chatbot:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
chatbot = ConversationChain(
    prompt=PROMPT,
    llm=llm,
    verbose=True,
    memory=memory,
)

In [8]:
ans = chatbot.predict(input="Hello mi name is Alex, How are you? ")
printmd(ans)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Instructions:
You are a assistant chatbot designed to engage in conversations with users. 
When asked a question, respond in a coherent and natural manner, avoiding providing any additional information beyond the answer.

Conversation History:
Chatbot: Hello, I am your intelligent personal chatbot, please ask me any question and I will try to help you with the best answer. /end
User: What you can do?/end
Chatbot: I can answer all your questions and talk to you about any subject./end


User: Hello mi name is Alex, How are you? 
Chatbot:[0m

[1m> Finished chain.[0m


 Hi Alex, I am doing great. How about you? How are you doing? 
User 

In [9]:
ans = chatbot.predict(input="Good, thanks for asking, can you tell me how the Memory works in humans")
printmd(ans)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Instructions:
You are a assistant chatbot designed to engage in conversations with users. 
When asked a question, respond in a coherent and natural manner, avoiding providing any additional information beyond the answer.

Conversation History:
Chatbot: Hello, I am your intelligent personal chatbot, please ask me any question and I will try to help you with the best answer. /end
User: What you can do?/end
Chatbot: I can answer all your questions and talk to you about any subject./end
Human: Hello mi name is Alex, How are you? 
AI:  Hi Alex, I am doing great. How about you? How are you doing? 
User 

User: Good, thanks for asking, can you tell me how the Memory works in humans
Chatbot:[0m

[1m> Finished chain.[0m


 Sure, memory is a complex process that involves the encoding, storage, and retrieval of information in the human brain. It is a vital cognitive process that allows us to retain and recall past experiences and events.
User 

In [10]:
ans = chatbot.predict(input="Can you explain to me short-term and long-term memory")
printmd(ans)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Instructions:
You are a assistant chatbot designed to engage in conversations with users. 
When asked a question, respond in a coherent and natural manner, avoiding providing any additional information beyond the answer.

Conversation History:
Chatbot: Hello, I am your intelligent personal chatbot, please ask me any question and I will try to help you with the best answer. /end
User: What you can do?/end
Chatbot: I can answer all your questions and talk to you about any subject./end
Human: Hello mi name is Alex, How are you? 
AI:  Hi Alex, I am doing great. How about you? How are you doing? 
User 
Human: Good, thanks for asking, can you tell me how the Memory works in humans
AI:  Sure, memory is a complex process that involves the encoding, storage, and retrieval of information in the human brain. It is a vital cognitive process that allows us to retain and recall past experiences and events.
Use

 Sure, short-term memory is the temporary storage of information in the brain, while long-term memory is the permanent storage of information in the brain.
User 
Human: 

In [11]:
ans = chatbot.predict(input="Could you give me a summary of what we talked about? ")
printmd(ans)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Instructions:
You are a assistant chatbot designed to engage in conversations with users. 
When asked a question, respond in a coherent and natural manner, avoiding providing any additional information beyond the answer.

Conversation History:
Chatbot: Hello, I am your intelligent personal chatbot, please ask me any question and I will try to help you with the best answer. /end
User: What you can do?/end
Chatbot: I can answer all your questions and talk to you about any subject./end
Human: Good, thanks for asking, can you tell me how the Memory works in humans
AI:  Sure, memory is a complex process that involves the encoding, storage, and retrieval of information in the human brain. It is a vital cognitive process that allows us to retain and recall past experiences and events.
User 
Human: Can you explain to me short-term and long-term memory
AI:  Sure, short-term memory is the temporary storage

 Sure, we discussed memory, short-term and long-term memory, and the process of how memory works in humans. Is there anything else you would like to know? /end
User 

You can see at the end that it responds well and has a context of the conversation we have had, keeping only a context window of 2 questions.

At the end of each response it tries to complete with a "User:" or "Human:" which I tried to avoid by putting a "/end" at the end of each line in the prompt, but it didn't improve it.

**In conclusion the model behaves well, maybe it just needs a better prompt.**