# FLAN-T5
# Question and Answer System

In [None]:
!pip install langchain
!pip install langchain_community
!pip install -U langchain-huggingface

Collecting langchain
  Downloading langchain-0.3.2-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.8 (from langchain)
  Downloading langchain_core-0.3.9-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.131-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4.0,>=0.3.8->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting httpx<1,>=0.23.0 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading orjson-3.10.7-cp310-cp310-ma

In [None]:
!pip install transformers faiss-cpu sentence-transformers

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0.post1


In [None]:
!pip freeze > requirement.txt

Loading Data

In [None]:
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("http://jalammar.github.io/illustrated-transformer/")
data = loader.load()



Splitting Data
  Split the Document into chunks for embedding and vector storage.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)

Storing
  embed the contents of each document, then store the embedding and document in a vector store, with the embedding being used to index the document.

In [None]:
# Import FAISS from Langchain Vectorstore
from langchain.vectorstores import FAISS

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from tqdm import tqdm, trange
from transformers import AutoTokenizer

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}

tokenizer = AutoTokenizer.from_pretrained(model_name, clean_up_tokenization_spaces=True)

hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
 # Creating a vector store
vectorstore = FAISS.from_documents(documents=all_splits, embedding=hf) ## hf are the hugging face embeddings

Retrieval

In [None]:
question = "What are transformers?"
docs = vectorstore.similarity_search(question)
docs

[Document(metadata={'source': 'http://jalammar.github.io/illustrated-transformer/', 'title': 'The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.', 'description': 'Discussions:\nHacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)\n\n\nTranslations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese\n\nWatch: MIT’s Deep Learning State of the Art lecture referencing this post\n\nFeatured in courses at Stanford, Harvard, MIT, Princeton, CMU and others\n\nIn the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperform

Generation
  Using FLAN_T5 language model

In [None]:
# For downloading the models from HF Hub
!pip install huggingface_hub==0.23.5 -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/402.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m399.4/402.8 kB[0m [31m13.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m402.8/402.8 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [33]:
# Set the environment variable
from google.colab import userdata
import os

# Ensure the token is correctly retrieved and assigned to the environment variable
token = userdata.get('HF_TOKEN')
if token:
  os.environ['HUGGINGFACEHUB_API_TOKEN'] = token
else:
  print("HuggingFace token not found in userdata.")

In [34]:
from langchain import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id='google/flan-t5-base',
    model_kwargs={"temperature":0, "max_length":1000, 'max_new_tokens' : 250, 'top_k' : 10, 'top_p': 0.95, 'repetition_penalty':1.03}
)

In [None]:
#from huggingface_hub import notebook_login
#notebook_login()

In [38]:

# Define the question
question = "What is attention mechanism?"
# using the model
output = llm.invoke(question)
print(output)


a reversal of the previous state


## Chat
  - Summary Memory creation

In [39]:
from langchain.memory import ConversationSummaryMemory

In [40]:
memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="chat_history",
    return_messages=True
)

  memory = ConversationSummaryMemory(


## Conversational Retrieval Chain
- chat history + new question

In [41]:
from langchain.chains import ConversationalRetrievalChain

retriever = vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory,
    verbose=True
)

## Demonstration

In [42]:
chat("Explain self-attention")

  chat("Explain self-attention")




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

If you’re familiar with RNNs, think of how maintaining a hidden state allows an RNN to incorporate its representation of previous words/vectors it has processed with the current one it’s processing. Self-attention is the method the Transformer uses to bake the “understanding” of other relevant words into the one we’re currently processing.

The second step in calculating self-attention is to calculate a score. Say we’re calculating the self-attention for the first word in this example, “Thinking”. We need to score each word of the input sentence against this word. The score determines how much focus to place on other parts of the input sentence as we encode a word at a certain position.

Self

{'question': 'Explain self-attention',
 'chat_history': [SystemMessage(content='', additional_kwargs={}, response_metadata={})],
 'answer': 'The Transformer uses to bake the “understanding” of other relevant words into the one we’re currently processing. The second step in calculating self-attention is to calculate a score.'}

In [43]:
chat("What is a gentler approach to transformers?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

system: The Transformer uses the “understanding” of other relevant words to calculate a score.
Follow Up Input: What is a gentler approach to transformers?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Go Forth And Transform
I hope you’ve found this a useful place to start to break the ice with the major concepts of the Transformer. If you want to go deeper, I’d suggest these next steps:

2020 Update: I’ve created a “Narrated Transformer” video whi

{'question': 'What is a gentler approach to transformers?',
 'chat_history': [SystemMessage(content='The Transformer uses the “understanding” of other relevant words to calculate a score.', additional_kwargs={}, response_metadata={})],
 'answer': 'Narrated Transformer'}

In [44]:
chat("Where were transformers proposed?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

system: The human is a little confused about what the transformer is doing.
Follow Up Input: Where were transformers proposed?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

2020 Update: I’ve created a “Narrated Transformer” video which is a gentler approach to the topic:

Go Forth And Transform
I hope you’ve found this a useful place to start to break the ice with the major concepts of the Transformer. If you want to go deeper, I’d suggest these ne

{'question': 'Where were transformers proposed?',
 'chat_history': [SystemMessage(content='The human is a little confused about what the transformer is doing.', additional_kwargs={}, response_metadata={})],
 'answer': 'in the paper Attention is All You Need'}

In [45]:
chat("What are the different layers in a typical Transformer model?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

system: The human is looking for a transformer.
Follow Up Input: What are the different layers in a typical Transformer model?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Go Forth And Transform
I hope you’ve found this a useful place to start to break the ice with the major concepts of the Transformer. If you want to go deeper, I’d suggest these next steps:

This goes for the sub-layers of the decoder as well. If we’re to think of a Transformer o

{'question': 'What are the different layers in a typical Transformer model?',
 'chat_history': [SystemMessage(content='The human is looking for a transformer.', additional_kwargs={}, response_metadata={})],
 'answer': 'attention layer'}

In [46]:
chat("If the vocabulary is 10,000 words, what would the width of the logits vector?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

system: The human is looking for a transformer.
Follow Up Input: If the vocabulary is 10,000 words, what would the width of the logits vector?
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Let’s assume that our model knows 10,000 unique English words (our model’s “output vocabulary”) that it’s learned from its training dataset. This would make the logits vector 10,000 cells wide – each cell corresponding to the score of a unique word. That is how w

{'question': 'If the vocabulary is 10,000 words, what would the width of the logits vector?',
 'chat_history': [SystemMessage(content='The human is looking for a transformer.', additional_kwargs={}, response_metadata={})],
 'answer': '10,000 cells wide'}

In [47]:
chat("Explain the training process of a Transformer network in detail")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

system: The logits vector is 10,000 cells wide.
Follow Up Input: Explain the training process of a Transformer network in detail
Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Recap Of Training
Now that we’ve covered the entire forward-pass process through a trained Transformer, it would be useful to glance at the intuition of training the model.
During training, an untrained model would go through the exact same forward pass. But since we are train

{'question': 'Explain the training process of a Transformer network in detail',
 'chat_history': [SystemMessage(content='The logits vector is 10,000 cells wide.', additional_kwargs={}, response_metadata={})],
 'answer': 'Forward-pass'}