<a href="https://colab.research.google.com/github/bvm2129/RAG/blob/main/Assignment5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Retrieval Augmanted Generation (RAG) Mechanism

In [1]:
!pip install transformers faiss-cpu
# transformers is a Python Library used for performing NLP Tasks and other operations on Hugging Face models
# FAISS (Facebook AI Similarity Search)

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0.post1


In [2]:
from transformers import AutoTokenizer, AutoModel
# It's like saying "automatically take suitable tokenizer and model for the process below from the module"
import torch
# works with the tensors (encoding, decoding) while retrieval process
import numpy as np
# works with the multi-dimentional arrays


# initializing the tokenizer and model by assigning the model name
tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

In [3]:
#INGESTION ##1
sample_text="""
Tokyo,[a] officially the Tokyo Metropolis,[b] is the capital and most populous city in Japan.
With a population of over 14 million in the city proper in 2023, it is one of the most populous urban areas in the world.
The Greater Tokyo Area, which includes Tokyo and parts of six neighboring prefectures,
is the most populous metropolitan area in the world, with 41 million residents as of 2024.
"""

In [4]:
#EMBEDDING ##3
def get_embedding(text):
  tokens=tokenizer(text, return_tensors="pt", truncation=True, padding=True)
  # it converts raw text into tokens so that the model can understand
  with torch.no_grad():  # it's like saying, "don't do lengthy calculations, keep it simple"
    model_output=model(**tokens)  # Unpacks the token dictionary into keyword arguments
  return model_output.last_hidden_state.mean(dim=1).squeeze().numpy()

# mean(dim=1): Averages over all tokens (word vectors), giving you a single vector per sentence.
# squeeze(): Removes any extra dimensions
# numpy(): Converts it from a PyTorch tensor to a NumPy array, so it's easier to store or use

In [5]:
import faiss
# FAISS is used in Python for efficient similarity search and clustering of dense vectors in high-dimensional spaces.
from transformers import pipeline
# pipeline function in transformers is used to connect or passage the gap between code and model to run the program
qa_pipe=pipeline("text2text-generation", model="google/flan-t5-base")

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [6]:
#CHUNKING ##2
chunks=[sample_text]  # list with single element, .i.e a big para of info

embeddings=[get_embedding(chunk) for chunk in chunks] # each word in the element is passed in form of list comprehension
dime=len(embeddings[0])  # length of the list/ the dimension of the future array
index=faiss.IndexFlatL2(dime)  # faiss functions= storing, indexing, searching the related info for generation
index.add(np.array(embeddings).astype("float32"))
# it takes text embeddings, formats them correctly, and then adds them to a searchable FAISS index

In [7]:
#INDEXING ##4
def retrieve_and_answer(query, top_k=1):
  query_embedding=get_embedding(query).reshape(1, -1)
  # it prepares the user's query to be used for searching the FAISS index to find the most similar text chunks
  _, indices=index.search(query_embedding, top_k)
  retrieved_texts=[chunks[i] for i in indices[0]]
  context="".join(retrieved_texts) # combines all the derived converted chunks into a single string

  prompt=f"Context:{context} \n\n Question:{query}\n Answer:"
  result=qa_pipe(prompt, max_length=80) # required modifications before generating the response
  return result[0]['generated_text']


In [8]:
#RETRIEVAL ##5
query="What is the population in Tokyo?"
answer=retrieve_and_answer(query) # response is retrieved and generated
print("Q: ", query)
print("A: ", answer)

Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q:  What is the population in Tokyo?
A:  14 million
