RAG (Retrival Agumented Generation ):
Retrival -> Search knowledge based on user query
Agumented ->
Generator -> Used retrived document as context to generate acurrate relevant response.
LLM hallucination = fluent, confident output that isn’t grounded in truth or provided data.

In our implementation, we will:

Use Wikipedia as our external knowledge source.
Employ Sentence Transformers for embedding text and FAISS for efficient similarity search.
Utilize Hugging Face’s question-answering pipeline to extract answers from retrieved documents.

FAISS stands for Facebook AI Similarity Search. It’s a library used to do fast similarity search and clustering over large sets of vectors (embeddings).

It mostly used to retrive the relevant context for LLM.

In [5]:
pip install transformers



In [6]:
pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=96938ebd24d3b46e28ee512188e1d6261ced8085608e799382c716995c0cabb6
  Stored in directory: /root/.cache/pip/wheels/63/47/7c/a9688349aa74d228ce0a9023229c6c0ac52ca2a40fe87679b8
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [7]:
pip install sentence_transformers



In [8]:
!pip install faiss-cpu


Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.2


In [9]:
import wikipedia
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

Retriving Knowledge

In [10]:
def get_wikipedia_content(topic):
  try:
    page = wikipedia.page(topic)
    return page.content
  except wikipedia.exceptions.PageError:
    return None
  except wikipedia.exceptions.DisambiguationError as e:
    print(f"Ambiguous topic. Please be more specific. Options: {e.options}")
    return None

In [26]:
# user input
topic = input("Enter a topic to learn about: ")
document = get_wikipedia_content(topic)

if not document:
  print("Could not retrive Info")
  exit()

Enter a topic to learn about: Fruits


In [27]:
document

'In botany, a fruit is the seed-bearing structure in flowering plants (angiosperms) that is formed from the ovary after flowering.\nFruits are the means by which angiosperms disseminate their seeds. Edible fruits in particular have long propagated using the movements of humans and other animals in a symbiotic relationship that is the means for seed dispersal for the one group and nutrition for the other; humans, and many other animals, have become dependent on fruits as a source of food. Consequently, fruits account for a substantial fraction of the world\'s agricultural output, and some (such as the apple and the pomegranate) have acquired extensive cultural and symbolic meanings.\nIn common language and culinary usage, fruit normally means the seed-associated fleshy structures (or produce) of plants that typically are sweet (or sour) and edible in the raw state, such as apples, bananas, grapes, lemons, oranges, and strawberries. In botanical usage, the term fruit also includes many s

In [28]:
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-mpnet-base-v2")

We will split the text into overlapping chucks for better retrival.
Sentence_Transformer -> Hugging Face tokenizer who know how to tokenize the words.[all-mpnet-base-v2]

Here, we are tokenizing the retrieved Wikipedia content and splitting it into smaller overlapping chunks for efficient retrieval. We used a pre-trained tokenizer (all-mpnet-base-v2) to break the text into tokens, then divided it into fixed-size segments (256 tokens each) with an overlap of 20 tokens to maintain context between chunks.

In [30]:
def split_text(text, chunk_size=256, chunk_overlap=20):
    token = tokenizer.tokenize(text)
    chunks = []
    start = 0
    while start < len(token):
      end = min(start+ chunk_size, len(token))
      chunks.append(tokenizer.convert_tokens_to_string(token[start:end]))
      if end == len(token):
         break
      start = end - chunk_overlap
    return chunks

chunks = split_text(document)
print(f"Number of chunks: {len(chunks)}")

Number of chunks: 24


Step 2: Storing and Retrieving Knowledge
To efficiently search for relevant chunks, we will use Sentence Transformers to convert text into embeddings and store them in a FAISS index:

Here, we converted the text chunks into numerical embeddings using the Sentence Transformer model (all-mpnet-base-v2), which captures their semantic meaning. We then created a FAISS index with an L2 (Euclidean) distance metric and stored the embeddings in it. This will allow us to efficiently retrieve the most relevant chunks based on a user’s query.

In [31]:
embedding_model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
embeddings = embedding_model.encode(chunks)

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/all-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


In [32]:
embeddings

array([[-0.01298279, -0.04379065,  0.00120637, ..., -0.01785989,
         0.05311805, -0.01185055],
       [-0.02446445, -0.03221103, -0.00502624, ...,  0.00493083,
         0.05100277,  0.00806903],
       [-0.0113179 , -0.06049761, -0.00984603, ..., -0.02301597,
         0.03133214, -0.01998937],
       ...,
       [ 0.00465859,  0.00315281,  0.00013881, ..., -0.03739901,
        -0.03622653, -0.03263571],
       [ 0.03770689, -0.00896055, -0.01711988, ..., -0.0309832 ,
        -0.00723123, -0.0136378 ],
       [ 0.00689307,  0.05240628, -0.01444541, ...,  0.02029368,
         0.01356044, -0.01571896]], dtype=float32)

In [34]:
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

Step 3: Querying the RAG Pipeline
Now, we will take user input for the RAG pipeline. When a user asks a question, we will:

Convert the query into an embedding.
Retrieve the top-k most relevant chunks using FAISS.
Use an LLM-powered question-answering model to generate the answer.

In [36]:
query = input("Ask a question about the topic: ")
query_embedding = embedding_model.encode([query])

k=3
distance, indices = index.search(np.array(query_embedding), k)

Ask a question about the topic: What is difference between Apple and Banana?


In [37]:
distance, indices

(array([[0.858746  , 0.9416712 , 0.99291474]], dtype=float32),
 array([[ 0, 15,  1]]))

In [38]:
retrived_chunks = [chunks[i] for i in indices[0]]
print("Retrived chunks:")
for chunk in retrived_chunks:
  print("-" + chunk)

Retrived chunks:
-in botany, a fruit is the seed - bearing structure in flowering plants ( angiosperms ) that is formed from the ovary after flowering. fruits are the means by which angiosperms disseminate their seeds. edible fruits in particular have long propagated using the movements of humans and other animals in a symbiotic relationship that is the means for seed dispersal for the one group and nutrition for the other ; humans, and many other animals, have become dependent on fruits as a source of food. consequently, fruits account for a substantial fraction of the world ' s agricultural output, and some ( such as the apple and the pomegranate ) have acquired extensive cultural and symbolic meanings. in common language and culinary usage, fruit normally means the seed - associated fleshy structures ( or produce ) of plants that typically are sweet ( or sour ) and edible in the raw state, such as apples, bananas, grapes, lemons, oranges, and strawberries. in botanical usage, the te

Step 4: Answering the Question with an LLM
Now, we will use a pre-trained question-answering model to extract the final answer from the retrieved context:

In [39]:
qa_model_name = "deepset/roberta-base-squad2"
qa_tokenizer = AutoTokenizer.from_pretrained(qa_model_name)
qa_model = AutoModelForQuestionAnswering.from_pretrained(qa_model_name)
qa_pipeline = pipeline("question-answering",  model=qa_model, tokenizer=qa_tokenizer)

context = " ".join(retrived_chunks)
answer = qa_pipeline(question=query, context=context)
print(f"Answer: {answer['answer']}")
_

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

RobertaForQuestionAnswering LOAD REPORT from: deepset/roberta-base-squad2
Key                             | Status     |  | 
--------------------------------+------------+--+-
roberta.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


Answer: seedlessness


(array([[0.858746  , 0.9416712 , 0.99291474]], dtype=float32),
 array([[ 0, 15,  1]]))