# Retrival Augmentated Generation

First, we define a list of facts about Nigeria that we will use as our knowledge base for the RAG system.

In [1]:
# Fifty facts about Nigeria
document_list = [
    "Nigeria is located in West Africa, bordered by Benin, Chad, Cameroon, and Niger.",
    "It is the most populous country in Africa, with over 220 million people (as of 2024).",
    "Nigeria is the 7th most populous country in the world.",
    "The country has over 250 ethnic groups, making it one of the most culturally diverse nations on Earth.",
    "The official capital is Abuja, which became the capital in 1991, replacing Lagos.",
    "Nigeria’s terrain includes rainforests, savannas, mountains, and a long coastline along the Gulf of Guinea (853 km).",
    "The Niger River and Benue River are the two largest rivers, converging in the center of the country.",
    "Nigeria has over 1,000 species of butterflies and more than 1,000 bird species.",
    "The highest point in Nigeria is Chappal Waddi (also called Gangirwal), standing at 2,419 meters (7,936 ft) in the Adamawa Mountains.",
    "Nigeria has several national parks, including Yankari National Park and Cross River National Park.",
    "Nigeria gained independence from British colonial rule on October 1, 1960.",
    "It became a republic in 1963, with Nnamdi Azikiwe as its first president.",
    "Nigeria experienced a brutal civil war (the Biafran War) from 1967 to 1970 after the southeastern region attempted to secede.",
    "The war resulted in an estimated 1–3 million deaths, mostly from starvation.",
    "Nigeria was under military rule for much of the 1970s–1990s, with brief democratic transitions.",
    "The return to democracy in 1999 marked the beginning of Nigeria’s current Fourth Republic.",
    "Nigeria has had 8 heads of state since independence, including both civilian and military leaders.",
    "Olusegun Obasanjo is the only person to have served as both military head of state (1976–1979) and elected president (1999–2007).",
    "Nigeria is a federal republic composed of 36 states and the Federal Capital Territory (Abuja).",
    "The Nigerian Constitution is based on the British parliamentary system but incorporates elements of U.S. federalism.",
    "Nigeria has the largest economy in Africa by nominal GDP (as of 2024).",
    "It is the largest oil producer in Africa and a founding member of OPEC.",
    "Oil accounts for about 90% of Nigeria’s export earnings and roughly 50% of government revenue.",
    "Despite oil wealth, Nigeria struggles with widespread poverty and economic inequality.",
    "Nigeria is a major global producer of cocoa, rubber, palm oil, and yams.",
    "The country is home to Africa’s largest stock exchange: the Nigerian Stock Exchange (now NGX Group).",
    "Nigeria has one of the fastest-growing tech ecosystems in Africa, nicknamed “Silicon Lagoon” in Lagos.",
    "Remittances from the Nigerian diaspora are among the highest in Africa.",
    "Nigeria has vast untapped mineral resources, including coal, limestone, tin, and iron ore.",
    "The country is working to diversify its economy away from oil dependency through initiatives like the “Economic Recovery and Growth Plan.”",
    "Nigeria is known as the “Giant of Africa” due to its large population, economy, and cultural influence.",
    "English is the official language, used for government, education, and media, but over 500 indigenous languages are spoken.",
    "The three largest ethnic groups are the Hausa-Fulani (North), Yoruba (Southwest), and Igbo (Southeast).",
    "Nigeria produces the second-largest film industry in the world by volume — Nollywood — after India’s Bollywood.",
    "Nollywood releases over 2,500 films annually, many produced on low budgets and distributed via DVDs and streaming platforms.",
    "Nigerian music genres like Afrobeat (pioneered by Fela Kuti), Afrobeats, Highlife, and Fuji are globally popular.",
    "Fela Kuti, the father of Afrobeat, was a political activist whose music criticized corruption and military rule.",
    "Nigeria hosted the 2019 African Games and will host the 2027 Africa Cup of Nations.",
    "The annual Eyo Festival in Lagos and the New Yam Festival among the Igbo are major cultural events.",
    "Traditional Nigerian clothing includes the agbada (men’s flowing robe), iro and buba (women’s wrap and blouse), and gele (head tie).",
    "Nigeria has over 150 universities, including the University of Ibadan (founded in 1948), the first university in Nigeria.",
    "Nigeria has produced Nobel laureates — Wole Soyinka (Literature, 1986) — the first African to win the prize in Literature.",
    "Nigeria has one of the highest rates of mobile phone usage in Africa, with over 200 million active lines.",
    "Nigerian engineers designed the first indigenous satellite, NigComSat-1, launched in 2007.",
    "Nigeria has made strides in renewable energy, particularly solar power, despite unreliable grid infrastructure.",
    "The Nigerian Institute of Medical Research (NIMR) is a leading center for tropical disease research.",
    "Nigeria has produced renowned scientists like Professor Adebayo Adesina (former President of the African Development Bank) and Dr. Ifeyinwa Osunkwo (global HIV researcher).",
    "The country has one of the youngest populations in the world — over 60% of Nigerians are under age 25.",
    "Nigeria leads Africa in digital innovation startups, with fintech companies like Flutterwave and Paystack gaining international recognition.",
    "Nigerian athletes have won Olympic medals in track and field, boxing, and weightlifting — notably sprinter Blessing Okagbare and boxer David Izonritei."
]

We need to install the `faiss-cpu` library, which is a library for efficient similarity search and clustering of dense vectors. This will be used to create an index for our embeddings.

In [2]:
# Install faiss-cpu library
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.12.0


Import the necessary libraries: `faiss` for vector indexing, `torch` for tensor operations (used by the transformer model), and `transformers` for loading the pre-trained tokenizer and model.

In [3]:
#  import libraries
import faiss
import torch
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
import numpy as np
import torch.nn.functional as F




---





Now we need to tokenized and generate embeddings for each sentense in the document array. These embeddings will represent the semantic meaning of each sentense in a vector space, allowing us to find similar sentense based on their meaning.

Initialize a tokenizer and a pre-trained model from the `sentence-transformers` library. The tokenizer will convert text into numerical tokens, and the model will be used to generate embeddings from these tokens.

In [4]:
# Initialize the tokenizer and model for generating embeddings
doc_tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
doc_model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Define a function `generate_embeddings` that  tokenized text and, returns the embeddings generated by the model after applying mean pooling.

In [5]:
# Function to generate embeddings from tokenized text
def generate_embeddings(text: str):
  tokenized_text = doc_tokenizer(
      text,
      return_tensors="pt",
      padding=True,
      truncation=True
      )

  with torch.no_grad():
      embeddings = doc_model(**tokenized_text)
  # Mean pooling to get a single vector representation for the entire sentence
  embeddings = mean_pooling(embeddings, tokenized_text['attention_mask'])
  # Normalize embeddings
  embeddings = F.normalize(embeddings, p=2, dim=1)
  return embeddings

Define a helper function `mean_pooling` to perform mean pooling on the model's output. This is a common technique to get a single vector representation for a sentence from the token embeddings.

In [6]:
# Function to perform mean pooling
def mean_pooling(model_output, attention_mask):
    # First element of model_output contains all token embeddings (It can also be accessed with (.last_hidden_state))
    token_embeddings = model_output[0]
    mask = attention_mask.unsqueeze(-1).float()
    return (token_embeddings * mask).sum(1) / mask.sum(1).clamp(min=1e-9)

In [7]:
document_embeddings = []

for doc in document_list:
  doc_embedding = generate_embeddings(doc)
  document_embeddings.append(doc_embedding)

document_embeddings = torch.stack(document_embeddings).cpu().numpy()

In [8]:
# Print document embeddings array size and shape (batch_size, sequence_length, hidden_size)
print(document_embeddings.size)
print(document_embeddings.shape)

19200
(50, 1, 384)


# Build the retrieval system

In [9]:
# Remove the extra dimension (size) before adding to the index
document_embeddings = document_embeddings.squeeze()
print(document_embeddings.shape) # print the new shape

index = faiss.IndexFlatL2(document_embeddings.shape[1])
index.add(document_embeddings)

(50, 384)


Create function to retrieve

In [17]:
distance_threshold = 18

def retrieve(query, k=3):
  query_embedding = generate_embeddings(query)
  query_embedding = query_embedding.cpu().detach().numpy()
  distances, indices = index.search(query_embedding, k)
  # Get the actual documents based on the indices
  indecies, distances = indices[0], distances[0]
  retrieved_docs = [document_list[idx] for idx, dist in zip(indecies, distances) if dist <= distance_threshold]
  return "\n".join(retrieved_docs)

In [11]:
query = "What is the capital of Nigeria?"
# retrieved_docs = retrieve(query)
# print("Retrieved Documents:", retrieved_docs)

# Build the generation system

In [12]:
# Login to huggingface
from huggingface_hub import login
login(new_session=False)

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [18]:
# Instantiate generator model
gen_tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m")
gen_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-270m")

# Set pad_token to eos_token (required for Gemma)
if gen_tokenizer.pad_token is None:
    gen_tokenizer.pad_token = gen_tokenizer.eos_token

In [21]:
# Create the generator function
def generate_text(prompt: str, max_length: int =100) -> str:
  # Step 1: Retrieve relevant context
  knowledge = retrieve(prompt)

  print(knowledge)
  # Check if there is relevant context
  if not knowledge:
    return "No relevant context found."

  # Step 2: Format prompt clearly
  formatted_prompt = f"""
    Use the context to answer the question. If the context does not contain the answer, say "I don’t know."
    Context: {knowledge}
    Question: {prompt}
    Answer:
    """

  # Step 3: Tokenize input
  inputs = gen_tokenizer(
      formatted_prompt,
      return_tensors="pt",
      padding=False,           # No padding needed for single inference
      truncation=True,         # Always truncate if too long
      max_length=512           # Respect model's context window
  )

  input_ids = inputs["input_ids"]
  attention_mask = inputs["attention_mask"]  # Already computed correctly by tokenizer

  # Step 4: Generate response
  output = gen_model.generate(
      input_ids=input_ids,
      attention_mask=attention_mask,
      max_length=max_length,
      repetition_penalty=1.2,
      temperature=0.9,
      top_k=50,
      top_p=0.95,
      pad_token_id=gen_tokenizer.pad_token_id,  # Use explicitly set pad_token
      do_sample=True,
      eos_token_id=gen_tokenizer.eos_token_id,  # Explicit stop condition
  )

  # Step 5: Decode and return clean output
  generated_text = gen_tokenizer.decode(output[0], skip_special_tokens=True)

  # Optional: Remove the prompt part from output if desired
  # (Useful if generation repeats context)
  if generated_text.startswith(formatted_prompt):
    return ""
      # generated_text = generated_text[len(formatted_prompt):].strip()

  return generated_text

# The model used will not generate a good result. This is only for testing
print(generate_text(query, 120))

Nigeria is a federal republic composed of 36 states and the Federal Capital Territory (Abuja).
The official capital is Abuja, which became the capital in 1991, replacing Lagos.
Nigeria is located in West Africa, bordered by Benin, Chad, Cameroon, and Niger.

