<a href="https://colab.research.google.com/github/ABUALHUSSEIN/simple-rag-pipeline/blob/main/ANWARCopy_of_%F0%9F%A7%A0_Model_3_Section_2_Homework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Model 3 Section 1: Homework
> ## Building a Full RAG Pipeline

> **🎯 Today’s Goal**: Combine all the parts from Section 1. We will use the **Retriever** (MiniLM) and the **Generator** (DistilBERT) to build a complete, end-to-end Retrieval-Augmented Generation (RAG) system.

---

###  recap: The Two Parts of Our RAG System

1.  **The Retriever (Part 2)** 🔎
    * **Model:** `all-MiniLM-L6-v2`
    * **Job:** To turn a text query into a vector and use **semantic search** to find the *most relevant* piece of context from our knowledge base.

2.  **The Generator (Part 3)** ✍️
    * **Model:** `distilbert-base-cased-distilled-squad`
    * **Job:** To take a `question` and a `context` and **extract** the specific answer from within the context.

Today, we connect them. The output of the Retriever becomes the input for the Generator.

---


---

### 🧠 Step 1: Load Models and Knowledge

Now, let's load both of our specialized models and define our knowledge base. We have one model for retrieving and one for generating.

---

In [1]:
# Code Cell 2: Load Models (The "CPU/GPU Split" Fix)
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline
import torch

# --- 1. Load the Retriever (MiniLM) on the CPU ---
# We explicitly tell it to use the 'cpu'.
# This is fast enough for a retriever and saves all our VRAM.
retriever_model = SentenceTransformer(
    'all-MiniLM-L6-v2',
    device='cpu'  # Force to CPU
)
print(f"✅ Retriever model (MiniLM) loaded. Using device: cpu")


# --- 2. Load the Generator (DistilBERT) on the GPU ---
# We check if a GPU is available and set the device index
# 0 = first GPU, -1 = CPU
pipeline_device = 0 if torch.cuda.is_available() else -1

generator_model = pipeline("question-answering",
                           model="distilbert-base-cased-distilled-squad",
                           device=pipeline_device) # Use GPU if available

print(f"✅ Generator model (DistilBERT) loaded.")
if pipeline_device == 0:
    print("   -> Running on GPU (Good!)")
else:
    print("   -> WARNING: Running on CPU (Will be slow, but should work)")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Retriever model (MiniLM) loaded. Using device: cpu


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


✅ Generator model (DistilBERT) loaded.
   -> Running on GPU (Good!)


---

### 📚 Step 2: Define and Encode Knowledge Base

Here is our simple knowledge base. We will give this "long-term memory" to our AI agent.

---

In [2]:
# Code Cell 3: Define Knowledge Base
# This is the "long-term memory" of our agent
knowledge_base = [
    "Buddy is a 3-year-old Golden Retriever who loves to play fetch.",
    "The capital of France is Paris, which is known for the Eiffel Tower.",
    "Python is an interpreted, high-level, general-purpose programming language.",
    "The first person to walk on the Moon was Neil Armstrong in 1969.",
    "Climate change is the long-term alteration of temperature and typical weather patterns."
]

print(f"📚 Knowledge base created with {len(knowledge_base)} documents.")

📚 Knowledge base created with 5 documents.


---

### ✏️ Task 1: Encode Your Knowledge

Your first task is to use the **Retriever model** (`retriever_model`) to encode all the documents in your `knowledge_base`.

**Your Goal:** Create a variable called `knowledge_embeddings` that holds the vector representations of all your documents.

---

In [4]:
# Code Cell 4: Task 1 - Encode Knowledge
print("--- Task 1: Encoding Knowledge Base ---")

# TODO: Use the retriever_model to encode the 'knowledge_base' list
# The .encode() method takes a list of strings and returns a list of embeddings
# Set convert_to_tensor=True
knowledge_embeddings = retriever_model.encode(knowledge_base, convert_to_tensor=True)

# --- Verification ---
if 'knowledge_embeddings' in locals() and knowledge_embeddings.shape[0] == len(knowledge_base):
    print("✅ Success! Knowledge base has been encoded.")
    print(f"   -> Embedding shape: {knowledge_embeddings.shape}")
else:
    print("⚠️ Task 1 not complete. 'knowledge_embeddings' not found or has wrong shape.")

--- Task 1: Encoding Knowledge Base ---
✅ Success! Knowledge base has been encoded.
   -> Embedding shape: torch.Size([5, 384])


---

### ✏️ Task 2: Build the Retriever Function

Now, let's build a function that performs the "R" (Retrieval) step. This function will take a user's `query` and find the most relevant document from our `knowledge_base`.

**Your Goal:** Complete the `retrieve_context` function.
1.  Encode the incoming `query` using the `retriever_model`.
2.  Use `util.pytorch_cos_sim` to compare the `query_embedding` to all `knowledge_embeddings`.
3.  Find the index of the highest-scoring document (use `torch.argmax`).
4.  Return the *text* of that document from the `knowledge_base`.

---

In [6]:
# Code Cell 5: Task 2 - Build the Retriever
print("--- Task 2: Building the Retriever ---")

def retrieve_context(query):
    # 1. Encode the query
    # TODO: Encode the 'query' using the 'retriever_model'.
    # Don't forget convert_to_tensor=True
    query_embedding = retriever_model.encode(query, convert_to_tensor=True)

    # 2. Compute cosine similarity
    # TODO: Use 'util.pytorch_cos_sim' to compare the 'query_embedding'
    # with all 'knowledge_embeddings'
    # This returns a tensor of scores, one for each document
    cos_scores = util.pytorch_cos_sim(query_embedding, knowledge_embeddings)

    # 3. Find the best match
    # TODO: Use 'torch.argmax' to find the index of the highest score
    # The highest score corresponds to the most similar document
    top_result_index = torch.argmax(cos_scores)

    # 4. Return the matching document text
    # TODO: Return the text from 'knowledge_base' at 'top_result_index'
    return knowledge_base[top_result_index]


# --- Verification ---
print("Testing retrieve_context('What is Python?')...")
retrieved = retrieve_context("What is Python?")
print(f"   -> Retrieved: '{retrieved}'")
if retrieved and "Python" in retrieved:
    print("✅ Success! Retriever function works.")
else:
    print("⚠️ Retriever function failed to find the right document.")

--- Task 2: Building the Retriever ---
Testing retrieve_context('What is Python?')...
   -> Retrieved: 'Python is an interpreted, high-level, general-purpose programming language.'
✅ Success! Retriever function works.


---

### ✏️ Task 3: Build the Generator Function

Great! We have a function to get context. Now let's build a function for the "G" (Generation) step. This function will take a `question` and the `context` we just retrieved.

**Your Goal:** Complete the `generate_answer` function.
1.  Call the `generator_model` (which is a `pipeline` object).
2.  Pass the `question` and `context` to it.
3.  Return *only the answer* from the resulting dictionary (e.g., `result['answer']`).

---

In [8]:
# Code Cell 6: Task 3 - Build the Generator
print("\n--- Task 3: Building the Generator ---")

def generate_answer(question, context):
    # 1. Call the pipeline
    # TODO: Call the 'generator_model' pipeline, passing in the
    # 'question' and 'context'
    result = generator_model(question=question, context=context)

    # 2. Return the answer
    # TODO: Return the 'answer' part of the 'result' dictionary
    return result['answer']


# --- Verification ---
print("Testing generate_answer('What is Python?', '...')...")
test_context = "Python is a popular programming language."
test_question = "What is Python?"
answer = generate_answer(test_question, test_context)
print(f"   -> Question: '{test_question}'")
print(f"   -> Context: '{test_context}'")
print(f"   -> Answer: '{answer}'")

if answer and "popular programming language" in answer:
    print("✅ Success! Generator function works.")
else:
    print("⚠️ Generator function failed to extract the answer.")


--- Task 3: Building the Generator ---
Testing generate_answer('What is Python?', '...')...
   -> Question: 'What is Python?'
   -> Context: 'Python is a popular programming language.'
   -> Answer: 'a popular programming language'
✅ Success! Generator function works.



---

### 🚀 Task 4: Build the Full RAG Pipeline!

This is the final step. Let's combine our two functions into a single, end-to-end RAG pipeline. This function will orchestrate the entire process.

**Your Goal:** Complete the `ask_rag_pipeline` function.
1.  Call your `retrieve_context` function to get the `best_context` for the `query`.
2.  Call your `generate_answer` function, passing in the *original* `query` and the `best_context` you just found.
3.  Return the final `answer`.

After you write the function, we'll test it with a query!

---

In [11]:
# Code Cell 7: Task 4 - Build the Full RAG Pipeline
print("\n--- Task 4: Building the Full RAG Pipeline ---")

def ask_rag_pipeline(query):
    # 1. Retrieve
    # TODO: Call your 'retrieve_context' function
    # This finds the most relevant document for the query.
    best_context = retrieve_context(query)

    # 2. Generate
    # TODO: Call your 'generate_answer' function
    # This uses the retrieved document to extract the specific answer.
    final_answer = generate_answer(question=query, context=best_context)

    # 3. Return both the answer and the context for inspection
    return final_answer, best_context


# --- Verification ---
print("Testing the full RAG pipeline...")
query = "What is the capital of France?"
print(f"Query: '{query}'")

# We expect two return values, so we unpack them like this
final_answer, retrieved_context = ask_rag_pipeline(query)

print(f"   -> Retrieved Context: '{retrieved_context}'")
print(f"   -> Final Answer: '{final_answer}'")

if final_answer and final_answer.lower() == "paris":
    print("✅ Success! Your RAG pipeline is working!")
else:
    print("⚠️ RAG pipeline failed. Expected 'Paris'.")

# --- Let's try another one! ---
print("\n--- Another Test ---")
query_2 = "Who was the first person on the moon?"
print(f"Query: '{query_2}'")
answer_2, context_2 = ask_rag_pipeline(query_2)
print(f"   -> Retrieved Context: '{context_2}'")
print(f"   -> Final Answer: '{answer_2}'")
if answer_2 and "neil armstrong" in answer_2.lower():
    print("✅ Correctly answered the second question!")
else:
    print("⚠️ Failed on the second question.")


--- Task 4: Building the Full RAG Pipeline ---
Testing the full RAG pipeline...
Query: 'What is the capital of France?'
   -> Retrieved Context: 'The capital of France is Paris, which is known for the Eiffel Tower.'
   -> Final Answer: 'Paris'
✅ Success! Your RAG pipeline is working!

--- Another Test ---
Query: 'Who was the first person on the moon?'
   -> Retrieved Context: 'The first person to walk on the Moon was Neil Armstrong in 1969.'
   -> Final Answer: 'Neil Armstrong'
✅ Correctly answered the second question!


---

### 🧪 Self-Assessment

Run this final cell to test your complete RAG pipeline. This assessment will:
1.  Add new information to the agent's knowledge base.
2.  Ask questions that *require* the RAG pipeline to work.
3.  It will check if your `retrieve_context` function finds the right document AND if your `generate_answer` function extracts the correct answer.

Good luck!

---

In [12]:
#@title # Code Cell 8: Self-Assessment
print("\n--- 🧪 Self-Assessment ---")

# We will add new documents to the knowledge base and test the agent.
# This simulates expanding the agent's memory.

try:
    # --- Setup for Test ---
    new_knowledge = [
        "The currency of Japan is the Yen.",
        "Maverick is a clever Border Collie who knows many tricks.",
        "The highest mountain in the world is Mount Everest."
    ]
    # Update all the pieces:
    # 1. Update the text list
    knowledge_base.extend(new_knowledge)
    # 2. Update the embeddings (re-encode everything)
    knowledge_embeddings = retriever_model.encode(knowledge_base, convert_to_tensor=True)

    print(f"📚 Agent memory updated. Total documents: {len(knowledge_base)}")

    # --- Test Cases ---
    test_cases = [
        {
            "query": "What is the currency of Japan?",
            "expected_context_keyword": "Japan",
            "expected_answer_keyword": "Yen"
        },
        {
            "query": "What kind of dog is Maverick?",
            "expected_context_keyword": "Maverick",
            "expected_answer_keyword": "Border Collie"
        },
        {
            "query": "Who was the first person on the Moon?",
            "expected_context_keyword": "Moon",
            "expected_answer_keyword": "Neil Armstrong"
        }
    ]

    score = 0
    total = len(test_cases) * 2 # Each test has a retrieval and generation part

    for i, test in enumerate(test_cases):
        print(f"\n--- Test Case {i+1} ---")
        query = test["query"]
        print(f"Query: \"{query}\"")

        # Run the full pipeline
        answer, context = ask_rag_pipeline(query)

        # Check Retrieval
        print(f"   -> 🔎 Retrieved: '{context}'")
        if test["expected_context_keyword"] in context:
            print("   -> ✅ Retrieval Correct!")
            score += 1
        else:
            print(f"   -> ❌ Retrieval Failed. Expected context with: '{test['expected_context_keyword']}'")

        # Check Generation
        print(f"   -> ✍️ Answer: '{answer}'")
        if test["expected_answer_keyword"].lower() in answer.lower():
            print("   -> ✅ Generation Correct!")
            score += 1
        else:
            print(f"   -> ❌ Generation Failed. Expected answer with: '{test['expected_answer_keyword']}'")

    # Final Score
    print(f"\n--- 🏁 Assessment Complete ---")
    print(f"🎯 Your Final Score: {score} / {total}")
    if score == total:
        print("🎉🎉🎉 Perfect! You have successfully built and tested a full RAG pipeline!")
    elif score >= total // 2:
        print("👍 Great job! Your pipeline is working. Review any failed tests to see what happened.")
    else:
        print("🔧 Keep trying! Check your functions in Tasks 2, 3, and 4.")

except Exception as e:
    print(f"\n--- ⚠️ Assessment Failed ---")
    print(f"An error occurred: {e}")
    print("Please check your code in all tasks and try again.")


--- 🧪 Self-Assessment ---
📚 Agent memory updated. Total documents: 11

--- Test Case 1 ---
Query: "What is the currency of Japan?"
   -> 🔎 Retrieved: 'The currency of Japan is the Yen.'
   -> ✅ Retrieval Correct!
   -> ✍️ Answer: 'Yen'
   -> ✅ Generation Correct!

--- Test Case 2 ---
Query: "What kind of dog is Maverick?"
   -> 🔎 Retrieved: 'Maverick is a clever Border Collie who knows many tricks.'
   -> ✅ Retrieval Correct!
   -> ✍️ Answer: 'Border Collie'
   -> ✅ Generation Correct!

--- Test Case 3 ---
Query: "Who was the first person on the Moon?"
   -> 🔎 Retrieved: 'The first person to walk on the Moon was Neil Armstrong in 1969.'
   -> ✅ Retrieval Correct!
   -> ✍️ Answer: 'Neil Armstrong'
   -> ✅ Generation Correct!

--- 🏁 Assessment Complete ---
🎯 Your Final Score: 6 / 6
🎉🎉🎉 Perfect! You have successfully built and tested a full RAG pipeline!
