# 🧠 Exploring RAG with a Chatbot That Knows Everything (Almost)!

Welcome to the world of **Retrieval-Augmented Generation (RAG)**! Today, we’ll learn how LLMs can get **even smarter** by looking things up. 📚💡  
Let’s build a chatbot that can pull in knowledge from outside its brain — just like us using Google during a test (but, uh, legally).

---

## 1. What is RAG?

RAG stands for **Retrieval-Augmented Generation**. It’s like giving your chatbot a **superpower** — the ability to look things up before answering! 🦸‍♂️✨

Here’s how it works:
1. You ask a question 🤔  
2. The bot searches a library or document 🗃️  
3. It finds the most relevant info 🔍  
4. Then it gives you a smart answer based on that info 🧠💬

---

## 2. Why Use RAG?

🔒 **LLMs have limits**: They forget stuff, can’t always access the latest info, and sometimes just make things up (weird, right?).

✅ **RAG fixes that**: It lets your bot pull in **real documents**, notes, or web pages — whatever you want.

**Example:**  
Without RAG: “What are MSU’s dining hours?” → *“Uh… not sure.”*  
With RAG: → Searches MSU’s dining website → Gives actual hours!

---

## 3. What You Need 🧰

- 🧠 A Language Model (like Gemini, GPT, etc.)
- 📄 A bunch of documents or web content
- 🧲 A way to search those docs (we use **embeddings** + a **vector database**)
- 🧪 A little Python magic (and curiosity)

---

## 4. Let’s Build It! ⚙️

You may need to clear your variables during testing!

In [None]:
%reset -f


### Step 1: Install Your Tools

In [None]:
!pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
!pip install -qU langchain-huggingface
!pip install -qU langchain-core
!pip install --quiet wikipedia
!pip install --quiet google-generativeai

### Step 2: Set Up the Chatbot
Now, let's configure our chatbot to use the Google Gemini API.

In [None]:
import google.generativeai as genai

# Configure the Gemini API client with your API key
# --> Finish the code here
genai.configure(api_key="INSERT YOUR API KEY HERE")

# Define the system instruction to guide the model's response
# --> Finish the code here
response_system_instruction = (
    "You are an intelligent assistant designed to answer questions using both your general knowledge and additional context provided to you."
    "TELL THE MODEL THAT YOU WILL BE PROVIDING IT ADDITIONAL CONTEXT"
    "Use the provided context as your primary source of truth when answering. If the context does not contain the necessary information, respond based on your general knowledge, and clearly indicate when you are doing so. Do not fabricate facts."
    "Be concise, accurate, and helpful."
)

# Initialize the Gemini model
llm = # --> Finish the code here

### Step 3: Set Up the Embedding Model

We will make use of a free embedding model from Hugging Face.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# Load the embedding model
embedding_model = # --> Finish the code here

### Step 4: Load the relevant documents

We will create a list of topics we want our model to learn about. The model will store context from Wikipedia's search result and answer questions with the learned context.

In [None]:
import wikipedia
from langchain.schema import Document

# Topics to load from Wikipedia
TOPICS = [] # --> Finish the code here (A list of strings of topics you want the AI to know)

# Load Wikipedia Articles
wiki_docs = []
for topic in TOPICS:
    try:
        # content = wikipedia.page(topic).content
        content = wikipedia.summary(topic,sentences=50)
        doc = Document(page_content=content, metadata={"source": f"Wikipedia: {topic}"})
        wiki_docs.append(doc)
        print(f"✅ Loaded Wikipedia article: {topic} ({len(content)} characters)")
    except wikipedia.exceptions.DisambiguationError as e:
        print(f"⚠️ Disambiguation error for '{topic}': {e.options[:3]}")
    except wikipedia.exceptions.PageError:
        print(f"❌ Page not found for '{topic}'.")

all_docs = wiki_docs

### Step 5: Set Up the Text Splitter

We need to slice our context into a more practical size. This makes it more token efficient, and also ensures that only the most relevant context is fetched.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = # --> Finish the code here

all_splits = # --> Finish the code here
print(f"🧩 Split into {len(all_splits)} chunks.")

### Step 6: Set Up the Memory Store

We will make use of an in-memory store to keep things simple. In practice, you would want to use a database (I prefer Chroma DB for this).

In [None]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = # --> Finish the code here
print(f"✅ Vector store created with {len(all_docs)} documents.")

### Step 7: Set Up the Retrieval and the Generation

These functions will actually handle the retrieval of the context from our data store, and the use of it to assist in generating relevant output. We want to ensure we only retrieve context if it is relevant, so we will throw out irrelevant context.

In [None]:
def retrieval(user_input: str, threshold: float = 0.5) -> tuple[str, list]:
    try:
        # Try to get documents with similarity scores (if supported)
        results = # --> Finish the code here

        # Filter by relevance threshold
        filtered_docs = [doc for doc, score in results if score >= threshold]

    except AttributeError:
        # If .similarity_search_with_score() is not supported, fallback
        filtered_docs = vector_store.similarity_search(user_input, k=4)

    if not filtered_docs:
        return "No relevant documents found.", []

    # Format context with source metadata
    context_chunks = []
    for i, doc in enumerate(filtered_docs, start=1):
        source = doc.metadata.get("source", f"Document {i}")
        context_chunks.append(f"[{source}]\n{doc.page_content.strip()}")

    context = "\n\n".join(context_chunks)
    return context, filtered_docs


def generation(user_input: str, context: str) -> str:
    # Invoke the LangChain Hub prompt (it will use the context + question)
    prompt_value = # --> Finish the code here
    full_prompt = prompt_value.to_string()

    # Generate response using Gemini
    response = llm.generate_content(full_prompt)
    return response

### Step 8:

Test out your RAG chatbot!

In [None]:
from langchain import hub

# Load RAG prompt from LangChain Hub
prompt = # --> Finish the code here

print("💬 Ask me anything! (type 'exit' to quit)")

while True:
    user_input = input("\nYou: ")
    if user_input.lower() in {"exit", "quit"}:
        print("👋 Goodbye!")
        break

    context, docs = # --> Finish the code here
    response = # --> Finish the code here

    # Print response
    print(f"\n🤖: {response.text.strip()}\n")


    # Print source list
    print("\n📚 Sources Used:")
    for i, doc in enumerate(docs, start=1):
        print(f"{i}. {doc.metadata.get('source', 'Unknown')}")


## 🎲 Section 5: Fun Activity – RAG Roleplay! 🎭🧠

Let’s play RAG in real life! You and a partner will act out the two roles in a Retrieval-Augmented Generation system:

### 👤 Person A: The Retriever
You’re the vector store. You hold the documents (Wikipedia, notes, etc.).  
When Person B asks a question, you:
1. Search your “mental documents”
2. Choose 2–3 facts that best answer the question
3. Share those facts as “context”

### 🤖 Person B: The Generator (a chatbot)
You get only what the retriever gives you — no peeking at outside info!
Use the context to answer the question clearly in 2–3 sentences.

---

**Example:**

- **Question:** What is the Doppler Effect?
- **Context from A:**  
  1. The Doppler Effect explains the change in frequency of waves based on motion.  
  2. It happens with sound, light, or other waves when the source or observer moves.  
  3. It’s why ambulance sirens change pitch as they pass by.

- **Answer from B:**  
  The Doppler Effect is the change in frequency of waves when the source or observer moves. That’s why a siren sounds different as it gets closer and then farther away.

---

Now switch roles and try with new questions! ✨  
Try: *What causes seasons?* or *Who was Nikola Tesla?*

## 🎉 Section 6: Wrapping Up – You Built a Smarter Bot!

Amazing work! Today, you explored how to give a chatbot superpowers using Retrieval-Augmented Generation (RAG).  
Let’s reflect on what you’ve accomplished:

✅ You learned how LLMs generate answers  
✅ You explored their limitations and how RAG helps  
✅ You loaded real documents into a vector store  
✅ You retrieved relevant context and generated responses  
✅ You even roleplayed a mini RAG system yourself!

---

Now you’re ready to:

- Build a bot that reads your study notes 🧠  
- Create a travel helper that reads blogs ✈️  
- Build a chatbot for your school or club 🏫  

The possibilities are endless when you give your chatbot access to the right info.

🛠️ Try customizing:
- What documents it uses
- How it responds (serious, funny, formal)
- What it’s an expert in

💡 Bonus idea: Combine RAG with a voice assistant or GUI using Streamlit or Gradio!

🚀 Go build something cool — and don't forget to share it!
