# 🧠 Week 5-6 · Notebook 07 · Retrieval-Augmented Generation (RAG) Foundations

**Module:** LLMs, Prompt Engineering & RAG  
**Project:** Build the Knowledge Core for the Manufacturing Copilot

---

Large Language Models have a knowledge cut-off and are not aware of your private, domain-specific data. **Retrieval-Augmented Generation (RAG)** solves this problem. It's a technique that connects an LLM to an external knowledge base, allowing it to answer questions by first *retrieving* relevant documents and then *generating* an answer based on them. This is the core technology for our Manufacturing Copilot's knowledge capabilities.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
1. ✅ **Explain the RAG Architecture:** Describe the key components of a RAG pipeline (Ingest, Embed, Retrieve, Generate).
2. ✅ **Compare RAG vs. Fine-Tuning:** Understand the trade-offs between RAG and fine-tuning for different manufacturing scenarios.
3. ✅ **Build a Simple RAG System:** Implement a basic RAG pipeline from scratch using Hugging Face and sentence transformers.
4. ✅ **Identify RAG's Risks and Mitigations:** Recognize common failure modes and how to address them.

## 🏗️ The RAG Pipeline: A High-Level View

A RAG system has two main stages:

1.  **Offline Indexing (The "Library")**: This is where you prepare your knowledge base. It's done once and updated periodically.
    - **Load & Chunk:** Documents (PDFs, text files, etc.) are loaded and split into smaller, manageable chunks.
    - **Embed & Store:** Each chunk is converted into a numerical vector (embedding) and stored in a specialized database called a **Vector Store**.

2.  **Online Retrieval & Generation (The "Librarian")**: This happens in real-time when a user asks a question.
    - **Retrieve:** The user's question is embedded, and the Vector Store finds the most relevant document chunks (the ones with the closest embeddings).
    - **Augment & Generate:** The retrieved chunks are added to the user's question in a prompt, and the LLM generates an answer based on this augmented context.

## ⚙️ Building a Mini-RAG System

Let's build a small-scale RAG system to see these concepts in action. We'll create a tiny knowledge base of three Standard Operating Procedures (SOPs) for our factory.

In [None]:
# Ensure necessary libraries are installed
# !pip install -q sentence-transformers pandas transformers

import pandas as pd
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer
import torch

### Step 1: Prepare the Knowledge Base (Indexing)

First, we define our documents and create embeddings for them.

In [None]:
# Our knowledge base: A few snippets from SOPs
knowledge_base = pd.DataFrame([
    {
        "doc_id": "SOP-HYD-001",
        "text": "For hydraulic press maintenance, first perform lockout-tagout. Then, inspect all seals for leaks, verify pressure gauges read within the 500-550 PSI range, and top off hydraulic fluid if necessary."
    },
    {
        "doc_id": "SOP-CONV-003",
        "text": "To troubleshoot a conveyor belt stoppage, first check for physical obstructions. If clear, verify the motor's thermal overload has not tripped. Finally, inspect belt tension and sensor alignment."
    },
    {
        "doc_id": "SOP-ROBO-002",
        "text": "Preventive maintenance for a robotic arm involves greasing all major joints monthly, recalibrating torque sensors quarterly, and validating vision system alignment weekly."
    },
])

# Load a model to create embeddings
retriever_model = SentenceTransformer("all-MiniLM-L6-v2")

# Create embeddings for our knowledge base
corpus_embeddings = retriever_model.encode(knowledge_base.text.tolist(), convert_to_tensor=True)

print("Knowledge base indexed successfully!")
print(f"Shape of corpus embeddings: {corpus_embeddings.shape}")

### Step 2: Ask a Question (Retrieval & Generation)

Now, a user asks a question. We'll retrieve relevant documents and use an LLM to generate an answer.

# User's question
question = "What are the steps to fix a hydraulic press?"

# --- Retrieval --- #
# Embed the user's question
query_embedding = retriever_model.encode(question, convert_to_tensor=True)

# Find the top-k most similar documents (we'll take k=1 for simplicity)
top_k = 1
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=top_k)[0]

# Get the retrieved document text
retrieved_doc = knowledge_base.iloc[hits[0]['corpus_id']]
retrieved_context = retrieved_doc['text']
retrieved_doc_id = retrieved_doc['doc_id']

print(f"--- Retrieved Document (ID: {retrieved_doc_id}) ---")
print(retrieved_context)

In [None]:
# --- Augment & Generate --- #

# Create the prompt, augmenting it with the retrieved context
rag_prompt = f"""You are a helpful manufacturing assistant. Answer the user's question based ONLY on the provided context. Cite the document ID.

Context from {retrieved_doc_id}:
"{retrieved_context}"

Question: {question}

Answer:"""

# Load a text-generation model
device = 0 if torch.cuda.is_available() else -1
model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

generator = pipeline(
    'text2text-generation',
    model=model,
    tokenizer=tokenizer,
    device=device,
    max_length=150
)

# Generate the final answer
final_answer = generator(rag_prompt)

print("\n--- Final Answer from RAG System ---")
print(final_answer[0]['generated_text'])

## 🤔 RAG vs. Fine-Tuning

| Aspect | RAG | Fine-Tuning |
| --- | --- | --- |
| **Knowledge Source** | External (Vector DB) | Internal (Model Weights) |
| **Updating Knowledge** | Easy & Fast (re-index documents) | Hard & Slow (re-train model) |
| **Hallucination Risk** | Lower (answers are grounded in retrieved text) | Higher (can invent facts) |
| **Best For...** | Fact-based Q&A, knowledge-intensive tasks | Learning a new style, tone, or task format |
| **Manufacturing Example** | Answering questions from SOPs | Creating a chatbot that speaks like a senior engineer |

## ⚠️ Common Risks & How to Mitigate Them

- **Risk: Poor Retrieval:** The wrong documents are retrieved, leading to irrelevant answers.
  - **Mitigation:** Use better embedding models, improve document chunking strategy, and clean your source data.
- **Risk: Stale Information:** The knowledge base is out of date.
  - **Mitigation:** Implement a regular pipeline to re-index documents from their source of truth.
- **Risk: Hallucination:** The LLM ignores the retrieved context and makes things up.
  - **Mitigation:** Use strong prompts that strictly instruct the model to only use the provided context. Add a check to verify the answer against the source document.

## ✅ Next Steps

You've now seen the fundamental mechanics of a RAG system. While our example was simple, it demonstrates the power of connecting LLMs to external knowledge.

In the next notebook, **`08_rag_implementation.ipynb`**, we will build a more robust RAG pipeline using the LangChain library, which provides helpful abstractions for chunking, retrieval, and prompt management, making it much easier to build production-ready RAG applications.