# 🛠️ Week 5-6 · Notebook 08 · RAG Implementation with LangChain

**Module:** LLMs, Prompt Engineering & RAG  
**Project:** Build the Knowledge Core for the Manufacturing Copilot

---

In the previous notebook, we built a RAG pipeline from scratch to understand the core concepts. Now, we'll use **LangChain**, a powerful library that simplifies building complex LLM applications. LangChain provides ready-to-use components for every step of the RAG process, allowing us to build a more robust and maintainable system for our Manufacturing Copilot.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
1. ✅ **Use LangChain Components:** Implement a RAG pipeline using LangChain's `DocumentLoader`, `TextSplitter`, `Embeddings`, and `VectorStore`.
2. ✅ **Build a `RetrievalQA` Chain:** Create and run a question-answering chain that automatically handles retrieval and generation.
3. ✅ **Customize Prompts in LangChain:** Modify the prompt template within a chain to control the LLM's output.
4. ✅ **Return Source Documents:** Configure a RAG chain to cite its sources, a critical feature for trustworthy AI.

## ⚙️ Setup: Installing and Importing Libraries

LangChain integrates with many other tools, so we'll need to install a few packages.

In [None]:
# !pip install -q langchain langchain-community sentence-transformers chromadb transformers torch

import os
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_community.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate

## Step 1: Load and Prepare Documents

First, we need some documents for our knowledge base. We'll create a few dummy SOP text files.

# Create a directory for our documents
os.makedirs("sops", exist_ok=True)

sop_docs = {
    "SOP-HYD-001.txt": "For hydraulic press maintenance, first perform lockout-tagout. Then, inspect all seals for leaks, verify pressure gauges read within the 500-550 PSI range, and top off hydraulic fluid if necessary. Check fluid levels weekly.",
    "SOP-CONV-003.txt": "To troubleshoot a conveyor belt stoppage, first check for physical obstructions. If clear, verify the motor's thermal overload has not tripped. Finally, inspect belt tension and sensor alignment. Belt tension should be checked monthly.",
    "SOP-ROBO-002.txt": "Preventive maintenance for a robotic arm involves greasing all major joints monthly, recalibrating torque sensors quarterly, and validating vision system alignment weekly."
}

for filename, content in sop_docs.items():
    with open(os.path.join("sops", filename), "w") as f:
        f.write(content)

# Load the documents using LangChain's TextLoader
loaders = [TextLoader(os.path.join("sops", name)) for name in os.listdir("sops")]
documents = []
for loader in loaders:
    documents.extend(loader.load())

print(f"Loaded {len(documents)} documents.")

### Chunk the Documents

Next, we split the loaded documents into smaller chunks. This is crucial for effective retrieval.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=20)
chunked_docs = text_splitter.split_documents(documents)

print(f"Split {len(documents)} documents into {len(chunked_docs)} chunks.")
print("\n--- Example Chunk ---")
print(chunked_docs[0].page_content)
print(chunked_docs[0].metadata)

## Step 2: Create Embeddings and a Vector Store

Now we convert the chunks into embeddings and store them in a `Chroma` vector store. This creates our searchable knowledge index.

In [None]:
# Use a standard Hugging Face model for embeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Create the Chroma vector store from the chunked documents
vector_store = Chroma.from_documents(
    documents=chunked_docs,
    embedding=embedding_model,
    persist_directory="./chroma_db" # Persist the DB to disk
)

print(f"Vector store created with {vector_store._collection.count()} vectors.")

## Step 3: Set Up the LLM and RetrievalQA Chain

This is where LangChain shines. We'll wrap our LLM in a `HuggingFacePipeline` and then create a `RetrievalQA` chain that connects all the pieces: the LLM, the vector store (as a retriever), and a custom prompt.

In [None]:
# Set up the LLM pipeline
device = 0 if torch.cuda.is_available() else -1
model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

llm_pipeline = pipeline(
    'text2text-generation',
    model=model,
    tokenizer=tokenizer,
    device=device,
    max_length=200
)

llm = HuggingFacePipeline(pipeline=llm_pipeline)

# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": 2}) # Retrieve top 2 chunks

# Define a custom prompt template
prompt_template = """You are a manufacturing assistant. Use the following context to answer the question. If you don't know the answer, say so.

Context: {context}

Question: {question}

Answer:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

# Create the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff", # "stuff" means all retrieved chunks are stuffed into the prompt
    retriever=retriever,
    return_source_documents=True, # This is key to getting citations
    chain_type_kwargs={"prompt": PROMPT}
)

print("RetrievalQA chain created successfully.")

## Step 4: Run Queries and Get Answers with Sources

Now we can ask our chain questions. It will perform the retrieval, prompt augmentation, and generation all in one step, and it will return the source documents it used!

In [None]:
question = "How often should I check hydraulic press fluid levels?"

result = qa_chain.invoke({"query": question})

print("--- Question ---")
print(question)
print("\n--- Answer ---")
print(result['result'])
print("\n--- Source Documents ---")
for doc in result['source_documents']: 
    print(f"- Source: {doc.metadata['source']}, Content: '{doc.page_content}'")

In [None]:
question_2 = "What is the first step for robot maintenance?"

result_2 = qa_chain.invoke({"query": question_2})

print("--- Question ---")
print(question_2)
print("\n--- Answer ---")
print(result_2['result'])
print("\n--- Source Documents ---")
for doc in result_2['source_documents']: 
    print(f"- Source: {doc.metadata['source']}, Content: '{doc.page_content}'")

## ✅ Next Steps

This notebook demonstrated the power and simplicity of using LangChain to build a RAG pipeline. By using high-level abstractions like `RetrievalQA`, we can create a sophisticated, source-citing question-answering system with just a few lines of code.

Key takeaways:
- **Modularity:** LangChain lets you easily swap out components (different LLMs, vector stores, etc.).
- **Simplicity:** High-level chains like `RetrievalQA` abstract away the boilerplate code for retrieval, prompting, and generation.
- **Traceability:** The ability to return source documents is crucial for building trust and allowing users to verify answers.

In the final notebook of this module, **`09_vector_embeddings.ipynb`**, we will dive deeper into the heart of our RAG system: the vector embeddings themselves. We'll explore what they are, how they are created, and why choosing the right embedding model is so important.