# 🛠️ Week 5-6 · Notebook 08 · Production-Grade RAG with LangChain

**Module:** LLMs, Prompt Engineering & RAG  
**Project:** Build the Knowledge Core for the Manufacturing Copilot

---

### From Scratch to Framework: Scaling our RAG System

In the previous notebook, we built a complete Retrieval-Augmented Generation (RAG) pipeline from scratch. This was essential for understanding the fundamental mechanics: indexing, retrieval, and generation.

However, building for production requires more than just understanding. It demands tools that are robust, scalable, and maintainable. This is where frameworks like **LangChain** come in.

LangChain provides a comprehensive toolkit of pre-built, modular components that streamline the development of complex LLM applications. Instead of writing boilerplate code for loading documents, managing vector stores, or chaining prompts, we can use LangChain's high-level APIs to assemble a production-ready RAG pipeline quickly and efficiently.

In this notebook, we will rebuild our manufacturing knowledge base using LangChain, transforming our simple proof-of-concept into a powerful, framework-driven application.

## 🎯 Learning Objectives

By the end of this notebook, you will be able to build a robust, production-style RAG pipeline for our Manufacturing Copilot. You will learn to:

1.  **Master the LangChain RAG Toolkit:**
    -   **`DocumentLoader`**: Automatically load and parse knowledge from various sources (like our SOP text files).
    -   **`TextSplitter`**: Intelligently chunk large documents into smaller, semantically meaningful pieces for effective retrieval.
    -   **`HuggingFaceEmbeddings`**: Convert text chunks into vector embeddings using powerful open-source models.
    -   **`Chroma` (VectorStore)**: Create a persistent, searchable vector database to serve as the long-term memory for our copilot.

2.  **Construct a `RetrievalQA` Chain:**
    -   Assemble a complete, end-to-end question-answering system that seamlessly connects the user's query to the vector store and the LLM.

3.  **Engineer Prompts within a Chain:**
    -   Inject custom instructions into the `RetrievalQA` chain to precisely control the LLM's behavior, ensuring it acts as a helpful manufacturing assistant.

4.  **Implement Source Citation for Trustworthiness:**
    -   Configure the chain to return the exact source documents it used to generate an answer, a critical feature for building user trust and enabling fact-checking.

## ⚙️ Step 1: Environment Setup

First, we need to install the necessary libraries. LangChain is a modular framework, so we install the core `langchain` package along with specific integrations we'll be using:

-   `langchain-community`: Provides integrations with community-maintained tools like Hugging Face models and ChromaDB.
-   `sentence-transformers`: The library that powers our embedding model.
-   `chromadb`: The high-performance vector store we'll use as our knowledge base.
-   `transformers` and `torch`: The foundational libraries for running our local LLM.

This modular approach allows us to keep our environment lean by only installing what we need.

In [None]:
# Install the required packages
!pip install -q langchain langchain-community sentence-transformers chromadb transformers torch

# --- Core LangChain Imports ---
# Document processing
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Vector store and embeddings
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

# LLM and pipeline
from langchain_community.llms import HuggingFacePipeline
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import torch

# Chains and prompts
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# --- Standard Library Imports ---
import os

print("Libraries installed and imported successfully.")

## 📚 Step 2: Build the Knowledge Base

The first phase in any RAG system is building the knowledge base. With LangChain, this process is broken down into three clean, modular steps: **Load**, **Split**, and **Store**.

### 2.1. Create Dummy Data: Standard Operating Procedures (SOPs)

First, let's create the raw material for our knowledge base. We'll simulate a set of Standard Operating Procedure (SOP) documents that our Manufacturing Copilot will need to understand. These plain text files represent the "knowledge" we want to inject into our LLM.

### 2.3. Split Documents with `RecursiveCharacterTextSplitter`

LLMs have a limited context window, meaning they can only process a certain amount of text at once. Therefore, we must split our large documents into smaller, more manageable chunks.

However, we must be careful not to split the text in the middle of a sentence, which could destroy its meaning. LangChain's `RecursiveCharacterTextSplitter` is a smart solution. It attempts to split text based on a prioritized list of separators (like `\n\n` for paragraphs, then `\n` for lines, then spaces) to keep semantically related pieces of text together as much as possible.

-   `chunk_size`: Defines the maximum size of each chunk (in characters).
-   `chunk_overlap`: Creates a small overlap between chunks. This helps preserve context for sentences that might otherwise be split across two chunks.

In [None]:
# 1. Load the documents
loader_paths = [os.path.join("sops", name) for name in os.listdir("sops")]
loaders = [TextLoader(path) for path in loader_paths]
documents = []
for loader in loaders:
    documents.extend(loader.load())
print(f"Loaded {len(documents)} documents.")

# 2. Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=50)
chunked_docs = text_splitter.split_documents(documents)

print(f"Split {len(documents)} documents into {len(chunked_docs)} chunks.")
print("\n--- Example Chunk ---")
print("Content:")
print(chunked_docs[0].page_content)
print("\nMetadata:")
print(chunked_docs[0].metadata)

### 2.4. Store Chunks in a `Chroma` Vector Store

Once our knowledge is loaded and chunked, the final step is to **index** it for fast retrieval. This involves two parts:

1.  **Embedding Model (`HuggingFaceEmbeddings`)**: We select an embedding model to convert our text chunks into numerical vectors (embeddings). We'll use `all-MiniLM-L6-v2`, a high-quality, lightweight model perfect for this task. LangChain's `HuggingFaceEmbeddings` wrapper makes it trivial to use any model from the Hugging Face Hub.

2.  **Vector Store (`Chroma`)**: We choose a vector store to house our embeddings. `Chroma` is a popular open-source vector database that runs locally, making it perfect for development. It stores the embeddings and provides a powerful `similarity_search` function.

LangChain's `Chroma.from_documents` function elegantly handles this entire process: it takes our chunked documents, passes them through the embedding model, and stores the resulting vectors in the Chroma database in a single line of code. We also set `persist_directory` to save our database to disk, so we don't have to rebuild it every time we run the script.

In [None]:
# Define the embedding model we'll use
# "all-MiniLM-L6-v2" is a popular and efficient model for generating sentence embeddings.
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Define the directory to persist the vector store
# Persisting the DB saves computation time, as we don't need to re-create embeddings every time.
persist_directory = "./chroma_db_langchain"

# Create the Chroma vector store
# This single command does the following:
# 1. Takes the `chunked_docs`.
# 2. Uses `embedding_model` to convert each chunk's content into a vector.
# 3. Stores the vector and the original document chunk (including metadata) in the Chroma database.
vector_store = Chroma.from_documents(
    documents=chunked_docs,
    embedding=embedding_model,
    persist_directory=persist_directory
)

print(f"Successfully created and persisted vector store with {vector_store._collection.count()} vectors.")
print(f"Database is stored in: {os.path.abspath(persist_directory)}")

## ⚙️ Step 3: Build the RAG Chain

With our knowledge base indexed and ready, we can now build the "live" part of our RAG system: the question-answering chain. This chain will orchestrate the entire process of receiving a user query, retrieving relevant information, and generating an answer.

LangChain's `RetrievalQA` chain is the perfect tool for this. It elegantly connects three key components:
1.  **The LLM:** The "brain" that will generate the final answer.
2.  **The Retriever:** The "librarian" that searches the vector store for relevant documents.
3.  **The Prompt Template:** The "instructions" that guide the LLM on how to behave and how to use the retrieved context.

In [None]:
# --- 1. Set up the LLM ---
# We'll use the same local Hugging Face pipeline as in the previous notebook.
# LangChain's `HuggingFacePipeline` wrapper makes it easy to integrate.
print("Setting up the LLM...")
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Create a text-generation pipeline
llm_pipeline = pipeline(
    'text2text-generation',
    model=model,
    tokenizer=tokenizer,
    device=device,
    max_length=256,
    temperature=0.1,
    top_p=0.95
)

# Wrap the pipeline in LangChain's utility class
llm = HuggingFacePipeline(pipeline=llm_pipeline)
print("LLM setup complete.")

# --- 2. Set up the Retriever ---
# The retriever's job is to fetch relevant documents from the vector store.
# We can configure it to return the top 'k' most similar documents.
print("\nSetting up the Retriever...")
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2} # Retrieve the top 2 most relevant chunks
)
print("Retriever setup complete.")

# --- 3. Set up the Prompt Template ---
# This is the instruction set for our LLM. It defines how the LLM should use
# the retrieved context to answer the user's question.
print("\nSetting up the Prompt Template...")
prompt_template = """
You are a helpful and precise manufacturing assistant. Your role is to answer questions about Standard Operating Procedures (SOPs).
Use the following retrieved context to answer the question.
If you don't know the answer from the context provided, just say that you don't know. Do not make up an answer.

CONTEXT:
{context}

QUESTION:
{question}

ACCURATE ANSWER:
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
print("Prompt Template setup complete.")


# --- 4. Create the RetrievalQA Chain ---
# This is the final step where we assemble all our components.
print("\nCreating the RetrievalQA chain...")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # "stuff" is the simplest method: it "stuffs" all retrieved chunks into the prompt.
    retriever=retriever,
    return_source_documents=True, # This is crucial for citing sources.
    chain_type_kwargs={"prompt": PROMPT}
)

print("RetrievalQA chain created successfully!")

## ✅ Step 4: Run Queries and Cite Sources

Our RAG system is now fully assembled and ready to use.

When we call `qa_chain.invoke()`, LangChain orchestrates the entire RAG flow behind the scenes:
1.  The user's `question` is sent to the `retriever`.
2.  The `retriever` converts the question into an embedding and searches the `Chroma` vector store for the most similar document chunks.
3.  The retrieved chunks (the `context`) and the original `question` are inserted into our `PROMPT` template.
4.  The completed prompt is sent to the `llm`.
5.  The `llm` generates an answer based *only* on the provided context.
6.  Because we set `return_source_documents=True`, the final result includes both the generated `result` (the answer) and the `source_documents` used to create it.

This ability to cite sources is a cornerstone of building trustworthy and transparent AI systems. It allows users to verify the information and builds confidence in the copilot's responses.

In [None]:
# Define a helper function to run queries and print the results nicely
def ask(question: str):
    """
    Invokes the RAG chain with a question and prints the answer and sources.
    """
    # The `invoke` method runs the entire chain
    result = qa_chain.invoke({"query": question})

    print(f"❓ Question: {question}\n")
    print(f"✅ Answer: {result['result']}\n")
    print("--- 📜 Sources ---")
    
    # Print the source documents
    for doc in result['source_documents']:
        source_file = doc.metadata['source']
        content = doc.page_content
        print(f"📄 Source: {source_file}\n   Content: '{content}'\n")
    print("--------------------")

# --- Let's ask our first question! ---
question_1 = "How often should I check hydraulic press fluid levels?"
ask(question_1)

In [None]:
# --- Let's try another, more specific question ---
question_2 = "What is the first step for robot maintenance?"
ask(question_2)

## 🎉 Congratulations and Next Steps!

You have successfully built a production-grade, source-citing RAG pipeline using LangChain!

This notebook demonstrated the immense power and efficiency of using a framework. By leveraging LangChain's high-level abstractions, we were able to construct a sophisticated, modular, and trustworthy question-answering system with remarkably little code.

### Key Takeaways:

-   **Modularity is Power:** LangChain's component-based architecture (Loaders, Splitters, Embedders, Vector Stores, Chains) allows you to easily swap parts of your pipeline. Want to use a different LLM? Just change the `llm` object. Need to load PDFs instead of TXT files? Swap `TextLoader` for `PyPDFLoader`. The rest of your code remains the same.
-   **Abstraction Simplifies Complexity:** High-level chains like `RetrievalQA` handle all the complex orchestration of retrieval, prompt formatting, and generation, letting you focus on the overall application logic rather than the boilerplate.
-   **Traceability Builds Trust:** The ability to return source documents is not just a feature; it's a fundamental requirement for building responsible AI. It provides transparency and allows users to verify the system's answers, which is critical in a manufacturing or any other high-stakes environment.

### Looking Ahead:

In the next notebook, **`09_vector_embeddings.ipynb`**, we will peel back one final layer of abstraction and dive deep into the heart of our RAG system: the **vector embeddings**. We will explore:
- What are embeddings, intuitively?
- How does an embedding model *really* work?
- Why is choosing the right embedding model so critical for retrieval quality?

Understanding embeddings is the key to mastering R-A-G and unlocking the next level of performance for your AI applications. Let's dive in!