# Lab 2 — Retrieval-Augmented Generation (RAG) with LangChain & Gemini

Welcome to the second part of our workshop! In Lab 1, we learned how to generate creative text. Now, we'll solve a key limitation: how to make an LLM answer questions about specific, private, or very recent data it wasn't trained on.

We'll build the **EU AI News Navigator**, a Q&A application that uses RAG to answer questions about recent EU AI Act news.

**What you'll learn**
### What you'll learn
1.  **The "Why":** Understand the limitations of standard LLMs and why RAG is necessary.
2.  **The Core Components of RAG:** Learn about Document Loading, Text Splitting (Chunking), Embeddings, and Vector Stores.
3.  **Building a RAG Chain with LangChain:** Use LangChain to quickly assemble a Q&A pipeline.
4.  Build a Gradio app that answers questions based on our custom, up-to-date documents.


**References**
- LangChain: [Question Answering / Retrieval](https://python.langchain.com/docs/use_cases/question_answering/)
- Google Gemini API: [Quickstart](https://ai.google.dev/gemini-api/docs/quickstart)



## ⚙️ 1) Setup & API Key

First, we'll install the necessary libraries and configure your Gemini API key.


In [None]:
!pip install -q langchain langchain_community langchain-google-genai faiss-cpu unstructured gradio 2>/dev/null

In [None]:
import os
import getpass
import textwrap
from IPython.display import display, Markdown

# Configure your Gemini API Key.
if 'GEMINI_API_KEY' not in os.environ:
    os.environ['GEMINI_API_KEY'] = getpass.getpass("Enter your GEMINI_API_KEY: ")

# # Some LangChain integrations historically look for GOOGLE_API_KEY.
# This alias ensures compatibility by setting it if GEMINI_API_KEY is present.
if 'GOOGLE_API_KEY' not in os.environ and 'GEMINI_API_KEY' in os.environ:
    os.environ['GOOGLE_API_KEY'] = os.environ['GEMINI_API_KEY']

print("GEMINI_API_KEY detected:", "Yes" if os.environ.get("GEMINI_API_KEY") else "No")


## 2) The Problem: An LLM's Knowledge is Limited

Let's ask a standard Gemini model a specific question about the brand-new initiatives and tools the European Commission launched just last week (October 8, 2025) to help businesses and researchers comply with the EU AI Act. This information was published just a few days ago, so the model has no pre-trained knowledge of it.

**Source -** European Commission News: [Commission launches AI Act Service Desk and Single Information Platform](https://digital-strategy.ec.europa.eu/en/news/commission-launches-ai-act-service-desk-and-single-information-platform-support-ai-act)


In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
question = "I'm a startup owner in Dublin. What new resources did the EU launch in October 2025 to help me understand the AI Act?"

print("--- Asking the standard LLM (without RAG) ---")
response = llm.invoke(question)
display(Markdown(response.content))


As expected when an LLM doesn't know the answer to a question, it typically does one of two things: it either invents an answer or it states that it cannot answer. The response here is generic; not fact-based or specific. It doesn't know about the AI Act Service Desk or the Compliance Checker because those are brand-new.

This is the exact problem RAG solves.



## 3) Load the Corpus (EU AI Act News mini‑dataset)
In this section, we’ll work with a small knowledge base built from recent EU policy updates on AI regulation.
The corpus simulates a real-world dataset that our Retrieval-Augmented Generation (RAG) system will later use to answer questions.

### About the Dataset

This mini-dataset contains text excerpts from official news sources, summarizing the European Commission’s latest AI initiatives (October 2025):

- “New Tools for AI Act Compliance” - Covers the launch of tools like the AI Act Service Desk, Single Information Platform, Compliance Checker, and AI Act Explorer.

  Source: European Commission News: [Commission launches AI Act Service Desk and Single Information Platform](https://digital-strategy.ec.europa.eu/en/news/commission-launches-ai-act-service-desk-and-single-information-platform-support-ai-act) (Published Oct 8, 2025)

- “New EU AI Strategies for Industry and Science” - → Describes the Apply AI Strategy and the AI in Science Strategy, including the creation of RAISE (Resource for AI Science in Europe).

  Source: Development Aid Report: [EU launches €1bn AI strategies for industry and science](https://www.developmentaid.org/news-stream/post/200924/eu-ai-strategies-apply-ai-science-strategy-european-commission) (Published Oct 12, 2025)

### Why This Dataset?

Ultra-Recent: Published in October 2025 — outside most models’ training windows.

Jargon-Rich: Includes official program names and technical language ideal for retrieval.

Realistic Scenario: Mimics how teams might use RAG to explore policy documents, compliance guidance, or legal updates.

Next step: Let’s load these text files into our environment and preview their contents before embedding them.


In [None]:
# Grant notebook access to Google Drive files
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os

# Define the folder containing your corpus
DATA_DIR = "/content/drive/MyDrive/WAIWorkshop - Chatbots/EU_AI_Act_News_Corpus"

# Check if folder exists
if not os.path.exists(DATA_DIR):
    print("Folder not found:", DATA_DIR)
    print("Please upload the text files to the folder specified in the data_dir.")
    os.makedirs(DATA_DIR, exist_ok=True)
else:
    print(f"Found corpus folder: {DATA_DIR}")
    files = [f for f in os.listdir(DATA_DIR) if f.endswith('.txt')]
    if not files:
        print("No .txt files found! Please upload your corpus files.")
    else:
        print(f"Found {len(files)} text files:")
        for f in files:
            print("   -", f)


## 4) Build the RAG Pipeline

Now we'll use LangChain to build a basic RAG pipeline.

Ref: https://python.langchain.com/docs/tutorials/rag/


A typical RAG application has two main components:

1. Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.
   - Load: First we need to load our data. This is done with Document Loaders.
    - Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won't fit in a model's finite context window.
    - Store: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using a VectorStore and Embeddings model.0

2. Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.


In [None]:
# import libraries
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain

### Load Documents
LangChain has hundreds of document loaders, with over 200 listed in its community integrations for loading data from a vast array of sources. Specific loaders are available for common formats such as CSV, JSON, and PDF, as well as integrations for services like Notion, Google Drive, and YouTube.

See Full list of Document Loaders [here](https://python.langchain.com/docs/integrations/document_loaders/)

We will be using the [DirectoryLoader](https://python.langchain.com/docs/how_to/document_loader_directory/) which loads files from a folder. It supports formats like PDF, HTML, Markdown and Text .

In [None]:
# 1. Load Documents
loader = DirectoryLoader(DATA_DIR, glob="*.txt", show_progress=True)
docs = loader.load()
print(f"\n Loaded {len(docs)} documents.")

In [None]:
len(docs[1].page_content)

In [None]:
print(docs[0].page_content[:500])

### Split Documents into Chunks

Now that we have loaded our documents, we need to process them for our model. LLMs have a limited context window, meaning they can only process a certain amount of text at once. To work around this, we must break our documents into smaller pieces. This process is called chunking. The goal is to create chunks that are small enough to fit in the model's context window while retaining their semantic meaning.

There are several ways to split text; including Character Splitting, Token Splitting, Semantic Chunking; each with its own advantages. For more
information, see the [Langchain Docs](https://python.langchain.com/docs/concepts/text_splitters/)

We'll use the RecursiveCharacterTextSplitter, which is a recommended starting point. It tries to keep related text together by splitting on paragraphs, then lines, and so on.

In [None]:
# 2. Split Documents into Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
print(f"Split into {len(splits)} chunks.")

### Create Embeddings and Vector Store (using FAISS)

To perform retrieval, we need to compare a user's query with our document chunks based on their semantic meaning (semantic search).

To enable semantic search, we must convert our text chunks into numerical representations called embeddings. These are vectors that capture the meaning of the text. We then load these vectors into a vector store, a specialized database designed for efficient similarity searching.

See links for more information on Embedding Models and Vector Stores

[Embedding Models](https://python.langchain.com/docs/concepts/embedding_models/)

[Vector Stores](https://python.langchain.com/docs/concepts/vectorstores/)


### **Implementation**
We will use a Hugging Face model to create the embeddings and FAISS (Facebook AI Similarity Search) as our in-memory vector store.


*   [all-MiniLM-L6-v2 model](http://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - maps sentences & paragraphs to a 384 dimensional dense vector space

In [None]:
# 3. Create Embeddings and Vector Store (using FAISS)

from langchain_community.embeddings import HuggingFaceEmbeddings

# Load a compact, high-quality embedding model
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Create FAISS vector store
from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(splits, embedding=embeddings)

print("Vector store created successfully (using Hugging Face embeddings).")


### Create the RAG Chain
Now, we'll assemble our components into a runnable chain. This chain will automatically handle the two-step RAG process: retrieving relevant documents and then generating an answer based on them.

### **Implementation**

* Define a Prompt: We create a prompt template that instructs the LLM to answer a question (input) using only the provided documents (context). This helps prevent the model from making things up.

* Create a Retriever: We convert our vector store into a retriever, which is an object designed to fetch relevant documents.

* Build the Chain: create_retrieval_chain links the retriever with the prompt and the LLM (llm). This single retrieval_chain object now encapsulates the entire RAG workflow.

In [None]:
# 4. Create the RAG Chain
prompt = ChatPromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)
retriever = vectorstore.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

print("RAG chain created successfully!")

## 5) Test the RAG Chain & Inspect Retrieval

Let's ask our question again. We'll also inspect the documents the retriever fetched to understand how it's working.


In [None]:

question = input("Enter your question (or press Enter to use default): ").strip()
if not question:
    question = "I'm a startup owner in Dublin. What new resources did the EU launch in October 2025 to help me understand the AI Act?"
print("--- Asking the RAG-Powered LLM ---")
response = retrieval_chain.invoke({"input": question})
display(Markdown(response["answer"]))


### Tune Retrieval (k) & Inspect Context
The `k` parameter in the retriever determines how many chunks of text are fetched from the vector store. Let's inspect what the retriever is finding.


In [None]:
# Set k to 3 to see the top 3 most relevant chunks
retriever_with_k = vectorstore.as_retriever(search_kwargs={"k": 3})

print(f"Inspecting retrieved documents for question: '{question}'\n")
retrieved_docs = retriever_with_k.invoke(question)

for i, doc in enumerate(retrieved_docs):
    print(f"--- Document {i+1} ---")
    print(f"Source: {doc.metadata.get('source')}")
    display(Markdown(textwrap.shorten(doc.page_content, width=400, placeholder="...")))


## 6) Build the Gradio App: "EU AI News Navigator"

Finally, let's wrap our chain in a Gradio UI. A key feature is displaying the sources used for each answer, which is crucial for building trust and transparency.


In [None]:
import gradio as gr

def get_rag_response(user_question):
    """Invokes the RAG chain and returns the answer and formatted sources."""
    response = retrieval_chain.invoke({"input": user_question})

    answer = response.get("answer", "Sorry, I couldn't find an answer.")
    sources_text = "\n\n---\n**Sources Used:**\n"

    # Get unique sources from the context
    unique_sources = {doc.metadata.get('source', 'Unknown') for doc in response.get("context", [])}

    for source in sorted(list(unique_sources)):
        sources_text += f"- {os.path.basename(source)}\n"

    return answer + sources_text

with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("# 🇪🇺 EU AI News Navigator")
    gr.Markdown("Ask me any question about the new EU AI Act tools and strategies launched in October 2025. My knowledge is based on the documents provided in this workshop.")

    with gr.Row():
        inp = gr.Textbox(label="Your Question", lines=2, placeholder="e.g., What is the 'Apply AI Alliance'?")
        out = gr.Markdown(label="Answer from Documents")

    btn = gr.Button("Ask Navigator", variant="primary")
    btn.click(get_rag_response, inp, out)

    gr.Examples(
        examples=[
            ["What is RAISE and how much funding is it getting?"],
            ["I run a small business. What new tools can I use to understand the AI Act?"],
            ["When do the rules for high-risk AI systems fully apply?"]
        ],
        inputs=inp
    )

print("Gradio app ready. Launching...")
demo.launch(debug=True)


In [None]:
demo.close()

## Wrap‑Up

Congratulations on building a complete Retrieval-Augmented Generation (RAG) pipeline! You've successfully implemented the core workflow that powers question-answering systems. This forms a powerful baseline for building applications that can reason about private or up-to-date information.

### **Next Steps:**
- Experiment with different document loaders (e.g., `PyPDFLoader`).
- Persist your vector store so you don't have to rebuild it every time.

### **Beyond the Basics: Exploring Advanced RAG**

The RAG pipeline you built is fantastic, but the field is evolving rapidly. Here are several exciting concepts and techniques you can explore to make your RAG systems even more powerful and intelligent.

1. Optimize the Core Pipeline
Before moving to complex architectures, you can significantly improve the performance of the basic RAG flow:

    * Smarter Chunking: Instead of splitting by a fixed character count, try Semantic Chunking. This method splits text based on semantic similarity, keeping related ideas together in the same chunk, which can greatly improve context.

    * Hybrid Search: Combine vector search (semantic) with traditional keyword search (like BM25). This is powerful for queries that depend on specific keywords or acronyms.

    * Re-ranking: Use a two-stage process. First, retrieve a larger number of documents (e.g., 20). Then, use a more powerful, slower model (a cross-encoder) to re-rank the initial results and select the top few (e.g., 3-5) to send to the LLM.


2. Agentic RAG
This is the cutting edge. In a simple RAG chain, the process is static. In Agentic RAG,  you use an LLM-powered agent that can think, reason, and use tools. The agent can make decisions in a loop:

    * Analyze the Query: The agent first looks at the user's question and decides on a plan.

    * Decide to Retrieve (or Not): It might decide that the question is simple and doesn't require retrieval at all.

    * Perform Complex Searches.

    * Reflect and Refine: The agent can look at the search results and decide they aren't good enough, then reformulate the query and search again.

Further Reading
* [Langchain Docts - Retrieval](https://python.langchain.com/docs/concepts/retrieval/)
* [DeepLearning.AI - Retrieval Augmented Generation (RAG) Course](https://www.deeplearning.ai/courses/retrieval-augmented-generation-rag/)
