
# Final Project: Build a RAG AI Assistant using LangChain & IBM watsonx.ai

**Author:** Alejandro Galindo Valencia  
**Program:** IBM AI Engineer Professional Certificate  
**Date:** October 07, 2025

---

This notebook implements a Retrieval-Augmented Generation (RAG) **Question-Answering Bot** using **LangChain**, **IBM watsonx.ai**, and a simple **Gradio** web interface.

**High-level pipeline:**

1. Load a PDF document.
2. Split it into manageable text chunks.
3. Create document embeddings with **Watsonx Embeddings**.
4. Store and search vectors with **ChromaDB**.
5. Answer user questions through a **RetrievalQA** chain powered by a **watsonx LLM**.
6. Serve an interactive **web app** with **Gradio**.



## Table of Contents

1. [Environment Setup](#environment-setup)  
2. [LLM Setup (IBM watsonx.ai)](#llm-setup)  
3. [Document Loading](#document-loading)  
4. [Text Splitting](#text-splitting)  
5. [Embedding Model (Watsonx Embeddings)](#embedding-model)  
6. [Vector Database (ChromaDB)](#vector-database)  
7. [Retriever and QA Chain](#retriever-and-qa-chain)  
8. [Gradio Web Application](#gradio-web-application)  
9. [How to Run Locally](#how-to-run-locally)  
10. [Notes and Tips](#notes-and-tips)



## 1) Environment Setup <a id="environment-setup"></a>

Install dependencies (uncomment and run if needed):
```bash
pip install ibm-watsonx-ai langchain langchain-ibm langchain-community chromadb pypdf gradio
```


In [1]:
pip install ibm-watsonx-ai langchain langchain-ibm langchain-community chromadb pypdf gradio

Collecting ibm-watsonx-ai
  Downloading ibm_watsonx_ai-1.3.40-py3-none-any.whl.metadata (3.3 kB)
Collecting langchain-ibm
  Downloading langchain_ibm-0.3.18-py3-none-any.whl.metadata (5.3 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.30-py3-none-any.whl.metadata (3.0 kB)
Collecting chromadb
  Downloading chromadb-1.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pypdf
  Downloading pypdf-6.1.1-py3-none-any.whl.metadata (7.1 kB)
Collecting lomond (from ibm-watsonx-ai)
  Downloading lomond-0.3.3-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting ibm-cos-sdk<2.15.0,>=2.12.0 (from ibm-watsonx-ai)
  Downloading ibm_cos_sdk-2.14.3.tar.gz (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting requests (from ibm-watsonx-ai)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB

In [2]:

# Suppress non-critical warnings for a cleaner output
def _suppress_warn(*args, **kwargs):
    pass

import warnings
warnings.warn = _suppress_warn
warnings.filterwarnings('ignore')

# Core imports
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
from ibm_watsonx_ai import Credentials

from langchain_ibm import WatsonxLLM, WatsonxEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA

import gradio as gr



## 2) LLM Setup (IBM watsonx.ai) <a id="llm-setup"></a>

The function below initializes a **watsonx LLM**.  
You can try different models (e.g., *Granite*, *Llama*, *Mixtral*) as supported by your watsonx.ai account.


In [3]:

def get_llm():
    """Create and return a WatsonxLLM instance.

    Model:
        - Default: ibm/granite-3-2-8b-instruct

    Parameters:
        - temperature (float): controls randomness
        - max_new_tokens (int): generation length
        - decoding_method (str): 'greedy' or other supported methods

    Returns:
        WatsonxLLM: Configured LLM client.
    """
    model_id = "ibm/granite-3-2-8b-instruct"
    parameters = {
        "temperature": 0.5,
        "max_new_tokens": 256,
        "decoding_method": "greedy"
    }
    project_id = "skills-network"
    url = "https://us-south.ml.cloud.ibm.com"

    watsonx_llm = WatsonxLLM(
        model_id=model_id,
        url=url,
        project_id=project_id,
        params=parameters
    )
    return watsonx_llm



## 3) Document Loading <a id="document-loading"></a>

We use **PyPDFLoader** from `langchain_community` to parse PDF files into `Document` objects.


In [4]:

def document_loader(file):
    """Load a PDF file into LangChain Document objects.

    Args:
        file: A file-like object coming from Gradio's uploader.

    Returns:
        List[Document]: Parsed documents.
    """
    loader = PyPDFLoader(file.name)
    loaded_document = loader.load()
    return loaded_document



## 4) Text Splitting <a id="text-splitting"></a>

Long documents are split into overlapping chunks to improve retrieval quality.


In [5]:

def text_splitter(data):
    """Split documents into manageable text chunks.

    Args:
        data: List of LangChain Document objects.

    Returns:
        List[Document]: Chunked documents.
    """
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=100,
        length_function=len
    )
    chunks = splitter.split_documents(data)
    return chunks



## 5) Embedding Model (Watsonx Embeddings) <a id="embedding-model"></a>

We create embeddings using **Watsonx Embeddings** to support vector similarity search.


In [6]:

def watsonx_embedding():
    """Instantiate the Watsonx Embeddings model for document retrieval."""
    embed_params = {
        "model_type": "embedding",
        "task_type": "retrieval_document",
    }
    embedding = WatsonxEmbeddings(
        model_id="ibm/slate-125m-english-rtrvr",
        url="https://us-south.ml.cloud.ibm.com",
        project_id="skills-network",
        params=embed_params,
    )
    return embedding



## 6) Vector Database (ChromaDB) <a id="vector-database"></a>

We store the chunk embeddings in **Chroma** to enable fast semantic search.


In [7]:

def vector_database(chunks):
    """Build a Chroma vector store from chunked documents."""
    embedding_model = watsonx_embedding()
    vectordb = Chroma.from_documents(chunks, embedding_model)
    return vectordb



## 7) Retriever and QA Chain <a id="retriever-and-qa-chain"></a>

We convert the vector store into a retriever and connect it with the LLM using `RetrievalQA`.


In [8]:

def retriever(file):
    """Create a retriever from an uploaded file."""
    splits = document_loader(file)
    chunks = text_splitter(splits)
    vectordb = vector_database(chunks)
    return vectordb.as_retriever()


def retriever_qa(file, query):
    """Run the RetrievalQA chain over the uploaded file for a given query.

    Args:
        file: Uploaded PDF file (from Gradio).
        query (str): The user question.

    Returns:
        str: The generated answer text.
    """
    llm = get_llm()
    retriever_obj = retriever(file)
    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever_obj,
        return_source_documents=False
    )
    response = qa.invoke({"query": query})
    return response["result"]



## 8) Gradio Web Application <a id="gradio-web-application"></a>

The UI lets users upload a PDF and ask questions.  
Responses are generated using the RAG pipeline defined above.


In [9]:

rag_application = gr.Interface(
    fn=retriever_qa,
    allow_flagging="never",
    inputs=[
        gr.File(
            label="Upload PDF File",
            file_count="single",
            file_types=[".pdf"],
            type="filepath"
        ),
        gr.Textbox(
            label="Input Query",
            lines=2,
            placeholder="Type your question here..."
        )
    ],
    outputs=gr.Textbox(label="Answer"),
    title="QA Bot using LangChain and Watsonx",
    description="Upload a PDF document and ask any question. The chatbot will try to answer using the provided document."
)

# Uncomment the line below to launch locally from the notebook environment
# rag_application.launch(server_name="0.0.0.0", server_port=7860)



## 9) How to Run Locally <a id="how-to-run-locally"></a>

1. Create and activate a virtual environment (optional but recommended).
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
3. Run directly from the notebook (uncomment the `launch` line) **or** export this logic into `app.py` and run:
   ```bash
   python app.py
   ```
4. Open your browser at **http://localhost:7860**.



## 10) Notes and Tips <a id="notes-and-tips"></a>

- For production, consider **persisting** your Chroma vector store to disk for faster restarts.
- You can expose **environment variables** (API endpoints, project ID, model ID) for configurability.
- You may add **source document return** to display citations (`return_source_documents=True`).

---

*Happy building!*
