Skip to content

Advanced-RAG-LangGraph is a Streamlit-based web application that implements an advanced Retrieval-Augmented Generation (RAG) pipeline using LangGraph, ChromaDB, and Tavily to enable interactive document-based Q&A with enhanced retrieval and error-handling capabilities.

License

Notifications You must be signed in to change notification settings

chitralputhran/Advanced-RAG-LangGraph

Repository files navigation

πŸ”Ž Advanced RAG with LangGraph

Python License: MIT Streamlit LangGraph LangChain ChromaDB Code style: Python

This is a web application that allows you to upload documents and ask questions about them. It is built with LangGraph, Streamlit, and ChromaDB. If your document does not contain the answer, it automatically searches online to help you out.

This project demonstrates how to build a RAG (Retrieval-Augmented Generation) system using LangGraph for workflow management. LangGraph helps orchestrate the different steps in the RAG pipeline, from document processing to answer generation, with built-in error handling and state management.

If you're looking to understand how LangGraph works with RAG systems, this implementation shows practical patterns for building document question-answering applications with proper workflow orchestration.

How It Works

Here is what happens when you use this application and the process is actually quite straightforward:

Workflow

This diagram shows how everything works together behind the scenes.


What You Can Do

  • Upload Different Types of Files:
    • Just drag and drop your PDFs, Word docs, Excel files, or text files
    • The app figures out what type of file it is and reads the content automatically
    • Everything gets converted into a searchable format and stored so you can ask questions about it later

Document Upload Interface

Simple drag-and-drop file upload that works with multiple file types

  • Ask Questions and Get Answers:
    • Type any question about your uploaded documents
    • The application tells you exactly where it found the answer in your files
    • Cannot find it in your documents? No problem - it will search online for you
    • You will always know if the answer came from your files or from the internet

Q&A Interface

Ask questions and get answers

  • View Evaluation Results:
    • See detailed system information about how your answer was generated
    • Review evaluation scores for document relevance, question matching, and answer grounding
    • Understand which documents were used and their relevance scores
    • View confidence levels and reasoning behind each evaluation
    • Check if online search was used or if answers came from your documents

Evaluation Results

Comprehensive evaluation metrics and system transparency

  • Smart Answer Generation:

    • Uses LangGraph to make sure answers are relevant and accurate
    • Has built-in checks to catch when something might be wrong
    • Falls back to different sources if needed
    • Evaluates document relevance so you get better answers
  • See What's Happening Behind the Scenes:

    • Connects with LangSmith so you can see how it's working
    • Great for debugging or just understanding what's going on

LangSmith Tracing

Optional: See exactly how your questions are being processed


LangGraph RAG Implementation

This project shows how to implement RAG with LangGraph in a practical way. LangGraph handles the workflow orchestration, managing the different steps like document retrieval, relevance checking, and answer generation.

Why LangGraph for RAG?

  • State Management: LangGraph manages the application state as it moves through different processing steps
  • Conditional Logic: The workflow can decide whether to search documents or go online based on what it finds
  • Error Handling: Built-in mechanisms to handle failures and try alternative approaches
  • Extensibility: Easy to add new steps or modify the workflow as needed

Key LangGraph RAG Patterns Used:

  • Document evaluation before answer generation
  • Conditional routing between different search methods
  • Multi-step validation and quality checks
  • State transitions with proper error recovery

If you want to learn LangGraph RAG implementation, this codebase provides a complete working example with real-world patterns.


File Types We Support

You can upload these types of files:

  • Text Files: .txt
  • PDF Documents: .pdf
  • Microsoft Word: .docx
  • Excel Files: .csv, .xlsx

How It Works

The application works in a few simple steps, but there is complex processing happening behind the scenes:

  1. When You Upload a Document:

    • You upload your files (PDFs, Word docs, Excel, or text files)
    • The app reads the content and breaks it into smaller chunks
    • These chunks get converted into a special searchable format
    • Everything gets saved in a database called ChromaDB so it can find things quickly
  2. When You Ask a Question:

    • You type your question in the text box
    • The app checks if your question makes sense
    • It searches through your uploaded documents to find relevant information
    • Multiple checks happen to make sure the answer will be good
  3. Getting Your Answer:

    • The application looks at what it found in your documents
    • If there is relevant information, it writes an answer based on that
    • If your documents do not contain what you need, it searches online instead
    • It performs quality checks to ensure the answer is not fabricated
  4. Ensuring Everything Works Properly:

    • The application has several checkpoints to catch problems
    • It can identify when an answer might be incorrect or fabricated
    • If one method does not work, it tries another approach
    • You always know where your answer originated
  5. Evaluation and Transparency:

    • After generating an answer, the system provides detailed evaluation metrics
    • Shows document relevance scores and which documents were most helpful
    • Displays question-answer matching quality and completeness ratings
    • Provides confidence levels and reasoning for each evaluation
    • Indicates whether online search was used or if answers came from your documents
    • All evaluation data is presented in easy-to-read tables for full transparency
  6. Seeing What's Happening (Optional):

    • If you set up LangSmith, you can see exactly what the app is doing
    • Great for understanding the process or fixing issues
    • Shows you timing and performance info

LangGraph RAG Architecture

The workflow uses LangGraph to manage the entire RAG pipeline:

  • State Management: All data flows through a defined GraphState that tracks questions, documents, and evaluation results
  • Conditional Routing: The system decides whether to use document search or online search based on what it finds
  • Error Recovery: If document search fails, the workflow automatically tries online search
  • Multi-Step Validation: Each step includes quality checks before moving to the next stage
  • Extensible Design: Easy to add new evaluation steps or modify the workflow logic

This LangGraph RAG implementation provides a good foundation for building more complex document processing systems.


What You'll Need

Before you start, make sure you have these things:

  • Python 3.11 or newer - Get it here
  • Git - Download here
  • OpenAI API Key - You need this to make the LLM work
  • Tavily API Key - This is for online search (optional, but really useful)
  • LangSmith API Key - Only if you want to see the workflow details (optional)

Getting Started

Step 1: Download the Code

git clone https://github.com/chitralputhran/Advanced-RAG-LangGraph.git
cd Advanced-RAG-LangGraph

Step 2: Set Up a Virtual Environment

This keeps everything organized and will not interfere with your other Python projects.

If you are on Mac or Linux:

python3 -m venv rag_env
source rag_env/bin/activate

If you are on Windows:

python -m venv rag_env
rag_env\Scripts\activate

Step 3: Install Everything You Need

This installs all the required packages:

pip install --upgrade pip
pip install -r requirements.txt

Step 4: Add Your API Keys

Create a file called .env in the main folder and add your API keys:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
LANGCHAIN_API_KEY=your_langsmith_api_key_here
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=Advanced-RAG-LangGraph

Step 5: Start the App

streamlit run app.py

Your browser will open automatically at http://localhost:8501


Quick Start Guide

  1. Upload a File: Click the file uploader and pick a PDF, Word doc, Excel file, or text file
  2. Wait a Moment: The app will read your document and get it ready for questions
  3. Ask Away: Type your question and hit "Ask"
  4. Get Your Answer: You will see the answer

How to Use It

The Basics

  1. Start the App:

    streamlit run app.py

    Then go to http://localhost:8501 in your browser

  2. Upload Your Files:

    • Look for the file uploader on the page
    • Pick your files (PDFs, Word docs, Excel files, or text files)
    • Wait for the little progress bar to finish
  3. Ask Questions:

    • Type your question in the text box
    • Click "Ask" or just press Enter
    • Check your answer

When Things Go Wrong

Common Issues

Application Will Not Start

  • Ensure you have installed everything: pip install -r requirements.txt
  • Check your Python version: python --version (must be 3.11 or newer)
  • Make sure your virtual environment is active

API Key Issues

  • Double-check your .env file contains the correct API keys
  • Ensure your OpenAI account has sufficient credits
  • Verify that your Tavily API key works (if you are using online search)

Cannot Upload Files

  • Ensure your file type is supported (PDF, Word, Excel, or text)
  • Large files take longer - please be patient!

Running Slowly

  • Large documents can be slow - this is normal

What's Inside

Here is how the code is organized:

Advanced-RAG-LangGraph/
β”œβ”€β”€ app.py                # The main app file
β”œβ”€β”€ config.py             # All the settings
β”œβ”€β”€ utils.py              # Helper functions
β”œβ”€β”€ ui_components.py      # What you see on screen
β”œβ”€β”€ document_processor.py # How documents get processed
β”œβ”€β”€ rag_workflow.py       # RAG workflow
β”œβ”€β”€ document_loader.py    # Reads different file types
β”œβ”€β”€ state.py              # Keeps track
β”œβ”€β”€ requirements.txt      # List of needed packages
β”œβ”€β”€ chains/               # LangGraph pieces
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ document_relevance.py
β”‚   β”œβ”€β”€ evaluate.py
β”‚   β”œβ”€β”€ generate_answer.py
β”‚   └── question_relevance.py
└── screenshots/*.png     # Pictures for this README

---

## Settings

### Your API Keys

Make a file called `.env` in the main folder with your keys:

```env
# You definitely need this one
OPENAI_API_KEY=your_openai_api_key_here

# This one is really useful too
TAVILY_API_KEY=your_tavily_api_key_here

# Only if you want to see what is happening behind the scenes
LANGCHAIN_API_KEY=your_langsmith_api_key_here
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=Advanced-RAG-LangGraph

License

This project uses the MIT License - check out the LICENSE file for the details.

About

Advanced-RAG-LangGraph is a Streamlit-based web application that implements an advanced Retrieval-Augmented Generation (RAG) pipeline using LangGraph, ChromaDB, and Tavily to enable interactive document-based Q&A with enhanced retrieval and error-handling capabilities.

Topics

Resources

License

Stars

Watchers

Forks

Languages