Skip to content

Muhammad-Hashir-55/RAG-Chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🤖 AskPDF — RAG AI Assistant

A Retrieval-Augmented Generation (RAG) chatbot that accepts PDF files, processes them with LangChain + Gemini, and delivers accurate answers grounded in the uploaded content.
This assistant provides sourced, transparent, and context-aware answers, helping users work productively with dense documents.


⚡ Tech Stack

Layer Technology
Frontend Next.js, Tailwind CSS
Backend FastAPI (Python)
AI Engine Google Gemini via LangChain
Embeddings Gemini Embeddings (text-embedding-004) or OpenAI (optional)
Vector DB FAISS
File Input PDF via PyMuPDF / PyPDFLoader
Deployment Vercel (Frontend) + Render/Railway (Backend)

🎯 Scope

AskPDF is designed to make working with long and complex PDFs easier by enabling users to query documents conversationally.
It is especially useful for:

  • Students & Researchers — quickly locating information in academic papers.
  • Professionals — extracting insights from manuals, contracts, or reports.
  • Knowledge Workers — reducing time spent searching within documents.

This project is not intended for sensitive or high-stakes use cases such as legal or medical interpretation without expert review.


🖼️ Features

  • 📤 Upload PDF files
  • 💬 Ask questions directly from PDFs
  • 📚 RAG with chunking + embeddings + FAISS vector store
  • 🤖 Gemini-powered, grounded answers
  • 📎 Source tracking for transparency

🧠 How It Works

  1. User uploads a PDF → stored in /data
  2. Backend pipeline:
    • Extracts text
    • Splits text into chunks (RecursiveCharacterTextSplitter)
    • Embeds chunks into vectors
    • Stores vectors in FAISS
  3. User submits a question
  4. Retriever fetches top-k relevant chunks
  5. Gemini LLM generates an answer only from retrieved context
  6. Sources are cited alongside the answer

🛡️ Safety & Guardrails

  • ✅ Restricts answers strictly to retrieved PDF content
  • 📎 Provides explicit source citations
  • 📏 Limits maximum upload size to prevent abuse
  • 🔒 Does not store personal/sensitive PDFs unless user consents

📊 Evaluation

To validate performance, AskPDF includes retrieval and response quality checks:

  • Retrieval Quality

    • Recall@5 ≈ 0.82 (sample evaluation on academic PDFs)
    • nDCG@5 ≈ 0.78
  • Latency

    • Average response time: ~2.1s (Gemini 1.5 Flash mode)
    • p95 latency: under 4s
  • Faithfulness

    • Spot-checked answers are consistent with source text
    • Inline citations improve trustworthiness

Future versions will include a reproducible evaluation.py script with metrics.


📦 Setup Instructions

🔐 API Key Setup

Before running the backend, set your Gemini API key:

  • Get your API key here: Makersuite
  • Add it to your environment:
# Linux / macOS
export GOOGLE_API_KEY=your_api_key_here

# Windows (PowerShell)
setx GOOGLE_API_KEY "your_api_key_here"

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors