Stochastic Inc. Assignment – Document Q&A RAGBot

This repository contains my submission for the Stochastic Inc. Competency Assessment (Junior AI Engineer). I built a RAG (Retrieval-Augmented Generation) based AI Document Q&A Agent inside a Django web application, integrated with LLM APIs for enterprise-ready performance.

🎯 Project Overview

The RAGBot is an intelligent document analysis system that combines the power of Retrieval-Augmented Generation with a user-friendly web interface. It enables users to upload PDF documents and interact with them through natural language queries, making document analysis and information extraction seamless and efficient.

Key Capabilities

📂 Multi-PDF Document Processing - Upload and manage multiple PDF documents simultaneously
🔍 Contextual Q&A - Ask specific questions about document content with accurate, context-aware responses
📝 Intelligent Summarization - Generate concise summaries of sections, methodologies, or entire documents
📊 Data Extraction - Extract structured information like accuracy scores, F1-scores, and other metrics
🔑 Secure Authentication - User management and authentication via Django's built-in system
☁️ Hybrid Cloud Deployment - Scalable architecture split across multiple platforms

🚀 Features

Core Functionality

Document Ingestion: Support for multiple PDF formats with robust parsing
RAG Pipeline: Advanced retrieval system for contextually relevant responses
Natural Language Interface: Intuitive chat-like interface for document queries
Content Analysis: Deep understanding of document structure and content
Result Extraction: Automated extraction of key metrics and findings

Example Use Cases

🔍 Direct Q&A:
"What is the conclusion of Paper X?"
"Who are the authors of this research?"

📝 Summarization:
"Summarize the methodology of Paper C."
"Give me an overview of the experimental setup."

📊 Data Extraction:
"What accuracy scores are reported in this paper?"
"Extract all F1-scores from the results section."

⚙️ Installation & Setup

Prerequisites

Python 3.8 or higher
pip package manager
Git

1️⃣ Clone the Repository

git clone https://github.com/VivanRajath/Stochastic-assignment.git
cd Stochastic-assignment

2️⃣ Create Virtual Environment

python -m venv env

3️⃣ Activate the Environment

Windows (PowerShell):

.\env\Scripts\Activate

Linux/MacOS:

source env/bin/activate

4️⃣ Install Dependencies

pip install -r requirements.txt

5️⃣ Environment Configuration

Create a .env file in the project root and add your API keys:

OPENAI_API_KEY=your_openai_key_here
GEMINI_API_KEY=your_gemini_key_here
DJANGO_SECRET_KEY=your_django_secret_key

6️⃣ Database Setup

cd docbot
python manage.py makemigrations
python manage.py migrate

7️⃣ Create Superuser (Optional)

python manage.py createsuperuser

8️⃣ Run the Application

python manage.py runserver

The application will be available at: http://127.0.0.1:8000/

🌐 Live Demo

Due to computational and storage limitations, the deployment is strategically split across platforms:

🤗 Hugging Face Spaces: Hosts the RAG pipeline and ML inference
🚀 Render: Hosts the Django web application with user authentication

This hybrid architecture ensures optimal performance while working within platform constraints.

Visit Live Demo → Its a free tier Render , it might take some time to load .

🛠️ Technology Stack

Backend

Django 4.0+: Web framework and user authentication
Python 3.8+: Core programming language
SQLite/PostgreSQL: Database management

AI/ML Components

OpenAI GPT: Large Language Model for text generation
Gemini API: Alternative LLM for enhanced performance
Hugging Face Transformers: Model inference and deployment
LangChain: RAG pipeline orchestration

Document Processing

PyPDF2: PDF parsing and text extraction
Sentence Transformers: Text embeddings for retrieval
FAISS: Vector database for similarity search

Deployment

Render: Web application hosting
Hugging Face Spaces: ML model hosting
Docker: Containerization (optional)

📁 Project Structure

Stochastic-assignment/
├── docbot/                 # Django project root
│   ├── manage.py
│   ├── docbot/            # Main Django app
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   ├── ragbot/            # RAG functionality
│   │   ├── models.py
│   │   ├── views.py
│   │   ├── rag_pipeline.py
│   │   └── utils.py
│   └── templates/         # HTML templates
├── requirements.txt       # Python dependencies
├── .env.example          # Environment variables template
└── README.md            # This file

🎥 Demo Presentation

A comprehensive video demonstration (.mp4) has been submitted as part of the assessment, showcasing:

Live document upload and processing
Real-time Q&A interactions
Summarization capabilities
Data extraction features
User authentication flow

✅ Assignment Objectives Fulfilled

Requirement	Status	Implementation
AI Document Q&A Agent	✔️	RAG-based system with multi-PDF support
Structured Information Extraction	✔️	Automated parsing of sections, results, metrics
Enterprise Features	✔️	Context handling, optimization, error management
User Authentication	✔️	Django's built-in authentication system
Live Deployment	✔️	Hybrid deployment on Render + Hugging Face

🔧 Configuration Options

Environment Variables

# Required
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
DJANGO_SECRET_KEY=...

# Optional
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1
DATABASE_URL=sqlite:///db.sqlite3

Add geminiai api key inside .env

Customization

Model Selection: Switch between OpenAI and Gemini models
Chunk Size: Adjust document chunking for optimal retrieval
Temperature: Control response creativity vs accuracy
Max Tokens: Set response length limits

🚀 Future Enhancements

Support for additional document formats (DOCX, TXT, HTML)
Multi-language document processing
Advanced visualization of extracted data
API endpoints for programmatic access
Batch processing capabilities
Integration with cloud storage services

🤝 Contributing

This project was developed as part of the Stochastic Inc. Competency Assessment. While primarily for evaluation purposes, suggestions and improvements are welcome.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

Stochastic Inc. for providing this challenging and engaging assessment opportunity
OpenAI and Google for their powerful language model APIs
Hugging Face for their excellent model hosting platform
Render for reliable web application deployment
The open-source community for the amazing libraries and tools used in this project

📞 Contact

Vivan Rajath

GitHub: @VivanRajath
Email: [vivanrajath999@gmail.com]
LinkedIn: [https://www.linkedin.com/in/vivan-rajath-178a6a348/]

Built with ❤️ for Stochastic Inc. Competency Assessment

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
chat		chat
docbot		docbot
staticfiles		staticfiles
uploads		uploads
.env		.env
README.md		README.md
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt

VivanRajath/Stochastic-assignment

Folders and files

Latest commit

History

Repository files navigation