Skip to content

VivanRajath/Stochastic-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Stochastic Inc. Assignment – Document Q&A RAGBot

Python Django License Demo

This repository contains my submission for the Stochastic Inc. Competency Assessment (Junior AI Engineer). I built a RAG (Retrieval-Augmented Generation) based AI Document Q&A Agent inside a Django web application, integrated with LLM APIs for enterprise-ready performance.

🎯 Project Overview

The RAGBot is an intelligent document analysis system that combines the power of Retrieval-Augmented Generation with a user-friendly web interface. It enables users to upload PDF documents and interact with them through natural language queries, making document analysis and information extraction seamless and efficient.

Key Capabilities

  • πŸ“‚ Multi-PDF Document Processing - Upload and manage multiple PDF documents simultaneously
  • πŸ” Contextual Q&A - Ask specific questions about document content with accurate, context-aware responses
  • πŸ“ Intelligent Summarization - Generate concise summaries of sections, methodologies, or entire documents
  • πŸ“Š Data Extraction - Extract structured information like accuracy scores, F1-scores, and other metrics
  • πŸ”‘ Secure Authentication - User management and authentication via Django's built-in system
  • ☁️ Hybrid Cloud Deployment - Scalable architecture split across multiple platforms

πŸš€ Features

Core Functionality

  • Document Ingestion: Support for multiple PDF formats with robust parsing
  • RAG Pipeline: Advanced retrieval system for contextually relevant responses
  • Natural Language Interface: Intuitive chat-like interface for document queries
  • Content Analysis: Deep understanding of document structure and content
  • Result Extraction: Automated extraction of key metrics and findings

Example Use Cases

πŸ” Direct Q&A:
"What is the conclusion of Paper X?"
"Who are the authors of this research?"

πŸ“ Summarization:
"Summarize the methodology of Paper C."
"Give me an overview of the experimental setup."

πŸ“Š Data Extraction:
"What accuracy scores are reported in this paper?"
"Extract all F1-scores from the results section."

βš™οΈ Installation & Setup

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • Git

1️⃣ Clone the Repository

git clone https://github.com/VivanRajath/Stochastic-assignment.git
cd Stochastic-assignment

2️⃣ Create Virtual Environment

python -m venv env

3️⃣ Activate the Environment

Windows (PowerShell):

.\env\Scripts\Activate

Linux/MacOS:

source env/bin/activate

4️⃣ Install Dependencies

pip install -r requirements.txt

5️⃣ Environment Configuration

Create a .env file in the project root and add your API keys:

OPENAI_API_KEY=your_openai_key_here
GEMINI_API_KEY=your_gemini_key_here
DJANGO_SECRET_KEY=your_django_secret_key

6️⃣ Database Setup

cd docbot
python manage.py makemigrations
python manage.py migrate

7️⃣ Create Superuser (Optional)

python manage.py createsuperuser

8️⃣ Run the Application

python manage.py runserver

The application will be available at: http://127.0.0.1:8000/

🌐 Live Demo

Due to computational and storage limitations, the deployment is strategically split across platforms:

  • πŸ€— Hugging Face Spaces: Hosts the RAG pipeline and ML inference
  • πŸš€ Render: Hosts the Django web application with user authentication

This hybrid architecture ensures optimal performance while working within platform constraints.

Visit Live Demo β†’ Its a free tier Render , it might take some time to load .

πŸ› οΈ Technology Stack

Backend

  • Django 4.0+: Web framework and user authentication
  • Python 3.8+: Core programming language
  • SQLite/PostgreSQL: Database management

AI/ML Components

  • OpenAI GPT: Large Language Model for text generation
  • Gemini API: Alternative LLM for enhanced performance
  • Hugging Face Transformers: Model inference and deployment
  • LangChain: RAG pipeline orchestration

Document Processing

  • PyPDF2: PDF parsing and text extraction
  • Sentence Transformers: Text embeddings for retrieval
  • FAISS: Vector database for similarity search

Deployment

  • Render: Web application hosting
  • Hugging Face Spaces: ML model hosting
  • Docker: Containerization (optional)

πŸ“ Project Structure

Stochastic-assignment/
β”œβ”€β”€ docbot/                 # Django project root
β”‚   β”œβ”€β”€ manage.py
β”‚   β”œβ”€β”€ docbot/            # Main Django app
β”‚   β”‚   β”œβ”€β”€ settings.py
β”‚   β”‚   β”œβ”€β”€ urls.py
β”‚   β”‚   └── wsgi.py
β”‚   β”œβ”€β”€ ragbot/            # RAG functionality
β”‚   β”‚   β”œβ”€β”€ models.py
β”‚   β”‚   β”œβ”€β”€ views.py
β”‚   β”‚   β”œβ”€β”€ rag_pipeline.py
β”‚   β”‚   └── utils.py
β”‚   └── templates/         # HTML templates
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ .env.example          # Environment variables template
└── README.md            # This file

πŸŽ₯ Demo Presentation

A comprehensive video demonstration (.mp4) has been submitted as part of the assessment, showcasing:

  • Live document upload and processing
  • Real-time Q&A interactions
  • Summarization capabilities
  • Data extraction features
  • User authentication flow

βœ… Assignment Objectives Fulfilled

Requirement Status Implementation
AI Document Q&A Agent βœ”οΈ RAG-based system with multi-PDF support
Structured Information Extraction βœ”οΈ Automated parsing of sections, results, metrics
Enterprise Features βœ”οΈ Context handling, optimization, error management
User Authentication βœ”οΈ Django's built-in authentication system
Live Deployment βœ”οΈ Hybrid deployment on Render + Hugging Face

πŸ”§ Configuration Options

Environment Variables

# Required
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
DJANGO_SECRET_KEY=...

# Optional
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1
DATABASE_URL=sqlite:///db.sqlite3

Add geminiai api key inside .env

Customization

  • Model Selection: Switch between OpenAI and Gemini models
  • Chunk Size: Adjust document chunking for optimal retrieval
  • Temperature: Control response creativity vs accuracy
  • Max Tokens: Set response length limits

πŸš€ Future Enhancements

  • Support for additional document formats (DOCX, TXT, HTML)
  • Multi-language document processing
  • Advanced visualization of extracted data
  • API endpoints for programmatic access
  • Batch processing capabilities
  • Integration with cloud storage services

🀝 Contributing

This project was developed as part of the Stochastic Inc. Competency Assessment. While primarily for evaluation purposes, suggestions and improvements are welcome.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgements

  • Stochastic Inc. for providing this challenging and engaging assessment opportunity
  • OpenAI and Google for their powerful language model APIs
  • Hugging Face for their excellent model hosting platform
  • Render for reliable web application deployment
  • The open-source community for the amazing libraries and tools used in this project

πŸ“ž Contact

Vivan Rajath


Built with ❀️ for Stochastic Inc. Competency Assessment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published