This repository contains my submission for the Stochastic Inc. Competency Assessment (Junior AI Engineer). I built a RAG (Retrieval-Augmented Generation) based AI Document Q&A Agent inside a Django web application, integrated with LLM APIs for enterprise-ready performance.
The RAGBot is an intelligent document analysis system that combines the power of Retrieval-Augmented Generation with a user-friendly web interface. It enables users to upload PDF documents and interact with them through natural language queries, making document analysis and information extraction seamless and efficient.
- π Multi-PDF Document Processing - Upload and manage multiple PDF documents simultaneously
- π Contextual Q&A - Ask specific questions about document content with accurate, context-aware responses
- π Intelligent Summarization - Generate concise summaries of sections, methodologies, or entire documents
- π Data Extraction - Extract structured information like accuracy scores, F1-scores, and other metrics
- π Secure Authentication - User management and authentication via Django's built-in system
- βοΈ Hybrid Cloud Deployment - Scalable architecture split across multiple platforms
- Document Ingestion: Support for multiple PDF formats with robust parsing
- RAG Pipeline: Advanced retrieval system for contextually relevant responses
- Natural Language Interface: Intuitive chat-like interface for document queries
- Content Analysis: Deep understanding of document structure and content
- Result Extraction: Automated extraction of key metrics and findings
π Direct Q&A:
"What is the conclusion of Paper X?"
"Who are the authors of this research?"
π Summarization:
"Summarize the methodology of Paper C."
"Give me an overview of the experimental setup."
π Data Extraction:
"What accuracy scores are reported in this paper?"
"Extract all F1-scores from the results section."
- Python 3.8 or higher
- pip package manager
- Git
git clone https://github.com/VivanRajath/Stochastic-assignment.git
cd Stochastic-assignment
python -m venv env
Windows (PowerShell):
.\env\Scripts\Activate
Linux/MacOS:
source env/bin/activate
pip install -r requirements.txt
Create a .env
file in the project root and add your API keys:
OPENAI_API_KEY=your_openai_key_here
GEMINI_API_KEY=your_gemini_key_here
DJANGO_SECRET_KEY=your_django_secret_key
cd docbot
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
python manage.py runserver
The application will be available at: http://127.0.0.1:8000/
Due to computational and storage limitations, the deployment is strategically split across platforms:
- π€ Hugging Face Spaces: Hosts the RAG pipeline and ML inference
- π Render: Hosts the Django web application with user authentication
This hybrid architecture ensures optimal performance while working within platform constraints.
Visit Live Demo β Its a free tier Render , it might take some time to load .
- Django 4.0+: Web framework and user authentication
- Python 3.8+: Core programming language
- SQLite/PostgreSQL: Database management
- OpenAI GPT: Large Language Model for text generation
- Gemini API: Alternative LLM for enhanced performance
- Hugging Face Transformers: Model inference and deployment
- LangChain: RAG pipeline orchestration
- PyPDF2: PDF parsing and text extraction
- Sentence Transformers: Text embeddings for retrieval
- FAISS: Vector database for similarity search
- Render: Web application hosting
- Hugging Face Spaces: ML model hosting
- Docker: Containerization (optional)
Stochastic-assignment/
βββ docbot/ # Django project root
β βββ manage.py
β βββ docbot/ # Main Django app
β β βββ settings.py
β β βββ urls.py
β β βββ wsgi.py
β βββ ragbot/ # RAG functionality
β β βββ models.py
β β βββ views.py
β β βββ rag_pipeline.py
β β βββ utils.py
β βββ templates/ # HTML templates
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ README.md # This file
A comprehensive video demonstration (.mp4) has been submitted as part of the assessment, showcasing:
- Live document upload and processing
- Real-time Q&A interactions
- Summarization capabilities
- Data extraction features
- User authentication flow
Requirement | Status | Implementation |
---|---|---|
AI Document Q&A Agent | βοΈ | RAG-based system with multi-PDF support |
Structured Information Extraction | βοΈ | Automated parsing of sections, results, metrics |
Enterprise Features | βοΈ | Context handling, optimization, error management |
User Authentication | βοΈ | Django's built-in authentication system |
Live Deployment | βοΈ | Hybrid deployment on Render + Hugging Face |
# Required
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
DJANGO_SECRET_KEY=...
# Optional
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1
DATABASE_URL=sqlite:///db.sqlite3
Add geminiai api key inside .env
- Model Selection: Switch between OpenAI and Gemini models
- Chunk Size: Adjust document chunking for optimal retrieval
- Temperature: Control response creativity vs accuracy
- Max Tokens: Set response length limits
- Support for additional document formats (DOCX, TXT, HTML)
- Multi-language document processing
- Advanced visualization of extracted data
- API endpoints for programmatic access
- Batch processing capabilities
- Integration with cloud storage services
This project was developed as part of the Stochastic Inc. Competency Assessment. While primarily for evaluation purposes, suggestions and improvements are welcome.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Stochastic Inc. for providing this challenging and engaging assessment opportunity
- OpenAI and Google for their powerful language model APIs
- Hugging Face for their excellent model hosting platform
- Render for reliable web application deployment
- The open-source community for the amazing libraries and tools used in this project
Vivan Rajath
- GitHub: @VivanRajath
- Email: [vivanrajath999@gmail.com]
- LinkedIn: [https://www.linkedin.com/in/vivan-rajath-178a6a348/]
Built with β€οΈ for Stochastic Inc. Competency Assessment