A comprehensive Retrieval-Augmented Generation (RAG) study assistant with modern web interface and user authentication. Upload PDFs, create personal accounts, and get intelligent responses from your course materials with an intuitive dark-themed chat interface.
- π€ User Registration & Login: Secure account creation with JWT authentication
- π Password Security: Bcrypt hashing for secure password storage
- π§ Email-based Login: User identification via email addresses
- οΏ½ User Profiles: Personal user accounts with profile management
- π PDF Processing: Extract and process content from lecture notes, textbooks, and study materials
- π Intelligent Search: Vector-based similarity search through your documents
- π¬ Conversational Memory: Maintains context across multiple questions
- π Source Citations: Shows which documents were used to generate answers
- β‘ Fast API: Optimized for quick responses
- π Incremental Updates: Add new PDFs without rebuilding the entire knowledge base
- π Dark Theme Interface: Professional dark-themed chat interface
- π± Responsive Design: Mobile-friendly layout with Tailwind CSS
- π― Intuitive Chat UI: Real-time messaging interface with chat history
- π§ Interactive Sidebar: User options and session management
- β¨ Smooth UX: Loading states, error handling, and user feedback
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β React Frontendβ β FastAPI β β MongoDB β
β (TypeScript) βββββΊβ Backend βββββΊβ User Data β
β Auth + Chat β β Auth + RAG β β (Atlas/Local) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
β βββββββββββββββββββ
β β JWT Auth β
β β Middleware β
β βββββββββββββββββββ
β β
β βββββββββββββββββββ βββββββββββββββββββ
β β RAG Service βββββΊβ ChromaDB β
ββββββββββββββββΊβ (Business β β Vector Store β
β Logic) β β (Embeddings) β
βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ β
βββββββββββββββββββ β
β Gemini LLM β β
β (Google AI) ββββββββββββββββ
βββββββββββββββββββ
Frontend:
- React 18 with TypeScript
- Tailwind CSS for styling
- Context API for state management
- Vite for build tooling
Backend:
- FastAPI with Python 3.12+
- MongoDB with Motor (async driver)
- JWT authentication
- Pydantic for data validation
AI & Data:
- LangChain for RAG pipeline
- ChromaDB for vector storage
- Google Gemini for LLM
- SentenceTransformers for embeddings
- Python 3.12+
- Node.js 18+ and npm
- MongoDB (Atlas or local instance)
- Google AI API Key (Gemini)
# Clone the repository
git clone <repository-url>
cd Study-Assistant/backkend
# Create virtual environment
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env and add your API keys# Navigate to frontend directory
cd ../frontend
# Install dependencies
npm install
# Set up environment variables
cp .env.example .env
# Edit .env and add your backend URLBackend (.env):
GEMINI_API_KEY=your_actual_gemini_api_key_here
MONGODB_URL=mongodb://localhost:27017/study_assistant
JWT_SECRET_KEY=your_super_secure_jwt_secret_key_hereFrontend (.env):
VITE_API_URL=http://localhost:8000Google AI API Key:
- Go to Google AI Studio
- Create a new API key
- Add it to your backend
.envfile
MongoDB Setup:
- Local: Install MongoDB locally or use Docker
- Cloud: Set up MongoDB Atlas and get connection string
cd backkend
source env/bin/activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000Backend server will start at: http://localhost:8000
# In a new terminal
cd frontend
npm run devFrontend will start at: http://localhost:5173
- Open
http://localhost:5173in your browser - Create a new account or login with existing credentials
- Start chatting with your AI study assistant!
First time only - Upload your PDFs to create the knowledge base:
curl -X POST "http://localhost:8000/api/rag/init-db" \
-H "Content-Type: application/json" \
-d '{
"pdf_paths": [
"assets/Sample.pdf",
"assets/GreedyAlgorithms.pdf"
]
}'# Register new user
curl -X POST "http://localhost:8000/api/auth/signup" \
-H "Content-Type: application/json" \
-d '{
"email": "test@example.com",
"password": "securepassword123",
"full_name": "Test User"
}'
# Login
curl -X POST "http://localhost:8000/api/auth/login" \
-H "Content-Type: application/json" \
-d '{
"email": "test@example.com",
"password": "securepassword123"
}'Register a new user account.
Request:
{
"email": "user@example.com",
"password": "securepassword123",
"full_name": "John Doe"
}Response:
{
"message": "User created successfully",
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer",
"user": {
"id": "user_id_here",
"email": "user@example.com",
"full_name": "John Doe"
}
}Login with existing credentials.
Request:
{
"email": "user@example.com",
"password": "securepassword123"
}Response:
{
"message": "Login successful",
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer",
"user": {
"id": "user_id_here",
"email": "user@example.com",
"full_name": "John Doe"
}
}Get current user profile (requires authentication).
Headers:
Authorization: Bearer your_jwt_token_here
Response:
{
"id": "user_id_here",
"email": "user@example.com",
"full_name": "John Doe",
"is_active": true,
"created_at": "2025-01-15T10:30:00",
"updated_at": "2025-01-15T10:30:00"
}Initialize the knowledge base with PDFs (first time only).
Request:
{
"pdf_paths": ["path/to/file1.pdf", "path/to/file2.pdf"]
}Response:
{
"status": "success",
"message": "ChromaDB initialized successfully with PDFs",
"documents_processed": 2,
"chunks_created": 157
}Add more PDFs to existing knowledge base.
Request:
{
"pdf_paths": ["path/to/new_file.pdf"]
}Response:
{
"status": "success",
"message": "PDFs added to existing ChromaDB successfully",
"new_documents_added": 45,
"total_documents": 202
}Ask questions using the RAG system.
Request:
{
"question": "Explain the time complexity of binary search"
}Response:
{
"answer": "Binary search has O(log n) time complexity because...",
"sources": [
{
"source": "Algorithms.pdf",
"chunk_id": "23",
"content_preview": "Binary search is a divide and conquer algorithm..."
}
],
"conversation_length": 3
}Study-Assistant/
βββ README.md # Project documentation
βββ
βββ backkend/ # FastAPI Backend
β βββ main.py # FastAPI application setup
β βββ auth.py # JWT authentication utilities
β βββ database.py # MongoDB connection and operations
β βββ user_models.py # Pydantic models for auth
β βββ user_routes.py # Authentication API endpoints
β βββ routes.py # RAG API route handlers
β βββ service.py # RAG business logic
β βββ models.py # Pydantic request/response models
β βββ requirements.txt # Python dependencies
β βββ .env # Environment variables (API keys)
β βββ assets/ # PDF files for knowledge base
β β βββ Sample.pdf
β β βββ GreedyAlgorithms.pdf
β β βββ ...
β βββ chroma_db/ # Persistent vector database
β βββ models_cache/ # Cached embedding models
β
βββ frontend/ # React Frontend
β βββ public/ # Static assets
β βββ src/
β β βββ components/ # React components
β β β βββ Auth.tsx # Authentication wrapper
β β β βββ Login.tsx # Login form component
β β β βββ Signup.tsx # Registration form component
β β β βββ Chat.tsx # Main chat interface
β β β βββ Sidebar.tsx # Navigation sidebar
β β βββ contexts/ # React contexts
β β β βββ AuthContext.tsx # Authentication state management
β β βββ hooks/ # Custom React hooks
β β β βββ useAuth.ts # Authentication hook
β β βββ App.tsx # Main application component
β β βββ main.tsx # Application entry point
β β βββ index.css # Global styles
β βββ package.json # Node.js dependencies
β βββ tailwind.config.js # Tailwind CSS configuration
β βββ tsconfig.json # TypeScript configuration
β βββ vite.config.ts # Vite build configuration
β βββ .env # Frontend environment variables
βββ .gitignore # Git ignore rules
Backend (.env):
GEMINI_API_KEY: Your Google AI API key (required)MONGODB_URL: MongoDB connection string (required)JWT_SECRET_KEY: Secret key for JWT token signing (required)
Frontend (.env):
VITE_API_URL: Backend API URL (default: http://localhost:8000)
Current optimizations for fast responses:
- Chunk size: 800 tokens (vs 1000)
- Retrieval: Top 3 documents (vs 5)
- Memory: 3 conversation turns (vs 5)
- Output: 512 tokens max (vs 1024)
- JWT token-based authentication
- Password hashing with bcrypt
- Protected API routes
- CORS configuration for frontend communication
- Start backend server
- Start frontend development server
- Open the web application
- Create a user account or login
- Admin: Call
/init-dbwith your PDFs - Start asking questions through the chat interface!
- Start both servers (auto-loads existing database)
- Login to your account
- Chat with your study assistant immediately
- Admin: Add more PDFs anytime with
/add-pdf
- Registration/Login: Dark-themed authentication forms
- Chat Interface: Real-time messaging with AI responses
- Sidebar: User options and session management
- Responsive: Works on desktop and mobile devices
- Registration: Users create accounts with email and password
- Authentication: JWT tokens provide secure session management
- Profile Management: User data stored securely in MongoDB
- Protected Routes: API endpoints secured with JWT middleware
- PDF Processing: Extracts text from PDFs using PyMuPDF
- Chunking: Splits documents into overlapping 800-token chunks
- Embedding: Converts chunks to vectors using SentenceTransformers
- Storage: Stores embeddings in ChromaDB for fast retrieval
- Retrieval: Finds relevant chunks using similarity search
- Generation: Uses Gemini LLM to generate answers from context
- Memory: Maintains conversation history for follow-up questions
- React Context: Manages authentication state globally
- Custom Hooks: Encapsulates auth logic and API calls
- Component Structure: Modular design with reusable components
- Responsive UI: Tailwind CSS for consistent styling
- Real-time Chat: Interactive messaging interface
- FastAPI: Modern web framework for the API
- MongoDB Motor: Async MongoDB driver
- PyJWT: JWT token handling
- Bcrypt: Password hashing
- Pydantic: Data validation and serialization
- LangChain: RAG pipeline orchestration
- ChromaDB: Vector database for embeddings
- PyMuPDF: PDF text extraction
- SentenceTransformers: Text embeddings
- Google Generative AI: LLM for answer generation
- React 18: Modern React with hooks
- TypeScript: Type-safe JavaScript
- Tailwind CSS: Utility-first CSS framework
- Lucide React: Modern icon library
- Vite: Fast build tool and dev server
# Backend
cd backkend
pip install -r requirements.txt
# Frontend
cd frontend
npm install- Backend Startup: ~5-10 seconds for complete initialization
- Frontend Load: ~1-2 seconds for React application
- Authentication: <500ms for login/signup operations
- RAG Initialization: ~30 seconds for 4 PDFs (first time only)
- Query Response: ~2-5 seconds per question
- Memory Usage: ~300MB for backend with knowledge base
- Storage: ~50MB ChromaDB + MongoDB user data
- Users: Supports multiple concurrent users
- Documents: Handles large document collections efficiently
- Responses: Optimized for quick AI responses
- Database: MongoDB scales horizontally for user growth
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source and available under the MIT License.
"GEMINI_API_KEY not found"
- Make sure your
.envfile exists in thebackkenddirectory - Verify the API key is valid and properly formatted
"Failed to connect to MongoDB"
- Check your
MONGODB_URLin the.envfile - Ensure MongoDB service is running (local) or accessible (Atlas)
- Verify network connectivity and credentials
"JWT token invalid"
- Check that
JWT_SECRET_KEYis set in backend.env - Frontend tokens expire after a certain time - re-login if needed
- Verify the Authorization header format:
Bearer <token>
"No existing ChromaDB found"
- Run
/init-dbendpoint first to create the knowledge base - Ensure PDF files exist in the specified paths
"Cannot connect to backend"
- Verify backend server is running on port 8000
- Check
VITE_API_URLin frontend.envfile - Ensure CORS is properly configured in FastAPI
"Login/Signup not working"
- Check browser network tab for API errors
- Verify form validation is passing
- Check backend logs for authentication errors
"UI styling issues"
- Ensure Tailwind CSS is properly installed
- Check for CSS conflicts or missing imports
- Verify responsive breakpoints are working
Slow responses
- First query is slower due to model loading
- Subsequent queries should be faster
- Check network latency to Google AI API
Memory issues
- Monitor ChromaDB size with large document collections
- Consider pagination for large chat histories
- Check MongoDB memory usage with many users
- Check terminal output for detailed error messages
- Verify all environment variables are properly set
- Test API endpoints individually using curl or Postman
- Check browser developer tools for frontend errors
- Ensure all dependencies are installed correctly
- Use hot reload for both frontend and backend development
- Check API documentation at
http://localhost:8000/docs - Use MongoDB Compass for database inspection
- Enable debug logging for detailed error tracking
Main chat interface showing the dark-themed design with real-time AI conversation, sidebar navigation, and responsive layout
- π Dark Theme: Professional gray color scheme with blue accents for optimal study sessions
- π¬ Real-time Chat: Interactive messaging with the AI assistant showing questions and detailed responses
- π Sidebar Navigation: Clean sidebar with user options and session management
- π± Responsive Design: Mobile-friendly layout that works across all devices
- β‘ Fast Responses: Quick AI-powered answers with source citations from your study materials
- Login Page: Clean dark-themed login form with email and password fields
- Signup Page: User registration with full name, email, and password
- Responsive Design: Mobile-friendly authentication flows
- PDF Management: API endpoints for document administration
- User Management: MongoDB-based user profiles and authentication
- System Monitoring: Comprehensive logging and error tracking
Happy studying with your AI-powered learning assistant! πβ¨π€
