A production-ready RAG (Retrieval-Augmented Generation) application for analyzing financial documents using React, FastAPI, LangChain, ChromaDB, and Google's Gemini AI.
- Document Ingestion: Upload and process financial PDFs (10-Ks, earnings calls, etc.)
- RAG Q&A: Ask questions about documents with source citations
- KPI Extraction: Automatically extract key financial metrics
- Sentiment Analysis: Analyze document sentiment (positive/neutral/negative)
- Multi-Document Comparison: Compare KPIs and trends across documents
- PDF Report Generation: Export professional analytical reports
- Framework: FastAPI (Python 3.11+)
- AI/RAG: LangChain with Google Gemini (gemini-1.5-pro)
- Vector Store: ChromaDB
- PDF Processing: pypdf, pdfplumber
- Report Generation: ReportLab
- Database: SQLite (for document registry)
- Framework: React 18 with Vite
- State Management: TanStack Query (React Query)
- Routing: React Router v6
- Styling: Tailwind CSS
- Charts: Recharts
- Icons: Lucide React
The backend follows clean/hexagonal architecture:
backend/
├── src/
│ ├── domain/ # Core entities (Document, KPI, Query, etc.)
│ ├── application/ # Use cases/services
│ ├── infrastructure/ # External integrations
│ │ ├── rag/ # LangChain chains (QA, KPI, sentiment, comparison)
│ │ ├── pdf/ # PDF processing and report generation
│ │ └── database/ # SQLAlchemy models
│ └── interfaces/ # FastAPI routes and schemas
- Python 3.11 or higher
- Node.js 18 or higher
- Google API Key (for Gemini)
git clone <repository-url>
cd finsight-aicd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
cp .env.example .envEdit .env and add your Google API key:
GOOGLE_API_KEY=your_google_api_key_hereGet your API key from: https://makersuite.google.com/app/apikey
cd ../frontend
# Install dependencies
npm installTerminal 1 - Backend:
cd backend
source venv/bin/activate # On Windows: venv\Scripts\activate
python src/main.pyBackend will run on: http://localhost:8000 API docs available at: http://localhost:8000/docs
Terminal 2 - Frontend:
cd frontend
npm run devFrontend will run on: http://localhost:3000
- Go to the Documents page
- Drag & drop or select a financial PDF
- Wait for processing (text extraction, chunking, embedding)
- Document appears in the list
- Go to the Query page
- Optionally filter by specific documents
- Ask questions like:
- "What was the revenue in Q3?"
- "What are the main risk factors?"
- "How did operating margins change?"
- View answers with source citations
- Go to the Analytics page
- Select a document
- View extracted KPIs (Revenue, Net Income, EPS, etc.)
- See sentiment analysis results
- Download PDF report
- Go to the Compare page
- Select 2+ documents to compare
- View KPI comparison table
- Read AI-generated comparison summary
- Download comparison report
POST /documents/upload- Upload PDFGET /documents- List all documentsGET /documents/{id}- Get document infoDELETE /documents/{id}- Delete document
POST /query- RAG Q&AGET /documents/{id}/kpis- Extract KPIsGET /documents/{id}/sentiment- Analyze sentimentGET /documents/{id}/report- Generate PDF report
POST /compare- Compare documentsPOST /compare/report- Generate comparison report
GET /health- Health check
# Gemini API
GOOGLE_API_KEY=your_key_here
# LangChain
EMBEDDING_MODEL=models/embedding-001
LLM_MODEL=gemini-1.5-pro
LLM_TEMPERATURE=0.1
# ChromaDB
CHROMA_PERSIST_DIRECTORY=./data/chroma
CHROMA_COLLECTION_NAME=financial_documents
# Storage
UPLOAD_DIRECTORY=./data/uploads
# Database
DATABASE_URL=sqlite:///./data/finsight.db
# Chunking
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# API
API_HOST=0.0.0.0
API_PORT=8000finsight-ai/
├── backend/
│ ├── src/
│ │ ├── domain/
│ │ │ └── entities.py # Core domain models
│ │ ├── application/
│ │ │ ├── document_service.py # Document ingestion
│ │ │ ├── query_service.py # RAG queries
│ │ │ ├── analytics_service.py # KPI & sentiment
│ │ │ └── comparison_service.py # Multi-doc comparison
│ │ ├── infrastructure/
│ │ │ ├── config.py # Settings
│ │ │ ├── rag/
│ │ │ │ ├── embedding.py # Gemini embeddings
│ │ │ │ ├── vector_store.py # ChromaDB
│ │ │ │ ├── qa_chain.py # RAG QA chain
│ │ │ │ ├── kpi_chain.py # KPI extraction
│ │ │ │ ├── sentiment_chain.py
│ │ │ │ └── comparison_chain.py
│ │ │ ├── pdf/
│ │ │ │ ├── processor.py # PDF parsing
│ │ │ │ └── report_generator.py
│ │ │ └── database/
│ │ │ └── models.py # SQLAlchemy
│ │ ├── interfaces/
│ │ │ ├── schemas.py # Pydantic models
│ │ │ ├── document_routes.py # Document API
│ │ │ ├── analysis_routes.py # Analysis API
│ │ │ └── dependencies.py # DI container
│ │ └── main.py # FastAPI app
│ ├── requirements.txt
│ └── .env.example
└── frontend/
├── src/
│ ├── components/
│ │ ├── Layout.jsx
│ │ ├── DocumentUpload.jsx
│ │ └── DocumentList.jsx
│ ├── pages/
│ │ ├── DocumentsPage.jsx
│ │ ├── QueryPage.jsx
│ │ ├── AnalyticsPage.jsx
│ │ └── ComparePage.jsx
│ ├── api/
│ │ └── client.js # Axios API client
│ ├── App.jsx
│ ├── main.jsx
│ └── index.css
├── package.json
├── vite.config.js
└── tailwind.config.js
# Backend tests
cd backend
pytest
# Frontend (if tests are added)
cd frontend
npm test# Dockerfile example for backend
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src/
CMD ["python", "src/main.py"]-
Security:
- Use environment variables for secrets
- Implement authentication/authorization
- Add rate limiting
- Enable HTTPS
-
Performance:
- Cache embeddings
- Use connection pooling
- Implement request queuing for heavy operations
-
Monitoring:
- Add logging (structlog, loguru)
- Implement health checks
- Track API metrics
ChromaDB errors:
- Delete
./data/chromaand restart - Check disk space
Gemini API errors:
- Verify API key is correct
- Check API quotas/limits
- Ensure billing is enabled
PDF extraction failures:
- Try different PDF files
- Check file is not encrypted
- Verify file is not corrupted
API connection errors:
- Ensure backend is running
- Check proxy configuration in vite.config.js
- Verify CORS settings
npm install errors:
- Delete node_modules and package-lock.json
- Run
npm installagain
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License
For issues and questions:
- Open a GitHub issue
- Check existing documentation
- Review API docs at
/docs
- LangChain: RAG framework
- Google Gemini: LLM and embeddings
- ChromaDB: Vector database
- FastAPI: Backend framework
- React: Frontend framework