AI-powered financial document analysis with intelligent section-based summarization using OpenAI's GPT models.
- Project Overview
- Architecture
- Features
- Project Structure
- Get Started
- Usage Guide
- Environment Variables
- Technology Stack
- Troubleshooting
- License
FinSights is an intelligent financial document analysis platform that processes text and financial documents (PDF, DOCX) to generate comprehensive summaries organized into six key financial sections:
- Financial Performance - Narrative overview with key financial numbers
- Key Metrics - Essential KPIs and financial indicators
- Risks - Identified risks and challenges
- Opportunities - Growth and business opportunities
- Outlook / Guidance - Forward-looking statements and guidance
- Other Important Highlights - Notable items, dividends, balance sheet items, and auditor notes
Users can paste text directly or upload documents, and the system intelligently extracts and summarizes content using OpenAI's GPT-4o-mini model. The backend caches extracted documents, allowing users to explore different sections without re-uploading.
The application follows a modular microservices architecture with specialized components for document processing and AI summarization:
graph LR
%% ====== FRONTEND ======
subgraph FE[Frontend]
A[React Web UI<br/>Port 5173]
end
%% ====== BACKEND ======
subgraph BE[Backend - FastAPI<br/>Port 8000]
B[API Router]
E[PDF Service]
D[LLM Service]
K[PDF Generator]
G[In-Memory Cache<br/>TTL 1 hour]
H[Session Summary History]
end
%% ====== EXTERNAL ======
subgraph EXT[External]
F[OpenAI API<br/>gpt-4o-mini]
end
%% ====== CONNECTIONS (ARCHITECTURE) ======
A -->|HTTP| B
B --> E
B --> D
B --> K
E -->|Extracted Text| G
D -->|Read Cached Text| G
D -->|Write Summary| H
K -->|Read History| H
D -->|API Call| F
F -->|Response| D
B -->|JSON| A
K -->|PDF File| A
%% ====== STYLES ======
style A fill:#e1f5ff
style B fill:#fff4e1
style D fill:#ffe1f5
style E fill:#ffe1f5
style K fill:#ffe1f5
style F fill:#fff3cd
style G fill:#e8f5e9
style H fill:#e8f5e9
Backend
- Multiple input format support (text, PDF, DOCX)
- PDF text extraction with OCR support for image-based PDFs using pytesseract
- DOCX document processing with python-docx
- AI-powered summarization using OpenAI's GPT-4o-mini model
- Intelligent section-based summarization with context-aware analysis
- Smart document caching system (1-hour TTL, up to 25 documents) to avoid reprocessing
- File validation and size limits (PDF/DOCX: 50 MB)
- Page limit protection (max 100 pages per PDF) to prevent timeouts
- Streaming response support for optimal performance
- CORS enabled for web integration
- Comprehensive error handling and logging
- Health check endpoints
- Modular architecture (routes + services + LLM service + PDF service)
Frontend
- Clean, intuitive interface with tab-based input selection (Text / File)
- Drag-and-drop file upload capability
- Real-time summary display with clickable financial section chips
- Chat-like history view of all summaries
- PDF export functionality for generated summaries
- Mobile-responsive design with Tailwind CSS
- Built with Vite for fast development and hot module replacement
FinSights/
├── backend/
│ ├── api/
│ │ └── routes.py # API endpoints
│ ├── services/
│ │ ├── llm_service.py # OpenAI integration
│ │ └── pdf_service.py # Document processing
│ ├── server.py # FastAPI app
│ ├── config.py # Configuration
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile # Backend container
├── frontend/
│ ├── src/
│ │ ├── pages/ # React pages
│ │ ├── components/ # React components
│ │ ├── services/ # API client
│ │ └── App.jsx # Main app
│ ├── package.json # npm dependencies
│ └── Dockerfile # Frontend container
├── docker-compose.yml # Service orchestration
└── README.md # This file
Before you begin, ensure you have the following installed and configured:
- Docker and Docker Compose (v20.10+)
- OpenAI API Key (for GPT-4o-mini access)
# Check Docker
docker --version
docker compose version
# Verify Docker is running
docker ps# If cloning:
git clone <your-repo-url>
cd FinSightsCreate backend/.env with your OpenAI credentials:
cat > backend/.env << EOF
# OpenAI Configuration (REQUIRED)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
# LLM Configuration
LLM_TEMPERATURE=0.2
LLM_MAX_TOKENS=900
# Caching Configuration
CACHE_MAX_DOCS=25
CACHE_TTL_SECONDS=3600
# Service Configuration
SERVICE_PORT=8000
LOG_LEVEL=INFO
# CORS Settings
CORS_ORIGINS=*
EOFReplace your_openai_api_key_here with your actual OpenAI API key.
Option A: Standard Deployment
# Build and start all services
docker compose up --build
# Or run in detached mode (background)
docker compose up -d --buildOption B: View Logs While Running
# All services
docker compose up --build
# In another terminal, view specific logs
docker compose logs -f backend
docker compose logs -f frontendOnce containers are running, access:
- Frontend UI: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- API Redoc: http://localhost:8000/redoc
# Check health status
curl http://localhost:8000/health
# View running containers
docker compose psdocker compose down-
Open the Application
- Navigate to
http://localhost:5173
- Navigate to
-
Choose Input Method
- Paste Text Tab: Copy/paste financial document text directly
- Upload File Tab: Upload PDF or DOCX files (max 50MB)
-
Generate Summary
- Click "Summarize" button
- Wait for AI processing
- View comprehensive financial summary
-
Explore Financial Sections
- Click any section chip to view detailed analysis:
- Financial Performance
- Key Metrics
- Risks
- Opportunities
- Outlook / Guidance
- Other Important Highlights
- Switching sections is instant (cached document)
- Click any section chip to view detailed analysis:
-
Export Results
- Click "Export as PDF" button
- Save formatted summary to your computer
-
View History
- All previous summaries in chat-like history
- Scroll through past analyses
- Re-explore or export any summary
- Large PDFs: For PDFs > 100 pages, only first 100 pages are processed
- Best Results: Clearly formatted financial documents with structured text
- Caching: First analysis processes document, subsequent sections are instant
- Temperature Setting: Default 0.2 ensures consistent, focused summaries
Configure the application behavior using environment variables in backend/.env:
| Variable | Description | Default | Type |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API key for GPT access (REQUIRED) | - | string |
OPENAI_MODEL |
GPT model version to use | gpt-4o-mini |
string |
LLM_TEMPERATURE |
Model creativity level (0.0-2.0, lower = deterministic) | 0.2 |
float |
LLM_MAX_TOKENS |
Maximum tokens per summary response | 900 |
integer |
CACHE_MAX_DOCS |
Maximum documents in memory cache | 25 |
integer |
CACHE_TTL_SECONDS |
Cache time-to-live in seconds | 3600 |
integer |
SERVICE_PORT |
Backend API port | 8000 |
integer |
LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
string |
CORS_ORIGINS |
Allowed CORS origins (comma-separated or *) |
* |
string |
MAX_PDF_PAGES |
Maximum PDF pages to process | 100 |
integer |
MAX_PDF_SIZE |
Maximum PDF file size in bytes | 52428800 |
integer |
Production Setup
OPENAI_API_KEY=sk-your-production-key
OPENAI_MODEL=gpt-4o-mini
LLM_TEMPERATURE=0.1
LOG_LEVEL=WARNING
CACHE_MAX_DOCS=50
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.comDevelopment Setup
OPENAI_API_KEY=sk-your-dev-key
OPENAI_MODEL=gpt-4o-mini
LLM_TEMPERATURE=0.5
LOG_LEVEL=DEBUG
CACHE_MAX_DOCS=10- Framework: FastAPI (Python web framework)
- AI/LLM: OpenAI GPT-4o-mini API
- Document Processing:
- pypdf (PDF text extraction)
- python-docx (DOCX processing)
- pdf2image + pytesseract (OCR for image-based PDFs)
- Async: Uvicorn ASGI server
- Config: Python-dotenv for environment management
- Framework: React 18 with React Router
- Build Tool: Vite (fast bundler)
- Styling: Tailwind CSS + PostCSS
- UI Components: Lucide React icons
- Export: jsPDF for PDF generation
- Notifications: react-hot-toast
- Containerization: Docker + Docker Compose
- Architecture: Microservices with isolated containers
- Networking: Docker bridge network
Encountering issues? Check the following:
Issue: API not responding
# Check service health
curl http://localhost:8000/health
# View backend logs
docker compose logs backendIssue: OpenAI API errors
- Verify
OPENAI_API_KEYis correct and has credits - Check API key permissions in OpenAI dashboard
- Ensure model
gpt-4o-miniis available in your account
Issue: PDF upload fails
- Max file size: 50MB
- Max pages: 100 pages
- Supported formats: PDF, DOCX
- Ensure file is not corrupted
Issue: Frontend can't connect to API
- Verify backend is running:
docker compose ps - Check CORS settings in
.env - Ensure both services are on same network
Enable debug logging:
# Update .env
LOG_LEVEL=DEBUG
# Restart services
docker compose restart backend
docker compose logs -f backendThis project is licensed under the MIT License - see LICENSE file for details.
For third-party licenses, see LICENSE-3rd-party.txt
FinSights is provided as-is for analysis and informational purposes. While we strive for accuracy:
- Always verify AI-generated summaries against original documents
- Do not rely solely on AI summaries for investment decisions
- Consult financial advisors for investment guidance
- Test thoroughly before using in production environments
Have suggestions or encountered an issue?