An AI-powered tool that reads financial documents and produces clear, structured first-draft summaries for credit analysts.
Reads documents โ Produces a clear first draft
A tool that reads long financial documents and writes a clear first-draft summary for humans.
AI reads PDFs, pulls key numbers and risks, and produces a structured, editable draft.
ChatGPT that reads PDFs and writes drafts.
- Python 3.9 or higher
- OpenAI API key
-
Clone the repository
git clone <repository-url> cd Memo_Test
-
Install dependencies
cd backend pip install -r requirements.txt -
โ ๏ธ IMPORTANT: Set up environment variablesCreate a
.envfile inside thebackendfolder (not in the root directory):# Navigate to backend folder cd backend # Create .env file # Windows (PowerShell) New-Item .env # macOS/Linux touch .env
Add your OpenAI API key to the
.envfile:OPENAI_API_KEY=your_openai_api_key_here
Note: The
.envfile must be inMemo_Test/backend/.envfor the application to work correctly. -
Run the application
# Make sure you're in the backend folder cd backend streamlit run frontend.py
-
Access the app
Open your browser and navigate to the port where it is running ( will be shown in the shell)
Sample test files are available in the backend/uploads/ folder:
| File | Description |
|---|---|
final_testcase.pdf |
Main test document - standard financial PDF |
test3.pdf |
Password-protected document (for testing password feature) |
Note: When testing
test3.pdf, check the "๐ PDF is password protected" checkbox and enter the password.
Credit analysts and financial professionals spend significant time manually reviewing financial documents (annual reports, balance sheets, income statements) to:
- Extract key metrics and trends
- Identify potential risks
- Generate executive summaries for stakeholders
This process is:
- Time-consuming: Manual review of lengthy documents
- Error-prone: Human oversight can miss critical details
- Inconsistent: Different analysts may interpret data differently
- Expensive: Requires skilled professionals for thorough analysis
This tool automates the financial document analysis workflow:
| Input | Process | Output |
|---|---|---|
| Upload a PDF | AI extracts & analyzes | Structured, editable draft |
Example builds:
- โ Upload a PDF and generate a one-page executive summary
- โ Highlight key numbers and show the source page
- โ Export the draft to Markdown or Word for editing
Upload a PDF with multiple financial statements โ tool outputs:
| Component | Description |
|---|---|
| 5-bullet executive summary | Key highlights from the document |
| Key metrics table | Financial metrics with trends (๐ขโ / ๐ดโ) |
| "Top 3 risks" section | Identified risks with severity ratings |
| Source tracing | Highlight a sentence โ see which PDF page it came from |
| Confidence tags | โ
Strong data | |
Upload PDF โ Click "Generate Memo" โ Review โ Edit โ Download
- Upload PDF - Drag & drop or select financial document
- Generate Memo - AI extracts data and generates summary
- Review - Check executive summary, risks, and metrics
- Validate - Secondary AI validates accuracy with confidence scores
- Edit - Refine using chat Q&A or regenerate with feedback
- Download - Export to Markdown or Text for further editing
- Table Extraction: Automatically detects and extracts financial tables with headers and data
- Text Extraction: Extracts paragraphs with section headings and page references
- Password Support: Handles password-protected PDF documents
- Automatically identifies key financial metrics from extracted tables
- Shows trends (๐ข Increase / ๐ด Decrease) for each metric
- Provides explanations with source references
- Generates configurable number of summary bullet points (3-10)
- Categories: Financial Performance, Revenue, Profitability, Cash Flow, etc.
- Confidence indicators (High/Medium/Low) for each point
- Page references for traceability
- Identifies and categorizes financial risks
- Severity ratings (High/Medium/Low) with strict criteria
- Evidence-based risk descriptions
- Categories: Liquidity, Credit, Operational, Compliance, etc.
- Optional data anonymization before sending to LLM
- Replaces company names, people, and products with tokens
- Automatically restores original names in final output
- Configurable entity mappings
- Conversational interface for querying document data
- Context-aware responses based on extracted content
- Suggested questions for quick exploration
- Conversation history for multi-turn interactions
- Uses a different AI model (GPT-4o) to validate primary outputs
- Scores each summary point and risk on:
- Factual Grounding (1-5)
- Numeric Accuracy (1-5)
- Coherence (1-5)
- Identifies critical issues: Hallucinations, Numeric Errors, Unsupported Claims
- Provides correction suggestions
- Regenerate memos incorporating validation feedback
- LLM uses identified issues to produce improved output
- Iterative refinement workflow
- Real-time tracking of API token consumption
- Cost estimation based on model pricing
- Breakdown by model and service
- Session history and downloadable logs
- Generate comprehensive Markdown reports
- Includes executive summary, risks, and key metrics
- Download as
.mdor.txtfiles - Professional formatting with trend indicators
- Alternative branch with Ollama integration
- Runs inference locally without cloud API calls
- Same functionality as cloud-based analysis
| Component | Technology |
|---|---|
| Frontend | Streamlit |
| PDF Processing | pdfplumber |
| Data Handling | pandas, numpy |
| LLM Provider | OpenAI (GPT-4o-mini, GPT-4o) |
| Environment | python-dotenv |
Memo_Test/
โโโ README.md
โโโ backend/
โโโ frontend.py # Streamlit UI application
โโโ requirements.txt # Python dependencies
โโโ .env # Environment variables (create this)
โโโ app/
โ โโโ __init__.py
โ โโโ services/
โ โโโ __init__.py
โ โโโ pdf_table_extractor.py # Table extraction service
โ โโโ pdf_text_extractor.py # Text extraction service
โ โโโ pseudonymizer.py # Data anonymization service
โ โโโ llm_insights.py # Key metrics analysis
โ โโโ generate_memo.py # Memo generation service
โ โโโ chat_service.py # Document Q&A chat
โ โโโ secondary_validator.py # Output validation service
โ โโโ token_monitor.py # Token usage tracking
โโโ config/
โ โโโ entity_map.json # Pseudonymization mappings
โ โโโ mapping_audit.json # Audit trail for mappings
โโโ uploads/ # Uploaded PDF storage
โโโ logs/ # Token usage logs
-
Upload a PDF: Select a financial document (supports password-protected files)
-
Generate Summary: Click "Generate Summary" to extract tables and text
-
View Extracted Data:
- Tables Tab: View extracted financial tables
- Text Tab: View extracted text paragraphs
-
Generate Memo:
- Go to "AI Insights" tab
- Configure number of summary points and risks
- Optionally enable pseudonymization
- Click "Generate Memo"
-
Validate Output:
- Go to "Validation" tab
- Click "Validate Memo" to check accuracy
- Review scores and critical issues
-
Iterate if Needed:
- Use "Regenerate Memo" to improve based on feedback
-
Export Report:
- Go to "Export" tab
- Download as Markdown or Text
- Chat Q&A: Use the Chat tab to ask specific questions about the document
- Key Metrics: View extracted metrics with trend analysis in the Key Metrics tab
- Token Monitoring: Track API usage and costs in the Token Usage tab
Edit backend/config/entity_map.json to customize entity replacements:
{
"companies": {
"Acme Corp": "COMPANY_001"
},
"people": {
"John Smith": "PERSON_001"
}
}Edit backend/app/services/token_monitor.py to update pricing:
MODEL_PRICING = {
"gpt-4o": {"input": 0.005, "output": 0.015},
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
# Add more models...
}- Batch processing for multiple documents
- Custom LLM model selection
- Integration with document management systems
- Advanced visualization dashboards
- API endpoints for programmatic access
- Multi-language support
- Document analysis and extraction pipeline
- LLM integration and prompt engineering
- Frontend development and UX
- Local LLM support (Ollama integration - separate branch)
This project is for educational and demonstration purposes.
- OpenAI for GPT models
- Streamlit for the rapid UI framework
- pdfplumber for PDF processing capabilities