📄 Credit Memo Auto-Generator

An AI-powered tool that reads financial documents and produces clear, structured first-draft summaries for credit analysts.

🎯 Problem Statement

Credit Memo Auto-Generator

Reads documents → Produces a clear first draft

What it is

A tool that reads long financial documents and writes a clear first-draft summary for humans.

In simple terms

AI reads PDFs, pulls key numbers and risks, and produces a structured, editable draft.

Think of it like

ChatGPT that reads PDFs and writes drafts.

� Quick Setup

Prerequisites

Python 3.9 or higher
OpenAI API key

Installation Steps

Clone the repository
```
git clone <repository-url>
cd Memo_Test
```

Install dependencies

cd backend
pip install -r requirements.txt

⚠️ IMPORTANT: Set up environment variables

Create a .env file inside the backend folder (not in the root directory):
```
# Navigate to backend folder
cd backend

# Create .env file
# Windows (PowerShell)
New-Item .env

# macOS/Linux
touch .env
```
Add your OpenAI API key to the .env file:
```
OPENAI_API_KEY=your_openai_api_key_here
```
Note: The .env file must be in Memo_Test/backend/.env for the application to work correctly.

Run the application

# Make sure you're in the backend folder
cd backend
streamlit run frontend.py

Access the app

Open your browser and navigate to the port where it is running ( will be shown in the shell)

🧪 Test Files

Sample test files are available in the backend/uploads/ folder:

File	Description
`final_testcase.pdf`	Main test document - standard financial PDF
`test3.pdf`	Password-protected document (for testing password feature)

Note: When testing test3.pdf, check the "🔒 PDF is password protected" checkbox and enter the password.

💡 The Challenge

Credit analysts and financial professionals spend significant time manually reviewing financial documents (annual reports, balance sheets, income statements) to:

Extract key metrics and trends
Identify potential risks
Generate executive summaries for stakeholders

This process is:

Time-consuming: Manual review of lengthy documents
Error-prone: Human oversight can miss critical details
Inconsistent: Different analysts may interpret data differently
Expensive: Requires skilled professionals for thorough analysis

Our Solution

This tool automates the financial document analysis workflow:

Input	Process	Output
Upload a PDF	AI extracts & analyzes	Structured, editable draft

Example builds:

✅ Upload a PDF and generate a one-page executive summary
✅ Highlight key numbers and show the source page
✅ Export the draft to Markdown or Word for editing

📋 Example Output

Upload a PDF with multiple financial statements → tool outputs:

Component	Description
5-bullet executive summary	Key highlights from the document
Key metrics table	Financial metrics with trends (🟢↑ / 🔴↓)
"Top 3 risks" section	Identified risks with severity ratings
Source tracing	Highlight a sentence → see which PDF page it came from
Confidence tags	✅ Strong data \| ⚠️ Incomplete data

🔄 User Flow

Upload PDF → Click "Generate Memo" → Review → Edit → Download

Upload PDF - Drag & drop or select financial document
Generate Memo - AI extracts data and generates summary
Review - Check executive summary, risks, and metrics
Validate - Secondary AI validates accuracy with confidence scores
Edit - Refine using chat Q&A or regenerate with feedback
Download - Export to Markdown or Text for further editing

✨ Features

📊 PDF Data Extraction

Table Extraction: Automatically detects and extracts financial tables with headers and data
Text Extraction: Extracts paragraphs with section headings and page references
Password Support: Handles password-protected PDF documents

🤖 AI-Powered Analysis

Key Metrics Identification

Automatically identifies key financial metrics from extracted tables
Shows trends (🟢 Increase / 🔴 Decrease) for each metric
Provides explanations with source references

Executive Summary Generation

Generates configurable number of summary bullet points (3-10)
Categories: Financial Performance, Revenue, Profitability, Cash Flow, etc.
Confidence indicators (High/Medium/Low) for each point
Page references for traceability

Key Risks Assessment

Identifies and categorizes financial risks
Severity ratings (High/Medium/Low) with strict criteria
Evidence-based risk descriptions
Categories: Liquidity, Credit, Operational, Compliance, etc.

🔒 Privacy & Security

Pseudonymization

Optional data anonymization before sending to LLM
Replaces company names, people, and products with tokens
Automatically restores original names in final output
Configurable entity mappings

💬 Document Q&A Chat

Conversational interface for querying document data
Context-aware responses based on extracted content
Suggested questions for quick exploration
Conversation history for multi-turn interactions

✅ Secondary Validation

Uses a different AI model (GPT-4o) to validate primary outputs
Scores each summary point and risk on:
- Factual Grounding (1-5)
- Numeric Accuracy (1-5)
- Coherence (1-5)
Identifies critical issues: Hallucinations, Numeric Errors, Unsupported Claims
Provides correction suggestions

🔄 Feedback-Driven Regeneration

Regenerate memos incorporating validation feedback
LLM uses identified issues to produce improved output
Iterative refinement workflow

💰 Token Usage Monitoring

Real-time tracking of API token consumption
Cost estimation based on model pricing
Breakdown by model and service
Session history and downloadable logs

📥 Export Reports

Generate comprehensive Markdown reports
Includes executive summary, risks, and key metrics
Download as .md or .txt files
Professional formatting with trend indicators

🖥️ Local LLM Support

Alternative branch with Ollama integration
Runs inference locally without cloud API calls
Same functionality as cloud-based analysis

🛠️ Tech Stack

Component	Technology
Frontend	Streamlit
PDF Processing	pdfplumber
Data Handling	pandas, numpy
LLM Provider	OpenAI (GPT-4o-mini, GPT-4o)
Environment	python-dotenv

Project Structure

Memo_Test/
├── README.md
└── backend/
    ├── frontend.py              # Streamlit UI application
    ├── requirements.txt         # Python dependencies
    ├── .env                     # Environment variables (create this)
    ├── app/
    │   ├── __init__.py
    │   └── services/
    │       ├── __init__.py
    │       ├── pdf_table_extractor.py   # Table extraction service
    │       ├── pdf_text_extractor.py    # Text extraction service
    │       ├── pseudonymizer.py         # Data anonymization service
    │       ├── llm_insights.py          # Key metrics analysis
    │       ├── generate_memo.py         # Memo generation service
    │       ├── chat_service.py          # Document Q&A chat
    │       ├── secondary_validator.py   # Output validation service
    │       └── token_monitor.py         # Token usage tracking
    ├── config/
    │   ├── entity_map.json      # Pseudonymization mappings
    │   └── mapping_audit.json   # Audit trail for mappings
    ├── uploads/                 # Uploaded PDF storage
    └── logs/                    # Token usage logs

🚀 Usage Guide

Basic Workflow

Upload a PDF: Select a financial document (supports password-protected files)
Generate Summary: Click "Generate Summary" to extract tables and text
View Extracted Data:
- Tables Tab: View extracted financial tables
- Text Tab: View extracted text paragraphs
Generate Memo:
- Go to "AI Insights" tab
- Configure number of summary points and risks
- Optionally enable pseudonymization
- Click "Generate Memo"
Validate Output:
- Go to "Validation" tab
- Click "Validate Memo" to check accuracy
- Review scores and critical issues
Iterate if Needed:
- Use "Regenerate Memo" to improve based on feedback
Export Report:
- Go to "Export" tab
- Download as Markdown or Text

Advanced Features

Chat Q&A: Use the Chat tab to ask specific questions about the document
Key Metrics: View extracted metrics with trend analysis in the Key Metrics tab
Token Monitoring: Track API usage and costs in the Token Usage tab

⚙️ Configuration

Entity Mappings (Pseudonymization)

Edit backend/config/entity_map.json to customize entity replacements:

{
  "companies": {
    "Acme Corp": "COMPANY_001"
  },
  "people": {
    "John Smith": "PERSON_001"
  }
}

Model Pricing (Token Monitor)

Edit backend/app/services/token_monitor.py to update pricing:

MODEL_PRICING = {
    "gpt-4o": {"input": 0.005, "output": 0.015},
    "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
    # Add more models...
}

🔮 Future Enhancements

Batch processing for multiple documents
Custom LLM model selection
Integration with document management systems
Advanced visualization dashboards
API endpoints for programmatic access
Multi-language support

👥 Team

Document analysis and extraction pipeline
LLM integration and prompt engineering
Frontend development and UX
Local LLM support (Ollama integration - separate branch)

📄 License

This project is for educational and demonstration purposes.

🙏 Acknowledgments

OpenAI for GPT models
Streamlit for the rapid UI framework
pdfplumber for PDF processing capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
backend		backend
README.md		README.md
TLE-DECK.pdf		TLE-DECK.pdf

Folders and files

Latest commit

History

Repository files navigation

📄 Credit Memo Auto-Generator

🎯 Problem Statement