Skip to content

adarshnaik1/FinScribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“„ Credit Memo Auto-Generator

An AI-powered tool that reads financial documents and produces clear, structured first-draft summaries for credit analysts.


๐ŸŽฏ Problem Statement

Credit Memo Auto-Generator

Reads documents โ†’ Produces a clear first draft

What it is

A tool that reads long financial documents and writes a clear first-draft summary for humans.

In simple terms

AI reads PDFs, pulls key numbers and risks, and produces a structured, editable draft.

Think of it like

ChatGPT that reads PDFs and writes drafts.


๏ฟฝ Quick Setup

Prerequisites

  • Python 3.9 or higher
  • OpenAI API key

Installation Steps

  1. Clone the repository

    git clone <repository-url>
    cd Memo_Test
  2. Install dependencies

    cd backend
    pip install -r requirements.txt
  3. โš ๏ธ IMPORTANT: Set up environment variables

    Create a .env file inside the backend folder (not in the root directory):

    # Navigate to backend folder
    cd backend
    
    # Create .env file
    # Windows (PowerShell)
    New-Item .env
    
    # macOS/Linux
    touch .env

    Add your OpenAI API key to the .env file:

    OPENAI_API_KEY=your_openai_api_key_here

    Note: The .env file must be in Memo_Test/backend/.env for the application to work correctly.

  4. Run the application

    # Make sure you're in the backend folder
    cd backend
    streamlit run frontend.py
  5. Access the app

    Open your browser and navigate to the port where it is running ( will be shown in the shell)


๐Ÿงช Test Files

Sample test files are available in the backend/uploads/ folder:

File Description
final_testcase.pdf Main test document - standard financial PDF
test3.pdf Password-protected document (for testing password feature)

Note: When testing test3.pdf, check the "๐Ÿ”’ PDF is password protected" checkbox and enter the password.


๐Ÿ’ก The Challenge

Credit analysts and financial professionals spend significant time manually reviewing financial documents (annual reports, balance sheets, income statements) to:

  • Extract key metrics and trends
  • Identify potential risks
  • Generate executive summaries for stakeholders

This process is:

  • Time-consuming: Manual review of lengthy documents
  • Error-prone: Human oversight can miss critical details
  • Inconsistent: Different analysts may interpret data differently
  • Expensive: Requires skilled professionals for thorough analysis

Our Solution

This tool automates the financial document analysis workflow:

Input Process Output
Upload a PDF AI extracts & analyzes Structured, editable draft

Example builds:

  • โœ… Upload a PDF and generate a one-page executive summary
  • โœ… Highlight key numbers and show the source page
  • โœ… Export the draft to Markdown or Word for editing

๐Ÿ“‹ Example Output

Upload a PDF with multiple financial statements โ†’ tool outputs:

Component Description
5-bullet executive summary Key highlights from the document
Key metrics table Financial metrics with trends (๐ŸŸขโ†‘ / ๐Ÿ”ดโ†“)
"Top 3 risks" section Identified risks with severity ratings
Source tracing Highlight a sentence โ†’ see which PDF page it came from
Confidence tags โœ… Strong data | โš ๏ธ Incomplete data

๐Ÿ”„ User Flow

Upload PDF โ†’ Click "Generate Memo" โ†’ Review โ†’ Edit โ†’ Download
  1. Upload PDF - Drag & drop or select financial document
  2. Generate Memo - AI extracts data and generates summary
  3. Review - Check executive summary, risks, and metrics
  4. Validate - Secondary AI validates accuracy with confidence scores
  5. Edit - Refine using chat Q&A or regenerate with feedback
  6. Download - Export to Markdown or Text for further editing

โœจ Features

๐Ÿ“Š PDF Data Extraction

  • Table Extraction: Automatically detects and extracts financial tables with headers and data
  • Text Extraction: Extracts paragraphs with section headings and page references
  • Password Support: Handles password-protected PDF documents

๐Ÿค– AI-Powered Analysis

Key Metrics Identification

  • Automatically identifies key financial metrics from extracted tables
  • Shows trends (๐ŸŸข Increase / ๐Ÿ”ด Decrease) for each metric
  • Provides explanations with source references

Executive Summary Generation

  • Generates configurable number of summary bullet points (3-10)
  • Categories: Financial Performance, Revenue, Profitability, Cash Flow, etc.
  • Confidence indicators (High/Medium/Low) for each point
  • Page references for traceability

Key Risks Assessment

  • Identifies and categorizes financial risks
  • Severity ratings (High/Medium/Low) with strict criteria
  • Evidence-based risk descriptions
  • Categories: Liquidity, Credit, Operational, Compliance, etc.

๐Ÿ”’ Privacy & Security

Pseudonymization

  • Optional data anonymization before sending to LLM
  • Replaces company names, people, and products with tokens
  • Automatically restores original names in final output
  • Configurable entity mappings

๐Ÿ’ฌ Document Q&A Chat

  • Conversational interface for querying document data
  • Context-aware responses based on extracted content
  • Suggested questions for quick exploration
  • Conversation history for multi-turn interactions

โœ… Secondary Validation

  • Uses a different AI model (GPT-4o) to validate primary outputs
  • Scores each summary point and risk on:
    • Factual Grounding (1-5)
    • Numeric Accuracy (1-5)
    • Coherence (1-5)
  • Identifies critical issues: Hallucinations, Numeric Errors, Unsupported Claims
  • Provides correction suggestions

๐Ÿ”„ Feedback-Driven Regeneration

  • Regenerate memos incorporating validation feedback
  • LLM uses identified issues to produce improved output
  • Iterative refinement workflow

๐Ÿ’ฐ Token Usage Monitoring

  • Real-time tracking of API token consumption
  • Cost estimation based on model pricing
  • Breakdown by model and service
  • Session history and downloadable logs

๐Ÿ“ฅ Export Reports

  • Generate comprehensive Markdown reports
  • Includes executive summary, risks, and key metrics
  • Download as .md or .txt files
  • Professional formatting with trend indicators

๐Ÿ–ฅ๏ธ Local LLM Support

  • Alternative branch with Ollama integration
  • Runs inference locally without cloud API calls
  • Same functionality as cloud-based analysis

๐Ÿ› ๏ธ Tech Stack

Component Technology
Frontend Streamlit
PDF Processing pdfplumber
Data Handling pandas, numpy
LLM Provider OpenAI (GPT-4o-mini, GPT-4o)
Environment python-dotenv

Project Structure

Memo_Test/
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ backend/
    โ”œโ”€โ”€ frontend.py              # Streamlit UI application
    โ”œโ”€โ”€ requirements.txt         # Python dependencies
    โ”œโ”€โ”€ .env                     # Environment variables (create this)
    โ”œโ”€โ”€ app/
    โ”‚   โ”œโ”€โ”€ __init__.py
    โ”‚   โ””โ”€โ”€ services/
    โ”‚       โ”œโ”€โ”€ __init__.py
    โ”‚       โ”œโ”€โ”€ pdf_table_extractor.py   # Table extraction service
    โ”‚       โ”œโ”€โ”€ pdf_text_extractor.py    # Text extraction service
    โ”‚       โ”œโ”€โ”€ pseudonymizer.py         # Data anonymization service
    โ”‚       โ”œโ”€โ”€ llm_insights.py          # Key metrics analysis
    โ”‚       โ”œโ”€โ”€ generate_memo.py         # Memo generation service
    โ”‚       โ”œโ”€โ”€ chat_service.py          # Document Q&A chat
    โ”‚       โ”œโ”€โ”€ secondary_validator.py   # Output validation service
    โ”‚       โ””โ”€โ”€ token_monitor.py         # Token usage tracking
    โ”œโ”€โ”€ config/
    โ”‚   โ”œโ”€โ”€ entity_map.json      # Pseudonymization mappings
    โ”‚   โ””โ”€โ”€ mapping_audit.json   # Audit trail for mappings
    โ”œโ”€โ”€ uploads/                 # Uploaded PDF storage
    โ””โ”€โ”€ logs/                    # Token usage logs

๐Ÿš€ Usage Guide

Basic Workflow

  1. Upload a PDF: Select a financial document (supports password-protected files)

  2. Generate Summary: Click "Generate Summary" to extract tables and text

  3. View Extracted Data:

    • Tables Tab: View extracted financial tables
    • Text Tab: View extracted text paragraphs
  4. Generate Memo:

    • Go to "AI Insights" tab
    • Configure number of summary points and risks
    • Optionally enable pseudonymization
    • Click "Generate Memo"
  5. Validate Output:

    • Go to "Validation" tab
    • Click "Validate Memo" to check accuracy
    • Review scores and critical issues
  6. Iterate if Needed:

    • Use "Regenerate Memo" to improve based on feedback
  7. Export Report:

    • Go to "Export" tab
    • Download as Markdown or Text

Advanced Features

  • Chat Q&A: Use the Chat tab to ask specific questions about the document
  • Key Metrics: View extracted metrics with trend analysis in the Key Metrics tab
  • Token Monitoring: Track API usage and costs in the Token Usage tab

โš™๏ธ Configuration

Entity Mappings (Pseudonymization)

Edit backend/config/entity_map.json to customize entity replacements:

{
  "companies": {
    "Acme Corp": "COMPANY_001"
  },
  "people": {
    "John Smith": "PERSON_001"
  }
}

Model Pricing (Token Monitor)

Edit backend/app/services/token_monitor.py to update pricing:

MODEL_PRICING = {
    "gpt-4o": {"input": 0.005, "output": 0.015},
    "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
    # Add more models...
}

๐Ÿ”ฎ Future Enhancements

  • Batch processing for multiple documents
  • Custom LLM model selection
  • Integration with document management systems
  • Advanced visualization dashboards
  • API endpoints for programmatic access
  • Multi-language support

๐Ÿ‘ฅ Team

  • Document analysis and extraction pipeline
  • LLM integration and prompt engineering
  • Frontend development and UX
  • Local LLM support (Ollama integration - separate branch)

๐Ÿ“„ License

This project is for educational and demonstration purposes.


๐Ÿ™ Acknowledgments

  • OpenAI for GPT models
  • Streamlit for the rapid UI framework
  • pdfplumber for PDF processing capabilities

About

This application reads your financial reports and generates a credit memo highlighting key risks ,metrics and other important information leveraging the power of cloud based and Local LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages