A Python-based web application for automatically extracting key information and all transaction details from credit card statements using Streamlit and pdfplumber.
This project provides an automated solution for parsing PDF credit card statements from multiple issuers. It extracts:
Statement Summary (5 key data points):
- Card Variant/Type - The name of the credit card (e.g., Chase Sapphire Preferred)
 - Card Number (Last 4) - Last four digits of the card number
 - Billing Cycle - Statement period date range
 - Payment Due Date - When payment is due
 - Total Balance - Amount due on the statement
 
Transaction Details (NEW!):
- ✅ All transactions with date, description, and amount
 - ✅ Transaction analytics (count, total spent, average)
 - ✅ Export to CSV for further analysis
 - ✅ Complete PDF text viewer
 
- American Express
 - Chase
 - Citibank
 - Bank of America
 - Discover
 
- Multi-Issuer Support: Modular parser design handles different statement formats
 - Auto-Detection: Automatically identifies credit card issuer from statement text
 - Transaction Extraction: Extracts ALL transactions with date, description, amount
 - Transaction Analytics: Shows count, total spent, and average per transaction
 - Web Interface: User-friendly Streamlit interface for easy file upload and viewing results
 - Export Functionality: Download extracted data and transactions as CSV
 - Raw Text Viewer: See complete PDF text for verification
 - Robust Parsing: Uses regex patterns and table extraction for accuracy
 
The project follows a modular design pattern:
credit_card_parser/
├── parsers/                    # PDF parsing logic package
│   ├── __init__.py            # Package initialization and factory functions
│   ├── base_parser.py         # Base parser class with common utilities
│   ├── amex_parser.py         # American Express specific parser
│   ├── chase_parser.py        # Chase specific parser
│   ├── citi_parser.py         # Citibank specific parser
│   ├── boa_parser.py          # Bank of America specific parser
│   ├── discover_parser.py     # Discover specific parser
│   └── utils.py               # Helper functions for text extraction
├── app.py                      # Streamlit web application
├── requirements.txt            # Python dependencies
└── README.md                   # This file
- Strategy Pattern: Each issuer has a dedicated parser class implementing a common interface
 - Separation of Concerns: UI logic (Streamlit) separated from parsing logic
 - Extensibility: Easy to add support for new issuers by creating new parser classes
 - Reusability: Common extraction utilities shared across all parsers
 
- Streamlit: Web application framework for the user interface
 - pdfplumber: PDF text extraction (superior layout handling)
 - pandas: Data manipulation and CSV export
 - pypdf: Additional PDF handling capabilities
 - python-dateutil: Date parsing utilities
 
- Python 3.10 or higher
 - uv - Fast Python package installer
 
- Install uv (if not already installed):
 
# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or with Homebrew
brew install uv
# Or with pip
pip install uv- 
Clone or download the project
 - 
Navigate to project directory:
 
cd /Users/harshkanani/Desktop/creditprj- Create virtual environment and install dependencies:
 
# uv will automatically create a venv and install all dependencies
uv syncThis will:
- Create a virtual environment in 
.venv/ - Install all required packages from 
pyproject.toml - Set up the project for development
 
If you prefer using pip:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt- Activate the virtual environment (if using uv sync):
 
source .venv/bin/activate  # On Windows: .venv\Scripts\activate- Start the Streamlit app:
 
# With uv (automatically uses the project's virtual environment)
uv run streamlit run app.py
# Or if venv is activated
streamlit run app.py- 
Open your browser to the URL displayed (typically
http://localhost:8501) - 
Upload a PDF statement using the file uploader in the sidebar
 - 
Choose detection method:
- Auto-detect (recommended): Automatically identifies the issuer
 - Manual selection: Choose your issuer from the dropdown
 
 - 
Click "Parse Statement" to extract the information
 - 
View results displayed in the main panel
 - 
Download CSV (optional) to export the extracted data
 
- Digital PDFs only: Text-based statements (not scanned images)
 - No password protection: PDFs should not be encrypted
 - Standard formats: Works best with official bank-issued statements
 - English language: Designed for English-language statements
 
- Uses 
pdfplumberto extract text from all pages - Handles multi-column layouts and tables effectively
 - Text is cleaned and normalized for parsing
 
- Searches for issuer-specific keywords and patterns
 - Matches against known issuer identifiers
 - Falls back to manual selection if auto-detection fails
 
Each parser implements issuer-specific logic:
- Regex patterns for structured data (dates, amounts, card numbers)
 - Keyword matching for field labels (varies by issuer)
 - Flexible search to handle format variations
 - Fallback mechanisms when primary patterns don't match
 
- Structured data object (
StatementData) - Clean display with metrics and summaries
 - Export functionality for further analysis
 
- 
Create a new parser file in
parsers/(e.g.,wells_fargo_parser.py) - 
Inherit from
StatementParser: 
from .base_parser import StatementParser
class WellsFargoParser(StatementParser):
    def __init__(self):
        super().__init__()
        self.issuer_name = "Wells Fargo"
    # Implement required methods
    def extract_card_variant(self):
        # Your extraction logic
        pass
    # ... implement other methods- Update 
parsers/__init__.py: 
from .wells_fargo_parser import WellsFargoParser
PARSER_MAP = {
    # ... existing parsers
    'Wells Fargo': WellsFargoParser,
}- 
Add detection patterns in the
detect_issuerfunction - 
Test with sample statements from the new issuer
 
- Digital PDFs only: Does not support scanned/image-based PDFs (no OCR)
 - Encrypted PDFs: Password-protected files must be unlocked first
 - Format variations: Accuracy depends on statement format consistency
 - Language: Currently supports English statements only
 - Local execution: Designed for local/single-user use (not cloud-deployed)
 
Potential improvements for future versions:
- OCR Support: Add Tesseract integration for scanned statements
 - Transaction Extraction: Parse and categorize individual transactions
 - Password Handling: Support for encrypted PDFs with password input
 - Batch Processing: Process multiple statements at once
 - Data Visualization: Charts and graphs for spending analysis
 - Database Storage: Save parsed data for historical tracking
 - API Endpoint: RESTful API for programmatic access
 - Additional Issuers: Expand support to more credit card companies
 
To test the parser:
- Obtain sample PDF statements from each supported issuer
 - Upload through the Streamlit interface
 - Verify all five data points are extracted correctly
 - Check edge cases (different date formats, special characters, etc.)
 
- Ensure PDF is not password-protected
 - Verify PDF is digitally generated (not a scan)
 - Try opening the PDF in a reader to confirm it contains selectable text
 
- Use manual selection from the dropdown
 - Check if the statement is from a supported issuer
 - Verify the PDF contains the issuer's name/logo text
 
- Some statements may have non-standard formats
 - Try updating the regex patterns in the relevant parser
 - Report the issue with statement details for improvements
 
Install with dev dependencies:
uv sync --all-extrasRun tests (when available):
uv run pytestFormat code:
uv run black .
uv run ruff check --fix .# Add a new dependency
uv add package-name
# Add a dev dependency
uv add --dev package-nameTo contribute to this project:
- Fork the repository
 - Create a feature branch
 - Install with dev dependencies: 
uv sync --all-extras - Add your enhancements or fixes
 - Test thoroughly with sample statements
 - Format code: 
uv run black . && uv run ruff check --fix . - Submit a pull request with detailed description
 
This project is provided as-is for educational and personal use.
- Built with Streamlit
 - PDF parsing powered by pdfplumber
 - Inspired by real-world needs for automating financial document processing
 
Note: This tool is for personal use only. Always verify extracted data against original statements before making financial decisions.