Skip to content

Shreyaa25/Credit-Parser

Repository files navigation

Credit Card Statement Parser (India)

Objective

This project extracts key financial details from Indian credit card statements (PDFs) across 5 major banks — HDFC, ICICI, SBI, Axis, and Kotak — using text-based PDF parsing.


Features

  • Supports real-world PDF statement formats (text-based).
  • Detects issuer automatically using fuzzy matching.
  • Extracts key data points:
    • Card last 4 digits
    • Card variant (type)
    • Billing cycle (start and end dates)
    • Payment due date
    • Total amount due
  • Returns structured JSON output.
  • Easy to extend to new banks and formats.

Supported Banks

Bank Example Card Type
HDFC Bank Regalia Gold
ICICI Bank Coral Credit Card
SBI Card SimplySAVE
Axis Bank Flipkart Axis
Kotak Mahindra Bank Royale Signature

Project Structure

CreditParser/
│
├── india_creditcard_parser_realworld.py   # Main parser script
├── HDFC_Statement_Test.pdf                # Sample test statement
├── ICICI_Statement_Test.pdf
├── SBI_Statement_Test.pdf
├── Axis_Statement_Test.pdf
├── Kotak_Statement_Test.pdf
├── requirements.txt 
└── README.md                             # Documentation

Setup Instructions

Prerequisites

  • Python 3.10+
  • Recommended: VS Code or any Python IDE

Install


  1. Install Tesseract for Windows:

    • Download and run an installer (for example the UB Mannheim build) and keep note of the installation path (commonly C:\Program Files\Tesseract-OCR\tesseract.exe).
  2. Create a virtual environment and install Python packages:

python -m venv .venv
.\.venv\Scripts\Activate
pip install -r requirements.txt

Run the Parser

Run on a single PDF:

python india_creditcard_parser_realworld.py HDFC_Statement_Test.pdf

Example output:

{
  "issuer": "HDFC",
  "card_last4": "5678",
  "card_variant": "Regalia Gold",
  "billing_cycle_start": "01 Sep 2025",
  "billing_cycle_end": "30 Sep 2025",
  "payment_due_date": "20 Oct 2025",
  "total_amount_due": "23,542.67"
}

Save Output to File

python india_creditcard_parser_realworld.py HDFC_Statement_Test.pdf > HDFC_output.json

Sample Output Summary

Issuer Card Type Total Amount Due Payment Due Date
HDFC Regalia Gold ₹23,542.67 20 Oct 2025
ICICI Coral Credit Card ₹15,870.45 18 Oct 2025
SBI SimplySAVE ₹9,230.00 22 Oct 2025
Axis Flipkart Axis ₹18,520.70 25 Oct 2025
Kotak Royale Signature ₹12,670.50 17 Oct 2025

How It Works

  1. PDF Extraction: Uses pdfplumber to extract text.
  2. Issuer Detection: Matches bank names using fuzzy logic.
  3. Regex Matching: Extracts important values using flexible patterns.
  4. JSON Output: Returns structured results for easy use in apps or APIs.

Future Enhancements

  • Add OCR support for scanned PDFs using pytesseract.
  • Train ML model to auto-detect bank format.
  • Add CSV/Excel export of parsed data.
  • Integrate with financial dashboards.

Author

Shreya Patil
Project: Credit Card Statement Parser (India)

About

Credit-Parser: A Python tool to extract key details from Indian credit card statements (HDFC, ICICI, SBI, Axis, Kotak) and output them in structured JSON format. Supports card info, billing cycle, payment due, and total amount due for easy financial analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages