This project extracts key financial details from Indian credit card statements (PDFs) across 5 major banks — HDFC, ICICI, SBI, Axis, and Kotak — using text-based PDF parsing.
- Supports real-world PDF statement formats (text-based).
- Detects issuer automatically using fuzzy matching.
- Extracts key data points:
- Card last 4 digits
- Card variant (type)
- Billing cycle (start and end dates)
- Payment due date
- Total amount due
- Returns structured JSON output.
- Easy to extend to new banks and formats.
| Bank | Example Card Type |
|---|---|
| HDFC Bank | Regalia Gold |
| ICICI Bank | Coral Credit Card |
| SBI Card | SimplySAVE |
| Axis Bank | Flipkart Axis |
| Kotak Mahindra Bank | Royale Signature |
CreditParser/
│
├── india_creditcard_parser_realworld.py # Main parser script
├── HDFC_Statement_Test.pdf # Sample test statement
├── ICICI_Statement_Test.pdf
├── SBI_Statement_Test.pdf
├── Axis_Statement_Test.pdf
├── Kotak_Statement_Test.pdf
├── requirements.txt
└── README.md # Documentation
- Python 3.10+
- Recommended: VS Code or any Python IDE
-
Install Tesseract for Windows:
- Download and run an installer (for example the UB Mannheim build) and keep note of the installation path (commonly
C:\Program Files\Tesseract-OCR\tesseract.exe).
- Download and run an installer (for example the UB Mannheim build) and keep note of the installation path (commonly
-
Create a virtual environment and install Python packages:
python -m venv .venv
.\.venv\Scripts\Activate
pip install -r requirements.txtRun on a single PDF:
python india_creditcard_parser_realworld.py HDFC_Statement_Test.pdfExample output:
{
"issuer": "HDFC",
"card_last4": "5678",
"card_variant": "Regalia Gold",
"billing_cycle_start": "01 Sep 2025",
"billing_cycle_end": "30 Sep 2025",
"payment_due_date": "20 Oct 2025",
"total_amount_due": "23,542.67"
}python india_creditcard_parser_realworld.py HDFC_Statement_Test.pdf > HDFC_output.json| Issuer | Card Type | Total Amount Due | Payment Due Date |
|---|---|---|---|
| HDFC | Regalia Gold | ₹23,542.67 | 20 Oct 2025 |
| ICICI | Coral Credit Card | ₹15,870.45 | 18 Oct 2025 |
| SBI | SimplySAVE | ₹9,230.00 | 22 Oct 2025 |
| Axis | Flipkart Axis | ₹18,520.70 | 25 Oct 2025 |
| Kotak | Royale Signature | ₹12,670.50 | 17 Oct 2025 |
- PDF Extraction: Uses
pdfplumberto extract text. - Issuer Detection: Matches bank names using fuzzy logic.
- Regex Matching: Extracts important values using flexible patterns.
- JSON Output: Returns structured results for easy use in apps or APIs.
- Add OCR support for scanned PDFs using
pytesseract. - Train ML model to auto-detect bank format.
- Add CSV/Excel export of parsed data.
- Integrate with financial dashboards.
Shreya Patil
Project: Credit Card Statement Parser (India)