This project is a PDF parser that extracts key data from credit card statements across multiple banks. It automatically detects high balances and potential fraud, calculates a risk score, and exports the results for review.
- Extracts key data:
- Card Member
- Card Number (last 4 digits)
- Card Variant
- Statement Period
- Payment Due Date
- Total Amount Due / Balance
- Flags high balances (> ₹20,000)
- Detects potential fraud if Customer Identification Number (CIN) is missing
- Computes risk score and severity (Low / Medium / High)
- Exports results to CSV for easy review
- Reads PDF statements from a specified folder using
pdfplumber. - Uses regex to extract relevant fields.
- Flags high balances and missing Customer IDs.
- Calculates a numeric risk score:
- High Balance = 1 point
- Missing Customer ID = 2 points
- Maps total points to a risk severity:
- 0 → Low
- 1 → Medium
- ≥2 → High
- Outputs all parsed data and flags into a CSV file.
- Python 3.x – Core programming language
- pdfplumber – Extract text from PDFs
- re (regex) – Parse key fields from text
- pandas – Data manipulation and CSV export
- Optional visualization: matplotlib / seaborn / Plotly
- Clone the repository:
git clone https://github.com/anugavstudy/pdf-parser.git cd pdf-parser
2. Install dependencies:
```bash
pip install pdfplumber pandas
```
3. Place your credit card PDF statements in the `statements/` folder.
4. Run the parser:
```bash
python main.py
```
5. Check the output CSV:
```
parsed_statements_with_risk.csv
```
---
## Example Output
| Card Member | Card Last 4 | Balance | High Balance Flag | Fraud Flag | Risk Points | Risk Severity |
| ----------- | ----------- | ------- | ----------------- | ---------- | ----------- | ------------- |
| ABC PQR | 1234 | 24680 | True | False | 1 | Medium |
| XYZ LMN | 5678 | 8000 | False | True | 2 | High |
---
## Why It Matters
* Automates tedious manual extraction of statements
* Detects potential fraud quickly and accurately
* Provides risk scoring to prioritize high-risk statements
---
## Future Enhancements
* Extract detailed transactions for fraud analysis
* Build dashboards for visualizing high-risk statements
* Implement automated email alerts for flagged statements
---