Skip to content

anugavstudy/pdf-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDF Credit Card Statement Parser

Overview

This project is a PDF parser that extracts key data from credit card statements across multiple banks. It automatically detects high balances and potential fraud, calculates a risk score, and exports the results for review.


Features

  • Extracts key data:
    • Card Member
    • Card Number (last 4 digits)
    • Card Variant
    • Statement Period
    • Payment Due Date
    • Total Amount Due / Balance
  • Flags high balances (> ₹20,000)
  • Detects potential fraud if Customer Identification Number (CIN) is missing
  • Computes risk score and severity (Low / Medium / High)
  • Exports results to CSV for easy review

How It Works

  1. Reads PDF statements from a specified folder using pdfplumber.
  2. Uses regex to extract relevant fields.
  3. Flags high balances and missing Customer IDs.
  4. Calculates a numeric risk score:
    • High Balance = 1 point
    • Missing Customer ID = 2 points
  5. Maps total points to a risk severity:
    • 0 → Low
    • 1 → Medium
    • ≥2 → High
  6. Outputs all parsed data and flags into a CSV file.

Tech Stack / Libraries

  • Python 3.x – Core programming language
  • pdfplumber – Extract text from PDFs
  • re (regex) – Parse key fields from text
  • pandas – Data manipulation and CSV export
  • Optional visualization: matplotlib / seaborn / Plotly

Setup Instructions

  1. Clone the repository:
    git clone https://github.com/anugavstudy/pdf-parser.git
    cd pdf-parser

2. Install dependencies:

   ```bash
   pip install pdfplumber pandas
   ```
3. Place your credit card PDF statements in the `statements/` folder.
4. Run the parser:

   ```bash
   python main.py
   ```
5. Check the output CSV:

   ```
   parsed_statements_with_risk.csv
   ```

---

## Example Output

| Card Member | Card Last 4 | Balance | High Balance Flag | Fraud Flag | Risk Points | Risk Severity |
| ----------- | ----------- | ------- | ----------------- | ---------- | ----------- | ------------- |
| ABC PQR     | 1234        | 24680   | True              | False      | 1           | Medium        |
| XYZ LMN     | 5678        | 8000    | False             | True       | 2           | High          |

---

## Why It Matters

* Automates tedious manual extraction of statements
* Detects potential fraud quickly and accurately
* Provides risk scoring to prioritize high-risk statements

---

## Future Enhancements

* Extract detailed transactions for fraud analysis
* Build dashboards for visualizing high-risk statements
* Implement automated email alerts for flagged statements

---

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages