# Personal Finance Tracker - Bank Statements Cleaning

Developed automated cleaning script for bank statements to simplify monthly budgeting. This notebook runs the cleaning pipeline, displays samples of the cleaned data, and shows monthly summaries and category breakdowns.

## Run the automated cleaning script

The next cell executes the robust cleaning script located in ../scripts/cleanup.py. The script produces cleaned CSV files in ../data/cleaned/.

In [1]:
import subprocess, sys, os

# Run the cleaning script (prints a short summary and creates cleaned files)
script_path = os.path.join('..', 'scripts', 'cleanup.py')
print('Executing:', script_path)
subprocess.run([sys.executable, script_path], check=True)

Executing: ..\scripts\cleanup.py


CompletedProcess(args=['c:\\Users\\TADS\\TADS Proj\\Cleanups\\Bank-statements-cleanup\\venv\\Scripts\\python.exe', '..\\scripts\\cleanup.py'], returncode=0)

## Load cleaned outputs

After the script runs, load the cleaned dataset and monthly summary for inspection.

In [2]:
import pandas as pd, os

base = os.path.normpath(os.path.join('..'))
clean_dir = os.path.join(base, 'data', 'cleaned')
cleaned_csv = os.path.join(clean_dir, 'cleaned_bank_statements.csv')
monthly_csv = os.path.join(clean_dir, 'monthly_summary.csv')

df = pd.read_csv(cleaned_csv, parse_dates=['Date'])
monthly = pd.read_csv(monthly_csv)

print('Cleaned rows:', len(df))
df.head(10)

Cleaned rows: 300


Unnamed: 0,Date,Description,Amount,Category,Balance,Anomaly
0,2025-04-09,Gas station refill,993.19,Entertainment,993.19,False
1,2025-04-10,Grocery shopping at Walmart,735.26,Entertainment,1728.45,False
2,2025-04-10,Utility bill - Electric,565.05,Rent,2293.5,False
3,2025-04-10,Utility bill - Electric,-161.4,Miscellaneous,2132.1,False
4,2025-04-10,Movie tickets,138.55,Salary,2270.65,False
5,2025-04-10,Miscellaneous expense,-420.86,Utilities,1849.79,False
6,2025-04-11,Rent payment,833.56,Unspecified,2683.35,False
7,2025-04-11,Miscellaneous expense,531.05,Rent,3214.4,False
8,2025-04-11,Miscellaneous expense,28.19,Unspecified,3242.59,False
9,2025-04-13,Gas station refill,292.91,Rent,3535.5,False


## Data snapshot and quick checks

Inspect a few cleaned rows and ensure Date, Description, Amount, Category, Balance and Anomaly columns are present.

In [3]:
# Show cleaned sample with anomalies highlighted
display_cols = ['Date','Description','Amount','Category','Balance','Anomaly']
df[display_cols].head(15)

Unnamed: 0,Date,Description,Amount,Category,Balance,Anomaly
0,2025-04-09,Gas station refill,993.19,Entertainment,993.19,False
1,2025-04-10,Grocery shopping at Walmart,735.26,Entertainment,1728.45,False
2,2025-04-10,Utility bill - Electric,565.05,Rent,2293.5,False
3,2025-04-10,Utility bill - Electric,-161.4,Miscellaneous,2132.1,False
4,2025-04-10,Movie tickets,138.55,Salary,2270.65,False
5,2025-04-10,Miscellaneous expense,-420.86,Utilities,1849.79,False
6,2025-04-11,Rent payment,833.56,Unspecified,2683.35,False
7,2025-04-11,Miscellaneous expense,531.05,Rent,3214.4,False
8,2025-04-11,Miscellaneous expense,28.19,Unspecified,3242.59,False
9,2025-04-13,Gas station refill,292.91,Rent,3535.5,False


## Monthly summary

The script also generates per-month totals: income, expenses, net and transaction count.

In [4]:
monthly.head(20)

Unnamed: 0,Month,transactions,total_income,total_expense,net
0,2025-04,32,11304.56,-1462.28,9842.28
1,2025-05,58,20559.65,-4048.89,16510.76
2,2025-06,56,24858.46,-2756.06,22102.4
3,2025-07,44,16891.31,-3434.98,13456.33
4,2025-08,43,17235.05,-2657.52,14577.53
5,2025-09,58,19754.86,-5700.61,14054.25
6,2025-10,9,2220.45,-1183.18,1037.27


## Next steps (suggestions)

- Use the cleaned CSV in a budgeting/dashboard tool (e.g., Power BI, Excel, or Plotly Dash).
- Extend category synonyms to capture more merchant-specific patterns.
- Integrate with personal income/expense forecasts.