# Bank Transaction Analysis

This notebook provides an interactive way to analyze bank transaction data.

## Overview
- Load transaction data from CSV files
- Clean and process the data
- Perform analysis and categorization
- Create visualizations
- Generate insights and summaries

## 1. Setup and Imports

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Add src directory to path
sys.path.append('../src')

# Import our custom modules
from data_processor import BankDataProcessor
from analyzer import TransactionAnalyzer
from visualizer import DataVisualizer

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline

print("Setup complete!")

## 2. Load and Process Data

In [None]:
# Define data paths
data_dir = Path("../../Transacties")
excel_file = Path("../../Bankzaken.xlsx")

# Check if data files exist
print(f"Data directory exists: {data_dir.exists()}")
print(f"Excel file exists: {excel_file.exists()}")

if data_dir.exists():
    csv_files = list(data_dir.glob("*.csv"))
    print(f"Found {len(csv_files)} CSV files:")
    for file in csv_files:
        print(f"  - {file.name}")
else:
    print("⚠️ Data directory not found. Please ensure CSV files are in '../Transacties'")

In [None]:
# Initialize processor and load data
if data_dir.exists():
    processor = BankDataProcessor(data_dir)
    
    # Load CSV files
    dataframes = processor.load_csv_files()
    
    if dataframes:
        # Combine all data
        combined_data = processor.combine_dataframes(dataframes)
        
        # Clean the data
        clean_data = processor.clean_data(combined_data)
        
        print(f"Successfully loaded and cleaned data:")
        print(f"  - Total transactions: {len(clean_data)}")
        print(f"  - Columns: {list(clean_data.columns)}")
        print(f"  - Date range: {clean_data['date'].min()} to {clean_data['date'].max()}")
    else:
        print("No data loaded")
        clean_data = pd.DataFrame()
else:
    print("Skipping data loading - data directory not found")
    clean_data = pd.DataFrame()

## 3. Data Exploration

In [None]:
# Display first few rows
if not clean_data.empty:
    print("First 5 transactions:")
    display(clean_data.head())
    
    print("\nData info:")
    print(clean_data.info())
    
    print("\nBasic statistics:")
    display(clean_data.describe())
else:
    print("No data to explore")

In [None]:
# Check for missing values
if not clean_data.empty:
    print("Missing values:")
    missing = clean_data.isnull().sum()
    print(missing[missing > 0])
    
    if missing.sum() == 0:
        print("✅ No missing values found")
else:
    print("No data to check")

## 4. Transaction Analysis

In [None]:
# Initialize analyzer
if not clean_data.empty:
    analyzer = TransactionAnalyzer(clean_data)
    
    # Get summary statistics
    summary_stats = analyzer.get_summary_stats()
    
    print("📊 Summary Statistics:")
    print(f"  Total Transactions: {summary_stats.get('total_transactions', 'N/A')}")
    print(f"  Total Income: €{summary_stats.get('total_income', 0):.2f}")
    print(f"  Total Expenses: €{abs(summary_stats.get('total_expenses', 0)):.2f}")
    print(f"  Net Amount: €{summary_stats.get('net_amount', 0):.2f}")
    print(f"  Average Transaction: €{summary_stats.get('avg_transaction', 0):.2f}")
    print(f"  Analysis Period: {summary_stats.get('analysis_period_days', 'N/A')} days")
else:
    print("No data for analysis")

In [None]:
# Monthly summary
if not clean_data.empty:
    monthly_summary = analyzer.monthly_summary()
    
    print("📅 Monthly Summary:")
    display(monthly_summary)
else:
    print("No data for monthly summary")

In [None]:
# Transaction categorization
if not clean_data.empty:
    category_summary = analyzer.categorize_transactions()
    
    print("🏷️ Transaction Categories:")
    display(category_summary)
else:
    print("No data for categorization")

## 5. Visualizations

In [None]:
# Create visualizer
if not clean_data.empty:
    visualizer = DataVisualizer(clean_data)
    
    # Monthly spending chart
    print("Creating monthly spending chart...")
    visualizer.create_monthly_spending_chart()
    
    # Display the chart in notebook
    fig, ax = plt.subplots(figsize=(12, 6))
    
    # Prepare monthly data for inline display
    monthly_data = clean_data.copy()
    monthly_data['year_month'] = monthly_data['date'].dt.to_period('M')
    
    # Income and expenses
    income_data = monthly_data[monthly_data['amount'] > 0].groupby('year_month')['amount'].sum()
    expense_data = monthly_data[monthly_data['amount'] < 0].groupby('year_month')['amount'].sum().abs()
    
    # Plot
    x_labels = [str(period) for period in income_data.index.union(expense_data.index)]
    income_values = [income_data.get(period, 0) for period in income_data.index.union(expense_data.index)]
    expense_values = [expense_data.get(period, 0) for period in income_data.index.union(expense_data.index)]
    
    ax.bar(x_labels, income_values, alpha=0.7, label='Income', color='green')
    ax.bar(x_labels, [-x for x in expense_values], alpha=0.7, label='Expenses', color='red')
    
    ax.set_title('Monthly Income vs Expenses')
    ax.set_ylabel('Amount (€)')
    ax.legend()
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
else:
    print("No data for visualizations")

In [None]:
# Category pie chart
if not clean_data.empty and 'category_summary' in locals():
    print("Creating category pie chart...")
    
    # Filter significant categories (>1% of total)
    total_amount = category_summary['total_abs_amount'].sum()
    significant_categories = category_summary[category_summary['total_abs_amount'] >= total_amount * 0.01]
    
    if not significant_categories.empty:
        fig, ax = plt.subplots(figsize=(10, 8))
        
        wedges, texts, autotexts = ax.pie(
            significant_categories['total_abs_amount'], 
            labels=significant_categories['category'],
            autopct='%1.1f%%',
            startangle=90
        )
        
        ax.set_title('Spending by Category')
        plt.show()
    else:
        print("No significant categories to display")
else:
    print("No category data for pie chart")

## 6. Advanced Analysis

In [None]:
# Find unusual transactions
if not clean_data.empty:
    unusual_transactions = analyzer.find_unusual_transactions(threshold_multiplier=2.5)
    
    print(f"🔍 Found {len(unusual_transactions)} unusual transactions:")
    if not unusual_transactions.empty:
        display(unusual_transactions[['date', 'amount', 'description']].head(10))
    else:
        print("No unusual transactions found")
else:
    print("No data for unusual transaction analysis")

In [None]:
# Spending trends
if not clean_data.empty:
    spending_trends = analyzer.spending_trends()
    
    print("📈 Spending Trends (last 10 weeks):")
    if not spending_trends.empty:
        display(spending_trends.tail(10))
        
        # Plot trends
        fig, ax = plt.subplots(figsize=(12, 6))
        ax.plot(spending_trends.index, spending_trends['weekly_total'], 
               marker='o', alpha=0.6, label='Weekly Total')
        ax.plot(spending_trends.index, spending_trends['rolling_4w_avg'], 
               linewidth=2, label='4-Week Average')
        
        ax.set_title('Weekly Spending Trends')
        ax.set_ylabel('Amount (€)')
        ax.legend()
        ax.grid(True, alpha=0.3)
        plt.show()
    else:
        print("No trend data available")
else:
    print("No data for spending trends")

## 7. Export Results

In [None]:
# Export analysis results
if not clean_data.empty:
    output_dir = Path("../output")
    output_dir.mkdir(exist_ok=True)
    
    # Save summary data
    if 'monthly_summary' in locals():
        monthly_summary.to_csv(output_dir / "monthly_summary.csv", index=False)
        print(f"✅ Monthly summary saved to {output_dir / 'monthly_summary.csv'}")
    
    if 'category_summary' in locals():
        category_summary.to_csv(output_dir / "category_summary.csv", index=False)
        print(f"✅ Category summary saved to {output_dir / 'category_summary.csv'}")
    
    if 'spending_trends' in locals():
        spending_trends.to_csv(output_dir / "spending_trends.csv", index=False)
        print(f"✅ Spending trends saved to {output_dir / 'spending_trends.csv'}")
    
    # Save processed data
    clean_data.to_csv(output_dir / "processed_transactions.csv", index=False)
    print(f"✅ Processed data saved to {output_dir / 'processed_transactions.csv'}")
    
    print(f"\n📁 All results saved to: {output_dir.absolute()}")
else:
    print("No data to export")

## 8. Summary and Next Steps

This notebook has provided a comprehensive analysis of your bank transaction data including:

1. **Data Loading**: Imported and cleaned transaction data from CSV files
2. **Analysis**: Generated monthly summaries, categorized transactions, and identified trends
3. **Visualizations**: Created charts to visualize spending patterns
4. **Insights**: Found unusual transactions and spending trends
5. **Export**: Saved all results for further use

### Next Steps:
- Review the generated charts and summaries
- Adjust category rules in `config/settings.py` to better match your data
- Run the main analysis script: `python src/main.py`
- Explore additional analysis options in the source code

### Files Generated:
- `output/monthly_summary.csv` - Monthly transaction summaries
- `output/category_summary.csv` - Spending by category
- `output/spending_trends.csv` - Weekly spending trends
- `output/processed_transactions.csv` - Clean transaction data
- Various chart files in PNG format