# Personal Finance Tracker üí∞

## Introduction

Welcome to this Personal Finance Tracker project! This notebook will teach you how to:
- Store and manage expense data using **pandas DataFrames**
- Perform data analysis and grouping operations
- Create professional visualizations
- Save and load data to/from CSV files

### What You'll Learn:
1. **Data Structures**: How to work with pandas DataFrames
2. **Data Manipulation**: Adding, filtering, and grouping data
3. **Data Analysis**: Computing statistics and aggregations
4. **Data Visualization**: Creating charts to understand spending patterns
5. **File I/O**: Saving and loading data for persistence

### Prerequisites:
- Basic Python knowledge
- Understanding of lists and dictionaries
- Curiosity to learn!

Let's get started! üöÄ

## 1. Setup and Imports

First, we need to import the libraries we'll use:

- **pandas**: For data manipulation and analysis (think Excel but in Python)
- **matplotlib**: For creating visualizations
- **seaborn**: Makes matplotlib charts prettier and easier
- **datetime**: For working with dates and times
- **os**: For checking if files exist

**Tip**: If you get an import error, install the library using: `!pip install library-name`

In [None]:
# Import required libraries
import pandas as pd  # Data manipulation library
import matplotlib.pyplot as plt  # Plotting library
import seaborn as sns  # Statistical visualization library
from datetime import datetime, timedelta  # For working with dates
import os  # For file operations

# Set the visual style for our plots
# 'whitegrid' gives us a clean look with grid lines
sns.set_style("whitegrid")

# Make plots appear in the notebook
%matplotlib inline

# Set default figure size for all plots (width, height in inches)
plt.rcParams["figure.figsize"] = (10, 6)

print("‚úÖ All libraries imported successfully!")
print(f"üìÖ Today's date: {datetime.now().strftime('%Y-%m-%d')}")

## 2. Creating Sample Data

To start learning immediately, let's create some sample expense data. In real life, you'd enter this manually or import from your bank.

### Understanding the Data Structure:
Each expense has 4 attributes:
1. **Date**: When the expense occurred (format: YYYY-MM-DD)
2. **Category**: Type of expense (e.g., Food, Transport, Entertainment)
3. **Amount**: How much you spent (in dollars)
4. **Description**: Brief note about the expense

**Why pandas DataFrame?**
- Like an Excel spreadsheet in Python
- Easy to filter, sort, and analyze
- Powerful built-in functions for data analysis

In [None]:
# Create sample expense data for learning
# This is a list of dictionaries - each dictionary is one expense
sample_expenses = [
    # January expenses
    {
        "date": "2025-01-05",
        "category": "Food",
        "amount": 45.50,
        "description": "Grocery shopping at Whole Foods",
    },
    {
        "date": "2025-01-07",
        "category": "Transport",
        "amount": 25.00,
        "description": "Uber to downtown",
    },
    {
        "date": "2025-01-10",
        "category": "Entertainment",
        "amount": 15.99,
        "description": "Netflix subscription",
    },
    {
        "date": "2025-01-12",
        "category": "Food",
        "amount": 32.75,
        "description": "Dinner at Italian restaurant",
    },
    {
        "date": "2025-01-15",
        "category": "Shopping",
        "amount": 89.99,
        "description": "New running shoes",
    },
    {"date": "2025-01-18", "category": "Food", "amount": 52.30, "description": "Weekly groceries"},
    {
        "date": "2025-01-20",
        "category": "Utilities",
        "amount": 120.00,
        "description": "Electric bill",
    },
    {
        "date": "2025-01-22",
        "category": "Entertainment",
        "amount": 45.00,
        "description": "Concert tickets",
    },
    # February expenses
    {"date": "2025-02-02", "category": "Food", "amount": 48.20, "description": "Grocery shopping"},
    {"date": "2025-02-05", "category": "Transport", "amount": 30.00, "description": "Gas for car"},
    {"date": "2025-02-08", "category": "Food", "amount": 28.50, "description": "Lunch at cafe"},
    {
        "date": "2025-02-10",
        "category": "Shopping",
        "amount": 125.00,
        "description": "Winter jacket",
    },
    {
        "date": "2025-02-14",
        "category": "Entertainment",
        "amount": 75.00,
        "description": "Valentine's Day dinner",
    },
    {"date": "2025-02-18", "category": "Food", "amount": 55.80, "description": "Weekly groceries"},
    {
        "date": "2025-02-20",
        "category": "Utilities",
        "amount": 115.00,
        "description": "Electric bill",
    },
    # March expenses
    {"date": "2025-03-03", "category": "Food", "amount": 42.90, "description": "Grocery shopping"},
    {
        "date": "2025-03-06",
        "category": "Transport",
        "amount": 35.00,
        "description": "Gas and car wash",
    },
    {
        "date": "2025-03-09",
        "category": "Entertainment",
        "amount": 12.99,
        "description": "Spotify subscription",
    },
    {"date": "2025-03-12", "category": "Food", "amount": 38.75, "description": "Sushi dinner"},
]

# Convert the list of dictionaries into a pandas DataFrame
# Think of this as creating an Excel table from your data
df = pd.DataFrame(sample_expenses)

# Convert the 'date' column from text strings to actual date objects
# This allows us to do date-based operations later
df["date"] = pd.to_datetime(df["date"])

# Sort by date (oldest first) - good practice for time-series data
df = df.sort_values("date").reset_index(drop=True)

print(f"‚úÖ Created sample data with {len(df)} expenses")
print(f"üíµ Total spending: ${df['amount'].sum():.2f}")
print(f"üìä Categories: {', '.join(df['category'].unique())}")

### üí° Common Mistake to Avoid:

**Problem**: Forgetting to convert date strings to datetime objects

**Why it matters**: If dates are stored as text, you can't do date arithmetic or extract month/year

**Solution**: Always use `pd.to_datetime()` when loading date data

## 3. Viewing Your Expenses

Now let's look at our data! We'll use several pandas methods:

- **`.head(n)`**: Shows first n rows (default 5)
- **`.tail(n)`**: Shows last n rows
- **`.info()`**: Gives overview of data types and missing values
- **`.describe()`**: Statistical summary of numerical columns

In [None]:
# Display the first 10 expenses
print("üìã First 10 Expenses:")
print("=" * 80)
df.head(10)

In [None]:
# Get information about our DataFrame
print("üìä DataFrame Information:")
print("=" * 80)
df.info()

print("\n" + "=" * 80)
print("\nüí° Understanding the output:")
print("- RangeIndex: Number of rows (0 to n-1)")
print("- Data columns: Number and names of columns")
print("- Non-Null Count: How many values exist (no missing data = good!)")
print("- Dtype: Data type (object=text, float64=decimal, datetime64=date)")

In [None]:
# Statistical summary of numerical columns
print("üìà Statistical Summary:")
print("=" * 80)
df.describe()

# Note: describe() automatically shows statistics for numerical columns
# count: number of entries
# mean: average
# std: standard deviation (how spread out the data is)
# min/max: smallest and largest values
# 25%, 50%, 75%: quartiles (50% is the median)

## 4. Adding New Expenses

Let's create a function to add new expenses. This teaches you:
- How to create reusable functions
- How to add rows to a DataFrame
- Input validation (checking if data is correct)

**Function Parameters**:
- `dataframe`: The DataFrame to add to
- `date`: Date of expense (string or datetime object)
- `category`: Category name (string)
- `amount`: Amount spent (number)
- `description`: What you bought (string)

In [None]:
def add_expense(dataframe, date, category, amount, description):
    """
    Add a new expense to the DataFrame.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        The expenses DataFrame to add to
    date : str or datetime
        Date of the expense (e.g., '2025-03-15')
    category : str
        Category of expense (e.g., 'Food', 'Transport')
    amount : float
        Amount spent in dollars
    description : str
        Brief description of the expense

    Returns:
    --------
    pd.DataFrame
        Updated DataFrame with new expense
    """

    # Input validation: Check if amount is positive
    if amount <= 0:
        print("‚ùå Error: Amount must be positive!")
        return dataframe

    # Create a new expense as a dictionary
    new_expense = {
        "date": pd.to_datetime(date),  # Convert to datetime
        "category": category,
        "amount": float(amount),  # Ensure it's a number
        "description": description,
    }

    # Add the new expense to the DataFrame
    # pd.concat() combines DataFrames (like appending)
    # ignore_index=True renumbers the rows
    updated_df = pd.concat([dataframe, pd.DataFrame([new_expense])], ignore_index=True)

    # Sort by date to keep chronological order
    updated_df = updated_df.sort_values("date").reset_index(drop=True)

    print(f"‚úÖ Added: ${amount:.2f} for {category} on {date}")
    return updated_df


print("‚úÖ Function 'add_expense()' created!")
print("\nüí° Usage example:")
print("   df = add_expense(df, '2025-03-15', 'Food', 25.50, 'Lunch')")

In [None]:
# Let's test our function by adding a new expense!
print("üß™ Testing add_expense() function:\n")

# Add a new expense
df = add_expense(df, "2025-03-15", "Food", 22.50, "Pizza delivery")

# Add another one
df = add_expense(df, "2025-03-18", "Entertainment", 18.99, "Movie tickets")

# Try adding an invalid expense (negative amount)
print("\nüß™ Testing with invalid data:")
df = add_expense(df, "2025-03-20", "Food", -10.00, "This should fail")

print(f"\nüìä Total expenses now: {len(df)}")

## 5. Saving and Loading Data

To make our tracker useful, we need to save expenses to a file and load them later.

**CSV (Comma-Separated Values)**:
- Simple text format that's human-readable
- Works with Excel, Google Sheets, and pandas
- Perfect for tabular data like our expenses

**Why use functions?**
- Reusable code (write once, use many times)
- Easier to debug
- More organized and professional

In [None]:
def save_expenses(dataframe, filename="expenses.csv"):
    """
    Save expenses DataFrame to a CSV file.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        The expenses DataFrame to save
    filename : str
        Name of the CSV file (default: 'expenses.csv')
    """
    try:
        # Save to CSV
        # index=False means don't save row numbers
        dataframe.to_csv(filename, index=False)
        print(f"‚úÖ Expenses saved to '{filename}'")
        print(f"üìä Saved {len(dataframe)} expenses totaling ${dataframe['amount'].sum():.2f}")
    except Exception as e:
        print(f"‚ùå Error saving file: {e}")


def load_expenses(filename="expenses.csv"):
    """
    Load expenses from a CSV file.

    Parameters:
    -----------
    filename : str
        Name of the CSV file to load (default: 'expenses.csv')

    Returns:
    --------
    pd.DataFrame
        DataFrame containing expenses, or empty DataFrame if file doesn't exist
    """
    # Check if file exists
    if not os.path.exists(filename):
        print(f"‚ö†Ô∏è  File '{filename}' not found. Starting with empty expenses.")
        # Return empty DataFrame with correct columns
        return pd.DataFrame(columns=["date", "category", "amount", "description"])

    try:
        # Load CSV file
        # parse_dates tells pandas which columns are dates
        df = pd.read_csv(filename, parse_dates=["date"])
        print(f"‚úÖ Loaded {len(df)} expenses from '{filename}'")
        print(f"üíµ Total: ${df['amount'].sum():.2f}")
        return df
    except Exception as e:
        print(f"‚ùå Error loading file: {e}")
        return pd.DataFrame(columns=["date", "category", "amount", "description"])


print("‚úÖ Functions 'save_expenses()' and 'load_expenses()' created!")

In [None]:
# Let's test saving and loading
print("üß™ Testing save and load functions:\n")

# Save current expenses
save_expenses(df)

print("\n" + "=" * 80 + "\n")

# Load them back
df_loaded = load_expenses()

# Verify they match
print("\nüîç Verification:")
print(f"Original DataFrame rows: {len(df)}")
print(f"Loaded DataFrame rows: {len(df_loaded)}")
print(f"Match: {'‚úÖ Yes!' if len(df) == len(df_loaded) else '‚ùå No'}")

## 6. Data Analysis - Spending by Category

Now for the fun part - analyzing our spending! We'll learn about:

**`.groupby()`**: Groups data by a column and performs calculations
- Think: "for each category, calculate the total"
- Very powerful for aggregating data

**Aggregation functions**:
- `sum()`: Add up all values
- `mean()`: Calculate average
- `count()`: Count how many
- `min()`/`max()`: Find smallest/largest

In [None]:
def analyze_by_category(dataframe):
    """
    Analyze spending by category.

    Returns summary statistics for each category:
    - Total spent
    - Number of transactions
    - Average transaction amount
    - Percentage of total spending
    """
    print("üìä SPENDING ANALYSIS BY CATEGORY")
    print("=" * 80)

    # Group by category and calculate statistics
    # .agg() allows multiple aggregations at once
    category_stats = dataframe.groupby("category")["amount"].agg(
        [
            ("Total", "sum"),  # Total spent per category
            ("Count", "count"),  # Number of transactions
            ("Average", "mean"),  # Average transaction amount
        ]
    )

    # Calculate percentage of total spending
    total_spending = dataframe["amount"].sum()
    category_stats["Percentage"] = category_stats["Total"] / total_spending * 100

    # Sort by total spending (highest first)
    category_stats = category_stats.sort_values("Total", ascending=False)

    # Format the output nicely
    category_stats["Total"] = category_stats["Total"].apply(lambda x: f"${x:.2f}")
    category_stats["Average"] = category_stats["Average"].apply(lambda x: f"${x:.2f}")
    category_stats["Percentage"] = category_stats["Percentage"].apply(lambda x: f"{x:.1f}%")

    print(category_stats)
    print("\n" + "=" * 80)
    print(f"üí∞ TOTAL SPENDING: ${total_spending:.2f}")

    return category_stats


# Run the analysis
category_analysis = analyze_by_category(df)

### üí° Understanding GroupBy:

```python
df.groupby('category')['amount'].sum()
```

**Step by step**:
1. `groupby('category')` - Separate data into groups by category
2. `['amount']` - Focus on the amount column
3. `.sum()` - Add up amounts in each group

**Result**: Total spending for each category!

**Common Mistake**: Forgetting to specify which column to aggregate
- ‚ùå `df.groupby('category').sum()` (sums ALL columns)
- ‚úÖ `df.groupby('category')['amount'].sum()` (sums only amount)

## 7. Data Analysis - Monthly Trends

Let's analyze spending over time. This introduces:

**DateTime operations**:
- Extract month/year from dates
- Group by time periods
- Analyze trends

**Why this matters**: Track if spending increases/decreases over time

In [None]:
def analyze_monthly_spending(dataframe):
    """
    Analyze spending by month.

    Shows total spending for each month and identifies trends.
    """
    print("üìÖ MONTHLY SPENDING ANALYSIS")
    print("=" * 80)

    # Create a copy to avoid modifying original
    df_copy = dataframe.copy()

    # Extract year and month from date
    # dt.to_period('M') converts dates to month periods (e.g., '2025-01')
    df_copy["month"] = df_copy["date"].dt.to_period("M")

    # Group by month and calculate statistics
    monthly_stats = df_copy.groupby("month")["amount"].agg(
        [("Total", "sum"), ("Transactions", "count"), ("Average", "mean")]
    )

    # Convert Period to string for better display
    monthly_stats.index = monthly_stats.index.astype(str)

    # Format numbers
    for col in ["Total", "Average"]:
        monthly_stats[col] = monthly_stats[col].apply(lambda x: f"${x:.2f}")

    print(monthly_stats)
    print("\n" + "=" * 80)

    # Calculate overall statistics
    total = dataframe["amount"].sum()
    num_months = len(monthly_stats)
    avg_monthly = total / num_months if num_months > 0 else 0

    print(f"üí∞ Total Spending: ${total:.2f}")
    print(f"üìä Average Monthly Spending: ${avg_monthly:.2f}")
    print(f"üóìÔ∏è  Number of Months: {num_months}")

    return monthly_stats


# Run the analysis
monthly_analysis = analyze_monthly_spending(df)

### üí° Working with Dates in Pandas:

Once you have a datetime column, you can extract parts using `.dt`:

```python
df['date'].dt.year       # Extract year (2025)
df['date'].dt.month      # Extract month (1-12)
df['date'].dt.day        # Extract day (1-31)
df['date'].dt.dayofweek  # Day of week (0=Monday, 6=Sunday)
df['date'].dt.to_period('M')  # Convert to month period
```

This is incredibly useful for time-based analysis!

## 8. Data Visualizations üìä

Visualizations help us understand data at a glance. We'll create:

1. **Pie Chart**: Shows proportion of spending by category
2. **Line Chart**: Shows spending trends over time
3. **Bar Chart**: Compares monthly spending

**Why visualize?**
- Humans process visuals faster than numbers
- Patterns and trends become obvious
- Great for presentations and reports

### 8.1 Pie Chart - Spending by Category

**When to use**: Show parts of a whole (percentages)

**Best for**: Comparing proportions of 5-7 categories

In [None]:
# Calculate total spending per category
category_totals = df.groupby("category")["amount"].sum().sort_values(ascending=False)

# Create figure and axis
# figsize sets the size (width, height) in inches
plt.figure(figsize=(10, 8))

# Create pie chart
# autopct shows percentages on the chart
# startangle rotates the chart (90 means start at top)
# colors uses a nice color palette from seaborn
plt.pie(
    category_totals,
    labels=category_totals.index,
    autopct="%1.1f%%",  # Format: show 1 decimal place
    startangle=90,
    colors=sns.color_palette("pastel"),
)

# Add title
plt.title("Spending by Category", fontsize=16, fontweight="bold", pad=20)

# Equal aspect ratio ensures pie is circular
plt.axis("equal")

# Display the plot
plt.tight_layout()  # Adjust spacing
plt.show()

print("\nüí° Interpretation: Larger slices = more spending in that category")

### 8.2 Line Chart - Spending Over Time

**When to use**: Show trends and changes over time

**Best for**: Identifying patterns, seasonality, and trends

We'll show daily spending and a 7-day moving average to smooth out fluctuations.

In [None]:
# Group by date and sum expenses (in case multiple expenses per day)
daily_spending = df.groupby("date")["amount"].sum().reset_index()

# Create the plot
plt.figure(figsize=(12, 6))

# Plot daily spending as points connected by lines
plt.plot(
    daily_spending["date"],
    daily_spending["amount"],
    marker="o",  # Add circular markers at each point
    linestyle="-",  # Solid line
    linewidth=2,
    markersize=6,
    color="steelblue",
    label="Daily Spending",
)

# Add a horizontal line showing the average
avg_spending = daily_spending["amount"].mean()
plt.axhline(
    y=avg_spending,
    color="red",
    linestyle="--",  # Dashed line
    linewidth=2,
    label=f"Average: ${avg_spending:.2f}",
)

# Formatting
plt.title("Daily Spending Over Time", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Date", fontsize=12)
plt.ylabel("Amount ($)", fontsize=12)
plt.legend(loc="upper right", fontsize=10)  # Show legend
plt.grid(True, alpha=0.3)  # Add subtle grid lines
plt.xticks(rotation=45)  # Rotate x-axis labels for readability

# Adjust layout to prevent label cutoff
plt.tight_layout()
plt.show()

print("\nüí° Interpretation:")
print("   - Points above red line = above-average spending days")
print("   - Look for patterns: Do you spend more on weekends?")

### 8.3 Bar Chart - Monthly Spending Comparison

**When to use**: Compare values across categories or time periods

**Best for**: Side-by-side comparisons

In [None]:
# Prepare data: group by month
df_copy = df.copy()
df_copy["month"] = df_copy["date"].dt.to_period("M").astype(str)
monthly_spending = df_copy.groupby("month")["amount"].sum()

# Create bar chart
plt.figure(figsize=(10, 6))

# Create bars
# Use seaborn's color palette for nice colors
bars = plt.bar(
    monthly_spending.index,
    monthly_spending.values,
    color=sns.color_palette("viridis", len(monthly_spending)),
    edgecolor="black",  # Black border around bars
    linewidth=1.5,
)

# Add value labels on top of each bar
for bar in bars:
    height = bar.get_height()
    plt.text(
        bar.get_x() + bar.get_width() / 2.0,
        height,
        f"${height:.2f}",
        ha="center",
        va="bottom",
        fontsize=10,
        fontweight="bold",
    )

# Formatting
plt.title("Monthly Spending Comparison", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Month", fontsize=12)
plt.ylabel("Total Spending ($)", fontsize=12)
plt.xticks(rotation=45)
plt.grid(axis="y", alpha=0.3)  # Only horizontal grid lines

plt.tight_layout()
plt.show()

print("\nüí° Interpretation:")
print("   - Taller bars = higher spending months")
print("   - Compare heights to spot spending increases/decreases")

### 8.4 Bonus: Category Spending by Month (Stacked Bar Chart)

This advanced visualization shows how spending in each category changes month by month.

**Introduces**: Pivot tables and stacked bar charts

In [None]:
# Prepare data using pivot table
df_copy = df.copy()
df_copy["month"] = df_copy["date"].dt.to_period("M").astype(str)

# Pivot table: rows=months, columns=categories, values=sum of amounts
# This reshapes data from "long" to "wide" format
pivot_data = df_copy.pivot_table(
    values="amount",
    index="month",
    columns="category",
    aggfunc="sum",
    fill_value=0,  # Replace missing values with 0
)

# Create stacked bar chart
plt.figure(figsize=(12, 7))

# Plot stacked bars
pivot_data.plot(
    kind="bar",
    stacked=True,
    ax=plt.gca(),  # Use current axis
    colormap="Set3",  # Color scheme
    edgecolor="black",
    linewidth=0.5,
)

# Formatting
plt.title("Monthly Spending by Category (Stacked)", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Month", fontsize=12)
plt.ylabel("Total Spending ($)", fontsize=12)
plt.legend(title="Category", bbox_to_anchor=(1.05, 1), loc="upper left")  # Legend outside plot
plt.xticks(rotation=45)
plt.grid(axis="y", alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüí° Interpretation:")
print("   - Each color = one category")
print("   - Stack height = total monthly spending")
print("   - Section sizes show how much each category contributes")

### üí° Understanding Pivot Tables:

A pivot table reorganizes data:

**Before** (long format):
```
date       | category | amount
2025-01-01 | Food     | 50
2025-01-01 | Gas      | 30
2025-02-01 | Food     | 45
```

**After pivot** (wide format):
```
month    | Food | Gas
2025-01  | 50   | 30
2025-02  | 45   | 0
```

**Use cases**: Creating summary tables, preparing data for certain visualizations

## 9. Advanced Analysis - Insights and Statistics

Let's extract some interesting insights from our data!

In [None]:
def get_spending_insights(dataframe):
    """
    Generate interesting insights from spending data.
    """
    print("üîç SPENDING INSIGHTS")
    print("=" * 80)

    # Overall statistics
    total = dataframe["amount"].sum()
    count = len(dataframe)
    avg_transaction = dataframe["amount"].mean()
    median_transaction = dataframe["amount"].median()

    print(f"üí∞ Total Spending: ${total:.2f}")
    print(f"üßæ Total Transactions: {count}")
    print(f"üìä Average Transaction: ${avg_transaction:.2f}")
    print(f"üìä Median Transaction: ${median_transaction:.2f}")

    # Find most expensive purchase
    max_expense = dataframe.loc[dataframe["amount"].idxmax()]
    print(f"\nüí∏ Largest Expense:")
    print(f"   ${max_expense['amount']:.2f} - {max_expense['description']}")
    print(
        f"   Category: {max_expense['category']} | Date: {max_expense['date'].strftime('%Y-%m-%d')}"
    )

    # Find smallest purchase
    min_expense = dataframe.loc[dataframe["amount"].idxmin()]
    print(f"\nüíµ Smallest Expense:")
    print(f"   ${min_expense['amount']:.2f} - {min_expense['description']}")
    print(
        f"   Category: {min_expense['category']} | Date: {min_expense['date'].strftime('%Y-%m-%d')}"
    )

    # Category insights
    top_category = dataframe.groupby("category")["amount"].sum().idxmax()
    top_category_amount = dataframe.groupby("category")["amount"].sum().max()
    top_category_pct = (top_category_amount / total) * 100

    print(f"\nüèÜ Top Spending Category: {top_category}")
    print(f"   Total: ${top_category_amount:.2f} ({top_category_pct:.1f}% of total)")

    # Time analysis
    date_range = (dataframe["date"].max() - dataframe["date"].min()).days
    daily_avg = total / date_range if date_range > 0 else 0

    print(f"\nüìÖ Date Range: {date_range} days")
    print(f"üìä Average Daily Spending: ${daily_avg:.2f}")

    print("\n" + "=" * 80)


# Run insights analysis
get_spending_insights(df)

### üí° Finding Max/Min in DataFrames:

```python
# Find maximum value
max_value = df['amount'].max()  # Returns the number

# Find index of maximum value
max_idx = df['amount'].idxmax()  # Returns row index

# Get entire row with maximum value
max_row = df.loc[df['amount'].idxmax()]  # Returns full row
```

Same works for `.min()` and `.idxmin()`!

## 10. Summary and Next Steps üöÄ

Congratulations! You've built a complete Personal Finance Tracker and learned:

### ‚úÖ Skills Acquired:

1. **Data Structures**
   - Created and manipulated pandas DataFrames
   - Understood rows, columns, and indices

2. **Data Manipulation**
   - Added new data with `pd.concat()`
   - Sorted and filtered data
   - Worked with datetime objects

3. **Data Analysis**
   - Used `.groupby()` for aggregations
   - Created pivot tables
   - Calculated statistics (mean, median, sum)

4. **Data Visualization**
   - Created pie charts (proportions)
   - Created line charts (trends)
   - Created bar charts (comparisons)

5. **File Operations**
   - Saved data to CSV
   - Loaded data from CSV

### üéØ Next Steps to Expand This Project:

1. **Add Income Tracking**
   - Track income sources
   - Calculate net savings (income - expenses)

2. **Budget Features**
   - Set monthly budgets per category
   - Alert when approaching budget limits
   - Calculate budget vs. actual spending

3. **More Visualizations**
   - Heatmap of spending by day of week
   - Box plots to identify outliers
   - Scatter plots for correlation analysis

4. **Advanced Analysis**
   - Predict future spending using linear regression
   - Identify spending patterns and anomalies
   - Compare spending to previous periods

5. **User Interface**
   - Create a web interface with Streamlit
   - Add interactive dashboards with Plotly
   - Build a mobile app

6. **Data Import**
   - Import bank statements (CSV/Excel)
   - Connect to bank APIs
   - Parse email receipts

7. **Export Features**
   - Generate PDF reports
   - Export charts as images
   - Create Excel summaries

### üìö Learning Resources:

- **Pandas**: [pandas.pydata.org](https://pandas.pydata.org/docs/)
- **Matplotlib**: [matplotlib.org](https://matplotlib.org/)
- **Seaborn**: [seaborn.pydata.org](https://seaborn.pydata.org/)
- **DataCamp**: Interactive Python courses
- **Kaggle**: Practice with real datasets

### üí™ Practice Exercises:

1. Add 10 new expenses from your own life
2. Create a new category and add expenses to it
3. Find your highest spending day
4. Calculate spending for a specific date range
5. Create a function to delete an expense
6. Add data validation (e.g., check category spelling)
7. Create a weekly spending analysis

---

**Remember**: The best way to learn is by doing! Modify this code, break things, fix them, and experiment.

Happy coding! üéâ

## üìù Quick Reference - Useful Code Snippets

Save these for quick access:

```python
# Add an expense
df = add_expense(df, '2025-03-20', 'Food', 25.50, 'Pizza')

# Save to file
save_expenses(df, 'expenses.csv')

# Load from file
df = load_expenses('expenses.csv')

# View recent expenses
df.tail(10)

# Filter by category
food_expenses = df[df['category'] == 'Food']

# Filter by date range
march_expenses = df[(df['date'] >= '2025-03-01') & (df['date'] < '2025-04-01')]

# Total for a category
df[df['category'] == 'Food']['amount'].sum()

# Count transactions by category
df['category'].value_counts()
```

## 18. Web Interface with Streamlit üåê

Transform your Jupyter notebook into an interactive web application!

**Streamlit** makes it easy to create beautiful web apps from Python scripts.

### Features of our Streamlit app:
- Dashboard with key metrics
- Interactive forms for adding transactions
- Budget management interface
- Visual analytics with charts
- Data import/export tools

### To run the Streamlit app:

1. Make sure you have all required packages:
   ```bash
   pip install -r requirements.txt
   ```

2. Run the app from terminal:
   ```bash
   streamlit run app.py
   ```

3. Your browser will open at `http://localhost:8501`

**Note**: The `app.py` file is created in the same directory as this notebook. Check it out to see how the code is structured for a web application!

In [None]:
# Predictive Analytics (install: !pip install scikit-learn)
# Uncomment to install:
# !pip install scikit-learn

try:
    from sklearn.linear_model import LinearRegression
    import numpy as np

    def forecast_spending(expenses_df, months_ahead=3):
        """
        Forecast future spending using linear regression.

        Parameters:
        -----------
        expenses_df : pd.DataFrame
            Historical expenses
        months_ahead : int
            Number of months to forecast

        Returns:
        --------
        pd.DataFrame
            Forecasted monthly spending
        """
        # Prepare data
        df_monthly = expenses_df.copy()
        df_monthly["month"] = df_monthly["date"].dt.to_period("M")
        monthly_totals = df_monthly.groupby("month")["amount"].sum().reset_index()
        monthly_totals["month_num"] = range(len(monthly_totals))

        if len(monthly_totals) < 3:
            print("‚ö†Ô∏è  Need at least 3 months of data for forecasting")
            return None

        # Train model
        X = monthly_totals[["month_num"]].values
        y = monthly_totals["amount"].values

        model = LinearRegression()
        model.fit(X, y)

        # Make predictions
        last_month_num = monthly_totals["month_num"].max()
        future_months = np.array([[last_month_num + i] for i in range(1, months_ahead + 1)])
        predictions = model.predict(future_months)

        # Get last date and create future months
        last_date = expenses_df["date"].max()
        future_dates = pd.date_range(
            start=last_date + pd.DateOffset(months=1), periods=months_ahead, freq="MS"
        )

        forecast_df = pd.DataFrame(
            {"Month": future_dates.strftime("%Y-%m"), "Predicted Spending": predictions}
        )

        print("üìà SPENDING FORECAST")
        print("=" * 60)
        print(forecast_df.to_string(index=False))
        print("\nüí° Based on linear regression of historical spending trends")
        print(f"   Average historical spending: ${y.mean():.2f}/month")
        print(
            f"   Trend: {'Increasing' if model.coef_[0] > 0 else 'Decreasing'} by ${abs(model.coef_[0]):.2f}/month"
        )

        return forecast_df

    def detect_anomalies(expenses_df, std_threshold=2):
        """
        Detect unusual expenses using statistical methods.

        Parameters:
        -----------
        expenses_df : pd.DataFrame
            Expenses to analyze
        std_threshold : float
            Number of standard deviations for anomaly threshold

        Returns:
        --------
        pd.DataFrame
            Anomalous transactions
        """
        mean_amount = expenses_df["amount"].mean()
        std_amount = expenses_df["amount"].std()
        threshold = mean_amount + (std_threshold * std_amount)

        anomalies = expenses_df[expenses_df["amount"] > threshold].copy()

        if len(anomalies) == 0:
            print("‚úÖ No anomalies detected! All expenses within normal range.")
            return anomalies

        print(f"‚ö†Ô∏è  ANOMALY DETECTION")
        print("=" * 80)
        print(f"Found {len(anomalies)} unusual expense(s)")
        print(f"Threshold: ${threshold:.2f} (mean + {std_threshold} std dev)")
        print(f"\nAnomalous transactions:")

        for idx, row in anomalies.iterrows():
            print(f"\n  ‚Ä¢ ${row['amount']:.2f} - {row['description']}")
            print(f"    Category: {row['category']} | Date: {row['date'].strftime('%Y-%m-%d')}")
            print(
                f"    This is {((row['amount'] - mean_amount) / std_amount):.1f}x std deviations above average"
            )

        return anomalies

    def recommend_budget(expenses_df, savings_target_pct=20):
        """
        Recommend monthly budget based on historical spending and savings goals.

        Parameters:
        -----------
        expenses_df : pd.DataFrame
            Historical expenses
        savings_target_pct : float
            Target savings rate (default: 20%)

        Returns:
        --------
        dict
            Budget recommendations by category
        """
        # Calculate average monthly spending by category
        df_monthly = expenses_df.copy()
        df_monthly["month"] = df_monthly["date"].dt.to_period("M")

        num_months = df_monthly["month"].nunique()
        category_totals = expenses_df.groupby("category")["amount"].sum()
        category_averages = category_totals / num_months

        total_avg_monthly = category_averages.sum()

        print("üí° BUDGET RECOMMENDATIONS")
        print("=" * 80)
        print(f"Based on {num_months} months of historical data")
        print(f"Average monthly spending: ${total_avg_monthly:.2f}\n")

        print("Recommended monthly budgets by category:")
        print("-" * 80)

        recommendations = {}
        for category, avg_amount in category_averages.sort_values(ascending=False).items():
            # Add 10% buffer for flexibility
            recommended = avg_amount * 1.1
            percentage = avg_amount / total_avg_monthly * 100

            recommendations[category] = recommended
            print(f"  {category:20} ${recommended:8.2f}  ({percentage:5.1f}% of spending)")

        print("-" * 80)
        print(f"  {'TOTAL':20} ${sum(recommendations.values()):8.2f}")

        print(f"\nüìä To achieve {savings_target_pct}% savings rate:")
        print(
            f"   Keep total spending under ${total_avg_monthly * (1 - savings_target_pct/100):.2f}/month"
        )

        return recommendations

    print("‚úÖ Predictive analytics functions created!")
    print("\nüí° Usage:")
    print("   forecast = forecast_spending(df, months_ahead=6)")
    print("   anomalies = detect_anomalies(df)")
    print("   budget_rec = recommend_budget(df, savings_target_pct=20)")

except ImportError:
    print("‚ö†Ô∏è  scikit-learn not installed. Install with: !pip install scikit-learn")
    print("   Then restart the kernel and run this cell again.")

## 17. Predictive Analytics & Machine Learning ü§ñ

Use machine learning to forecast future spending and get personalized recommendations!

**Features:**
- Spending forecast (next 3-6 months)
- Anomaly detection (unusual expenses)
- Budget recommendations based on historical data
- Savings goal timeline predictions

**ML Concepts:**
- Linear Regression for trend forecasting
- Statistical analysis for anomaly detection
- Moving averages for smoothing

Note: Install with `!pip install scikit-learn`

In [None]:
# Plotly visualizations (install with: !pip install plotly)
# Uncomment the line below to install plotly if needed:
# !pip install plotly

try:
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots

    def create_waterfall_chart(income_df, expenses_df):
        """Create waterfall chart showing income/expense flow."""
        total_income = income_df["amount"].sum()
        total_expenses = expenses_df["amount"].sum()
        net_savings = total_income - total_expenses

        # Create waterfall data
        fig = go.Figure(
            go.Waterfall(
                orientation="v",
                measure=["relative", "relative", "total"],
                x=["Total Income", "Total Expenses", "Net Savings"],
                textposition="outside",
                text=[f"${total_income:,.2f}", f"-${total_expenses:,.2f}", f"${net_savings:,.2f}"],
                y=[total_income, -total_expenses, net_savings],
                connector={"line": {"color": "rgb(63, 63, 63)"}},
            )
        )

        fig.update_layout(title="üí∞ Cash Flow Analysis", showlegend=False, height=500)

        fig.show()
        print("üí° Tip: Hover over bars to see exact values!")

    def create_spending_heatmap(expenses_df):
        """Create heatmap of spending by day of week and hour (if time data available)."""
        # For this example, we'll create a calendar heatmap by date
        df_heat = expenses_df.copy()
        df_heat["day_of_week"] = df_heat["date"].dt.day_name()
        df_heat["week"] = df_heat["date"].dt.isocalendar().week

        # Aggregate by day of week
        heatmap_data = (
            df_heat.groupby("day_of_week")["amount"]
            .sum()
            .reindex(["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"])
        )

        fig = px.bar(
            x=heatmap_data.index,
            y=heatmap_data.values,
            title="üìÖ Spending by Day of Week",
            labels={"x": "Day", "y": "Total Spending ($)"},
            color=heatmap_data.values,
            color_continuous_scale="Reds",
        )

        fig.update_layout(height=400)
        fig.show()

    def create_interactive_category_pie(expenses_df):
        """Interactive pie chart with plotly."""
        category_totals = expenses_df.groupby("category")["amount"].sum()

        fig = px.pie(
            values=category_totals.values,
            names=category_totals.index,
            title="üìä Spending by Category (Interactive)",
            hole=0.3,  # Creates a donut chart
        )

        fig.update_traces(textposition="inside", textinfo="percent+label")
        fig.show()
        print("üí° Tip: Click legend items to hide/show categories!")

    def create_trend_chart(expenses_df, income_df):
        """Create line chart showing income vs expenses over time."""
        # Prepare data
        expenses_monthly = expenses_df.copy()
        expenses_monthly["month"] = expenses_monthly["date"].dt.to_period("M").astype(str)
        expenses_by_month = expenses_monthly.groupby("month")["amount"].sum()

        income_monthly = income_df.copy()
        income_monthly["month"] = income_monthly["date"].dt.to_period("M").astype(str)
        income_by_month = income_monthly.groupby("month")["amount"].sum()

        # Create figure
        fig = go.Figure()

        fig.add_trace(
            go.Scatter(
                x=income_by_month.index,
                y=income_by_month.values,
                mode="lines+markers",
                name="Income",
                line=dict(color="green", width=3),
                marker=dict(size=8),
            )
        )

        fig.add_trace(
            go.Scatter(
                x=expenses_by_month.index,
                y=expenses_by_month.values,
                mode="lines+markers",
                name="Expenses",
                line=dict(color="red", width=3),
                marker=dict(size=8),
            )
        )

        fig.update_layout(
            title="üìà Income vs Expenses Trend",
            xaxis_title="Month",
            yaxis_title="Amount ($)",
            hovermode="x unified",
            height=500,
        )

        fig.show()

    print("‚úÖ Plotly visualization functions created!")
    print("\nüí° Usage:")
    print("   create_waterfall_chart(df_income, df)")
    print("   create_spending_heatmap(df)")
    print("   create_interactive_category_pie(df)")
    print("   create_trend_chart(df, df_income)")

except ImportError:
    print("‚ö†Ô∏è  Plotly not installed. Install with: !pip install plotly")
    print("   Then restart the kernel and run this cell again.")

## 16. Interactive Visualizations with Plotly üìä

Plotly creates interactive charts with hover effects, zoom, and pan capabilities!

**Features:**
- Hover to see exact values
- Click legend to show/hide series
- Zoom and pan for detailed analysis
- Export charts as images

**New Chart Types:**
- Waterfall chart (income/expense flow)
- Heatmap (spending patterns)
- Sunburst chart (hierarchical spending)

Note: Install with `!pip install plotly`

In [None]:
import json


def import_from_csv(filename, column_mapping=None, transaction_type="expense"):
    """
    Import transactions from a CSV file with flexible column mapping.

    Parameters:
    -----------
    filename : str
        Path to CSV file
    column_mapping : dict
        Maps CSV columns to our format:
        {'date_col': 'Date', 'amount_col': 'Amount', 'desc_col': 'Description', 'category_col': 'Category'}
    transaction_type : str
        'expense' or 'income'

    Returns:
    --------
    pd.DataFrame
        Imported transactions
    """
    if not os.path.exists(filename):
        print(f"‚ùå File not found: {filename}")
        return pd.DataFrame(columns=["date", "category", "amount", "description"])

    try:
        # Read CSV
        df_import = pd.read_csv(filename)

        # Default mapping
        if column_mapping is None:
            column_mapping = {
                "date_col": "date",
                "amount_col": "amount",
                "desc_col": "description",
                "category_col": "category",
            }

        # Create new DataFrame with our standard columns
        df_new = pd.DataFrame()
        df_new["date"] = pd.to_datetime(df_import[column_mapping["date_col"]])
        df_new["amount"] = df_import[column_mapping["amount_col"]].abs()  # Convert to positive
        df_new["description"] = df_import[column_mapping["desc_col"]]

        # Category: use provided or default to 'Uncategorized'
        if column_mapping["category_col"] in df_import.columns:
            df_new["category"] = df_import[column_mapping["category_col"]]
        else:
            df_new["category"] = "Uncategorized"

        print(f"‚úÖ Imported {len(df_new)} {transaction_type}s from {filename}")
        return df_new

    except Exception as e:
        print(f"‚ùå Error importing: {e}")
        return pd.DataFrame(columns=["date", "category", "amount", "description"])


def export_to_excel(expenses_df, income_df, filename="financial_report.xlsx"):
    """
    Export expenses and income to Excel with multiple sheets.

    Requires: pip install openpyxl
    """
    try:
        with pd.ExcelWriter(filename, engine="openpyxl") as writer:
            expenses_df.to_excel(writer, sheet_name="Expenses", index=False)
            income_df.to_excel(writer, sheet_name="Income", index=False)

            # Add summary sheet
            summary_data = {
                "Metric": [
                    "Total Expenses",
                    "Total Income",
                    "Net Savings",
                    "Expense Count",
                    "Income Count",
                ],
                "Value": [
                    f"${expenses_df['amount'].sum():.2f}",
                    f"${income_df['amount'].sum():.2f}",
                    f"${income_df['amount'].sum() - expenses_df['amount'].sum():.2f}",
                    len(expenses_df),
                    len(income_df),
                ],
            }
            pd.DataFrame(summary_data).to_excel(writer, sheet_name="Summary", index=False)

        print(f"‚úÖ Exported to Excel: {filename}")
        print(f"   Sheets: Expenses, Income, Summary")
    except ImportError:
        print("‚ùå Error: openpyxl not installed. Run: pip install openpyxl")
    except Exception as e:
        print(f"‚ùå Error exporting to Excel: {e}")


def export_to_json(expenses_df, income_df, budgets_dict, filename="financial_data.json"):
    """
    Export all financial data to JSON for backup/portability.
    """
    try:
        data = {
            "expenses": expenses_df.to_dict(orient="records", date_format="iso"),
            "income": income_df.to_dict(orient="records", date_format="iso"),
            "budgets": budgets_dict,
            "recurring": recurring_transactions,
            "export_date": datetime.now().isoformat(),
        }

        with open(filename, "w") as f:
            json.dump(data, f, indent=2, default=str)

        print(f"‚úÖ Exported to JSON: {filename}")
        print(
            f"   Expenses: {len(expenses_df)}, Income: {len(income_df)}, Budgets: {len(budgets_dict)}"
        )
    except Exception as e:
        print(f"‚ùå Error exporting to JSON: {e}")


def import_from_json(filename="financial_data.json"):
    """
    Import financial data from JSON backup.

    Returns:
    --------
    dict
        Dictionary containing expenses_df, income_df, budgets, recurring
    """
    if not os.path.exists(filename):
        print(f"‚ùå File not found: {filename}")
        return None

    try:
        with open(filename, "r") as f:
            data = json.load(f)

        # Convert back to DataFrames
        df_expenses = pd.DataFrame(data["expenses"])
        df_expenses["date"] = pd.to_datetime(df_expenses["date"])

        df_income = pd.DataFrame(data["income"])
        df_income["date"] = pd.to_datetime(df_income["date"])

        print(f"‚úÖ Imported from JSON: {filename}")
        print(f"   Expenses: {len(df_expenses)}, Income: {len(df_income)}")

        return {
            "expenses": df_expenses,
            "income": df_income,
            "budgets": data.get("budgets", {}),
            "recurring": data.get("recurring", []),
        }
    except Exception as e:
        print(f"‚ùå Error importing from JSON: {e}")
        return None


print("‚úÖ Import/Export functions created!")
print("\nüí° Usage:")
print("   export_to_excel(df, df_income, 'my_report.xlsx')")
print("   export_to_json(df, df_income, budgets, 'backup.json')")
print("   data = import_from_json('backup.json')")

## 15. Import/Export Features üìÅ

Import transactions from bank statements and export reports in multiple formats.

**Import**: CSV files from banks (with column mapping)
**Export**: Excel (XLSX), JSON backup, PDF reports

This makes it easy to integrate with other financial tools!

In [None]:
# Storage for recurring transactions
recurring_transactions = []


def add_recurring_transaction(
    transaction_type, category, amount, description, frequency="monthly", start_date=None
):
    """
    Add a recurring transaction template.

    Parameters:
    -----------
    transaction_type : str
        'expense' or 'income'
    category : str
        Transaction category
    amount : float
        Transaction amount
    description : str
        Transaction description
    frequency : str
        'daily', 'weekly', 'biweekly', 'monthly', 'quarterly', 'yearly'
    start_date : str or datetime
        When to start generating transactions (default: today)
    """
    if start_date is None:
        start_date = datetime.now()
    else:
        start_date = pd.to_datetime(start_date)

    recurring = {
        "type": transaction_type,
        "category": category,
        "amount": float(amount),
        "description": description,
        "frequency": frequency,
        "start_date": start_date,
        "active": True,
    }

    recurring_transactions.append(recurring)
    print(f"‚úÖ Added {frequency} recurring {transaction_type}: {description} (${amount:.2f})")


def generate_recurring_transactions(end_date=None, preview_only=True):
    """
    Generate transactions from recurring templates.

    Parameters:
    -----------
    end_date : str or datetime
        Generate transactions up to this date (default: 3 months from now)
    preview_only : bool
        If True, return generated transactions without adding them

    Returns:
    --------
    tuple
        (generated_expenses, generated_income)
    """
    if end_date is None:
        end_date = datetime.now() + timedelta(days=90)  # 3 months
    else:
        end_date = pd.to_datetime(end_date)

    generated_expenses = []
    generated_income = []

    frequency_days = {
        "daily": 1,
        "weekly": 7,
        "biweekly": 14,
        "monthly": 30,
        "quarterly": 90,
        "yearly": 365,
    }

    for recurring in recurring_transactions:
        if not recurring["active"]:
            continue

        days_increment = frequency_days.get(recurring["frequency"], 30)
        current_date = recurring["start_date"]

        while current_date <= end_date:
            transaction = {
                "date": current_date,
                "category": recurring["category"],
                "amount": recurring["amount"],
                "description": f"{recurring['description']} (recurring)",
            }

            if recurring["type"] == "expense":
                generated_expenses.append(transaction)
            else:
                generated_income.append(transaction)

            # Increment date
            if recurring["frequency"] == "monthly":
                # Handle month boundaries better
                current_date = current_date + pd.DateOffset(months=1)
            else:
                current_date = current_date + timedelta(days=days_increment)

    # Convert to DataFrames
    df_gen_expenses = (
        pd.DataFrame(generated_expenses)
        if generated_expenses
        else pd.DataFrame(columns=["date", "category", "amount", "description"])
    )
    df_gen_income = (
        pd.DataFrame(generated_income)
        if generated_income
        else pd.DataFrame(columns=["date", "category", "amount", "description"])
    )

    print(
        f"üìÖ Generated {len(df_gen_expenses)} recurring expenses and {len(df_gen_income)} recurring income entries"
    )
    print(f"   From {datetime.now().strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")

    return df_gen_expenses, df_gen_income


def list_recurring_transactions():
    """Display all recurring transactions."""
    print("üîÑ RECURRING TRANSACTIONS")
    print("=" * 80)

    if not recurring_transactions:
        print("No recurring transactions configured.")
        return

    for i, recurring in enumerate(recurring_transactions, 1):
        status = "‚úÖ Active" if recurring["active"] else "‚ùå Inactive"
        print(f"\n{i}. {status}")
        print(f"   Type: {recurring['type'].capitalize()}")
        print(f"   Category: {recurring['category']}")
        print(f"   Amount: ${recurring['amount']:.2f}")
        print(f"   Description: {recurring['description']}")
        print(f"   Frequency: {recurring['frequency'].capitalize()}")
        print(f"   Start Date: {recurring['start_date'].strftime('%Y-%m-%d')}")


# Example recurring transactions
add_recurring_transaction("expense", "Entertainment", 15.99, "Netflix subscription", "monthly")
add_recurring_transaction("expense", "Entertainment", 12.99, "Spotify subscription", "monthly")
add_recurring_transaction("income", "Salary", 4500.00, "Monthly salary", "monthly")

print("\n")
list_recurring_transactions()

## 14. Recurring Transactions üîÑ

Track subscriptions and regular bills automatically!

**Common Recurring Transactions:**
- Netflix, Spotify, gym memberships (monthly)
- Salary (monthly or biweekly)
- Electric/water bills (monthly)
- Rent/mortgage (monthly)
- Insurance premiums (monthly, quarterly, annually)

**Benefits:**
- Never forget to record regular expenses
- Generate future transactions automatically
- Better forecasting and budgeting

In [None]:
# Test transaction management

# Add IDs to our expenses DataFrame
df = add_transaction_ids(df)

print("\nüìã Expenses with IDs:")
print(df.head(10))

print("\n\nüîç Search Test: Find Food expenses over $40")
food_results = search_transactions(df, category="Food", min_amount=40)
print(food_results[["id", "date", "category", "amount", "description"]])

print("\n\nüëÅÔ∏è  View specific transaction:")
view_transaction(df, transaction_id=1)

# You can uncomment these to test edit and delete:
# print("\n\n‚úèÔ∏è  Edit Test:")
# df = edit_transaction(df, transaction_id=1, amount=50.00, description='Updated grocery shopping')
#
# print("\n\nüóëÔ∏è  Delete Test:")
# df = delete_transaction(df, transaction_id=20)

In [None]:
def add_transaction_ids(dataframe):
    """
    Add unique IDs to transactions if not already present.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        Transaction DataFrame (expenses or income)

    Returns:
    --------
    pd.DataFrame
        DataFrame with 'id' column
    """
    if "id" not in dataframe.columns:
        dataframe.insert(0, "id", range(1, len(dataframe) + 1))
        print(f"‚úÖ Added IDs to {len(dataframe)} transactions")
    else:
        print("‚ÑπÔ∏è  IDs already exist")

    return dataframe


def search_transactions(dataframe, **kwargs):
    """
    Search for transactions based on various criteria.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        Transaction DataFrame to search
    **kwargs : keyword arguments
        category : str - Filter by category
        min_amount : float - Minimum amount
        max_amount : float - Maximum amount
        start_date : str - Start date (YYYY-MM-DD)
        end_date : str - End date (YYYY-MM-DD)
        description : str - Search in description (case-insensitive)

    Returns:
    --------
    pd.DataFrame
        Filtered DataFrame matching criteria
    """
    result = dataframe.copy()

    # Filter by category
    if "category" in kwargs:
        result = result[result["category"].str.lower() == kwargs["category"].lower()]

    # Filter by amount range
    if "min_amount" in kwargs:
        result = result[result["amount"] >= kwargs["min_amount"]]
    if "max_amount" in kwargs:
        result = result[result["amount"] <= kwargs["max_amount"]]

    # Filter by date range
    if "start_date" in kwargs:
        result = result[result["date"] >= pd.to_datetime(kwargs["start_date"])]
    if "end_date" in kwargs:
        result = result[result["date"] <= pd.to_datetime(kwargs["end_date"])]

    # Filter by description
    if "description" in kwargs:
        result = result[
            result["description"].str.contains(kwargs["description"], case=False, na=False)
        ]

    print(f"üîç Found {len(result)} matching transactions")
    return result


def edit_transaction(dataframe, transaction_id, **kwargs):
    """
    Edit an existing transaction.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        Transaction DataFrame
    transaction_id : int
        ID of transaction to edit
    **kwargs : keyword arguments
        date : str - New date
        category : str - New category
        amount : float - New amount
        description : str - New description

    Returns:
    --------
    pd.DataFrame
        Updated DataFrame
    """
    if "id" not in dataframe.columns:
        print("‚ùå Error: DataFrame doesn't have ID column. Use add_transaction_ids() first.")
        return dataframe

    # Find transaction by ID
    mask = dataframe["id"] == transaction_id

    if not mask.any():
        print(f"‚ùå Error: No transaction found with ID {transaction_id}")
        return dataframe

    # Get current values
    idx = dataframe[mask].index[0]
    old_values = dataframe.loc[idx].to_dict()

    # Update fields
    for field, value in kwargs.items():
        if field in ["date", "category", "amount", "description"]:
            if field == "date":
                value = pd.to_datetime(value)
            elif field == "amount":
                value = float(value)
                if value <= 0:
                    print("‚ùå Error: Amount must be positive!")
                    continue

            dataframe.loc[idx, field] = value

    # Resort by date
    dataframe = dataframe.sort_values("date").reset_index(drop=True)

    # Print changes
    print(f"‚úÖ Updated transaction ID {transaction_id}:")
    for field, value in kwargs.items():
        if field in old_values:
            print(f"   {field}: {old_values[field]} ‚Üí {value}")

    return dataframe


def delete_transaction(dataframe, transaction_id):
    """
    Delete a transaction by ID.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        Transaction DataFrame
    transaction_id : int
        ID of transaction to delete

    Returns:
    --------
    pd.DataFrame
        Updated DataFrame with transaction removed
    """
    if "id" not in dataframe.columns:
        print("‚ùå Error: DataFrame doesn't have ID column. Use add_transaction_ids() first.")
        return dataframe

    # Find transaction
    mask = dataframe["id"] == transaction_id

    if not mask.any():
        print(f"‚ùå Error: No transaction found with ID {transaction_id}")
        return dataframe

    # Get transaction details for confirmation message
    transaction = dataframe[mask].iloc[0]

    # Delete transaction
    dataframe = dataframe[~mask].reset_index(drop=True)

    print(f"‚úÖ Deleted transaction ID {transaction_id}:")
    print(f"   Date: {transaction['date'].strftime('%Y-%m-%d')}")
    print(f"   Category: {transaction['category']}")
    print(f"   Amount: ${transaction['amount']:.2f}")
    print(f"   Description: {transaction['description']}")

    return dataframe


def view_transaction(dataframe, transaction_id):
    """
    View details of a specific transaction.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        Transaction DataFrame
    transaction_id : int
        ID of transaction to view
    """
    if "id" not in dataframe.columns:
        print("‚ùå Error: DataFrame doesn't have ID column.")
        return

    mask = dataframe["id"] == transaction_id

    if not mask.any():
        print(f"‚ùå Error: No transaction found with ID {transaction_id}")
        return

    transaction = dataframe[mask].iloc[0]

    print(f"üìã Transaction ID: {transaction_id}")
    print("=" * 60)
    print(f"Date:        {transaction['date'].strftime('%Y-%m-%d')}")
    print(f"Category:    {transaction['category']}")
    print(f"Amount:      ${transaction['amount']:.2f}")
    print(f"Description: {transaction['description']}")
    print("=" * 60)


print("‚úÖ Transaction management functions created!")
print("\nüí° Usage:")
print("   df = add_transaction_ids(df)")
print("   results = search_transactions(df, category='Food', min_amount=20)")
print("   df = edit_transaction(df, transaction_id=5, amount=25.99)")
print("   df = delete_transaction(df, transaction_id=10)")
print("   view_transaction(df, transaction_id=1)")

## 13. Transaction Management üîß

Sometimes you need to edit or delete transactions (e.g., fixing typos, removing duplicates).

### Why Transaction IDs?
- **Unique identifier** for each transaction
- **Easy reference** when editing or deleting
- **Prevents accidental** modification of wrong records

### Operations:
1. **Add ID**: Assign unique IDs to existing transactions
2. **Search**: Find transactions by various criteria
3. **Edit**: Modify existing transactions
4. **Delete**: Remove transactions

### Production Feature:
This is crucial for real-world applications where data corrections are common.

In [None]:
# Test budget management functions

# Check budget status for January
print("Testing budget status for January 2025:\n")
budget_status = check_budget_status(df, year=2025, month=1)

print("\n\n" + "=" * 80)
print("Checking individual budget alerts:\n")

# Check alerts for each category
for category in budgets.keys():
    alert = budget_alert(df, category)
    print(alert["message"])

In [None]:
# Budget storage - dictionary mapping categories to budget amounts
budgets = {
    "Food": 350.00,
    "Transport": 100.00,
    "Entertainment": 150.00,
    "Shopping": 200.00,
    "Utilities": 250.00,
}


def set_budget(category, amount):
    """
    Set or update budget for a category.

    Parameters:
    -----------
    category : str
        Expense category
    amount : float
        Monthly budget amount
    """
    if amount < 0:
        print("‚ùå Error: Budget amount cannot be negative!")
        return

    budgets[category] = float(amount)
    print(f"‚úÖ Budget set: {category} = ${amount:.2f}/month")


def get_budget(category=None):
    """
    Get budget for specific category or all budgets.

    Parameters:
    -----------
    category : str, optional
        Specific category to get budget for

    Returns:
    --------
    float or dict
        Budget amount for category, or all budgets if category not specified
    """
    if category:
        return budgets.get(category, 0)
    return budgets.copy()


def check_budget_status(expenses_df, period="month", year=None, month=None):
    """
    Check spending against budgets for a specific period.

    Parameters:
    -----------
    expenses_df : pd.DataFrame
        Expenses DataFrame
    period : str
        'month' or 'year'
    year : int, optional
        Specific year to analyze
    month : int, optional
        Specific month (1-12) to analyze

    Returns:
    --------
    pd.DataFrame
        Budget analysis DataFrame
    """
    df_filtered = expenses_df.copy()

    # Filter by date
    if year:
        df_filtered = df_filtered[df_filtered["date"].dt.year == year]
    if month:
        df_filtered = df_filtered[df_filtered["date"].dt.month == month]

    # If period is month and no specific month given, use current month
    if period == "month" and not month:
        current_date = datetime.now()
        df_filtered = df_filtered[
            (df_filtered["date"].dt.year == current_date.year)
            & (df_filtered["date"].dt.month == current_date.month)
        ]

    # Calculate actual spending per category
    actual_spending = df_filtered.groupby("category")["amount"].sum()

    # Create comparison DataFrame
    budget_analysis = []

    for category, budget in budgets.items():
        actual = actual_spending.get(category, 0)
        remaining = budget - actual
        percentage_used = (actual / budget * 100) if budget > 0 else 0
        status = "‚úÖ" if remaining >= 0 else "‚ö†Ô∏è"

        budget_analysis.append(
            {
                "Category": category,
                "Budget": f"${budget:.2f}",
                "Spent": f"${actual:.2f}",
                "Remaining": f"${remaining:.2f}",
                "Used %": f"{percentage_used:.1f}%",
                "Status": status,
            }
        )

    # Add categories with spending but no budget
    for category in actual_spending.index:
        if category not in budgets:
            actual = actual_spending[category]
            budget_analysis.append(
                {
                    "Category": category,
                    "Budget": "Not set",
                    "Spent": f"${actual:.2f}",
                    "Remaining": "N/A",
                    "Used %": "N/A",
                    "Status": "‚ö†Ô∏è",
                }
            )

    df_analysis = pd.DataFrame(budget_analysis)

    print("üìä BUDGET STATUS")
    print("=" * 80)

    if year and month:
        print(f"üìÖ Period: {year}-{month:02d}")
    elif year:
        print(f"üìÖ Period: Year {year}")
    else:
        print(f"üìÖ Period: Current month")

    print("\n")
    print(df_analysis.to_string(index=False))
    print("\n" + "=" * 80)

    # Summary warnings
    total_budget = sum(budgets.values())
    total_spent = actual_spending.sum()

    print(f"üí∞ Total Budget: ${total_budget:.2f}")
    print(f"üí∏ Total Spent: ${total_spent:.2f}")

    if total_spent > total_budget:
        print(f"‚ö†Ô∏è  OVER BUDGET by ${total_spent - total_budget:.2f}")
    else:
        print(f"‚úÖ Under budget by ${total_budget - total_spent:.2f}")

    return df_analysis


def budget_alert(expenses_df, category, threshold=0.8):
    """
    Check if spending in a category is approaching budget limit.

    Parameters:
    -----------
    expenses_df : pd.DataFrame
        Expenses DataFrame
    category : str
        Category to check
    threshold : float
        Alert threshold (default: 0.8 = 80%)

    Returns:
    --------
    dict
        Alert information
    """
    if category not in budgets:
        return {"alert": False, "message": f"No budget set for {category}"}

    # Filter to current month
    current_date = datetime.now()
    current_month_expenses = expenses_df[
        (expenses_df["date"].dt.year == current_date.year)
        & (expenses_df["date"].dt.month == current_date.month)
        & (expenses_df["category"] == category)
    ]

    spent = current_month_expenses["amount"].sum()
    budget = budgets[category]
    percentage = spent / budget if budget > 0 else 0

    alert_info = {
        "category": category,
        "spent": spent,
        "budget": budget,
        "percentage": percentage,
        "alert": percentage >= threshold,
    }

    if percentage >= 1.0:
        alert_info["message"] = f"üö® {category}: OVER BUDGET! (${spent:.2f} / ${budget:.2f})"
    elif percentage >= threshold:
        alert_info["message"] = (
            f"‚ö†Ô∏è  {category}: {percentage*100:.0f}% of budget used (${spent:.2f} / ${budget:.2f})"
        )
    else:
        alert_info["message"] = (
            f"‚úÖ {category}: {percentage*100:.0f}% of budget used (${spent:.2f} / ${budget:.2f})"
        )

    return alert_info


print("‚úÖ Budget management functions created!")
print(f"\nüìä Current budgets: {len(budgets)} categories")
print("\nüí° Usage:")
print("   set_budget('Food', 400.00)")
print("   check_budget_status(df)")
print("   alert = budget_alert(df, 'Food')")

## 12. Budget Management üìä

Budgets help you control spending by setting limits for each category.

### Why Budget?
- **Prevent overspending** in specific categories
- **Allocate money** intentionally
- **Reach financial goals** faster
- **Reduce financial stress** by having a plan

### Budget Types:
1. **Category Budget**: Set limits per expense category (e.g., $500/month for Food)
2. **Overall Budget**: Set a total spending limit
3. **Percentage-based**: Allocate percentage of income to each category

### Best Practices:
- **50/30/20 Rule**: 50% needs, 30% wants, 20% savings
- Review and adjust budgets monthly
- Be realistic - too strict budgets fail
- Track actual vs. budget regularly

Let's create a budget tracking system!

In [None]:
def calculate_net_savings(income_df, expenses_df, start_date=None, end_date=None):
    """
    Calculate net savings (income - expenses) for a given period.

    Parameters:
    -----------
    income_df : pd.DataFrame
        Income DataFrame
    expenses_df : pd.DataFrame
        Expenses DataFrame
    start_date : str or datetime, optional
        Start date for analysis period
    end_date : str or datetime, optional
        End date for analysis period

    Returns:
    --------
    dict
        Dictionary containing financial summary
    """
    # Filter by date range if specified
    if start_date:
        income_df = income_df[income_df["date"] >= pd.to_datetime(start_date)]
        expenses_df = expenses_df[expenses_df["date"] >= pd.to_datetime(start_date)]
    if end_date:
        income_df = income_df[income_df["date"] <= pd.to_datetime(end_date)]
        expenses_df = expenses_df[expenses_df["date"] <= pd.to_datetime(end_date)]

    # Calculate totals
    total_income = income_df["amount"].sum() if len(income_df) > 0 else 0
    total_expenses = expenses_df["amount"].sum() if len(expenses_df) > 0 else 0
    net_savings = total_income - total_expenses
    savings_rate = (net_savings / total_income * 100) if total_income > 0 else 0

    # Print report
    print("üí∞ FINANCIAL SUMMARY")
    print("=" * 80)

    if start_date and end_date:
        print(f"üìÖ Period: {start_date} to {end_date}")
    elif start_date:
        print(f"üìÖ Period: From {start_date}")
    elif end_date:
        print(f"üìÖ Period: Until {end_date}")
    else:
        print("üìÖ Period: All time")

    print("\nüíµ Income:")
    print(f"   Total Income: ${total_income:,.2f}")
    print(f"   Income Entries: {len(income_df)}")

    print("\nüí∏ Expenses:")
    print(f"   Total Expenses: ${total_expenses:,.2f}")
    print(f"   Expense Entries: {len(expenses_df)}")

    print("\n" + "=" * 80)

    if net_savings >= 0:
        print(f"‚úÖ Net Savings: ${net_savings:,.2f}")
        print(f"üìä Savings Rate: {savings_rate:.1f}%")

        if savings_rate >= 20:
            print("üéâ Excellent! You're saving over 20% of your income!")
        elif savings_rate >= 10:
            print("üëç Good job! Consider increasing savings to 20% or more.")
        else:
            print("‚ö†Ô∏è  Savings rate is low. Try to increase to at least 10-20%.")
    else:
        print(f"‚ö†Ô∏è  Net Deficit: ${abs(net_savings):,.2f}")
        print("   You're spending more than you're earning!")
        print("   Consider:")
        print("   - Reducing discretionary expenses")
        print("   - Finding additional income sources")
        print("   - Creating a budget")

    print("=" * 80)

    return {
        "total_income": total_income,
        "total_expenses": total_expenses,
        "net_savings": net_savings,
        "savings_rate": savings_rate,
        "income_count": len(income_df),
        "expense_count": len(expenses_df),
    }


# Test the function with our data
financial_summary = calculate_net_savings(df_income, df)

# Calculate monthly net savings
print("\n\nüìä MONTHLY BREAKDOWN")
print("=" * 80)

# Prepare data
df_all_income = df_income.copy()
df_all_income["month"] = df_all_income["date"].dt.to_period("M")
df_all_expenses = df.copy()
df_all_expenses["month"] = df_all_expenses["date"].dt.to_period("M")

# Group by month
monthly_income = df_all_income.groupby("month")["amount"].sum()
monthly_expenses = df_all_expenses.groupby("month")["amount"].sum()

# Combine into summary
monthly_summary = pd.DataFrame({"Income": monthly_income, "Expenses": monthly_expenses})

# Calculate net savings per month
monthly_summary["Net Savings"] = monthly_summary["Income"] - monthly_summary["Expenses"]
monthly_summary["Savings Rate (%)"] = (
    monthly_summary["Net Savings"] / monthly_summary["Income"] * 100
).round(1)

# Format for display
monthly_summary["Income"] = monthly_summary["Income"].apply(lambda x: f"${x:,.2f}")
monthly_summary["Expenses"] = monthly_summary["Expenses"].apply(lambda x: f"${x:,.2f}")
monthly_summary["Net Savings"] = monthly_summary["Net Savings"].apply(lambda x: f"${x:,.2f}")

monthly_summary.index = monthly_summary.index.astype(str)
print(monthly_summary)

### Net Savings Analysis

Now let's calculate your **net savings** - the difference between income and expenses.

**Key Metrics**:
- **Total Income**: All money coming in
- **Total Expenses**: All money going out
- **Net Savings**: Income - Expenses
- **Savings Rate**: (Net Savings / Income) √ó 100%

A healthy savings rate is typically 20% or higher, but it varies based on personal circumstances.

In [None]:
def add_income(dataframe, date, category, amount, description):
    """
    Add a new income entry to the DataFrame.

    Parameters:
    -----------
    dataframe : pd.DataFrame
        The income DataFrame to add to
    date : str or datetime
        Date of the income (e.g., '2025-03-15')
    category : str
        Category of income (e.g., 'Salary', 'Freelance', 'Investment')
    amount : float
        Amount received in dollars
    description : str
        Brief description of the income source

    Returns:
    --------
    pd.DataFrame
        Updated DataFrame with new income entry
    """

    # Input validation
    if amount <= 0:
        print("‚ùå Error: Amount must be positive!")
        return dataframe

    # Valid income categories
    valid_categories = ["Salary", "Freelance", "Investment", "Business", "Gift", "Other"]
    if category not in valid_categories:
        print(f"‚ö†Ô∏è  Warning: '{category}' is not a standard category.")
        print(f"   Valid categories: {', '.join(valid_categories)}")
        print("   Proceeding anyway...")

    # Create new income entry
    new_income = {
        "date": pd.to_datetime(date),
        "category": category,
        "amount": float(amount),
        "description": description,
    }

    # Add to DataFrame
    updated_df = pd.concat([dataframe, pd.DataFrame([new_income])], ignore_index=True)
    updated_df = updated_df.sort_values("date").reset_index(drop=True)

    print(f"‚úÖ Added income: ${amount:.2f} from {category} on {date}")
    return updated_df


def save_income(dataframe, filename="income.csv"):
    """Save income DataFrame to a CSV file."""
    try:
        dataframe.to_csv(filename, index=False)
        print(f"‚úÖ Income saved to '{filename}'")
        print(f"üí∞ Saved {len(dataframe)} income entries totaling ${dataframe['amount'].sum():.2f}")
    except Exception as e:
        print(f"‚ùå Error saving file: {e}")


def load_income(filename="income.csv"):
    """Load income from a CSV file."""
    if not os.path.exists(filename):
        print(f"‚ö†Ô∏è  File '{filename}' not found. Starting with empty income records.")
        return pd.DataFrame(columns=["date", "category", "amount", "description"])

    try:
        df = pd.read_csv(filename, parse_dates=["date"])
        print(f"‚úÖ Loaded {len(df)} income entries from '{filename}'")
        print(f"üí∞ Total: ${df['amount'].sum():.2f}")
        return df
    except Exception as e:
        print(f"‚ùå Error loading file: {e}")
        return pd.DataFrame(columns=["date", "category", "amount", "description"])


print("‚úÖ Income tracking functions created!")
print("\nüí° Usage:")
print("   df_income = add_income(df_income, '2025-03-20', 'Freelance', 500.00, 'Consulting')")
print("   save_income(df_income)")
print("   df_income = load_income()")

In [None]:
# Create sample income data
sample_income = [
    {
        "date": "2025-01-01",
        "category": "Salary",
        "amount": 4500.00,
        "description": "Monthly salary",
    },
    {
        "date": "2025-01-15",
        "category": "Freelance",
        "amount": 800.00,
        "description": "Web design project",
    },
    {
        "date": "2025-02-01",
        "category": "Salary",
        "amount": 4500.00,
        "description": "Monthly salary",
    },
    {
        "date": "2025-02-10",
        "category": "Investment",
        "amount": 125.50,
        "description": "Dividend payment",
    },
    {
        "date": "2025-03-01",
        "category": "Salary",
        "amount": 4500.00,
        "description": "Monthly salary",
    },
    {
        "date": "2025-03-05",
        "category": "Freelance",
        "amount": 1200.00,
        "description": "Logo design project",
    },
]

# Create income DataFrame
df_income = pd.DataFrame(sample_income)
df_income["date"] = pd.to_datetime(df_income["date"])
df_income = df_income.sort_values("date").reset_index(drop=True)

print(f"‚úÖ Created sample income data with {len(df_income)} entries")
print(f"üí∞ Total income: ${df_income['amount'].sum():.2f}")
print(f"üìä Income categories: {', '.join(df_income['category'].unique())}")
print("\nüìã Income entries:")
df_income

## 11. Income Tracking üíµ

To understand your complete financial picture, you need to track both **income and expenses**.

### Why Track Income?
- Calculate **net savings** (income - expenses)
- Understand your savings rate
- Track multiple income sources
- Plan for financial goals

### Income Categories:
- **Salary**: Regular employment income
- **Freelance**: Contract or gig work
- **Investment**: Dividends, interest, capital gains
- **Business**: Self-employment income
- **Gift**: Money received as gifts
- **Other**: Any other income sources

### Data Structure:
We'll use a separate DataFrame for income with the same structure as expenses to keep our code modular and maintainable.

---

# Part II: Advanced Features üöÄ

Welcome to the enhanced version of the Personal Finance Tracker! In this section, we'll add production-ready features while maintaining our educational approach.

## What's New in This Section:

1. **Income Tracking** - Track income sources and calculate net savings
2. **Budget Management** - Set budgets and get alerts when approaching limits
3. **Transaction Management** - Edit, delete, and search transactions
4. **Recurring Transactions** - Automate tracking of subscriptions and bills
5. **Enhanced Data Validation** - Better data quality and error prevention
6. **Import/Export Features** - Import bank statements, export to Excel/PDF/JSON
7. **Predictive Analytics** - Forecast spending and get recommendations
8. **Pattern Recognition** - Detect anomalies and analyze trends
9. **Interactive Visualizations** - Plotly charts with hover effects
10. **Web Interface** - Streamlit app for easy access

Let's enhance your finance tracker! üí™