# Income Prediction Model Training

This notebook trains a Machine Learning model to predict next month's income using historical data.

## 1. Setup and Data Loading

In [1]:
import pandas as pd
import numpy as np
import joblib
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
import os

# Load the dataset
filename = 'monthly_spending_dataset_2020_2025.csv'
df = pd.read_csv(filename)
print("Dataset loaded successfully.")
df.head()

Dataset loaded successfully.


Unnamed: 0,Month,Groceries (₹),Rent (₹),Transportation (₹),Gym (₹),Utilities (₹),Healthcare (₹),Investments (₹),Savings (₹),EMI/Loans (₹),Dining & Entertainment (₹),Shopping & Wants (₹),Total Expenditure (₹),Income (₹)
0,2020-01-01,4860,10000,2595,888,1520,1930,4311,4232,0,3138,1121,30363,40000
1,2020-02-01,6135,10000,2371,851,1630,1923,5939,7329,0,3185,2332,34366,40000
2,2020-03-01,6853,10000,2715,1143,1776,1185,4700,3625,0,2684,1459,32515,36000
3,2020-04-01,6904,10000,2582,869,1975,1274,4420,6426,0,2475,2806,33305,36000
4,2020-05-01,4562,10000,3028,830,1984,1631,4410,3647,0,2146,1020,29611,36000


## 2. Preprocessing & Currency Conversion
We convert the income from INR to LKR (Rate: 3.44) and prepare the lag features.

In [2]:
EXCHANGE_RATE = 3.44  
income_col_inr = 'Income (₹)' 

# Convert to LKR
df['Income_LKR'] = df[income_col_inr] * EXCHANGE_RATE

# Feature Engineering: Use previous 3 months to predict the next
df['Month_1_Ago'] = df['Income_LKR'].shift(1)
df['Month_2_Ago'] = df['Income_LKR'].shift(2)
df['Month_3_Ago'] = df['Income_LKR'].shift(3)
df['Target'] = df['Income_LKR'].shift(-1)

# Drop rows with NaN values created by shifting
df_final = df.dropna(subset=['Month_1_Ago', 'Month_2_Ago', 'Month_3_Ago', 'Target'])
print(f"Processed {len(df_final)} months of data for training.")
df_final[['Month_3_Ago', 'Month_2_Ago', 'Month_1_Ago', 'Target']].head()

Processed 65 months of data for training.


Unnamed: 0,Month_3_Ago,Month_2_Ago,Month_1_Ago,Target
3,137600.0,137600.0,123840.0,123840.0
4,137600.0,123840.0,123840.0,123840.0
5,123840.0,123840.0,123840.0,123840.0
6,123840.0,123840.0,123840.0,123840.0
7,123840.0,123840.0,123840.0,123840.0


## 3. Train Model
We use a Random Forest Regressor and evaluate it on a test set.

In [3]:
# Features and Target
features = ['Month_1_Ago', 'Month_2_Ago', 'Month_3_Ago']
X = df_final[features]
y = df_final['Target']

# Split data (Last 20% for testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Initialize and Train
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
mape = mean_absolute_percentage_error(y_test, y_pred)
accuracy_pct = (1 - mape) * 100

print(f"--- Model Performance ---")
print(f"Mean Absolute Error: LKR {mae:,.2f}")
print(f"Model Accuracy Score: {accuracy_pct:.2f}%")

--- Model Performance ---
Mean Absolute Error: LKR 602.82
Model Accuracy Score: 99.73%


## 4. Save Model
Save the trained model for use in the Laravel application.

In [4]:
joblib.dump(model, 'lkr_income_model.pkl')
print("Model saved as 'lkr_income_model.pkl'")

Model saved as 'lkr_income_model.pkl'
