# PowerPulse: Household Energy Usage Forecast

This project builds two regression models (Linear Regression and Random Forest) to predict household energy usage using historical data. The goal is to extract insights from power consumption trends and deliver accurate forecasts.

**Tech Stack:** Python | Pandas | Scikit-learn | Matplotlib | Seaborn  
**Models Used:** Linear Regression, Random Forest

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv(r'L:\Guvi\Power\household_power_consumption.txt', parse_dates=[['Date', 'Time']], infer_datetime_format=True)

# Drop rows with missing values
df.replace(';', np.nan, inplace=True)
df.dropna(inplace=True)
df = df.astype({'Global_active_power': 'float64',
                'Global_reactive_power': 'float64',
                'Voltage': 'float64',
                'Global_intensity': 'float64',
                'Sub_metering_1': 'float64',
                'Sub_metering_2': 'float64',
                'Sub_metering_3': 'float64'})

# Feature engineering
df['hour'] = df['Date_Time'].dt.hour
df['day'] = df['Date_Time'].dt.day
df['month'] = df['Date_Time'].dt.month
df['weekday'] = df['Date_Time'].dt.weekday

# Target & features
features = ['Global_reactive_power', 'Voltage', 'Global_intensity',
            'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3',
            'hour', 'day', 'month', 'weekday']
X = df[features]
y = df['Global_active_power']

  df = pd.read_csv(r'L:\Guvi\Power\household_power_consumption.txt', parse_dates=[['Date', 'Time']], infer_datetime_format=True)
  df = pd.read_csv(r'L:\Guvi\Power\household_power_consumption.txt', parse_dates=[['Date', 'Time']], infer_datetime_format=True)


ValueError: Missing column provided to 'parse_dates': 'Date, Time'

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(df[features + ['Global_active_power']].corr(), annot=True, cmap='coolwarm')
plt.title("Feature Correlation Heatmap")
plt.show()

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)

mae_lr = mean_absolute_error(y_test, y_pred_lr)
rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred_lr))
r2_lr = r2_score(y_test, y_pred_lr)

In [None]:
from sklearn.ensemble import RandomForestRegressor

rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)

mae_rf = mean_absolute_error(y_test, y_pred_rf)
rmse_rf = np.sqrt(mean_squared_error(y_test, y_pred_rf))
r2_rf = r2_score(y_test, y_pred_rf)

# Feature importance
importances = rf_model.feature_importances_
plt.figure(figsize=(10, 6))
sns.barplot(x=importances, y=features)
plt.title("Feature Importance - Random Forest")
plt.show()

In [None]:
results = pd.DataFrame({
    'Model': ['Linear Regression', 'Random Forest'],
    'MAE': [mae_lr, mae_rf],
    'RMSE': [rmse_lr, rmse_rf],
    'R²': [r2_lr, r2_rf]
})
results

In [None]:
plt.figure(figsize=(14, 4))
plt.plot(y_test.values[:200], label='Actual')
plt.plot(y_pred_rf[:200], label='Predicted')
plt.title('Random Forest Predictions vs Actual')
plt.xlabel('Time Index')
plt.ylabel('Global Active Power (kilowatts)')
plt.legend()
plt.show()

## Conclusion

- Random Forest outperformed Linear Regression significantly in all metrics.
- `Global_intensity` and time-related features (like `hour`) were the most impactful.
- The model can be used for energy monitoring, demand forecasting, and smart grid integration.

### Business Impact
✅ Helps households monitor and reduce energy consumption.  
✅ Assists utility providers in forecasting and load balancing.  
✅ Lays the groundwork for real-time predictive energy systems.