# Problem Statement:
Predicting Appliance Energy Consumption using various environmental and internal features in a smart home environment. The goal is to build a regression model that accurately estimates the energy consumption of appliances based on factors like temperature, humidity, wind speed, and visibility.

# Dataset Overview:
The dataset contains time-series data related to environmental conditions, electrical appliances, and energy consumption. Here are the key features:

- Appliances: Energy consumption in Wh (Target variable).
- Lights: Energy consumption by lights in the house.
- Temperature and Humidity (T1, RH_1, ...): Various temperature and humidity readings from different locations in the house (e.g., kitchen, living room, etc.).
- Outdoor Environment: External temperature, wind speed, pressure, and visibility.
- Other Variables: Random variables rv1, rv2 included in the dataset.

The target variable is the energy consumption of appliances (Appliances), and the goal is to predict this based on the other variables.

# Steps to be covered:
- Data Preprocessing: Handle date column, check missing value & ensure data is clean, and split the dataset.
- Model Training: Train a model using Gradient Boosting Regressor.
- Model Evaluation: Use evaluation metrics like Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²) to assess the model's performance.
- Save the Model: Save the trained model.
- Load and Predict: Use the saved model for predictions.

# Import Libraries and Load Dataset

In [None]:
# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import joblib

# Load the dataset
file_path = 'energydata_complete.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset to understand its structure
data.head()


# Data Preprocessing

In [2]:
# Convert the 'date' column to datetime format
data['date'] = pd.to_datetime(data['date'])

# Drop the 'date' column as it's not needed for modeling
data = data.drop(columns=['date'])

# Check for missing values
print(data.isnull().sum())  # Ensure there are no missing values

# Split the data into features (X) and target (y)
X = data.drop(columns=['Appliances'])  # Features are all columns except 'Appliances'
y = data['Appliances']  # Target variable is 'Appliances'

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shape of training and test data
X_train.shape, X_test.shape, y_train.shape, y_test.shape


Appliances     0
lights         0
T1             0
RH_1           0
T2             0
RH_2           0
T3             0
RH_3           0
T4             0
RH_4           0
T5             0
RH_5           0
T6             0
RH_6           0
T7             0
RH_7           0
T8             0
RH_8           0
T9             0
RH_9           0
T_out          0
Press_mm_hg    0
RH_out         0
Windspeed      0
Visibility     0
Tdewpoint      0
rv1            0
rv2            0
dtype: int64


((15788, 27), (3947, 27), (15788,), (3947,))

# Model Training (Gradient Boosting Regressor)

In [13]:
# Initialize the Gradient Boosting Regressor
gb_regressor = GradientBoostingRegressor(random_state=42)

# Train the model on the training data
gb_regressor.fit(X_train, y_train)

# Predict on the test data
y_pred = gb_regressor.predict(X_test)

# Use root_mean_squared_error function instead of squared=False in mean_squared_error
from sklearn.metrics import root_mean_squared_error, mean_absolute_error, r2_score, mean_squared_error

# Calculate RMSE, MAE, and R²
rmse = root_mean_squared_error(y_test, y_pred)  # This will be deprecated soon
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Output evaluation metrics
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"R-squared (R²): {r2}")

Root Mean Squared Error (RMSE): 86.04039404289088
Mean Absolute Error (MAE): 47.21612484004351
R-squared (R²): 0.2602296374169759


These metrics provide a general sense of how well the model is doing in terms of both absolute error (RMSE, MAE) and the model's overall explanatory power (R²).

### 1. Root Mean Squared Error (RMSE):
- Value: 86.04
- Explanation: RMSE tells us how far, on average, our model's predictions are from the actual values. The error is measured in the same units as the target variable (Appliances energy consumption in Wh).
- In Simple Terms: On average, the model’s predictions for appliance energy consumption are about 86 Wh off from the true values.
### 2. Mean Absolute Error (MAE):
- Value: 47.22
- Explanation: MAE gives the average magnitude of the errors between predictions and actual values, without considering the direction of the errors (whether the prediction is too high or too low).
- In Simple Terms: On average, the model’s predictions are about 47 Wh off from the true appliance energy consumption.
### 3. R-squared (R²):
- Value: 0.26
- Explanation: R² measures how well the model explains the variability in the target variable. An R² of 1 means the model perfectly explains the data, while 0 means it does not explain any variability.
- In Simple Terms: The model explains only 26% of the variation in appliance energy consumption. This suggests that the model could be improved or that the data might have other factors influencing energy consumption that the model isn't capturing well.


# Save the Model

In [14]:
# Save the trained model to a file
model_filename = 'appliance_energy_gb_model.pkl'
joblib.dump(gb_regressor, model_filename)

print(f"Model saved to {model_filename}")


Model saved to appliance_energy_gb_model.pkl


# Load and Predict using the Saved Model

In [16]:
# Load the saved model
loaded_model = joblib.load(model_filename)

# Make predictions on the test set with the loaded model
loaded_model_predictions = loaded_model.predict(X_test)

# Display a few predictions with explanations
print("Sample Appliance Energy Consumption Predictions:")
for i, pred in enumerate(loaded_model_predictions[:10], 1):
    print(f"Prediction {i}: The predicted energy consumption is approximately {pred:.2f} Wh.")


Sample Appliance Energy Consumption Predictions:
Prediction 1: The predicted energy consumption is approximately 60.31 Wh.
Prediction 2: The predicted energy consumption is approximately 189.48 Wh.
Prediction 3: The predicted energy consumption is approximately 49.41 Wh.
Prediction 4: The predicted energy consumption is approximately 113.20 Wh.
Prediction 5: The predicted energy consumption is approximately 70.73 Wh.
Prediction 6: The predicted energy consumption is approximately 174.69 Wh.
Prediction 7: The predicted energy consumption is approximately 135.46 Wh.
Prediction 8: The predicted energy consumption is approximately 192.87 Wh.
Prediction 9: The predicted energy consumption is approximately 83.42 Wh.
Prediction 10: The predicted energy consumption is approximately 90.55 Wh.
