<a href="https://colab.research.google.com/github/AasthathecoderX/Edunet_Energy/blob/main/Edunet_ElectricityPred.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Load the dataset from "/content/drive/MyDrive/Colab Notebooks/Datasets/archive(1)/electricity_bill_dataset.csv", explore the data, prepare it for training, choose and train a suitable model for time series prediction, evaluate the model, and make predictions.

## Load the dataset

### Subtask:
Load the `electricity_bill_dataset.csv` file into a pandas DataFrame.


**Reasoning**:
Import pandas and load the dataset into a DataFrame, then display the first few rows.



In [None]:
import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Datasets/archive(1)/electricity_bill_dataset.csv")
display(df.head())

## Explore the data

### Subtask:
Explore the data by displaying the first few rows, checking for missing values, and understanding the data types.


**Reasoning**:
Explore the data by displaying the first few rows, checking for missing values, and understanding the data types as requested in the instructions.



In [None]:
display(df.head())
display(df.isnull().sum())
df.info()

## Prepare the data

### Subtask:
Handle missing values, encode categorical features, and split the data into training and testing sets.


**Reasoning**:
Identify categorical columns, apply one-hot encoding, separate features and target, and split the data into training and testing sets.



In [None]:
from sklearn.model_selection import train_test_split

categorical_cols = df.select_dtypes(include=['object']).columns
df_encoded = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

X = df_encoded.drop('ElectricityBill', axis=1)
y = df_encoded['ElectricityBill']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

display(X_train.head())
display(X_test.head())
display(y_train.head())
display(y_test.head())

## Choose a model

### Subtask:
Select a suitable model for time series prediction.


## Train the model

### Subtask:
Train the selected model on the training data.


**Reasoning**:
Import the XGBoost Regressor model and train it using the training data.



In [None]:
from xgboost import XGBRegressor

model = XGBRegressor()
model.fit(X_train, y_train)

## Evaluate the model

### Subtask:
Evaluate the model's performance on the testing data.


**Reasoning**:
Import necessary metrics, make predictions, calculate MSE and R-squared, and print the results.



In [None]:
from sklearn.metrics import mean_squared_error, r2_score

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

## Make predictions

### Subtask:
Use the trained model to make predictions on new data.


**Reasoning**:
Make predictions on the test set using the trained model and display the first few predictions.



In [None]:
y_pred = model.predict(X_test)
predictions_df = pd.DataFrame({'Original': y_test, 'Predicted': y_pred})
display(predictions_df.head())

## Summary:

### Data Analysis Key Findings

*   The dataset was loaded successfully and contains no missing values.
*   Categorical features were successfully one-hot encoded.
*   The data was split into training (80%) and testing (20%) sets.
*   An XGBoost Regressor model was chosen and trained for the time series prediction task.
*   The model achieved a Mean Squared Error (MSE) of approximately 384.44 and an R-squared ($R^2$) score of approximately 0.9997 on the test set.

### Insights or Next Steps

*   The trained XGBoost model shows excellent performance on the test data, indicated by the very high R-squared score and low MSE.
*   Further analysis could involve exploring the feature importances from the XGBoost model to understand which factors most significantly influence electricity bills.
