# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [13]:
import lightgbm
# Import necessary libraries
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split, GridSearchCV, TimeSeriesSplit
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
# Import models you're considering
from Data.load_data import get_energy_data
from helper_functions import create_error_metrics

## Model Selection

[Discuss the type(s) of models you consider for this task, and justify the selection.]

# TODO

## Feature Engineering

[Describe any additional feature engineering you've performed beyond what was done for the baseline model.]


In [6]:
df = get_energy_data()

df['hour'] = df.index.hour
df['day_of_week'] = df.index.dayofweek
df['day'] = df.index.day
df['month'] = df.index.month
df['day_of_year'] = df.index.dayofyear

# Perform any feature engineering steps
# Example: df['new_feature'] = df['feature1'] + df['feature2']

# Feature and target variable selection
X = df[['hour', 'day_of_week', 'day', 'month', 'day_of_year', 'Temperature']]
y = df['Load']
# 
# # Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## Hyperparameter Tuning

[Discuss any hyperparameter tuning methods you've applied, such as Grid Search or Random Search, and the rationale behind them.]


In [12]:
cv_split = TimeSeriesSplit(n_splits=4, test_size=100)
model = lightgbm.LGBMRegressor()
parameters = {
    # "max_depth": [3, 4, 6, 5, 10],
    # "num_leaves": [10, 20, 30, 40, 100, 120],
    # "learning_rate": [0.01, 0.05, 0.1, 0.2, 0.3],
    # "n_estimators": [50, 100, 300, 500, 700, 900, 1000],
    # "colsample_bytree": [0.3, 0.5, 0.7, 1]    
    "max_depth": [10],
    "num_leaves": [10],
    "learning_rate": [0.3],
    "n_estimators": [50],
    "colsample_bytree": [0.3]
}


grid_search = GridSearchCV(estimator=model, cv=cv_split, param_grid=parameters)
grid_search.fit(X_train, y_train)


[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000604 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 578
[LightGBM] [Info] Number of data points in the train set: 20624, number of used features: 6
[LightGBM] [Info] Start training from score 230017.997881
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001867 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 579
[LightGBM] [Info] Number of data points in the train set: 20724, number of used features: 6
[LightGBM] [Info] Start training from score 230034.079829
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000081 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 578
[LightGBM] [Info] Number of data points in the trai

## Implementation

[Implement the final model(s) you've selected based on the above steps.]


In [None]:
# Implement the final model(s)
# Example: model = YourChosenModel(best_hyperparameters)
# model.fit(X_train, y_train)


## Evaluation Metrics

[Clearly specify which metrics you'll use to evaluate the model performance, and why you've chosen these metrics.]


In [16]:
# Evaluate the model using your chosen metrics
# Example for classification
# y_pred = model.predict(X_test)
# print(classification_report(y_test, y_pred))

# Example for regression
# mse = mean_squared_error(y_test, y_pred)
prediction = grid_search.predict(X_test)
# Your evaluation code here
create_error_metrics(y_test, prediction)

Unnamed: 0,MAE,MSE,RMSE,MAPE %,R2 %
0,8638.22,133586400.0,11557.96,4.0,91.79


## Comparative Analysis

[Compare the performance of your model(s) against the baseline model. Discuss any improvements or setbacks and the reasons behind them.]


In [None]:
# Comparative Analysis code (if applicable)
# Example: comparing accuracy of the baseline model and the new model
# print(f"Baseline Model Accuracy: {baseline_accuracy}, New Model Accuracy: {new_model_accuracy}")
