<a href="https://colab.research.google.com/github/AryanKothari/SolarPredictor/blob/main/P1_Solar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# By: Sammy Korol and Aryan Kothari

#**Reading solar datasets**

In [None]:
import pandas as pd

ava = pd.read_pickle('ava_st1_ns4_23.pkl')
comp = pd.read_pickle('comp_st1_ns4_23.pkl')
seed = 100521223


In [None]:

# splitting input features (x) and response variable (y)
target_column = 'energy'
X = ava.drop(target_column, axis=1) # Selecting all columns except the last one
y = ava[target_column]   # Selecting the last column


#**Exploratory Data Analysis (EDA**)

In [None]:
# displaying the first few rows of the dataset
print(ava.head)

In [None]:
# print shape of X (input variables)
print(X.shape) # 4380 instances and 300 input features

# print shape of y (response variable)
print(y.shape) # 4380 values of response variable

In [None]:
# Calculate mean, mode, median, and range of response variable
mean_value = ava["energy"].mean()
mode_value = ava["energy"].mode().iloc[0]
median_value = ava["energy"].median()
range_value = ava["energy"].max() - ava["energy"].min()

# Display the results
print(f"Mean: {mean_value} kJ")
print(f"Mode: {mode_value} kJ")
print(f"Median: {median_value} kJ")
print(f"Range: {range_value} kJ")

In [None]:
# check for missing values
missing_values = ava.isnull().sum().sum()
print(f"missing values: {missing_values}")

In [None]:
# identify constant columns
constant_columns = ava.columns[ava.nunique() == 1]
constant_value = ava[constant_columns[0]].iloc[0]
print(f"consant columns: {len(constant_columns)}")
print(f"constant column name: {constant_columns[0]}")
print(f"constant column value: {constant_value}")

#constant column is dropped
ava.drop(columns=constant_columns, inplace=True)
print(f"Columns {constant_columns} have been dropped.")

#update X to reflect dropped column
X = ava.drop("energy", axis=1)

In [None]:
# detect the presence of categorical variables
categorical_columns = ava.select_dtypes(include=['category']).columns
print(f"categorical columns: {len(categorical_columns)}")
# detect the presence of numerical variables
numerical_columns = ava.select_dtypes(include=['number']).columns
print(f"numerical columns: {len(numerical_columns)}")

categorical columns: 19
numerical columns: 282


## **EDA Analysis**

**Categorical Variables:** there are 18 categorical variables detected, which will be dealt with as necessary depending on the model evaluation technique(s) below.

**Missing Values:** there are no missing variables, hence no further action is required.

**Constant Columns:** there is one constant column ("apcp_sf3_3") with value 1. Column is dropped to reduce redundancy and accuracy of model evaluations.



#**Method 1: Decision Tree**

## Training and evaluating a decision tree *without* hyperparameter tuning (holdout)

First, we are going to use Holdout (train/test) for model evaluation without hyperparamter tuning. MSE and R^2 will be our metrics used to evaluate the performance of all models henceforth.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import tree
from sklearn.tree import DecisionTreeRegressor
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import r2_score

# Splitting data into testing (9 years) and training (3 years data)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=3/12, random_state=100521223)

# Training a decision tree (for regression)
tree_reg = DecisionTreeRegressor(random_state=100521223)
tree_reg.fit(X_train,y_train)

# Obtain predictions on the test set
y_pred_tree = tree_reg.predict(X_test)

# Calculate RSME for the Decision Tree Regressor
rmse_tree = metrics.mean_squared_error(y_test, y_pred_tree)
print(f'MSE: {rmse_tree}')

# Calculate R^2 for the Decision Tree Regressor
r2_tree = r2_score(y_test, y_pred_tree)
print(f'R^2 Score: {r2_tree}')

## Hyperparameter tuning with CV
Without hyperparameter tuning, the decision tree might be prone to overfitting the training data, and the model's performance on the validation set may not be optimal. Therefore, we use gridsearch & randomizedsearch to find optimal values for *max_depth* and *min_sample_split*.

In [None]:
#Grid Search Hyperparamter Tuning

from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.model_selection import TimeSeriesSplit

# Define hyperparameters grid
param_grid = {
    'max_depth': list(range(2,16,2)),
    'min_samples_split': list(range(2,16,2)),
    }

# Create decision tree regressor
reg_tree_tuned = DecisionTreeRegressor(random_state=100521223)

# Set up the GridSearchCV
grid_search = GridSearchCV(reg_tree_tuned, param_grid, cv=TimeSeriesSplit(n_splits=3), scoring='neg_mean_squared_error', n_jobs=-1)

# Fit the model to the training data
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_
print(f'Best Hyperparameters: {best_params}')

# Make predictions on the test set using the tuned model
y_pred_tuned = grid_search.predict(X_test)

# Evaluate the tuned model
mse_tuned = metrics.mean_squared_error(y_test, y_pred_tuned)
print(f'Mean Squared Error (With Hyperparameter Tuning): {mse_tuned}')

# Calculate R^2 for the Decision Tree Regressor
r2_tree_tuned = r2_score(y_test, y_pred_tuned)
print(f'R^2 Score (With Hyperparameter Tuning): {r2_tree_tuned}')

Best Hyperparameters: {'max_depth': 4, 'min_samples_split': 2}
Mean Squared Error (With Hyperparameter Tuning): 13324816988333.227
R^2 Score (With Hyperparameter Tuning): 0.7982468028318466


In [None]:
#Randomized Search Hyperparamter Tuning

from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn import metrics


from scipy.stats import uniform, expon
from scipy.stats import randint as sp_randint

# Search space with integer uniform distributions
param_grid = {'max_depth': sp_randint(2,16),
              'min_samples_split': sp_randint(2,16)}

budget = 20
regr = RandomizedSearchCV(DecisionTreeRegressor(),
                         param_grid,
                         scoring='neg_mean_squared_error',
                         cv=TimeSeriesSplit(n_splits=3),
                         n_jobs=1, verbose=1,
                         n_iter=budget
                        )

np.random.seed(100521223)
regr.fit(X=X_train, y=y_train)

best_params = regr.best_params_
print(f'Best Hyperparameters: {best_params}')

# Make predictions on the test set using the tuned model
y_pred_tuned = regr.predict(X_test)

# Evaluate the tuned model
mse_tuned = metrics.mean_squared_error(y_test, y_pred_tuned)
print(f'Mean Squared Error (With Hyperparameter Tuning): {mse_tuned}')

# Calculate R^2 for the Decision Tree Regressor
r2_tree_tuned = r2_score(y_test, y_pred_tuned)
print(f'R^2 Score (With Hyperparameter Tuning): {r2_tree_tuned}')

Fitting 3 folds for each of 20 candidates, totalling 60 fits
Best Hyperparameters: {'max_depth': 5, 'min_samples_split': 10}
Mean Squared Error (With Hyperparameter Tuning): 12681659041934.867
R^2 Score (With Hyperparameter Tuning): 0.8079849607430273


**Summary**

As can be seen, the model which is trained and evaluated using hyperparameter tuning (max depth = 5, min samples split = 10) performs significantly better than the model trained and evaluated with default paramaters. This is made clear in the R^2 for each model, with the HPO model demonstrating an improvement of 0.08.

In summary, model using randomized search HPO is the best performing regression tree mode.

# **Method 2: KNN**


Analysis of Outliers in the Data:

In [None]:
#Outlier Analysis:
z_scores = np.abs((ava[numerical_columns] - ava[numerical_columns].mean()) / ava[numerical_columns].std())

# Count the number of data points with Z-scores above threshold of 7
outliers_count = (z_scores >= 7).sum(axis=1)

# Total number of data points with Z-scores above the threshold
total_outliers = (outliers_count > 0).sum()
print(f"Total number of data points with Z-scores above or equal to 7: {total_outliers}")

Total number of data points with Z-scores above or equal to 7: 217


KNN Model with standard k value (no hyper-parameter tuning)


In [None]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

#Identify extreme outliers (we decided on the outlier threshold to be below a z-score of 7, because 1/4 of the data points already had a z-score of 3)
z_scores = np.abs((ava[numerical_columns] - ava[numerical_columns].mean()) / ava[numerical_columns].std())
ava_no_outliers = ava[(z_scores < 7).all(axis=1)]

# Features and target variable
X = ava_no_outliers.drop('energy', axis=1)
y = ava_no_outliers['energy']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=seed)

# Separate numerical and categorical columns
numerical_cols = X.select_dtypes(include=['number']).columns
categorical_cols = X.select_dtypes(include=['category']).columns

# Create transformers for numerical and categorical columns
numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(drop='first'))  # Use drop='first' to avoid dummy variable trap
])

# Combine transformers using ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

# Build the KNN model with preprocessing
knn_model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor(n_neighbors=5))  # Not using Hyperparameter tuning
])

# Train the model
knn_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error (MSE): {mse}')
print(f'R-squared (R^2): {r2}')


Mean Squared Error (MSE): 12381654019261.28
R-squared (R^2): 0.7914587389793833


GridSearchCV tuning method (detecting lowest MSE):

In [None]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for KNN
param_grid = {'regressor__n_neighbors': [5, 10, 15, 20, 25]}  # Adjust the range as needed

# Create a pipeline with preprocessing and KNN model
knn_model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor())
])

# Use GridSearchCV for hyperparameter tuning
grid_search = GridSearchCV(knn_model, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_n_neighbors_mse = grid_search.best_params_['regressor__n_neighbors']

# Build the final KNN model with the best hyperparameters for MSE
best_knn_model_mse = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor(n_neighbors=best_n_neighbors_mse))
])

# Train the model for MSE
best_knn_model_mse.fit(X_train, y_train)

# Make predictions on the test set for MSE
y_pred_mse = best_knn_model_mse.predict(X_test)

# Evaluate the model for MSE
mse_mse = mean_squared_error(y_test, y_pred_mse)
r2_mse = r2_score(y_test, y_pred_mse)

print(f'Best Number of Neighbors for MSE: {best_n_neighbors_mse}')
print(f'Mean Squared Error (MSE) for MSE model: {mse_mse}')
print(f'R-squared (R^2) for MSE model: {r2_mse}')


Best Number of Neighbors for MSE: 10
Mean Squared Error (MSE) for MSE model: 11484582098779.92
R-squared (R^2) for MSE model: 0.8065679085000587


For-Loop tuning method (detecting lowest MSE)

In [None]:

#Adding Hyper-parameter tuning based on lowest MSE
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for KNN
param_grid = {'regressor__n_neighbors': [5, 10, 15, 20, 25]}  # Adjust the range as needed

# Initialize variables to track the best hyperparameters
best_mse = float('inf')  # Start with a very high value
best_n_neighbors_mse = None

# Iterate over each value of n_neighbors
for n_neighbors in param_grid['regressor__n_neighbors']:
    # Build the KNN model with the current hyperparameters
    knn_model_temp = Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('regressor', KNeighborsRegressor(n_neighbors=n_neighbors))
    ])

    # Train the model
    knn_model_temp.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred_temp = knn_model_temp.predict(X_test)

    # Evaluate the model for MSE
    mse_temp = mean_squared_error(y_test, y_pred_temp)

    # Update the best hyperparameters if the current MSE is lower
    if mse_temp < best_mse:
        best_mse = mse_temp
        best_n_neighbors_mse = n_neighbors

# Build the final KNN model with the best hyperparameters for MSE
best_knn_model_mse = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor(n_neighbors=best_n_neighbors_mse))
])

# Train the model for MSE
best_knn_model_mse.fit(X_train, y_train)

# Make predictions on the test set for MSE
y_pred_mse = best_knn_model_mse.predict(X_test)

# Evaluate the model for MSE
mse_mse = mean_squared_error(y_test, y_pred_mse)
r2_mse = r2_score(y_test, y_pred_mse)

print(f'Best Number of Neighbors for MSE: {best_n_neighbors_mse}')
print(f'Mean Squared Error (MSE) for MSE model: {mse_mse}')
print(f'R-squared (R^2) for MSE model: {r2_mse}')


Best Number of Neighbors for MSE: 20
Mean Squared Error (MSE) for MSE model: 11207266494203.098
R-squared (R^2) for MSE model: 0.8112386694330632


GridSearchCV Tuning Method (detecting highest r^2)



In [None]:
from sklearn.metrics import r2_score

# Define the parameter grid for KNN
param_grid = {'regressor__n_neighbors': [5, 10, 15, 20, 25]}  # Adjust the range as needed

# Create a pipeline with preprocessing and KNN model
knn_model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor())
])

# Use GridSearchCV for hyperparameter tuning
grid_search_r2 = GridSearchCV(knn_model, param_grid, cv=5, scoring='r2', n_jobs=-1)
grid_search_r2.fit(X_train, y_train)

# Get the best hyperparameters
best_n_neighbors_r2 = grid_search_r2.best_params_['regressor__n_neighbors']

# Build the final KNN model with the best hyperparameters for R-squared
best_knn_model_r2 = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor(n_neighbors=best_n_neighbors_r2))
])

# Train the model for R-squared
best_knn_model_r2.fit(X_train, y_train)

# Make predictions on the test set for R-squared
y_pred_r2 = best_knn_model_r2.predict(X_test)

# Evaluate the model for R-squared
mse_r2 = mean_squared_error(y_test, y_pred_r2)
r2_r2 = r2_score(y_test, y_pred_r2)

print(f'Best Number of Neighbors for R-squared: {best_n_neighbors_r2}')
print(f'Mean Squared Error (MSE) for R-squared model: {mse_r2}')
print(f'R-squared (R^2) for R-squared model: {r2_r2}')

Best Number of Neighbors for R-squared: 10
Mean Squared Error (MSE) for R-squared model: 11484582098779.92
R-squared (R^2) for R-squared model: 0.8065679085000587


For Loop Tuning Method (highest r^2)

In [None]:
#Adding Hyper-parameter tuning based on highest R^2

# Initialize variables to track the best hyperparameters
best_r2 = -float('inf')  # Start with a very low value
best_n_neighbors_r2 = None

# Iterate over each value of n_neighbors
for n_neighbors in param_grid['regressor__n_neighbors']:
    # Build the KNN model with the current hyperparameters
    knn_model_temp = Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('regressor', KNeighborsRegressor(n_neighbors=n_neighbors))
    ])

    # Train the model
    knn_model_temp.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred_temp = knn_model_temp.predict(X_test)

    # Evaluate the model for R-squared
    r2_temp = r2_score(y_test, y_pred_temp)

    # Update the best hyperparameters if the current R-squared is higher
    if r2_temp > best_r2:
        best_r2 = r2_temp
        best_n_neighbors_r2 = n_neighbors

# Build the final KNN model with the best hyperparameters for R-squared
best_knn_model_r2 = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', KNeighborsRegressor(n_neighbors=best_n_neighbors_r2))
])

# Train the model for R-squared
best_knn_model_r2.fit(X_train, y_train)

# Make predictions on the test set for R-squared
y_pred_r2 = best_knn_model_r2.predict(X_test)

# Evaluate the model for R-squared
mse_r2 = mean_squared_error(y_test, y_pred_r2)
r2_r2 = r2_score(y_test, y_pred_r2)

print(f'Best Number of Neighbors for R-squared: {best_n_neighbors_r2}')
print(f'Mean Squared Error (MSE) for R-squared model: {mse_r2}')
print(f'R-squared (R^2) for R-squared model: {r2_r2}')


Best Number of Neighbors for R-squared: 20
Mean Squared Error (MSE) for R-squared model: 11207266494203.098
R-squared (R^2) for R-squared model: 0.8112386694330632


**Summary**

# **Method 3: SVMs**


In [None]:
#SVM Model:


# **Method 4: Random Forest & Extra Trees**




In [None]:
#Random Forest Model (no Hyper-parameter tuning):
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from sklearn.preprocessing import StandardScaler

X = ava.drop('energy', axis=1)
y = ava['energy']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=seed)

# Create a pipeline with preprocessing and Random Forests model
rf_model = Pipeline(steps=[
    ('scaler', StandardScaler()),  # You can choose whether or not to scale your features
    ('regressor', RandomForestRegressor(n_estimators=100, random_state=seed))  # This is without hyper-parameter tuning
])

# Train the model
rf_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_rf = rf_model.predict(X_test)

# Evaluate the Random Forests model
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)

print('Random Forests Model:')
print(f'Mean Squared Error (MSE): {mse_rf}')
print(f'R-squared (R^2): {r2_rf}')

Random Forests Model:
Mean Squared Error (MSE): 9326973023157.86
R-squared (R^2): 0.8587788013170603


Using RandomizedSearchCV for tuning on Random Forest (by lowest MSE)

In [None]:
from sklearn.model_selection import RandomizedSearchCV

param_grid = {
    'regressor__n_estimators': [int(x) for x in np.linspace(start=10, stop=300, num=10)],
    'regressor__max_features': ['auto', 'sqrt'],
    'regressor__max_depth': [int(x) for x in np.linspace(10, 110, num=11)] + [None],
    'regressor__min_samples_split': [2, 5, 10],
    'regressor__min_samples_leaf': [1, 2, 4],
}

# Use RandomizedSearchCV for hyperparameter tuning
random_search_rf = RandomizedSearchCV(rf_model, param_distributions=param_grid, n_iter=2, cv=5, scoring='neg_mean_squared_error', n_jobs=-1, random_state=seed)
random_search_rf.fit(X_train, y_train)

# Get the best hyperparameters
best_params_rf = random_search_rf.best_params_

# Remove the 'regressor__' prefix from the parameter names
best_params_rf = {key.replace('regressor__', ''): value for key, value in best_params_rf.items()}

# Build the Random Forests model with the best hyperparameters
best_rf_model = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('regressor', RandomForestRegressor(random_state=seed, **best_params_rf))
])

# Train the model
best_rf_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_rf_best = best_rf_model.predict(X_test)

# Evaluate the Random Forests model with hyperparameter tuning
mse_rf_best = mean_squared_error(y_test, y_pred_rf_best)
r2_rf_best = r2_score(y_test, y_pred_rf_best)

print('Random Forests Model with Hyperparameter Tuning (RandomizedSearchCV):')
print(f'Best Hyperparameters: {best_params_rf}')
print(f'Mean Squared Error (MSE): {mse_rf_best}')
print(f'R-squared (R^2): {r2_rf_best}')

Random Forests Model with Hyperparameter Tuning (RandomizedSearchCV):
Best Hyperparameters: {'n_estimators': 267, 'min_samples_split': 5, 'min_samples_leaf': 4, 'max_features': 'sqrt', 'max_depth': 110}
Mean Squared Error (MSE): 9101328771949.059
R-squared (R^2): 0.8621953172169672


Extra Trees Model (without tuning)

In [None]:
#Extra Trees Model (no Hyper-Parameter tuning):
et_model = Pipeline(steps=[
    ('scaler', StandardScaler()),  # You can choose whether or not to scale your features
    ('regressor', ExtraTreesRegressor(n_estimators=100, random_state=seed))  # This is without hyper-parameter tuning
])

# Train the model
et_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_et = et_model.predict(X_test)

# Evaluate the Extra Trees model
mse_et = mean_squared_error(y_test, y_pred_et)
r2_et = r2_score(y_test, y_pred_et)

print('\nExtra Trees Model:')
print(f'Mean Squared Error (MSE): {mse_et}')
print(f'R-squared (R^2): {r2_et}')


Extra Trees Model:
Mean Squared Error (MSE): 9276587062641.662
R-squared (R^2): 0.8595417032492544


Using RandomizedSearchCV for tuning on Extra Trees (by lowest MSE)

In [None]:
# Define the parameter grid for Extra Trees
# Define the parameter grid for Extra Trees
param_grid_et = {
    'regressor__n_estimators': [int(x) for x in np.linspace(start=10, stop=200, num=10)],
    'regressor__max_features': ['auto', 'sqrt'],
    'regressor__max_depth': [int(x) for x in np.linspace(10, 110, num=11)] + [None],
    'regressor__min_samples_split': [2, 5, 10],
    'regressor__min_samples_leaf': [1, 2, 4],
}

# Use RandomizedSearchCV for hyperparameter tuning
random_search_et = RandomizedSearchCV(et_model, param_distributions=param_grid_et, n_iter=2, cv=5, scoring='neg_mean_squared_error', n_jobs=-1, random_state=seed)
random_search_et.fit(X_train, y_train)

# Get the best hyperparameters
best_params_et = random_search_et.best_params_

# Remove the 'regressor__' prefix from the parameter names
best_params_et = {key.replace('regressor__', ''): value for key, value in best_params_et.items()}

# Build the Extra Trees model with the best hyperparameters
best_et_model = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('regressor', ExtraTreesRegressor(random_state=seed, **best_params_et))
])

# Train the model
best_et_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_et_best = best_et_model.predict(X_test)

# Evaluate the Extra Trees model with hyperparameter tuning
mse_et_best = mean_squared_error(y_test, y_pred_et_best)
r2_et_best = r2_score(y_test, y_pred_et_best)

print('Extra Trees Model with Hyperparameter Tuning (RandomizedSearchCV):')
print(f'Best Hyperparameters: {best_params_et}')
print(f'Mean Squared Error (MSE): {mse_et_best}')
print(f'R-squared (R^2): {r2_et_best}')

Extra Trees Model with Hyperparameter Tuning (RandomizedSearchCV):
Best Hyperparameters: {'n_estimators': 115, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_features': 'sqrt', 'max_depth': None}
Mean Squared Error (MSE): 9311107021114.018
R-squared (R^2): 0.8590190310058752


**Summary**

# **Final Model: Estimation of Future Performance**

# **Saving Final Model**

Summary:
