# Regression Predict Student Solution

© Explore Data Science Academy

---
### Honour Code

I {JOSEPH, MHLOMI**}, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the [EDSA honour code](https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.

### Predict Overview: Spain Electricity Shortfall Challenge

The government of Spain is considering an expansion of it's renewable energy resource infrastructure investments. As such, they require information on the trends and patterns of the countries renewable sources and fossil fuel energy generation. Your company has been awarded the contract to:

- 1. analyse the supplied data;
- 2. identify potential errors in the data and clean the existing data set;
- 3. determine if additional features can be added to enrich the data set;
- 4. build a model that is capable of forecasting the three hourly demand shortfalls;
- 5. evaluate the accuracy of the best machine learning model;
- 6. determine what features were most important in the model’s prediction decision, and
- 7. explain the inner working of the model to a non-technical audience.

Formally the problem statement was given to you, the senior data scientist, by your manager via email reads as follow:

> In this project you are tasked to model the shortfall between the energy generated by means of fossil fuels and various renewable sources - for the country of Spain. The daily shortfall, which will be referred to as the target variable, will be modelled as a function of various city-specific weather features such as `pressure`, `wind speed`, `humidity`, etc. As with all data science projects, the provided features are rarely adequate predictors of the target variable. As such, you are required to perform feature engineering to ensure that you will be able to accurately model Spain's three hourly shortfalls.
 
On top of this, she has provided you with a starter notebook containing vague explanations of what the main outcomes are. 

<a id="cont"></a>

## Table of Contents

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Loading Data</a>

<a href=#three>3. Exploratory Data Analysis (EDA)</a>

<a href=#four>4. Data Engineering</a>

<a href=#five>5. Modeling</a>

<a href=#six>6. Model Performance</a>

<a href=#seven>7. Model Explanations</a>

 <a id="one"></a>
## 1. Importing Packages
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Importing Packages ⚡ |
| :--------------------------- |
| In this section you are required to import, and briefly discuss, the libraries that will be used throughout your analysis and modelling. |

---

In [1]:
# Libraries for data loading, data manipulation and data visulisation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Libraries for data preparation and model building
from sklearn import model_selection
from sklearn import metrics
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn import feature_selection
from sklearn import model_selection
from sklearn import preprocessing
from sklearn import decomposition
from sklearn import cluster
from sklearn import impute
from sklearn import pipeline
from sklearn import compose
from sklearn import calibration

# Model Evaluation and Hyperparameter Tuning
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

# Notebook Environment
from IPython.core.display import HTML


# Setting global constants to ensure notebook results are reproducible
PARAMETER_CONSTANT = 50

<a id="two"></a>
## 2. Loading the Data
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Loading the data ⚡ |
| :--------------------------- |
| In this section you are required to load the data from the `df_train` file into a DataFrame. |

---

In [2]:
train_df = pd.read_csv('df_train.csv')  # load train data csv into pandas DataFrame
test_df = pd.read_csv('df_test.csv') # Load test data csv into pandas DataFrame

<a id="three"></a>
## 3. Exploratory Data Analysis (EDA)
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Exploratory data analysis ⚡ |
| :--------------------------- |
| In this section, you are required to perform an in-depth analysis of all the variables in the DataFrame. |

---


In [3]:
# look at data statistics
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8763 entries, 0 to 8762
Data columns (total 49 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Unnamed: 0            8763 non-null   int64  
 1   time                  8763 non-null   object 
 2   Madrid_wind_speed     8763 non-null   float64
 3   Valencia_wind_deg     8763 non-null   object 
 4   Bilbao_rain_1h        8763 non-null   float64
 5   Valencia_wind_speed   8763 non-null   float64
 6   Seville_humidity      8763 non-null   float64
 7   Madrid_humidity       8763 non-null   float64
 8   Bilbao_clouds_all     8763 non-null   float64
 9   Bilbao_wind_speed     8763 non-null   float64
 10  Seville_clouds_all    8763 non-null   float64
 11  Bilbao_wind_deg       8763 non-null   float64
 12  Barcelona_wind_speed  8763 non-null   float64
 13  Barcelona_wind_deg    8763 non-null   float64
 14  Madrid_clouds_all     8763 non-null   float64
 15  Seville_wind_speed   

In [4]:
# plot relevant feature interactions
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8763 entries, 0 to 8762
Data columns (total 49 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Unnamed: 0            8763 non-null   int64  
 1   time                  8763 non-null   object 
 2   Madrid_wind_speed     8763 non-null   float64
 3   Valencia_wind_deg     8763 non-null   object 
 4   Bilbao_rain_1h        8763 non-null   float64
 5   Valencia_wind_speed   8763 non-null   float64
 6   Seville_humidity      8763 non-null   float64
 7   Madrid_humidity       8763 non-null   float64
 8   Bilbao_clouds_all     8763 non-null   float64
 9   Bilbao_wind_speed     8763 non-null   float64
 10  Seville_clouds_all    8763 non-null   float64
 11  Bilbao_wind_deg       8763 non-null   float64
 12  Barcelona_wind_speed  8763 non-null   float64
 13  Barcelona_wind_deg    8763 non-null   float64
 14  Madrid_clouds_all     8763 non-null   float64
 15  Seville_wind_speed   

In [5]:
# evaluate correlation
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2920 entries, 0 to 2919
Data columns (total 48 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Unnamed: 0            2920 non-null   int64  
 1   time                  2920 non-null   object 
 2   Madrid_wind_speed     2920 non-null   float64
 3   Valencia_wind_deg     2920 non-null   object 
 4   Bilbao_rain_1h        2920 non-null   float64
 5   Valencia_wind_speed   2920 non-null   float64
 6   Seville_humidity      2920 non-null   float64
 7   Madrid_humidity       2920 non-null   float64
 8   Bilbao_clouds_all     2920 non-null   float64
 9   Bilbao_wind_speed     2920 non-null   float64
 10  Seville_clouds_all    2920 non-null   float64
 11  Bilbao_wind_deg       2920 non-null   float64
 12  Barcelona_wind_speed  2920 non-null   float64
 13  Barcelona_wind_deg    2920 non-null   float64
 14  Madrid_clouds_all     2920 non-null   float64
 15  Seville_wind_speed   

In [6]:
# have a look at feature distributions
train_missing_values_df = train_df.isnull().any()
train_missing_values_df

Unnamed: 0              False
time                    False
Madrid_wind_speed       False
Valencia_wind_deg       False
Bilbao_rain_1h          False
Valencia_wind_speed     False
Seville_humidity        False
Madrid_humidity         False
Bilbao_clouds_all       False
Bilbao_wind_speed       False
Seville_clouds_all      False
Bilbao_wind_deg         False
Barcelona_wind_speed    False
Barcelona_wind_deg      False
Madrid_clouds_all       False
Seville_wind_speed      False
Barcelona_rain_1h       False
Seville_pressure        False
Seville_rain_1h         False
Bilbao_snow_3h          False
Barcelona_pressure      False
Seville_rain_3h         False
Madrid_rain_1h          False
Barcelona_rain_3h       False
Valencia_snow_3h        False
Madrid_weather_id       False
Barcelona_weather_id    False
Bilbao_pressure         False
Seville_weather_id      False
Valencia_pressure        True
Seville_temp_max        False
Madrid_pressure         False
Valencia_temp_max       False
Valencia_t

In [7]:
test_missing_values_df = train_df.isnull().any()
test_missing_values_df


Unnamed: 0              False
time                    False
Madrid_wind_speed       False
Valencia_wind_deg       False
Bilbao_rain_1h          False
Valencia_wind_speed     False
Seville_humidity        False
Madrid_humidity         False
Bilbao_clouds_all       False
Bilbao_wind_speed       False
Seville_clouds_all      False
Bilbao_wind_deg         False
Barcelona_wind_speed    False
Barcelona_wind_deg      False
Madrid_clouds_all       False
Seville_wind_speed      False
Barcelona_rain_1h       False
Seville_pressure        False
Seville_rain_1h         False
Bilbao_snow_3h          False
Barcelona_pressure      False
Seville_rain_3h         False
Madrid_rain_1h          False
Barcelona_rain_3h       False
Valencia_snow_3h        False
Madrid_weather_id       False
Barcelona_weather_id    False
Bilbao_pressure         False
Seville_weather_id      False
Valencia_pressure        True
Seville_temp_max        False
Madrid_pressure         False
Valencia_temp_max       False
Valencia_t

In [8]:
# look at the train data statistics
train_df.describe()

Unnamed: 0.1,Unnamed: 0,Madrid_wind_speed,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,Seville_clouds_all,Bilbao_wind_deg,...,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min,load_shortfall_3h
count,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,...,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0
mean,4381.0,2.425729,0.135753,2.586272,62.658793,57.414717,43.469132,1.850356,13.714748,158.957511,...,289.540309,289.855459,285.017973,286.422929,288.447422,287.966027,291.633356,288.419439,287.202203,10673.857612
std,2529.804538,1.850371,0.374901,2.41119,22.621226,24.335396,32.551044,1.695888,24.272482,102.056299,...,9.752047,6.528111,6.705672,6.818682,6.102593,7.10559,8.17822,9.346796,9.206237,5218.046404
min,0.0,0.0,0.0,0.0,8.333333,6.333333,0.0,0.0,0.0,0.0,...,264.983333,270.816667,264.483333,267.483333,269.483333,269.063,270.15,264.983333,264.983333,-6618.0
25%,2190.5,1.0,0.0,1.0,44.333333,36.333333,10.0,0.666667,0.0,73.333333,...,282.15,284.973443,280.085167,281.374167,284.15,282.836776,285.816667,281.404281,280.299167,7390.333333
50%,4381.0,2.0,0.0,1.666667,65.666667,58.0,45.0,1.0,0.0,147.0,...,288.116177,289.416667,284.816667,286.158333,288.15,287.63,290.816667,287.053333,286.083333,11114.666667
75%,6571.5,3.333333,0.1,3.666667,82.0,78.666667,75.0,2.666667,20.0,234.0,...,296.816667,294.909,289.816667,291.034167,292.966667,292.483333,297.15,295.154667,293.8845,14498.166667
max,8762.0,13.0,3.0,52.0,100.0,100.0,100.0,12.666667,97.333333,359.333333,...,314.483333,307.316667,309.816667,310.71,304.816667,317.966667,314.816667,313.133333,310.383333,31904.0


In [9]:
# look at the test data statistics
test_df.describe()

Unnamed: 0.1,Unnamed: 0,Madrid_wind_speed,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,Seville_clouds_all,Bilbao_wind_deg,...,Barcelona_temp_max,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min
count,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,...,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0,2920.0
mean,10222.5,2.45782,0.067517,3.012785,67.123516,62.644463,43.355422,2.283562,15.477283,162.643836,...,290.695462,288.888393,289.911289,284.920684,286.522375,289.124971,288.483641,290.152431,287.869763,286.61813
std,843.075718,1.774838,0.153381,1.99634,20.611292,24.138393,30.486298,1.654787,25.289197,97.749873,...,7.113599,9.089699,7.119411,6.803424,6.492355,7.168049,6.221324,7.906915,8.977511,8.733163
min,8763.0,0.0,0.0,0.0,11.666667,8.0,0.0,0.0,0.0,0.0,...,273.816667,269.816667,272.65,266.483333,268.12,271.483333,270.138667,271.15,268.713333,267.816667
25%,9492.75,1.333333,0.0,1.666667,52.0,43.0,13.333333,1.0,0.0,86.666667,...,284.816667,281.483333,284.3075,280.15,281.778333,283.483333,284.15,284.483333,280.816667,279.816667
50%,10222.5,2.0,0.0,2.333333,70.333333,63.0,45.0,1.666667,0.0,140.0,...,290.15,287.483333,289.483333,284.483333,286.265,288.816667,288.483333,289.15,286.396667,285.483333
75%,10952.25,3.333333,0.0,4.0,85.0,84.0,75.0,3.333333,20.0,233.333333,...,296.483333,295.483333,295.816667,289.816667,291.119167,295.15,292.816667,295.15,294.4525,293.15
max,11682.0,13.333333,1.6,14.333333,100.0,100.0,97.333333,10.666667,93.333333,360.0,...,309.483333,313.483333,308.15,307.483333,308.966667,306.816667,310.816667,314.483333,312.223333,310.15


In [10]:
#Missing entries in test data and its percentage
total = df_test.isnull().sum().sort_values(ascending=False)
percent = (df_test.isnull().sum()/df_test.isnull().count()).sort_values(ascending=False)
missing_training_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_training_data.head()

NameError: name 'df_test' is not defined

In [None]:
#missing training_data and its percentage
total = df_train.isnull().sum().sort_values(ascending=False)
percent = (df_train.isnull().sum()/df_train.isnull().count()).sort_values(ascending=False)
missing_training_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_training_data.head()

# Merging 2 Datasets


In [None]:
# Combining the two datasets in the same table
df = pd.concat([train_df, test_df])
df.head().T

In [None]:
#Check the last 5 features
df.tail().T

In [None]:
# Check the rows and columns
df.shape


In [None]:
# Look at the statistics of the joined dataset
df.describe().T


# Replace missing values/ features with the mode

In [None]:
Display null values


In [None]:
df.isnull().sum()

In [None]:
#missing training_data and its percentage
total = df.isnull().sum().sort_values(ascending=False)
percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)
missing_training_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_training_data.head()

Valencia_ pressure is the only feature with missing entries of 21.5%. Since load_shortfall is our target variable, we will ignore it.

In [None]:
# Check for the Mode, Mean and Median for the Valencia_pressure
print('Mode')
print(df['Valencia_pressure'].mode())
print('Mean')
print(df['Valencia_pressure'].mean())
print('Median')
print(df['Valencia_pressure'].median())

Replace the null value (Valencia_pressure) with the mode value

In [None]:
# Check for the Mode, Mean and Median for the Valencia_pressure
print('Mode')
print(df['Valencia_pressure'].mode())
print('Mean')
print(df['Valencia_pressure'].mean())
print('Median')
print(df['Valencia_pressure'].median())

Replace the null value (Valencia_pressure) with the mode value

In [None]:
# Initialization
df_clean = df
df_clean['Valencia_pressure'] = df_clean['Valencia_pressure'].fillna(df_clean['Valencia_pressure'].mode()[0])

In [None]:
# Replace the null/missing value with the mode
df_clean.isnull().sum()

In [None]:
#Create plots to visualize the distribution and check for outliers
plt.figure(figsize=(10, 5))

#Histogram plot
plt.subplot(1, 2, 2)
sns.histplot(df['Valencia_pressure'], kde=True)
plt.title('Valencia Pressure Histogram Plot')

#Boxplot
plt.subplot(1, 2, 1)
sns.boxplot(df['Valencia_pressure'], orient='v', width=0.2)
plt.title('Valencia Pressure Boxplot')

plt.tight_layout()
plt.show()

Univariate Analysis

In [None]:
# Distribution of the target variable
from matplotlib.pyplot import figure
figure(figsize=(15, 6), dpi=80)
sns.displot(df['load_shortfall_3h'])
plt.title('load shortfall 3h')
plt.show()

In [None]:
sns.distplot(df['Valencia_wind_speed'])
plt.title('Wind_Speed of Valencia City')
plt.show()

The above plot shows that the target variable is fairly symetrical

In [None]:
sns.distplot(df['Bilbao_clouds_all'])
plt.title('Bilbao City Weather')
plt.show()

Multivariate Analysis

Evaluate the correlation

In [None]:
# Correlation heat map showing relationship among variables in train data(Numerical)
sns.heatmap(df.corr())

In [None]:
from statsmodels.graphics.correlation import plot_corr

fig = plt.figure(figsize=(15,15));
ax = fig.add_subplot(111);
plot_corr(df.corr(), xnames = df.corr().columns, ax = ax);

<a id="four"></a>
## 4. Data Engineering
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Data engineering ⚡ |
| :--------------------------- |
| In this section you are required to: clean the dataset, and possibly create new features - as identified in the EDA phase. |

---

In [None]:
# remove missing values/ features


In [None]:
# create new features

In [None]:
# engineer existing features

<a id="five"></a>
## 5. Modelling
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Modelling ⚡ |
| :--------------------------- |
| In this section, you are required to create one or more regression models that are able to accurately predict the thee hour load shortfall. |

---

In [22]:
from sklearn.model_selection import train_test_split

# Define your features (X) and target variable (y) for training data
X_train = train_df.drop(columns=['load_shortfall_3h'])  # Exclude the target column
y_train = train_df['load_shortfall_3h']  # Target variable

# Split the data into training and testing sets for training data
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Now, X_train and y_train contain the training data, and X_test and y_test contain the testing data.





In [23]:
# create targets and features dataset
# Define your target variable (y)
y = train_df['load_shortfall_3h']

# Define your feature variables (X)
X = train_df.drop(columns=['load_shortfall_3h'])


In [24]:
# create one or more ML models
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

data = data.select_dtypes(include=['float64', 'int64'])  # Remove non-numeric columns

# Convert the 'time' column to datetime and extract features
data['time'] = pd.to_datetime(data['time'])
data['year'] = data['time'].dt.year
data['month'] = data['time'].dt.month
data['day'] = data['time'].dt.day
data['hour'] = data['time'].dt.hour

# Define your target variable (y)
y = data['load_shortfall_3h']

# Define your feature variables (X)
feature_columns = ['year', 'month', 'day', 'hour', ...]  # Include other relevant columns
X = data[feature_columns]


# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Linear Regression model
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)


# Initialize and train the Random Forest Regressor model
random_forest_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
random_forest_regressor.fit(X_train, y_train)


# Make predictions using the trained models
y_pred_linear = linear_regression.predict(X_test)
y_pred_random_forest = random_forest_regressor.predict(X_test)


# Evaluate model performance
mae_linear = mean_absolute_error(y_test, y_pred_linear)
mse_linear = mean_squared_error(y_test, y_pred_linear)
r2_linear = r2_score(y_test, y_pred_linear)


mae_random_forest = mean_absolute_error(y_test, y_pred_random_forest)
mse_random_forest = mean_squared_error(y_test, y_pred_random_forest)
r2_random_forest = r2_score(y_test, y_pred_random_forest)


# Print or log the evaluation results
print("Linear Regression - MAE:", mae_linear)
print("Linear Regression - MSE:", mse_linear)
print("Linear Regression - R-squared:", r2_linear)

print("Random Forest Regressor - MAE:", mae_random_forest)
print("Random Forest Regressor - MSE:", mse_random_forest)
print("Random Forest Regressor - R-squared:", r2_random_forest)


# Train the model on the training data

# Import necessary libraries
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

# Initialize and train the Linear Regression model
linear_regression = LinearRegression()
linear_regression.fit(X_train, y_train)

# Initialize and train the Random Forest Regressor model
random_forest_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
random_forest_regressor.fit(X_train, y_train)


NameError: name 'data' is not defined

In [21]:
# evaluate one or more ML models
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

# Create and train the Linear Regression model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Create and train the Random Forest Regression model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Make predictions
linear_predictions = linear_model.predict(X_test)
rf_predictions = rf_model.predict(X_test)

# Evaluate the models
linear_mae = mean_absolute_error(y_test, linear_predictions)
rf_mae = mean_absolute_error(y_test, rf_predictions)

linear_mse = mean_squared_error(y_test, linear_predictions)
rf_mse = mean_squared_error(y_test, rf_predictions)

linear_r2 = r2_score(y_test, linear_predictions)
rf_r2 = r2_score(y_test, rf_predictions)

# Print the evaluation metrics
print("Linear Regression Metrics:")
print(f"Mean Absolute Error: {linear_mae}")
print(f"Mean Squared Error: {linear_mse}")
print(f"R-squared (R2) Score: {linear_r2}")

print("\nRandom Forest Regression Metrics:")
print(f"Mean Absolute Error: {rf_mae}")
print(f"Mean Squared Error: {rf_mse}")
print(f"R-squared (R2) Score: {rf_r2}")


ValueError: could not convert string to float: '2017-01-16 12:00:00'

<a id="six"></a>
## 6. Model Performance
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model performance ⚡ |
| :--------------------------- |
| In this section you are required to compare the relative performance of the various trained ML models on a holdout dataset and comment on what model is the best and why. |

---

In [None]:
# Compare model performance

In [None]:
# Choose best model and motivate why it is the best choice

<a id="seven"></a>
## 7. Model Explanations
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model explanation ⚡ |
| :--------------------------- |
| In this section, you are required to discuss how the best performing model works in a simple way so that both technical and non-technical stakeholders can grasp the intuition behind the model's inner workings. |

---

In [None]:
# discuss chosen methods logic