# Regression Predict Student Solution

© Explore Data Science Academy

---
### Honour Code

I {**YOUR NAME, YOUR SURNAME**}, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the [EDSA honour code](https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.

### Predict Overview: Spain Electricity Shortfall Challenge

The government of Spain is considering an expansion of it's renewable energy resource infrastructure investments. As such, they require information on the trends and patterns of the countries renewable sources and fossil fuel energy generation. Your company has been awarded the contract to:

- 1. analyse the supplied data;
- 2. identify potential errors in the data and clean the existing data set;
- 3. determine if additional features can be added to enrich the data set;
- 4. build a model that is capable of forecasting the three hourly demand shortfalls;
- 5. evaluate the accuracy of the best machine learning model;
- 6. determine what features were most important in the model’s prediction decision, and
- 7. explain the inner working of the model to a non-technical audience.

Formally the problem statement was given to you, the senior data scientist, by your manager via email reads as follow:

> In this project you are tasked to model the shortfall between the energy generated by means of fossil fuels and various renewable sources - for the country of Spain. The daily shortfall, which will be referred to as the target variable, will be modelled as a function of various city-specific weather features such as `pressure`, `wind speed`, `humidity`, etc. As with all data science projects, the provided features are rarely adequate predictors of the target variable. As such, you are required to perform feature engineering to ensure that you will be able to accurately model Spain's three hourly shortfalls.
 
On top of this, she has provided you with a starter notebook containing vague explanations of what the main outcomes are. 

## Problem statement

## We are tasked to model the shortfall between the energy generated by means of fossil fuels and various renewable sources, for the country of Spain.

<a id="cont"></a>

## Table of Contents

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Loading Data</a>

<a href=#three>3. Exploratory Data Analysis (EDA)</a>

<a href=#four>4. Data Engineering</a>

<a href=#five>5. Modeling</a>

<a href=#six>6. Model Performance</a>

<a href=#seven>7. Model Explanations</a>

## Assumptions on variables

## Null Hypothesis:   The independent variables have no effect on load_shortfall_3h
## Alternative Hypothesis:   At least one of the independent variables has an effect on the 
## load_shortfall_3h

<a id="one"></a>
## 1. Importing Packages
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Importing Packages ⚡ |
| :--------------------------- |
| In this section you are required to import, and briefly discuss, the libraries that will be used throughout your analysis and modelling. |

---

In [1]:
# Libraries for data loading, data manipulation and data visulisation
import pandas as pd
import numpy as np
import seaborn as sns ## for data visualisation
import matplotlib.pyplot as plt ## for data visusalisation
import seaborn as sns ## for data visualisation
import os #for loading data (csv files e.t.c)

# Libraries for data preparation and model building

from sklearn.linear_model import LinearRegression

# Setting global constants to ensure notebook results are reproducible
##PARAMETER_CONSTANT = ###

<a id="two"></a>
## 2. Loading the Data
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Loading the data ⚡ |
| :--------------------------- |
| In this section you are required to load the data from the `df_train` file into a DataFrame. |

---

In [2]:
# Here the data is loaded 

In [3]:
train = pd.read_csv("df_train.csv")
test = pd.read_csv("df_test.csv")

In [4]:
train = train.copy() # Creation of a copy of the original data before processing it

In [5]:
test = test.copy()

In [6]:
train.head()

Unnamed: 0.1,Unnamed: 0,time,Madrid_wind_speed,Valencia_wind_deg,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,...,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min,load_shortfall_3h
0,0,2015-01-01 03:00:00,0.666667,level_5,0.0,0.666667,74.333333,64.0,0.0,1.0,...,265.938,281.013,269.338615,269.338615,281.013,269.338615,274.254667,265.938,265.938,6715.666667
1,1,2015-01-01 06:00:00,0.333333,level_10,0.0,1.666667,78.333333,64.666667,0.0,1.0,...,266.386667,280.561667,270.376,270.376,280.561667,270.376,274.945,266.386667,266.386667,4171.666667
2,2,2015-01-01 09:00:00,1.0,level_9,0.0,1.0,71.333333,64.333333,0.0,1.0,...,272.708667,281.583667,275.027229,275.027229,281.583667,275.027229,278.792,272.708667,272.708667,4274.666667
3,3,2015-01-01 12:00:00,1.0,level_8,0.0,1.0,65.333333,56.333333,0.0,1.0,...,281.895219,283.434104,281.135063,281.135063,283.434104,281.135063,285.394,281.895219,281.895219,5075.666667
4,4,2015-01-01 15:00:00,1.0,level_7,0.0,1.0,59.0,57.0,2.0,0.333333,...,280.678437,284.213167,282.252063,282.252063,284.213167,282.252063,285.513719,280.678437,280.678437,6620.666667


In [7]:
train.tail() # Calling the last 5 rows of train data

Unnamed: 0.1,Unnamed: 0,time,Madrid_wind_speed,Valencia_wind_deg,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,...,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min,load_shortfall_3h
8758,8758,2017-12-31 09:00:00,1.0,level_6,0.0,2.666667,89.0,95.666667,56.666667,4.333333,...,280.816667,281.276667,285.15,287.573333,280.483333,290.15,284.816667,279.686667,278.483333,-28.333333
8759,8759,2017-12-31 12:00:00,5.0,level_6,0.0,2.0,82.0,85.0,26.666667,8.0,...,283.483333,287.483333,286.483333,288.616667,287.15,291.15,287.15,282.4,280.15,2266.666667
8760,8760,2017-12-31 15:00:00,6.333333,level_9,0.4,7.333333,67.666667,71.0,63.333333,8.333333,...,285.15,289.816667,283.816667,285.33,289.15,286.816667,289.15,283.956667,281.15,822.0
8761,8761,2017-12-31 18:00:00,7.333333,level_8,0.2,7.333333,67.666667,79.0,63.333333,2.666667,...,283.483333,287.523333,278.816667,281.41,286.816667,284.15,289.15,282.666667,280.816667,-760.0
8762,8762,2017-12-31 21:00:00,4.333333,level_9,0.0,7.0,78.666667,68.666667,20.0,1.666667,...,282.15,287.483333,276.816667,281.02,287.15,285.15,287.483333,281.396667,280.483333,2780.666667


In [8]:
test.head()

Unnamed: 0.1,Unnamed: 0,time,Madrid_wind_speed,Valencia_wind_deg,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,...,Barcelona_temp_max,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min
0,8763,1/1/2018 0:00,5.0,level_8,0.0,5.0,87.0,71.333333,20.0,3.0,...,287.816667,280.816667,287.356667,276.15,280.38,286.816667,285.15,283.15,279.866667,279.15
1,8764,1/1/2018 3:00,4.666667,level_8,0.0,5.333333,89.0,78.0,0.0,3.666667,...,284.816667,280.483333,284.19,277.816667,281.01,283.483333,284.15,281.15,279.193333,278.15
2,8765,1/1/2018 6:00,2.333333,level_7,0.0,5.0,89.0,89.666667,0.0,2.333333,...,284.483333,276.483333,283.15,276.816667,279.196667,281.816667,282.15,280.483333,276.34,276.15
3,8766,1/1/2018 9:00,2.666667,level_7,0.0,5.333333,93.333333,82.666667,26.666667,5.666667,...,284.15,277.15,283.19,279.15,281.74,282.15,284.483333,279.15,275.953333,274.483333
4,8767,1/1/2018 12:00,4.0,level_7,0.0,8.666667,65.333333,64.0,26.666667,10.666667,...,287.483333,281.15,286.816667,281.816667,284.116667,286.15,286.816667,284.483333,280.686667,280.15


In [9]:
test.tail()

Unnamed: 0.1,Unnamed: 0,time,Madrid_wind_speed,Valencia_wind_deg,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,...,Barcelona_temp_max,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min
2915,11678,12/31/2018 9:00,0.333333,level_9,0.0,2.0,81.666667,49.666667,80.0,1.333333,...,279.816667,281.483333,278.14,270.816667,273.21,276.483333,276.15,279.816667,274.91,271.15
2916,11679,12/31/2018 12:00,0.333333,level_8,0.0,1.333333,61.0,28.333333,56.666667,1.0,...,286.483333,287.816667,286.15,278.15,278.443333,285.816667,278.816667,287.15,283.156667,280.483333
2917,11680,12/31/2018 15:00,1.0,level_6,0.0,3.0,47.0,26.333333,0.0,0.666667,...,289.483333,288.816667,288.82,284.15,285.073333,288.15,285.816667,290.816667,287.733333,286.483333
2918,11681,12/31/2018 18:00,1.0,level_6,0.0,2.0,52.666667,56.666667,0.0,0.666667,...,285.816667,285.15,284.473333,280.15,281.626667,283.15,282.816667,287.483333,283.813333,282.15
2919,11682,12/31/2018 21:00,1.333333,level_10,0.0,2.333333,61.666667,69.333333,0.0,1.333333,...,283.816667,276.816667,281.133333,276.15,276.45,278.483333,276.816667,283.816667,276.623333,276.483333


In [10]:
train.shape

(8763, 49)

In [11]:
test.shape

(2920, 48)

In [12]:
df=pd.concat([train,test])
df.tail()

Unnamed: 0.1,Unnamed: 0,time,Madrid_wind_speed,Valencia_wind_deg,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,...,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min,load_shortfall_3h
2915,11678,12/31/2018 9:00,0.333333,level_9,0.0,2.0,81.666667,49.666667,80.0,1.333333,...,281.483333,278.14,270.816667,273.21,276.483333,276.15,279.816667,274.91,271.15,
2916,11679,12/31/2018 12:00,0.333333,level_8,0.0,1.333333,61.0,28.333333,56.666667,1.0,...,287.816667,286.15,278.15,278.443333,285.816667,278.816667,287.15,283.156667,280.483333,
2917,11680,12/31/2018 15:00,1.0,level_6,0.0,3.0,47.0,26.333333,0.0,0.666667,...,288.816667,288.82,284.15,285.073333,288.15,285.816667,290.816667,287.733333,286.483333,
2918,11681,12/31/2018 18:00,1.0,level_6,0.0,2.0,52.666667,56.666667,0.0,0.666667,...,285.15,284.473333,280.15,281.626667,283.15,282.816667,287.483333,283.813333,282.15,
2919,11682,12/31/2018 21:00,1.333333,level_10,0.0,2.333333,61.666667,69.333333,0.0,1.333333,...,276.816667,281.133333,276.15,276.45,278.483333,276.816667,283.816667,276.623333,276.483333,


## From below we can see that, both Valencia_pressure and load_shortfall_3h have null values. 

In [13]:
df.isnull().sum()

Unnamed: 0                 0
time                       0
Madrid_wind_speed          0
Valencia_wind_deg          0
Bilbao_rain_1h             0
Valencia_wind_speed        0
Seville_humidity           0
Madrid_humidity            0
Bilbao_clouds_all          0
Bilbao_wind_speed          0
Seville_clouds_all         0
Bilbao_wind_deg            0
Barcelona_wind_speed       0
Barcelona_wind_deg         0
Madrid_clouds_all          0
Seville_wind_speed         0
Barcelona_rain_1h          0
Seville_pressure           0
Seville_rain_1h            0
Bilbao_snow_3h             0
Barcelona_pressure         0
Seville_rain_3h            0
Madrid_rain_1h             0
Barcelona_rain_3h          0
Valencia_snow_3h           0
Madrid_weather_id          0
Barcelona_weather_id       0
Bilbao_pressure            0
Seville_weather_id         0
Valencia_pressure       2522
Seville_temp_max           0
Madrid_pressure            0
Valencia_temp_max          0
Valencia_temp              0
Bilbao_weather

## Below we have stored our dataframe from df to df_new, then replaced the null values with the mode in the Valencia_pressure column.

In [14]:
df_new=df
df_new['Valencia_pressure']=df_new['Valencia_pressure'].fillna(df_new['Valencia_pressure'].mode()[0])

In [15]:
df_new.isnull().sum()

Unnamed: 0                 0
time                       0
Madrid_wind_speed          0
Valencia_wind_deg          0
Bilbao_rain_1h             0
Valencia_wind_speed        0
Seville_humidity           0
Madrid_humidity            0
Bilbao_clouds_all          0
Bilbao_wind_speed          0
Seville_clouds_all         0
Bilbao_wind_deg            0
Barcelona_wind_speed       0
Barcelona_wind_deg         0
Madrid_clouds_all          0
Seville_wind_speed         0
Barcelona_rain_1h          0
Seville_pressure           0
Seville_rain_1h            0
Bilbao_snow_3h             0
Barcelona_pressure         0
Seville_rain_3h            0
Madrid_rain_1h             0
Barcelona_rain_3h          0
Valencia_snow_3h           0
Madrid_weather_id          0
Barcelona_weather_id       0
Bilbao_pressure            0
Seville_weather_id         0
Valencia_pressure          0
Seville_temp_max           0
Madrid_pressure            0
Valencia_temp_max          0
Valencia_temp              0
Bilbao_weather

In [16]:
df_new.dtypes

Unnamed: 0                int64
time                     object
Madrid_wind_speed       float64
Valencia_wind_deg        object
Bilbao_rain_1h          float64
Valencia_wind_speed     float64
Seville_humidity        float64
Madrid_humidity         float64
Bilbao_clouds_all       float64
Bilbao_wind_speed       float64
Seville_clouds_all      float64
Bilbao_wind_deg         float64
Barcelona_wind_speed    float64
Barcelona_wind_deg      float64
Madrid_clouds_all       float64
Seville_wind_speed      float64
Barcelona_rain_1h       float64
Seville_pressure         object
Seville_rain_1h         float64
Bilbao_snow_3h          float64
Barcelona_pressure      float64
Seville_rain_3h         float64
Madrid_rain_1h          float64
Barcelona_rain_3h       float64
Valencia_snow_3h        float64
Madrid_weather_id       float64
Barcelona_weather_id    float64
Bilbao_pressure         float64
Seville_weather_id      float64
Valencia_pressure       float64
Seville_temp_max        float64
Madrid_p

## From the above we see that Seville_pressure,Valencia_wind_deg  and time are of type 'text'; so we need to convert them to numeric for the data to be more comprehensive.

In [17]:
df_new['time']=pd.to_datetime(df_new['time'])
df_new['hour'] = df_new['time'].dt.hour
# minute
df_new['minute'] = df_new['time'].dt.minute
# second
df_new['second'] = df_new['time'].dt.second
# Monday is 0 and Sunday is 6
df_new['year'] = df_new['time'].dt.year
df_new['month'] = df_new['time'].dt.month
df_new['day']=df_new['time'].dt.day


In [18]:
df_new['Valencia_wind_deg']=df_new['Valencia_wind_deg'].str.extract('(\d+)')
df_new['Valencia_wind_deg']=pd.to_numeric(df_new['Valencia_wind_deg'])

In [19]:
df_new.Seville_pressure=df_new.Seville_pressure .str.extract('(\d+)')
df_new.Seville_pressure=pd.to_numeric(df_new.Seville_pressure)

In [20]:
df_new.dtypes

Unnamed: 0                       int64
time                    datetime64[ns]
Madrid_wind_speed              float64
Valencia_wind_deg                int64
Bilbao_rain_1h                 float64
Valencia_wind_speed            float64
Seville_humidity               float64
Madrid_humidity                float64
Bilbao_clouds_all              float64
Bilbao_wind_speed              float64
Seville_clouds_all             float64
Bilbao_wind_deg                float64
Barcelona_wind_speed           float64
Barcelona_wind_deg             float64
Madrid_clouds_all              float64
Seville_wind_speed             float64
Barcelona_rain_1h              float64
Seville_pressure                 int64
Seville_rain_1h                float64
Bilbao_snow_3h                 float64
Barcelona_pressure             float64
Seville_rain_3h                float64
Madrid_rain_1h                 float64
Barcelona_rain_3h              float64
Valencia_snow_3h               float64
Madrid_weather_id        

## We now have all features as numerial data type.

<a id="three"></a>
## 3. Exploratory Data Analysis (EDA)
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Exploratory data analysis ⚡ |
| :--------------------------- |
| In this section, you are required to perform an in-depth analysis of all the variables in the DataFrame. |

---


In [21]:
# look at data statistics
train.describe()

Unnamed: 0.1,Unnamed: 0,Madrid_wind_speed,Bilbao_rain_1h,Valencia_wind_speed,Seville_humidity,Madrid_humidity,Bilbao_clouds_all,Bilbao_wind_speed,Seville_clouds_all,Bilbao_wind_deg,...,Madrid_temp_max,Barcelona_temp,Bilbao_temp_min,Bilbao_temp,Barcelona_temp_min,Bilbao_temp_max,Seville_temp_min,Madrid_temp,Madrid_temp_min,load_shortfall_3h
count,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,...,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0,8763.0
mean,4381.0,2.425729,0.135753,2.586272,62.658793,57.414717,43.469132,1.850356,13.714748,158.957511,...,289.540309,289.855459,285.017973,286.422929,288.447422,287.966027,291.633356,288.419439,287.202203,10673.857612
std,2529.804538,1.850371,0.374901,2.41119,22.621226,24.335396,32.551044,1.695888,24.272482,102.056299,...,9.752047,6.528111,6.705672,6.818682,6.102593,7.10559,8.17822,9.346796,9.206237,5218.046404
min,0.0,0.0,0.0,0.0,8.333333,6.333333,0.0,0.0,0.0,0.0,...,264.983333,270.816667,264.483333,267.483333,269.483333,269.063,270.15,264.983333,264.983333,-6618.0
25%,2190.5,1.0,0.0,1.0,44.333333,36.333333,10.0,0.666667,0.0,73.333333,...,282.15,284.973443,280.085167,281.374167,284.15,282.836776,285.816667,281.404281,280.299167,7390.333333
50%,4381.0,2.0,0.0,1.666667,65.666667,58.0,45.0,1.0,0.0,147.0,...,288.116177,289.416667,284.816667,286.158333,288.15,287.63,290.816667,287.053333,286.083333,11114.666667
75%,6571.5,3.333333,0.1,3.666667,82.0,78.666667,75.0,2.666667,20.0,234.0,...,296.816667,294.909,289.816667,291.034167,292.966667,292.483333,297.15,295.154667,293.8845,14498.166667
max,8762.0,13.0,3.0,52.0,100.0,100.0,100.0,12.666667,97.333333,359.333333,...,314.483333,307.316667,309.816667,310.71,304.816667,317.966667,314.816667,313.133333,310.383333,31904.0


In [22]:
# plot relevant feature interactions


In [23]:
plt.figure(figsize=(25,15))

<Figure size 1800x1080 with 0 Axes>

<Figure size 1800x1080 with 0 Axes>

In [24]:
correlation = train.corr()

In [25]:
sns.heatmap(correlation, xticklabels=correlation.columns, yticklabels=correlation.columns,annot=True,plt.figure(figsize=(25,15)))

SyntaxError: positional argument follows keyword argument (<ipython-input-25-b85e632324f5>, line 1)

In [None]:
sns.pairplot(train)

In [None]:
# evaluate correlation

In [None]:
# have a look at feature distributions

<a id="four"></a>
## 4. Data Engineering
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Data engineering ⚡ |
| :--------------------------- |
| In this section you are required to: clean the dataset, and possibly create new features - as identified in the EDA phase. |

---

In [None]:
# remove missing values/ features

In [2]:
df_new=df_new.drop(['Unnamed: 0','time'],axis=1)

NameError: name 'df_new' is not defined

In [None]:
# create new features

In [None]:
# engineer existing features

<a id="five"></a>
## 5. Modelling
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Modelling ⚡ |
| :--------------------------- |
| In this section, you are required to create one or more regression models that are able to accurately predict the thee hour load shortfall. |

---

In [None]:
# split data

In [3]:
y=df_new[:len(train)][['load_shortfall_3h']]
x=df_new[:len(train)].drop('load_shortfall_3h',axis=1)


NameError: name 'df_new' is not defined

In [None]:
x_submission = df_new[len(train):].drop('load_shortfall_3h',axis=1)

In [None]:
sub_pred = rf.predict(x_submission)

In [None]:
df_RF=pd.DataFrame({'load_shortfall_3h': sub_pred})
df_RF.head()

In [None]:
output = pd.DataFrame({'time': test["time"]}).reset_index().drop(["index"], axis=1)
submission = output.join(df_RF)

In [None]:
submission.to_csv('Team12_submission.csv', index=None)

In [None]:
y.describe()

In [None]:
# create targets and features dataset

In [None]:
Ir= LinearRegression()

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.1)

In [None]:
Ir.fit(x_train,y_train)
preds=Ir.predict(x_test)

In [None]:
# create one or more ML models

In [None]:
# evaluate one or more ML models

In [None]:

from sklearn.metrics import mean_squared_error
import math
from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

In [None]:
dt = DecisionTreeRegressor(random_state=42)
dt.fit(x_train, y_train)

rf = RandomForestRegressor()
rf.fit(x_train, y_train)

In [None]:
dt_pred = dt.predict(x_test) 

rf_pred = rf.predict(x_test)

In [None]:
print(f"root mean squared error for dt is: {rmse(y_test, dt_pred)}")

print(f"root mean squared error for rf is: {rmse(y_test, rf_pred)}")

In [None]:
def rmse(y_test,y_predict):
    return np.sqrt(mean_squared_error(y_test,y_predict))
    

In [None]:
rmse(y_test,preds)

In [None]:
from sklearn.metrics import r2_score

In [None]:
r2_score(y_test,preds)

In [None]:
x_train=df_new[:len(train)].drop('load_shortfall_3h',axis=1)
x_test=df_new[:len(train)].drop('load_shortfall_3h',axis=1)

In [None]:
Ir.fit(x_train,y)
preds=Ir.predict(x_test)

In [None]:
daf=pd.DataFrame(preds,columns=['load_shortfall_3h'])
daf.head()

In [None]:
output=pd.DataFrame({"time":test['time']})
submission=output.join(daf)
submission.to_csv("submission.csv",index=False)

<a id="six"></a>
## 6. Model Performance
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model performance ⚡ |
| :--------------------------- |
| In this section you are required to compare the relative performance of the various trained ML models on a holdout dataset and comment on what model is the best and why. |

---

In [None]:
# Compare model performance

In [None]:
# Choose best model and motivate why it is the best choice

<a id="seven"></a>
## 7. Model Explanations
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model explanation ⚡ |
| :--------------------------- |
| In this section, you are required to discuss how the best performing model works in a simple way so that both technical and non-technical stakeholders can grasp the intuition behind the model's inner workings. |

---

In [None]:
# discuss chosen methods logic