# Predict Bike Sharing Demand with AutoGluon Template

## Project: Predict Bike Sharing Demand with AutoGluon
This notebook is a template with each step that you need to complete for the project.

Please fill in your code where there are explicit `?` markers in the notebook. You are welcome to add more cells and code as you see fit.

Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.

`File-> Export Notebook As... -> Export Notebook as HTML`

There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.

Completing the code template and writeup template will cover all of the rubric points for this project.

The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.

## Step 1: Create an account with Kaggle

### Create Kaggle Account and download API key
Below is example of steps to get the API username and key. Each student will have their own username and key.

1. Open account settings.
![kaggle1.png](attachment:kaggle1.png)
![kaggle2.png](attachment:kaggle2.png)
2. Scroll down to API and click Create New API Token.
![kaggle3.png](attachment:kaggle3.png)
![kaggle4.png](attachment:kaggle4.png)
3. Open up `kaggle.json` and use the username and key.
![kaggle5.png](attachment:kaggle5.png)

## Step 2: Download the Kaggle dataset using the kaggle python library

### Open up Sagemaker Studio and use starter template

1. Notebook should be using a `ml.t3.medium` instance (2 vCPU + 4 GiB)
2. Notebook should be using kernal: `Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)`

### Install packages

In [None]:
!pip install install pip==21.3.1
!pip install install setuptools==60.0.0 wheel==0.37.0
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
# Without --no-cache-dir, smaller aws instances may have trouble installing
!pip install autogluon --no-cache-dir
!pip install autogluon.tabular

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting autogluon
  Downloading autogluon-0.6.1-py3-none-any.whl (9.8 kB)
Collecting autogluon.features==0.6.1
  Downloading autogluon.features-0.6.1-py3-none-any.whl (59 kB)
     |████████████████████████████████| 59 kB 5.2 MB/s             
[?25hCollecting autogluon.multimodal==0.6.1
  Downloading autogluon.multimodal-0.6.1-py3-none-any.whl (289 kB)
     |████████████████████████████████| 289 kB 9.5 MB/s            
[?25hCollecting autogluon.vision==0.6.1
  Downloading autogluon.vision-0.6.1-py3-none-any.whl (49 kB)
     |████████████████████████████████| 49 kB 50.3 MB/s            
[?25hC

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Setup Kaggle API Key

In [None]:
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json

In [None]:
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "marquithia"
kaggle_key = "49c5fb254ef897109dc30d657b22b12d"

# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
    f.write(json.dumps({"username":"marquithia","key":"49c5fb254ef897109dc30d657b22b12d"}))

### Download and explore dataset

### Go to the bike sharing demand competition and agree to the terms
![kaggle6.png](attachment:kaggle6.png)

In [None]:
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip

bike-sharing-demand.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  bike-sharing-demand.zip
  inflating: sampleSubmission.csv    
  inflating: test.csv                
  inflating: train.csv               


In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd
import numpy as np

from sklearn import datasets
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularDataset, TabularPredictor


In [None]:
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv('train.csv', parse_dates=['datetime'])
train['year']=train['datetime'].dt.year
train['month']=train['datetime'].dt.month
train['day']=train['datetime'].dt.day
train['hour']=train['datetime'].dt.hour
train['weekday']=train['datetime'].dt.weekday

train=train.set_index('datetime')
train.head()

Unnamed: 0_level_0,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count,year,month,day,hour,weekday
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16,2011,1,1,0,5
2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40,2011,1,1,1,5
2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32,2011,1,1,2,5
2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13,2011,1,1,3,5
2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1,2011,1,1,4,5


In [None]:
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv('test.csv', parse_dates=['datetime'])
test['year']=test['datetime'].dt.year
test['month']=test['datetime'].dt.month
test['day']=test['datetime'].dt.day
test['hour']=test['datetime'].dt.hour
test['weekday']=test['datetime'].dt.weekday

test=test.set_index('datetime')
test.head()

Unnamed: 0_level_0,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,year,month,day,hour,weekday
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2011-01-20 00:00:00,1,0,1,1,10.66,11.365,56,26.0027,2011,1,20,0,3
2011-01-20 01:00:00,1,0,1,1,10.66,13.635,56,0.0,2011,1,20,1,3
2011-01-20 02:00:00,1,0,1,1,10.66,13.635,56,0.0,2011,1,20,2,3
2011-01-20 03:00:00,1,0,1,1,10.66,12.88,56,11.0014,2011,1,20,3,3
2011-01-20 04:00:00,1,0,1,1,10.66,12.88,56,11.0014,2011,1,20,4,3


In [None]:
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
x=train.drop(['casual', 'registered'],axis=1)
y=train['count']

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2, random_state=0)
x_train,x_val,y_train,y_val=train_test_split(x,y,test_size=0.25,random_state=0)
x_train.head()

Unnamed: 0_level_0,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,year,month,day,hour,weekday
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2012-12-09 17:00:00,4,0,0,3,14.76,17.425,93,8.9981,229,2012,12,9,17,6
2011-06-19 00:00:00,2,0,0,1,28.7,32.575,65,0.0,89,2011,6,19,0,6
2012-05-10 19:00:00,2,0,1,1,22.14,25.76,37,23.9994,553,2012,5,10,19,3
2011-12-06 08:00:00,4,0,1,2,18.86,22.725,94,12.998,414,2011,12,6,8,1
2011-04-17 08:00:00,2,0,0,1,15.58,19.695,46,26.0027,43,2011,4,17,8,6


In [None]:
train.shape

(10886, 16)

In [None]:
train.info

<bound method DataFrame.info of                      season  holiday  workingday  weather   temp   atemp  \
datetime                                                                   
2011-01-01 00:00:00       1        0           0        1   9.84  14.395   
2011-01-01 01:00:00       1        0           0        1   9.02  13.635   
2011-01-01 02:00:00       1        0           0        1   9.02  13.635   
2011-01-01 03:00:00       1        0           0        1   9.84  14.395   
2011-01-01 04:00:00       1        0           0        1   9.84  14.395   
...                     ...      ...         ...      ...    ...     ...   
2012-12-19 19:00:00       4        0           1        1  15.58  19.695   
2012-12-19 20:00:00       4        0           1        1  14.76  17.425   
2012-12-19 21:00:00       4        0           1        1  13.94  15.910   
2012-12-19 22:00:00       4        0           1        1  13.94  17.425   
2012-12-19 23:00:00       4        0           1        

In [None]:
test.shape

(6493, 13)

In [None]:
test.describe

<bound method DataFrame.info of                      season  holiday  workingday  weather   temp   atemp  \
datetime                                                                   
2011-01-20 00:00:00       1        0           1        1  10.66  11.365   
2011-01-20 01:00:00       1        0           1        1  10.66  13.635   
2011-01-20 02:00:00       1        0           1        1  10.66  13.635   
2011-01-20 03:00:00       1        0           1        1  10.66  12.880   
2011-01-20 04:00:00       1        0           1        1  10.66  12.880   
...                     ...      ...         ...      ...    ...     ...   
2012-12-31 19:00:00       1        0           1        2  10.66  12.880   
2012-12-31 20:00:00       1        0           1        2  10.66  12.880   
2012-12-31 21:00:00       1        0           1        1  10.66  12.880   
2012-12-31 22:00:00       1        0           1        1  10.66  13.635   
2012-12-31 23:00:00       1        0           1        

In [None]:
# Same thing as train and test dataset
submission = pd.read_csv('train.csv', parse_dates=['datetime'])
submission['year']=submission['datetime'].dt.year
submission['month']=submission['datetime'].dt.month
submission['day']=submission['datetime'].dt.day
submission['hour']=submission['datetime'].dt.hour
submission['weekday']=submission['datetime'].dt.weekday
submission.head()

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count,year,month,day,hour,weekday
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16,2011,1,1,0,5
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40,2011,1,1,1,5
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32,2011,1,1,2,5
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13,2011,1,1,3,5
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1,2011,1,1,4,5


## Step 3: Train a model using AutoGluon’s Tabular Prediction

Requirements:
* We are prediting `count`, so it is the label we are setting.
* Ignore `casual` and `registered` columns as they are also not present in the test dataset. 
* Use the `root_mean_squared_error` as the metric to use for evaluation.
* Set a time limit of 10 minutes (600 seconds).
* Use the preset `best_quality` to focus on creating the best model.

In [None]:
from autogluon.tabular import TabularPredictor, TabularDataset
train_data = TabularDataset('train.csv')
predictor = TabularPredictor(label='count').fit(train_data, time_limit=600)
best_quality={'auto_stack':True}

Loaded data from: train.csv | Columns = 12 / 12 | Rows = 10886 -> 10886
No path specified. Models will be saved in: "AutogluonModels/ag-20230111_171546/"
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20230111_171546/"
AutoGluon Version:  0.6.1
Python Version:     3.8.16
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Sat Dec 10 16:00:40 UTC 2022
Train Data Rows:    10886
Train Data Columns: 11
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
	Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
	If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting Auto

[1000]	valid_set's rmse: 5.01587
[2000]	valid_set's rmse: 4.47265
[3000]	valid_set's rmse: 4.30083
[4000]	valid_set's rmse: 4.21377
[5000]	valid_set's rmse: 4.15973
[6000]	valid_set's rmse: 4.11879
[7000]	valid_set's rmse: 4.08715
[8000]	valid_set's rmse: 4.06524
[9000]	valid_set's rmse: 4.05637
[10000]	valid_set's rmse: 4.04667


	-4.0462	 = Validation score   (-root_mean_squared_error)
	20.11s	 = Training   runtime
	2.42s	 = Validation runtime
Fitting model: LightGBM ... Training model for up to 574.86s of the 574.86s of remaining time.
	-2.7689	 = Validation score   (-root_mean_squared_error)
	1.6s	 = Training   runtime
	0.03s	 = Validation runtime
Fitting model: RandomForestMSE ... Training model for up to 573.09s of the 573.09s of remaining time.
	-2.6562	 = Validation score   (-root_mean_squared_error)
	11.78s	 = Training   runtime
	0.15s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 560.35s of the 560.35s of remaining time.
	-2.3869	 = Validation score   (-root_mean_squared_error)
	28.88s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: ExtraTreesMSE ... Training model for up to 531.43s of the 531.43s of remaining time.
	-2.4554	 = Validation score   (-root_mean_squared_error)
	5.76s	 = Training   runtime
	0.19s	 = Validation runtime
Fitting model: NeuralNetFa

### Review AutoGluon's training run with ranking of models that did the best.

In [None]:
predictor.fit_summary()

*** Summary of fit() ***
Estimated performance of each model:
                  model   score_val  pred_time_val   fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0   WeightedEnsemble_L2   -1.356320       2.839426  82.764951                0.000769           0.467354            2       True         12
1              CatBoost   -2.386901       0.007976  28.882586                0.007976          28.882586            1       True          6
2         ExtraTreesMSE   -2.455353       0.188448   5.763455                0.188448           5.763455            1       True          7
3       RandomForestMSE   -2.656229       0.152189  11.777817                0.152189          11.777817            1       True          5
4       NeuralNetFastAI   -2.704858       0.034058  14.170557                0.034058          14.170557            1       True          8
5              LightGBM   -2.768913       0.033062   1.595466                0.033062           1.

{'model_types': {'KNeighborsUnif': 'KNNModel',
  'KNeighborsDist': 'KNNModel',
  'LightGBMXT': 'LGBModel',
  'LightGBM': 'LGBModel',
  'RandomForestMSE': 'RFModel',
  'CatBoost': 'CatBoostModel',
  'ExtraTreesMSE': 'XTModel',
  'NeuralNetFastAI': 'NNFastAiTabularModel',
  'XGBoost': 'XGBoostModel',
  'NeuralNetTorch': 'TabularNeuralNetTorchModel',
  'LightGBMLarge': 'LGBModel',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif': -109.73942190555698,
  'KNeighborsDist': -92.44208479870365,
  'LightGBMXT': -4.0461778114790325,
  'LightGBM': -2.7689127077362037,
  'RandomForestMSE': -2.6562292088161206,
  'CatBoost': -2.386900508971571,
  'ExtraTreesMSE': -2.4553533866581576,
  'NeuralNetFastAI': -2.7048576699480877,
  'XGBoost': -3.760492207376374,
  'NeuralNetTorch': -9.193542733677647,
  'LightGBMLarge': -3.0279609622637578,
  'WeightedEnsemble_L2': -1.356320492791273},
 'model_best': 'WeightedEnsemble_L2',
 'model_paths': {'KNeighborsUnif': 'Aut

In [None]:
predictor.get_model_best()

'WeightedEnsemble_L2'

### Create predictions from test dataset

In [None]:
train_data = TabularDataset('train.csv')
subsample_size = 1000  # subsample subset of data for faster demo
train_data = train_data.sample(n=subsample_size, random_state=0)
print(train_data.head())

label = 'count'
print("Summary of count column: \n", train_data['count'].describe())

new_data = TabularDataset('test.csv')
test_data = new_data[10000:].copy()
val_data = new_data[:10000].copy()

Loaded data from: train.csv | Columns = 12 / 12 | Rows = 10886 -> 10886
Loaded data from: test.csv | Columns = 9 / 9 | Rows = 6493 -> 6493


                 datetime  season  holiday  workingday  weather   temp  \
6638  2012-03-13 21:00:00       1        0           1        1  23.78   
7975  2012-06-12 16:00:00       2        0           1        2  27.06   
5915  2012-02-02 16:00:00       1        0           1        1  18.86   
8050  2012-06-15 19:00:00       2        0           1        1  28.70   
5894  2012-02-01 19:00:00       1        0           1        1  22.14   

       atemp  humidity  windspeed  casual  registered  count  
6638  27.275        56     7.0015      44         200    244  
7975  29.545        89    19.0012      30         209    239  
5915  22.725        55    19.0012      18         211    229  
8050  31.820        42    11.0014      98         369    467  
5894  25.760        52    19.0012      20         315    335  
Summary of count column: 
 count    1000.000000
mean      191.085000
std       178.408019
min         1.000000
25%        40.000000
50%       150.000000
75%       283.250000
max

In [None]:
predictor.predict(test_data, model='WeightedEnsemble_L2')

Series([], Name: count, dtype: float64)

In [None]:
predictor.features()

['datetime',
 'season',
 'holiday',
 'workingday',
 'weather',
 'temp',
 'atemp',
 'humidity',
 'windspeed',
 'casual',
 'registered']

In [None]:
time_limit = 60  # for quick demonstration only (in seconds)
metric = 'root_mean_squared_error'  # specifing for RMSE here as the evaluation metric
predictor = TabularPredictor(label, eval_metric=metric).fit(train_data, time_limit=time_limit, presets='best_quality')
predictor.leaderboard(test_data, silent=True)

No path specified. Models will be saved in: "AutogluonModels/ag-20230111_194721/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "AutogluonModels/ag-20230111_194721/"
AutoGluon Version:  0.6.1
Python Version:     3.8.16
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Sat Dec 10 16:00:40 UTC 2022
Train Data Rows:    1000
Train Data Columns: 11
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
	Label info (max, min, mean, stddev): (900, 1, 191.085, 178.40802)
	If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Featur

ValueError: ignored

#### NOTE: Kaggle will reject the submission if we don't set everything to be > 0.

In [None]:
# Describe the `predictions` series to see if there are any negative values


In [None]:
# How many negative values do we have?


In [None]:
# Set them to zero


### Set predictions to submission dataframe, save, and submit

In [None]:
submission["count"] = pd.read_csv('sampleSubmission.csv', index_col=0)
submission.to_csv("submission.csv", index=False)

In [None]:
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"

  0% 0.00/763k [00:00<?, ?B/s]100% 763k/763k [00:00<00:00, 3.34MB/s]
Successfully submitted to Bike Sharing Demand

#### View submission via the command line or in the web browser under the competition's page - `My Submissions`

In [None]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6

fileName        date                 description           status  publicScore  privateScore  
--------------  -------------------  --------------------  ------  -----------  ------------  
submission.csv  2023-01-11 19:52:50  first raw submission  error                              
submission.csv  2023-01-11 19:14:44  first raw submission  error                              
submission.csv  2023-01-11 19:13:40  first raw submission  error                              


#### Initial score of `?`

## Step 4: Exploratory Data Analysis and Creating an additional feature
* Any additional feature will do, but a great suggestion would be to separate out the datetime into hour, day, or month parts.

In [None]:
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
train=train.histogram()

In [None]:
# create a new feature
train[?] = ?
test[?] = ?

## Make category types for these so models know they are not just numbers
* AutoGluon originally sees these as ints, but in reality they are int representations of a category.
* Setting the dtype to category will classify these as categories in AutoGluon.

In [None]:
train["season"] = ?
train["weather"] = ?
test["season"] = ?
test["weather"] = ?

In [None]:
# View are new feature
train.head()

In [None]:
# View histogram of all features again now with the hour feature
train.?

## Step 5: Rerun the model with the same settings as before, just with more features

In [None]:
predictor_new_features = TabularPredictor(?).fit(?)

In [None]:
predictor_new_features.fit_summary()

In [None]:
# Remember to set all negative values to zero


In [None]:
# Same submitting predictions
submission_new_features["count"] = ?
submission_new_features.to_csv("submission_new_features.csv", index=False)

In [None]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features"

In [None]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6

#### New Score of `?`

## Step 6: Hyper parameter optimization
* There are many options for hyper parameter optimization.
* Options are to change the AutoGluon higher level parameters or the individual model hyperparameters.
* The hyperparameters of the models themselves that are in AutoGluon. Those need the `hyperparameter` and `hyperparameter_tune_kwargs` arguments.

In [None]:
predictor_new_hpo = TabularPredictor(?).fit(?)

In [None]:
predictor_new_hpo.fit_summary()

In [None]:
# Remember to set all negative values to zero
?

In [None]:
# Same submitting predictions
submission_new_hpo["count"] = ?
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)

In [None]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"

In [None]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6

#### New Score of `?`

## Step 7: Write a Report
### Refer to the markdown file for the full report
### Creating plots and table for report

In [None]:
# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
    {
        "model": ["initial", "add_features", "hpo"],
        "score": [?, ?, ?]
    }
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_train_score.png')

In [None]:
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
    {
        "test_eval": ["initial", "add_features", "hpo"],
        "score": [?, ?, ?]
    }
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_test_score.png')

### Hyperparameter table

In [None]:
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
    "model": ["initial", "add_features", "hpo"],
    "hpo1": [?, ?, ?],
    "hpo2": [?, ?, ?],
    "hpo3": [?, ?, ?],
    "score": [?, ?, ?]
})