# Regression to traffic volume
Aim is to use details about a day to predict the volume of traffic in a given hour.
Use what you've learnt so far to come up with your own solution.

Run the cell below to download the data

In [1]:
!mkdir ./data
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz -O ./data/Metro_Interstate_Traffic_Volume.csv.gz

mkdir: cannot create directory ‘./data’: File exists
--2019-07-15 09:44:25--  https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252, ::128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 405373 (396K) [application/x-httpd-php]
Saving to: ‘./data/Metro_Interstate_Traffic_Volume.csv.gz’


2019-07-15 09:44:27 (543 KB/s) - ‘./data/Metro_Interstate_Traffic_Volume.csv.gz’ saved [405373/405373]



In [6]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Union, Optional, Tuple
from collections import OrderedDict, defaultdict
import os
import re

import sklearn
from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble.forest import ForestRegressor
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

from keras.models import Model, Sequential
from keras.layers import Dense, Activation, Dropout, BatchNormalization, Input, Embedding, Reshape, Concatenate
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint, History

# Data Importing
The data is in compressed *Comma Separated Value* (CSV) format. To load it up, we'll use Pandas.

In [3]:
df = pd.read_csv('./data/Metro_Interstate_Traffic_Volume.csv.gz'); print(len(df)); df.head()

48204


Unnamed: 0,holiday,temp,rain_1h,snow_1h,clouds_all,weather_main,weather_description,date_time,traffic_volume
0,,288.28,0.0,0.0,40,Clouds,scattered clouds,2012-10-02 09:00:00,5545
1,,289.36,0.0,0.0,75,Clouds,broken clouds,2012-10-02 10:00:00,4516
2,,289.58,0.0,0.0,90,Clouds,overcast clouds,2012-10-02 11:00:00,4767
3,,290.13,0.0,0.0,90,Clouds,overcast clouds,2012-10-02 12:00:00,5026
4,,291.14,0.0,0.0,75,Clouds,broken clouds,2012-10-02 13:00:00,4918


The regression target is the `traffic_volume` column. Note the scale; maybe it would be good to pre-process it? Or perhaps to rescale the out of the model automatically?

Also note that the `data_time` column can't be read directly by the model. We need to extract out various features from the date and time in order to help the model better interpret the data.

In [10]:
def add_datepart(df, fldname, drop=True, time=False):
    "Helper function that adds columns relevant to a date. Courtesy of Fast.AI"
    fld = df[fldname]
    fld_dtype = fld.dtype
    if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
        fld_dtype = np.datetime64

    if not np.issubdtype(fld_dtype, np.datetime64):
        df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
    targ_pre = re.sub('[Dd]ate$', '', fldname)
    attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
            'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
    if time: attr = attr + ['Hour', 'Minute', 'Second']
    for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower())
    df[targ_pre + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9
    if drop: df.drop(fldname, axis=1, inplace=True)

In [7]:
add_datepart(df, 'date_time')

In [9]:
df.head()

Unnamed: 0,holiday,temp,rain_1h,snow_1h,clouds_all,weather_main,weather_description,traffic_volume,date_timeYear,date_timeMonth,...,date_timeDay,date_timeDayofweek,date_timeDayofyear,date_timeIs_month_end,date_timeIs_month_start,date_timeIs_quarter_end,date_timeIs_quarter_start,date_timeIs_year_end,date_timeIs_year_start,date_timeElapsed
0,,288.28,0.0,0.0,40,Clouds,scattered clouds,5545,2012,10,...,2,1,276,False,False,False,False,False,False,1349168400
1,,289.36,0.0,0.0,75,Clouds,broken clouds,4516,2012,10,...,2,1,276,False,False,False,False,False,False,1349172000
2,,289.58,0.0,0.0,90,Clouds,overcast clouds,4767,2012,10,...,2,1,276,False,False,False,False,False,False,1349175600
3,,290.13,0.0,0.0,90,Clouds,overcast clouds,5026,2012,10,...,2,1,276,False,False,False,False,False,False,1349179200
4,,291.14,0.0,0.0,75,Clouds,broken clouds,4918,2012,10,...,2,1,276,False,False,False,False,False,False,1349182800
