### You are asked to build regularized linear models to predict the game attendance based on the month, day_of_week, temp, skies, and bobblehead input attributes.

You need to perform the following data preprocessing operations:
* Encode the categorical variables using the one-hot encoding.
* Make sure that in the process both the training and testing datasets have the same data columns (input attributes).
* Standardize the new features by removing the mean and scaling to unit variance.
* Using the training dataset, train 100 L2-regularized linear models corresponding to 100 regularization coefficients evenly spaced between 0.1 and 1000. Use the leave-one-out cross-validation.
* Similarly, perform the same operation but now using L1-regularization.
* Train also a linear model without regularization.

In [52]:
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 15)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 1000)

In [53]:
# Here we load our CSV's into dataframes using pandas
dodgersTraining_df = pd.read_csv("/Users/maxrogers/Documents/laura/dodgers_training.csv")
dodgersTesting_df = pd.read_csv("/Users/maxrogers/Documents/laura/dodgers_testing.csv")

#Print our training Dataframe
dodgersTraining_df

Unnamed: 0,month,day,attend,day_of_week,opponent,temp,skies,day_night,cap,shirt,fireworks,bobblehead
0,APR,10,56000,Tuesday,Pirates,67,Clear,Day,NO,NO,NO,NO
1,APR,11,29729,Wednesday,Pirates,58,Cloudy,Night,NO,NO,NO,NO
2,APR,12,28328,Thursday,Pirates,57,Cloudy,Night,NO,NO,NO,NO
3,APR,13,31601,Friday,Padres,54,Cloudy,Night,NO,NO,YES,NO
4,APR,15,38359,Sunday,Padres,65,Clear,Day,NO,NO,NO,NO
5,APR,23,26376,Monday,Braves,60,Cloudy,Night,NO,NO,NO,NO
6,APR,25,26345,Wednesday,Braves,64,Cloudy,Night,NO,NO,NO,NO
...,...,...,...,...,...,...,...,...,...,...,...,...
49,SEP,14,40167,Friday,Cardinals,85,Clear,Night,NO,NO,YES,NO
50,SEP,15,42449,Saturday,Cardinals,95,Clear,Night,NO,NO,NO,NO


In [54]:
# using .dtypes to figure out which fields are objects (these are likely the fields we need to one-hot encode)
dodgersTraining_df.dtypes

month          object
day             int64
attend          int64
day_of_week    object
opponent       object
temp            int64
skies          object
day_night      object
cap            object
shirt          object
fireworks      object
bobblehead     object
dtype: object

## One-Hot encoded dodgers_training.csv:

In [60]:
# I'm not sure if day and temp also need to be one hot encoded. If so we can add that.
# The .get_dummies() function allows you to pick the dataframe and columns you need to make one-hot encoded:
oneHotDodgersTraining_df = pd.get_dummies(dodgersTraining_df, columns=['month','day_of_week','skies','day_night','bobblehead'])
# Print the new data frame. Note the new one-hot encoded columns to the right of 'fireworks'
oneHotDodgersTraining_df

Unnamed: 0,day,attend,opponent,temp,cap,shirt,fireworks,month_APR,month_AUG,month_JUL,month_JUN,month_MAY,month_OCT,month_SEP,day_of_week_Friday,day_of_week_Monday,day_of_week_Saturday,day_of_week_Sunday,day_of_week_Thursday,day_of_week_Tuesday,day_of_week_Wednesday,skies_Clear,skies_Cloudy,day_night_Day,day_night_Night,bobblehead_NO,bobblehead_YES
0,10,56000,Pirates,67,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,1,0
1,11,29729,Pirates,58,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0
2,12,28328,Pirates,57,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0
3,13,31601,Padres,54,NO,NO,YES,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,0
4,15,38359,Padres,65,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0
5,23,26376,Braves,60,NO,NO,NO,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1,0
6,25,26345,Braves,64,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49,14,40167,Cardinals,85,NO,NO,YES,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,1,0
50,15,42449,Cardinals,95,NO,NO,NO,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,1,0


## One-Hot Encoded dodgers_testing.csv:

In [61]:
# I'm not sure if day and temp also need to be one hot encoded. If so we can add that.
# The .get_dummies() function allows you to pick the dataframe and columns you need to make one-hot encoded:
oneHotDodgersTesting_df = pd.get_dummies(dodgersTesting_df, columns=['month','day_of_week','skies','day_night','bobblehead'])
# Print the new data frame. Note the new one-hot encoded columns to the right of 'fireworks'
oneHotDodgersTesting_df

Unnamed: 0,day,attend,opponent,temp,cap,shirt,fireworks,month_APR,month_AUG,month_JUL,month_JUN,month_MAY,month_OCT,month_SEP,day_of_week_Friday,day_of_week_Monday,day_of_week_Saturday,day_of_week_Sunday,day_of_week_Thursday,day_of_week_Tuesday,day_of_week_Wednesday,skies_Clear,skies_Cloudy,day_night_Day,day_night_Night,bobblehead_NO,bobblehead_YES
0,10,56000,Pirates,67,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,1,0
1,11,29729,Pirates,58,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0
2,12,28328,Pirates,57,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0
3,13,31601,Padres,54,NO,NO,YES,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,0
4,14,46549,Padres,57,NO,NO,NO,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,1,0
5,15,38359,Padres,65,NO,NO,NO,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0
6,23,26376,Braves,60,NO,NO,NO,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,16,35754,Cardinals,86,NO,NO,NO,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,1,0
75,28,37133,Rockies,77,NO,NO,YES,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,1,0
