# Assignment:

Your task is to engineer some new features to try to improve a model's ability to predict the total number of bike share rentals during a given hour of the day.

1. Import the data the drop the 'casual' and 'registered' columns. These are redundant with your target, 'count'.
2. Transform the 'datetime' column into a datetime type and use it to create 3 new columns in the data frame containing the:
    1. Name of the Month
    2. Name of the Day of the Week
    3. Hour of the Day
        1. Make sure all 3 new columns are 'object' datatype so they can be one-hot encoded later.
        2. Drop the 'datetime' and 'season' columns. These are now redundant.
3. The temperatures in the 'temp' and 'atemp' columns are in Celsius. Use `.apply()` and a Lambda function to convert them to Fahrenheit.
4. Create a new column, 'temp_variance,' which shows how much warmer or colder the current temperature ('temp') is than the average temperate for that day of the year ('atemp'). If the current temperature is warmer than average ('atemp'), the value in 'temp_variance' should be positive.
Drop the 'atemp' column.

### Optional:

Use a predictive model of your choice and try to predict the 'count' of hourly bike-share users with both the original features and the engineered feature set you created.

Remember to drop the 'casual' and 'registered' columns from both versions before modeling.

Did these feature engineering choices improve your ability to predict the 'count'?



# Imports

In [1]:
import pandas as pd
import numpy as np
import datetime as dt

In [2]:
fpath = "Data/bikeshare_train - bikeshare_train.csv"

df=pd.read_csv(fpath)
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   datetime    10886 non-null  object 
 1   season      10886 non-null  int64  
 2   holiday     10886 non-null  int64  
 3   workingday  10886 non-null  int64  
 4   weather     10886 non-null  int64  
 5   temp        10886 non-null  float64
 6   atemp       10886 non-null  float64
 7   humidity    10886 non-null  int64  
 8   windspeed   10886 non-null  float64
 9   casual      10886 non-null  int64  
 10  registered  10886 non-null  int64  
 11  count       10886 non-null  int64  
dtypes: float64(3), int64(8), object(1)
memory usage: 1020.7+ KB


Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
0,2011-01-01 0:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
1,2011-01-01 1:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2,2011-01-01 2:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
3,2011-01-01 3:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
4,2011-01-01 4:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


In [3]:
df=df.drop(columns=['casual', 'registered'])
df.head()

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,count
0,2011-01-01 0:00:00,1,0,0,1,9.84,14.395,81,0.0,16
1,2011-01-01 1:00:00,1,0,0,1,9.02,13.635,80,0.0,40
2,2011-01-01 2:00:00,1,0,0,1,9.02,13.635,80,0.0,32
3,2011-01-01 3:00:00,1,0,0,1,9.84,14.395,75,0.0,13
4,2011-01-01 4:00:00,1,0,0,1,9.84,14.395,75,0.0,1


In [4]:
df.sample(10)

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,count
7156,2012-04-16 13:00:00,2,1,0,1,30.34,33.335,45,19.9995,263
4834,2011-11-14 12:00:00,4,0,1,1,22.96,26.515,56,19.9995,202
10493,2012-12-03 15:00:00,4,0,1,1,24.6,31.06,56,0.0,268
796,2011-02-16 15:00:00,1,0,1,1,18.86,22.725,28,27.9993,117
1157,2011-03-12 21:00:00,1,0,0,1,15.58,19.695,62,0.0,82
6309,2012-02-19 3:00:00,1,0,0,2,12.3,14.395,36,15.0013,15
4672,2011-11-07 18:00:00,4,0,1,1,18.86,22.725,59,7.0015,425
7920,2012-06-10 9:00:00,2,0,0,1,28.7,32.575,58,0.0,266
10530,2012-12-05 4:00:00,4,0,1,1,20.5,24.24,63,30.0026,10
5089,2011-12-06 3:00:00,4,0,1,2,18.86,22.725,87,12.998,3


In [5]:
df['datetime']=pd.to_datetime(df['datetime'], format='%Y-%m-%d %H:%M:%S')
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   datetime    10886 non-null  datetime64[ns]
 1   season      10886 non-null  int64         
 2   holiday     10886 non-null  int64         
 3   workingday  10886 non-null  int64         
 4   weather     10886 non-null  int64         
 5   temp        10886 non-null  float64       
 6   atemp       10886 non-null  float64       
 7   humidity    10886 non-null  int64         
 8   windspeed   10886 non-null  float64       
 9   count       10886 non-null  int64         
dtypes: datetime64[ns](1), float64(3), int64(6)
memory usage: 850.6 KB


Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,count
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,16
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,40
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,32
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,13
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,1


In [6]:
df['year']=df['datetime'].dt.year
df.head()

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,year
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,16,2011
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,40,2011
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,32,2011
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,13,2011
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,1,2011


In [7]:
df['day_name']=df['datetime'].dt.day_name()

In [8]:
df['hour']=df['datetime'].dt.hour
df.head()

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,year,day_name,hour
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,16,2011,Saturday,0
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,40,2011,Saturday,1
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,32,2011,Saturday,2
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,13,2011,Saturday,3
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,1,2011,Saturday,4


In [9]:
df=df.drop(columns=['datetime','season'])
df.head()

Unnamed: 0,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,year,day_name,hour
0,0,0,1,9.84,14.395,81,0.0,16,2011,Saturday,0
1,0,0,1,9.02,13.635,80,0.0,40,2011,Saturday,1
2,0,0,1,9.02,13.635,80,0.0,32,2011,Saturday,2
3,0,0,1,9.84,14.395,75,0.0,13,2011,Saturday,3
4,0,0,1,9.84,14.395,75,0.0,1,2011,Saturday,4


In [12]:
df['temp']=df['temp'].apply(lambda x: x*9/5 + 32)

In [13]:
df['atemp']=df['atemp'].apply(lambda x: x*9/5 + 32)
df.head()

Unnamed: 0,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,year,day_name,hour
0,0,0,1,49.712,57.911,81,0.0,16,2011,Saturday,0
1,0,0,1,48.236,56.543,80,0.0,40,2011,Saturday,1
2,0,0,1,48.236,56.543,80,0.0,32,2011,Saturday,2
3,0,0,1,49.712,57.911,75,0.0,13,2011,Saturday,3
4,0,0,1,49.712,57.911,75,0.0,1,2011,Saturday,4


In [15]:
df['temp_variance']=df['temp'] - df['atemp']

In [16]:
df=df.drop(columns=['atemp'])
df.head()

Unnamed: 0,holiday,workingday,weather,temp,humidity,windspeed,count,year,day_name,hour,temp_variance
0,0,0,1,49.712,81,0.0,16,2011,Saturday,0,-8.199
1,0,0,1,48.236,80,0.0,40,2011,Saturday,1,-8.307
2,0,0,1,48.236,80,0.0,32,2011,Saturday,2,-8.307
3,0,0,1,49.712,75,0.0,13,2011,Saturday,3,-8.199
4,0,0,1,49.712,75,0.0,1,2011,Saturday,4,-8.199


In [17]:
column="temp_variance"
move = df.pop(column)
df.insert(4,column,move)
df.head()

Unnamed: 0,holiday,workingday,weather,temp,temp_variance,humidity,windspeed,count,year,day_name,hour
0,0,0,1,49.712,-8.199,81,0.0,16,2011,Saturday,0
1,0,0,1,48.236,-8.307,80,0.0,40,2011,Saturday,1
2,0,0,1,48.236,-8.307,80,0.0,32,2011,Saturday,2
3,0,0,1,49.712,-8.199,75,0.0,13,2011,Saturday,3
4,0,0,1,49.712,-8.199,75,0.0,1,2011,Saturday,4


In [19]:
column='year'
move=df.pop(column)
df.insert(0,column,move)

In [20]:
column = 'day_name'
move=df.pop(column)
df.insert(1,column,move)

In [22]:
column = 'hour'
move = df.pop(column)
df.insert(2,column,move)
df.head()

Unnamed: 0,year,day_name,hour,holiday,workingday,weather,temp,temp_variance,humidity,windspeed,count
0,2011,Saturday,0,0,0,1,49.712,-8.199,81,0.0,16
1,2011,Saturday,1,0,0,1,48.236,-8.307,80,0.0,40
2,2011,Saturday,2,0,0,1,48.236,-8.307,80,0.0,32
3,2011,Saturday,3,0,0,1,49.712,-8.199,75,0.0,13
4,2011,Saturday,4,0,0,1,49.712,-8.199,75,0.0,1
