# Feature Engineering - Core
**Student:** Matthew Malueg

**Complete the following tasks:**
- Import the data and drop the redundant 'casual' and 'registered' columns.
- Transform the 'datetime' column into a more useful datatype, and use it to create 3 new columns with date information.
- Convert the temperature columns from Celsius to Fahrenheit.
- Create a column showing the daily temperature's variance from the mean.

### Loading, Imports, Custom Functions

In [1]:
# Imports
import datetime
import pandas as pd

In [2]:
# Load the dataset
df = pd.read_csv("Data/bikeshare_train - bikeshare_train.csv")
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   datetime    10886 non-null  object 
 1   season      10886 non-null  int64  
 2   holiday     10886 non-null  int64  
 3   workingday  10886 non-null  int64  
 4   weather     10886 non-null  int64  
 5   temp        10886 non-null  float64
 6   atemp       10886 non-null  float64
 7   humidity    10886 non-null  int64  
 8   windspeed   10886 non-null  float64
 9   casual      10886 non-null  int64  
 10  registered  10886 non-null  int64  
 11  count       10886 non-null  int64  
dtypes: float64(3), int64(8), object(1)
memory usage: 1020.7+ KB


Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
0,2011-01-01 0:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
1,2011-01-01 1:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2,2011-01-01 2:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
3,2011-01-01 3:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
4,2011-01-01 4:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


### Process the data

#### 1. Drop the 'casual' and 'registered columns, as 'count' gives us this target info.

In [3]:
df.drop(columns=['casual', 'registered'], axis=1, inplace=True)

#### 2. Transform the 'datetime' column to a datetime type, and create 3 new columns:
    - Name of the month
    - Name of the day of the week
    - Hour of the day
    - Make sure all 3 columns are 'object' type so they can be one-hot-encoded later.

- Drop the now redundant 'datetime' and 'season' columns.

In [4]:
# Change to datetime dtype
df['datetime'] = pd.to_datetime(df['datetime'])

In [5]:
# Create three new cols
df['Month Name'] = df['datetime'].dt.month_name()
df['Day Name'] = df['datetime'].dt.day_name()
df['Hour'] = df['datetime'].dt.hour

In [6]:
# Drop redundant cols
df.drop(columns=['datetime', 'season'], axis=1, inplace=True)

#### 3. Change the temperatures in the 'temp' and 'atemp' columns from celsius to fahrenheit using a lambda function and .apply()

In [7]:
# Check temp values before transforming
df.head()

Unnamed: 0,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,Month Name,Day Name,Hour
0,0,0,1,9.84,14.395,81,0.0,16,January,Saturday,0
1,0,0,1,9.02,13.635,80,0.0,40,January,Saturday,1
2,0,0,1,9.02,13.635,80,0.0,32,January,Saturday,2
3,0,0,1,9.84,14.395,75,0.0,13,January,Saturday,3
4,0,0,1,9.84,14.395,75,0.0,1,January,Saturday,4


In [8]:
# Change Celsius to Farenheit
df['temp'] = df['temp'].apply(lambda x: x*(9/5) + 32)
df['atemp'] = df['atemp'].apply(lambda x: x*(9/5) + 32)

In [9]:
# Verify new temperature values after transforming
df.head()

Unnamed: 0,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,Month Name,Day Name,Hour
0,0,0,1,49.712,57.911,81,0.0,16,January,Saturday,0
1,0,0,1,48.236,56.543,80,0.0,40,January,Saturday,1
2,0,0,1,48.236,56.543,80,0.0,32,January,Saturday,2
3,0,0,1,49.712,57.911,75,0.0,13,January,Saturday,3
4,0,0,1,49.712,57.911,75,0.0,1,January,Saturday,4


#### 4. Create a column showing the variance in temperature

In [10]:
# Create column and verify accuracy of value
df['temp_variance'] = df['temp'] - df['atemp']
df.head()

Unnamed: 0,holiday,workingday,weather,temp,atemp,humidity,windspeed,count,Month Name,Day Name,Hour,temp_variance
0,0,0,1,49.712,57.911,81,0.0,16,January,Saturday,0,-8.199
1,0,0,1,48.236,56.543,80,0.0,40,January,Saturday,1,-8.307
2,0,0,1,48.236,56.543,80,0.0,32,January,Saturday,2,-8.307
3,0,0,1,49.712,57.911,75,0.0,13,January,Saturday,3,-8.199
4,0,0,1,49.712,57.911,75,0.0,1,January,Saturday,4,-8.199


In [11]:
# Drop 'atemp'
df.drop(columns='atemp', axis=1, inplace=True)
df.head()

Unnamed: 0,holiday,workingday,weather,temp,humidity,windspeed,count,Month Name,Day Name,Hour,temp_variance
0,0,0,1,49.712,81,0.0,16,January,Saturday,0,-8.199
1,0,0,1,48.236,80,0.0,40,January,Saturday,1,-8.307
2,0,0,1,48.236,80,0.0,32,January,Saturday,2,-8.307
3,0,0,1,49.712,75,0.0,13,January,Saturday,3,-8.199
4,0,0,1,49.712,75,0.0,1,January,Saturday,4,-8.199
