### Model Describer Meetup Tutorial

In this notebook, we will be doing some brief EDA of the [bicycle trip dataset](https://www.kaggle.com/pronto/cycle-share-dataset/home). This is data from the Pronto Cycle Share system which consists of 500 bikes and 54 stations located in Seattle. 

The key question for this tutorial will be whether or not there are noticeable differences in trip duration by gender and by user age. We will not be controlling for location information (a known deficit of this tutorial). 

In this tutorial, we will be covering the following:

* [Data prep](#prep)
* [Exploratory data analysis](#eda)
* [Model Describer Regression Evaluation](#mdesc_regression)
* [Model Describer Classification Evaluation](#mdesc_classification)
* [Model Describer Regression Sensitivity](#mdesc_sensitivity_regression)
* [Model Describer Classification Sensitivity](mdesc_sensitivity_classification)
* [Additional thoughts](#thoughts)

In [315]:
import os
from datetime import datetime

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
import keras
import pandas as pd

In [316]:
# initialize plotly notebook_mode
init_notebook_mode(connected=True)

#### Data Prep <a id='prep' />

Read in data and perform basic data manipulations

In [305]:
# read in the trip and weather data. We will not be using the station data, but there are a number of ways we could use
# this if we took advantage of the location information. 

base_path = r'C:\Users\jlewris\Desktop\BikeData'
trip = pd.read_csv(os.path.join(base_path, 'trip.csv'), error_bad_lines=False)
weather = pd.read_csv(os.path.join(base_path, 'weather.csv'))

b'Skipping line 50794: expected 12 fields, saw 20\n'


In [306]:
def convert_date(dte):
    """
    convert string date into datetime object
    
    Parameters
    ----------
    dte - string
          datetime string object
          
    Return 
    ----------
    datetime obj
        string input converted to datetime object
    """
    try:
        dte = datetime.strptime(dte, '%m/%d/%Y %H:%M')
    except ValueError:
        dte = datetime.strptime(dte, '%m/%d/%Y')
    return dte

def return_part_date(dte, part_of_date='month'):
    """
    Pull the part_of_date from input datetime object
    
    Parameters
    ----------
    dte - datetime object
          input datetime object
    
    part_of_date - str - ['month', 'day', 'year', 'hour', 'minute']
          
    Return 
    ----------
    part of date
        part of date, i.e. year, hour, etc. 
    """
    dte = convert_date(dte)
    return getattr(dte, part_of_date)

def is_weekday(dte):
    """
    return whether a dte is a weekday or not
    
    Parameters
    ----------
    dte - datetime object
          input datetime object
              
    Return 
    ----------
    binary flag
        1 if is weekday else 0 
    """
    dte = convert_date(dte)
    if dte.weekday() not in [5, 6]:
        return 1
    else:
        return 0
    

In [307]:
# pull relevant parts of date
trip['start_day'] = trip['starttime'].apply(lambda x: return_part_date(x, part_of_date='day'))
trip['start_year'] = trip['starttime'].apply(lambda x: return_part_date(x, part_of_date='year'))
trip['start_month'] = trip['starttime'].apply(lambda x: return_part_date(x, part_of_date='month'))
trip['start_hour'] = trip['starttime'].apply(lambda x: return_part_date(x, part_of_date='hour'))
weather['start_day'] = weather['Date'].apply(lambda x: return_part_date(x, part_of_date='day'))
weather['start_year'] = weather['Date'].apply(lambda x: return_part_date(x, part_of_date='year'))
weather['start_month'] = weather['Date'].apply(lambda x: return_part_date(x, part_of_date='month'))

# test if date is weekday or not
trip['weekday'] = trip['starttime'].apply(lambda x: is_weekday(x))

In [308]:
# pull out just the mean values for weather
weather_sub = weather[['Mean_Temperature_F', 'MeanDew_Point_F', 'Mean_Humidity', 
                      'Mean_Visibility_Miles', 'Mean_Wind_Speed_MPH', 'Precipitation_In', 
                      'start_day', 'start_year', 'start_month']]

In [309]:
trip = pd.merge(trip, weather_sub, on=['start_day', 'start_year', 'start_month'], how='left')

In [310]:
# drop end location information
trip = trip.drop(['to_station_name', 'to_station_id', 'from_station_name'], axis=1)

In [313]:
# get age of rider
trip['rider_age'] = trip['start_year'] - trip['birthyear']

#### Basic Exploratory Data Analysis <a id='eda' />

Perform basic exploratory analysis of the bicycle data

In [321]:
numtrips_weekday = trip.groupby(['weekday', 'start_hour'])['trip_id'].nunique().reset_index(name='numTrip')

total_weekday_trips = trip.loc[trip['weekday'] == 1]['trip_id'].nunique()
total_weekend_trips = trip.loc[trip['weekday'] == 0]['trip_id'].nunique()

numtrips_weekday['percentTrips'] = numtrips_weekday.apply(lambda x: x['numTrip']/total_weekday_trips if x['weekday'] == 1 else x['numTrip']/total_weekend_trips, axis=1)

In [318]:
# get number of trips by hour of day

numTrips = trip.groupby('start_hour')['trip_id'].nunique().reset_index(name='numTrip')

data = [go.Bar(
    x = numTrips['start_hour'].tolist(),
    y = numTrips['numTrip'].tolist()
)]

iplot(data)

In [301]:
datetime.strptime('8/12/2018 10:54', '%m/%d/%Y %H:%M').weekday()

6

In [216]:
bike_station_start = (bike_station[['station_id', 'install_dockcount', 'current_dockcount', 'install_date']]
                      .rename(columns={"station_id": 'from_station_id', 
                                      'install_dockcount': 'from_install_dockcount', 
                                      'current_dockcount': 'from_current_dockcount', 'install_date': 'from_install_date'})
                     )

bike_trip = pd.merge(bike_trip, bike_station_start, on='from_station_id', how='left')

In [217]:
bike_trip.head(1)

Unnamed: 0,trip_id,starttime,stoptime,bikeid,tripduration,from_station_name,to_station_name,from_station_id,to_station_id,usertype,gender,birthyear,from_install_dockcount,from_current_dockcount,from_install_date
0,431,2014-10-13 10:31:00,2014-10-13 10:48:00,SEA00298,985.935,2nd Ave & Spring St,Occidental Park / Occidental Ave S & S Washing...,CBD-06,PS-04,Member,Male,1960.0,20.0,18.0,2014-10-13


In [221]:
pd.Datetime(bike_trip['starttime'])

numpy.datetime64('2014-10-13T10:31:00.000000000')