# Ford GoBike System Data: Presentation
## by Vallela Kavya

## Investigation Overview

The project is mainly written in Python 3.6. I used pandas and numpy for data analysis, and used seaborn, and matplotlib for data visualization.
In addition, Some plots are plotted in R with Ipython, such as Bokeh and gglot2.

## Dataset Overview

This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area.
<br>
Note that this dataset will require some data wrangling in order to make it tidy for analysis. 
<bR>
There are multiple cities covered by the linked system, and multiple data files will need to be joined together if a full year’s coverage is desired.
<br>
In addition, there was another announcement on April 24th 2018 that Ford GoBike would do a pilot launch of their electric bike service. So, I would like to find who might be especially interested of the news.

In [1]:
# import all packages and set plots to be embedded inline
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
import datetime
import math
import calendar

#from bokeh.plotting import figure, output_notebook, show # bokeh plotting library
#We'll show the plots in the cells of this notebook
#output_notebook()

from bokeh.io import output_file, show, output_notebook
from bokeh.models import ColumnDataSource, GMapOptions
from bokeh.plotting import gmap, figure


sns.set(color_codes=True)
sns.set(rc={'figure.figsize':(12.5,9.7)})

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [2]:
# load in the dataset into a pandas dataframe
%time

df = pd.read_csv('./2017-fordgobike-tripdata.csv')
df1 = pd.read_csv('201801-fordgobike-tripdata.csv')
df2 = pd.read_csv('201802-fordgobike-tripdata.csv')
df3 = pd.read_csv('201803-fordgobike-tripdata.csv')
df4 = pd.read_csv('201804-fordgobike-tripdata.csv')

Wall time: 0 ns


## Visualization 1: The age distribution of Ford goBike riders

In [None]:
plt.figure(figsize=(10,6))
base_color = sns.color_palette()[0]
ax1 = sns.boxplot(data = df, x = 'member_age', color = base_color)
#sns.boxplot(x='member_age', data = df, palette='Blues', orient='h')
plt.title("The age distribution of Ford goBike riders", fontsize=20, y=1.03)
plt.xlabel("Age[bike riders]", fontsize=18, labelpad=10)
plt.grid(False)
plt.xlim(0,120)
plt.savefig('image01.png');

<img src='image01.png'>

The plot shows that, the age distribution of bike riders renders between 20 to 60.<br>
The IQR of riders is between 25 to 42 like that. The outliers lies after 60.<br>
The minimum age of bike riders is 19(Q1), where as maximum age of bike riders is 60(Q3).


## Visualization 2: .How does subscibers and customers behave differently?

Subscribers' rides take place around both morning commute time and evening commute time. On the contrary, customers' rides take place the most during weekend, which represents the main purpose for the user type is different. 
<br>The former is for convenience around commute time and the latter is more likely for leisure.

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(1,2,1)
plt.suptitle('Most frequently used of age 20-40 group people', fontsize=22)
#bins_x = np.arange(1,1, 1)
#bins_y = np.arange(1,1, 1)
#plt.hist2d(data=subscriber_hour_df_pivoted, x='Weekday', y='Hour[day]', bins=[bins_x,bins_y], cmap='viridis_r', cmin=0.5)
sns.heatmap(subscriber_hour_df_pivoted, fmt='d', annot=True, cmap='YlGnBu_r', annot_kws={"size": 12})
plt.title("'Rank of subscribers' most frequently used(time)", y=1.015)
plt.xlabel('Weekday', labelpad=16)
#plt.ylabel('Hour [day]', labelpad=16)
plt.yticks(rotation=360)

plt.subplot(1,2,2)
sns.heatmap(customer_hour_df_pivoted, fmt='d', annot=True, cmap='YlGnBu_r', annot_kws={"size": 12}, cbar_kws={'label': 'Rank of frequently used timing'})
plt.title("'Rank of customers' most frequently used(time)", y=1.015)
plt.xlabel('Weekday', labelpad=16)
#plt.ylabel(' ')
plt.yticks(rotation=360)
plt.savefig('image11.png');

<img src='image11.png'>

Subscribers' most frequently used timing is around 7~9am and 4~6pm, which is a commute time.<br> 
In the contrary, Customers' most frequently used timing are weekend 12pm~4pm and weekday 5pm~6pm.<br> 
Customers tend to use this service during weekend for leisure and after work.

## Visualization 3: Which age group favors e-bike more?

Suprisingly, 10 to 20 years user group seems to show the most interest in the e-bike, seeing that the percentage of e-bike rides of all the rides by 10 to 20 years old users is around 10%.<br>
Also, the younger a rider is, the more likely the rider would be fond of electric bike rides.

In [None]:
plt.figure(figsize=(12,6))
my_palette = {"electric":"lightblue", 'non-electric':'deepskyblue'}
base_color = sns.color_palette()[0]
ax = sns.boxplot(x='start_time_date', y='count', hue='bike_type', linewidth=1.5, palette=my_palette, 
                 data=electric_bike_verification_df)
ax.grid(False)
plt.title('Avg count of Electric and Non-Electric bike rides from 24-April-2018 to 3-May-2018', y=1.015)
plt.xlabel('Month-Day', labelpad=16)
plt.ylabel('Avg Count[rides]', labelpad=16)
leg = ax.legend()
leg.set_title('Bike Type',prop={'size':14})
plt.savefig('image16.png');

<img src="image16.png">

It seems that there is huge difference in terms of average count of electric bike rides, which is less than 5 times, and normal bike rides, which is more than 5 times.<br>
After the news of new launch of electric bike service, maybe there were high demands on riding electric bikes at the time.

## Visualization 4:  Trend of 'bike share for all' members bike rides for different age group
'Bike share for all' program is a subscription model for low income bay area residents. Unlike most popular docks for all members gathered around financial distinct and market street, the popular docks for 'Bike share for all' members are sqreaded especially around bart station or caltrain station, even though those are quite further away from market street, such as 24th street bart station. 
<br>
I guess it's because low income people tend to travel further from south bay or east bay and they might take public transportation to work around San Fancisco area.

In [None]:
plt.figure(figsize=(10,7))
ax = sns.pointplot(x='start_time_year_month_renamed', y='bike_id', hue='member_age_bins', scale=.7, data=bike_share_for_all_trip_age_df)
plt.title("The monthly trend of 'bike share for all' members' bike rides per age group", fontsize=20, y=1.015)
plt.xlabel('year-month', labelpad=16)
plt.ylabel('count [rides]', labelpad=16)
leg = ax.legend()
leg.set_title('Member age group',prop={'size':12})
plt.grid(False)
plt.savefig('image14.png');

<img src='image14.png'>

It seems the 'Bike share for all' program launched around January 2018. There is an increasing trend after that. 10 to 20 years old group might not be qualified for the criteria.<br>
There is sharp increase of 40 to 50 years old users' rides after March 2018. However, 20 to 40 user groups' usage seems to be stagnated recently.

## Visualization 5: Which member age group relatively use electric bikes more than other group?

In [None]:
new_color=['navy', 'lightblue', 'lightblue', 'lightblue', 'lightblue', 'lightblue']
#my_palette = {"10-20":"navy", 'others':'lightskyblue'}
bike_rides_per_age_merged_df[['member_age_bins', 'perc']].plot(kind='bar',
                                                               x='member_age_bins', y='perc', color=new_color, figsize=(8,6), legend=False)
plt.title('Percentage of electric bike rides of overall bike rides', fontsize=22, y=1.015)
plt.xlabel('Age Group', labelpad=16)
plt.ylabel('Percentage [Electric Bike rides]', labelpad=16)
plt.xticks(rotation=360)
#leg = ax.legend()
#leg.set_title('Agegroup',prop={'size':14})
plt.grid(False)
plt.savefig('image18.png');

<img src='image18.png'>

10-20 age people (teenagers) have shown the least interest to this service of riding bikes. <br>
However, it is interesting that electric bike rides account for 10% of all bike rides. <br>
It seems that the younger you are, the more likely you are fond of electric bike rides.