# Part II - Communicate Data Findings Project - Ford GoBike Slides
## by Chinomnso Chinedum


## Investigation Overview


> In this presentation, i focused on the relationship between distance traveled and  age, gender. I also showed some of the insights from the ambitious bikers (who traveled for 5.05km - 15km). Owing to the close relationship between duration and distance, I also showed some insights on duration and features like bike id, age, start hour and distance.




## Dataset Overview

> The dataset includes information about 183412 individual rides made in a bike-sharing system covering the greater San Francisco Bay area. The exploration exercise is focused on the distance covered by each person, and how this may be dependent on/closely related with other features such as age, gender, bike id, user type, day of the week

In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [None]:
# load in the dataset into a pandas dataframe
ford_df = pd.read_csv('fordgobike.csv')
distance_one_percent = pd.read_csv('ambitious_fordgobike.csv')

In [None]:
# Necessary colours
base_color = sb.color_palette()[7]

## Trip Duration and other quantitative variables

> Younger people (20 - 40 years) are more likely to ride a bike for longer minutes than older people. People who start a bike ride in the early hours of the morning (0hrs - 5hrs), are more likely to ride for a shorter period of time compared tho those who start after 10:00am. Those who rode their bikes for longer seconds were more likely to be using a more modern bike version. The bikers moved at different speeds, they could have ridden for 1000 seconds and traveled 5km, or just 1km.

In [None]:
# Create Scatter plots to get relationship between quantitative variables and trip duration

quantitative_columns = ['age', 'distance_km', 'start_hour', 'bike_id']

nrows, ncols = 2, 2
cols = quantitative_columns
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(20,10))

for i in range(nrows):
    for j in range(ncols): 
        sb.scatterplot(data=ford_df,x=ford_df['duration_sec'],y=ford_df[cols[i*ncols+j]],color=base_color, ax = ax[i,j],alpha=0.5)
        plt.suptitle('Trip Duration, Age, Distance, Start Hour and Bike ID', fontsize=20, va='bottom')

## Bike Usage for the Ambitious Bikers

> Interestingly, the ambitious bikers are mostly between their mid 20s and 40s. The people who rode the farthest are in their 30s
People who start a bike ride in the early hours of the morning (0hrs-5hrs), are more likely to ride for a shorter period of time compared tho those who start after 10:00am. Those who rode their bikes for longer seconds were more likely to be using a more modern bike version (between bike_id 4200-7000). Most of the bikers who traveled far, traveled fast.

In [None]:
# Lets find out the bike usage of the 1% (distance_km)

quantitative_columns = ['age', 'duration_sec', 'start_hour', 'bike_id']

nrows, ncols = 2, 2
cols = quantitative_columns
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(20,10))

for i in range(nrows):
    for j in range(ncols): 
        sb.scatterplot(data=distance_one_percent,x=distance_one_percent['distance_km'],y=distance_one_percent[cols[i*ncols+j]],color=base_color, ax = ax[i,j],alpha=0.5)
        plt.suptitle('Ambitious Bikers, their Age, Duration, Start Hour and Bike ID', fontsize=20, va='bottom')

## Trip Distance, Gender, and Age

> Across all genders, older people tend to ride for shorter distances. More people from the `Other` gender category cycle for shorter distances.

In [None]:
# Splitting up Shape Encoded Graph
gender = sb.FacetGrid(data = ford_df, col = 'member_gender', col_wrap = 3, size = 5,
                 xlim = [10, 100], ylim = [0, 6])
gender.map(plt.scatter, 'age', 'distance_km', alpha= 1)

gender.set_xlabels('Age')
gender.set_ylabels('Distance (km)')
gender.fig.suptitle('Trip Distance, Gender, and Age', fontsize=20, va='bottom')
plt.show()

### Summary
Different patterns of bike usage were observed across age, gender, bike_ids. Longer duration of bike riding was associated with modern bikes and younger age. Across all genders, older people tend to ride for shorter distances. More people from the `Other` gender category cycle for shorter distances. However, among the ambitious bikers, the 1% who traveled for distances between 5.05km and 15km, the older people within this category were females.


In [None]:
!jupyter nbconvert slide_deck.ipynb --to slides --post serve --no-input --no-prompt