# Part I - Ford GoBike System Data
## by Kenechukwu Nwankwo




## Investigation Overview


> The overall goals of the presentation here is to present key insight discovered from the Ford Go biking system data set. 


## Dataset Overview

>This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. There are 183412 rows and 16 columns in the dataset. Before performing analysis on the dataset, I wrangled the data and deleted rows with null values, changed the data types for some columns and added new columns to the dataframe.


In [None]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [None]:
# load in the dataset into a pandas dataframe
# read in the data
bike_df = pd.read_csv('201902-fordgobike-tripdata.csv')

# make a copy of the data
df = bike_df.copy()

# display the first five rows of the dataset
df.head()

> Note that the above cells have been set as "Skip"-type slides. That means
that when the notebook is rendered as http slides, those cells won't show up.

## Gender Distribution

> Male users made up a higehr percentage than female and other genders


In [None]:
# Function for univariate plots
def univariate_plot(variable, plot_title, plot_type = sns.histplot):
    """Function for univariate plots
    Args: 
        variable: a column to be plotted
        plot_title: title of the plot
        plot_type: the type of plot e.g hisplot, countplot, etc
    """
    plt.figure(figsize=(10, 5))
    plot_type(data=df, x=variable)
    plt.xticks(rotation = 45)
    plt.title(plot_title); 
    
univariate_plot('member_gender', 'Gender Distribution')

## Duration against User type

> Customers ride for a longer time than subscribers

In [None]:
# Function for bivariate plots
def bivariate_plot(variable_x, plot_title, variable_y='duration_sec', plot_type = sns.barplot):
    """Function for univariate plots
    Args: 
        variable_x: a column to be plotted on the x_axis
        plot_title: title of the plot
        variable_y: a column to be plotted on the y_axis
        plot_type: the type of plot e.g catplot
    """
    plt.figure(figsize=(10, 5))
    plot_type(data=df, x=variable_x, y=variable_y) 
    plt.xticks(rotation = 45)
    plt.title(plot_title); 
    
bivariate_plot('user_type', 'Duration By User Type')

## Duration by Bike sharing conditioned by Age

> Most of the Long trips were not shared trips and were mostly done by younger drivers. 

In [None]:
# Creating a scatter plot to examine the distribution for Bike sharing against trip duration in hrs and age 

df['duration_hr']= (df['duration_sec']/3600)
df['member_age']= (2022 - df['member_birth_year'])
Dist_type = sns.FacetGrid(data = df, col = 'bike_share_for_all_trip', col_wrap = 3, height = 7,xlim = [20, 85], ylim = [-0.1, 5])
Dist_type.map(plt.scatter,'member_age', 'duration_hr', alpha=0.5)
Dist_type.set_xlabels('User Age')
Dist_type.set_ylabels('Duration in hrs');
plt.suptitle("Bike Sharing Distribution against Trip Duration and age".title(),y=1.1,fontsize=14,weight="bold");

>**Generate Slideshow**: Once you're ready to generate your slideshow, use the `jupyter nbconvert` command to generate the HTML slide show. . From the terminal or command line, use the following expression.

In [None]:
!jupyter nbconvert Part_II_slide_deck_kene.ipynb --to slides --post serve --no-input --no-prompt

> This should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel. 