# Ford GoBike System Data
## by Manuel Quijada

## Investigation Overview

> This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area. Bay Area Bike Share is a company that provides on-demand bike rentals for customers in San Francisco, Redwood City, Palo Alto, Mountain View, and San Jose. Users can unlock bikes from a variety of stations throughout each city, and return them to any station within the same city. Users pay for the service either through a yearly subscription or by purchasing 3-day or 24-hour passes. Users can make an unlimited number of trips, with trips under thirty minutes in length having no additional charge; longer trips will incur overtime fees.

The source data (Ford GoBike System Data) is available at : https://www.fordgobike.com/system-data.

## Dataset Overview

> The data consists of information regarding 3.27 billion bike hiring, including
age, timeframe, gender, station, and others. The dataset can be found in Ford GoBike website. Dataset: (https://s3.amazonaws.com/fordgobike-data/index.html). The data consisted of 16 different variables such as age, gender, weekday, time and others. It contains 3.27 billion rides. Ages in dataset from 18 to 56 takes 95% of the users in dataset. There were users more than 100 years old. So, we can remove users more than 60 years old as a cleaning and tidy up. We also generated new fields such as age group in order to make grouping and analyze the date by using groups. Ford GoBike extended the service to San Francisco, Oakland and San Jose. However, regarding complexity of traffic, we decided to focus on the San Fancisco area.

In [6]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [7]:
# load in the dataset into a pandas dataframe
from requests import get
from os import path, getcwd, makedirs, listdir 
from io import BytesIO
from zipfile import ZipFile
import pandas as pd
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import matplotlib.ticker as tick
import seaborn as sns
import datetime
import math
import calendar
import cufflinks as cf

import warnings
warnings.filterwarnings('ignore')

from IPython.display import Image

from bokeh.io import output_file, show, output_notebook
from bokeh.models import ColumnDataSource, GMapOptions
from bokeh.plotting import gmap, figure

%matplotlib inline

> Note that the above cells have been set as "Skip"-type slides. That means
that when the notebook is rendered as http slides, those cells won't show up.

## The age distribution of Ford GoBike users

> 95% of the distrubition of users age range from 18 to 56. Accordingly we remove users more than 60 years old.

In [8]:
plt.figure(figsize=(14,6))
sns.boxplot(x='member_age', data=df, palette='Blues', orient='h')
plt.title("The age distribution of Ford GoBike users", fontsize=20, y=1.03)
plt.xlabel("Age [bike riders]", fontsize=18, labelpad=10)
plt.savefig('image01.png');

NameError: name 'df' is not defined

<Figure size 1008x432 with 0 Axes>

## The monthly trend of bike rides

> You should have at least three visualizations in your presentation,
but feel free to add more if you'd like!

In [9]:
plt.figure(figsize=(14,8))
sns.countplot(x='start_time_year_month_renamed', palette="Blues", data=df.sort_values(by='start_time_year_month_renamed'))
plt.title('The monthly trend of bike rides', fontsize=22, y=1.015)
plt.xlabel('year-month', labelpad=16)
plt.ylabel('count [rides]', labelpad=16)
ax = plt.gca()
ax.yaxis.set_major_formatter(tick.FuncFormatter(transform_axis_fmt))
plt.savefig('image03.png');

NameError: name 'df' is not defined

<Figure size 1008x576 with 0 Axes>

## The monthly trend of bike for 20 to 50 years olds



In [10]:
plt.figure(figsize=(14,8))
my_palette = {'20 - 30': 'deepskyblue', '30 - 40': 'navy', '40 - 50': 'lightgrey'}
ax = sns.countplot(x='start_time_year_month_renamed', hue='member_age_bins', palette=my_palette, data=df[df['member_age_bins'].isin(['20 - 30', '30 - 40', '40 - 50'])].sort_values(by=['start_time_year_month_renamed', 'member_age_bins']))
plt.title('The monthly trend of bike rides for 20 to 50 years olds', fontsize=22, y=1.015)
plt.xlabel('year-month', labelpad=16)
plt.ylabel('count [rides]', labelpad=16)
leg = ax.legend()
leg.set_title('Member age group',prop={'size':16})
ax = plt.gca()
ax.yaxis.set_major_formatter(tick.FuncFormatter(transform_axis_fmt))
plt.savefig('image04.png');

NameError: name 'df' is not defined

<Figure size 1008x576 with 0 Axes>

> Once you're ready to finish your presentation, check your output by using
nbconvert to export the notebook and set up a server for the slides. From the
terminal or command line, use the following expression:
> > `jupyter nbconvert <file_name>.ipynb --to slides --post serve --template output_toggle`

> This should open a tab in your web browser where you can scroll through your
presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent
slide. Make sure you remove all of the quote-formatted guide notes like this one
before you finish your presentation!

In [11]:
jupyter nbconvert <file_name>.ipynb --to slides --post serve --template output_toggle

SyntaxError: invalid syntax (<ipython-input-11-649a05a455b4>, line 1)