# BikeShare Data Analysis

In this project, we make use of Python to explore data provided by [Motivate](https://www.motivateco.com/), a bike share system provider for many major cities in the United States, to uncover bike share usage patterns in Chicago.

Here, we use a csv file with randomly selected data for the first six months of 2017, for the Chicago branch of Motivate. 

# Import tools

In [144]:
import pandas as pd

# Get the data

In [110]:
data = pd.read_csv('chicago.csv')

In [111]:
# load data file into a dataframe
df = pd.DataFrame(data)

# Prepare DataFrame

In [112]:
df.shape

(300000, 9)

In [113]:
df.head()

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
0,1423854,2017-06-23 15:09:32,2017-06-23 15:14:53,321,Wood St & Hubbard St,Damen Ave & Chicago Ave,Subscriber,Male,1992.0
1,955915,2017-05-25 18:19:03,2017-05-25 18:45:53,1610,Theater on the Lake,Sheffield Ave & Waveland Ave,Subscriber,Female,1992.0
2,9031,2017-01-04 08:27:49,2017-01-04 08:34:45,416,May St & Taylor St,Wood St & Taylor St,Subscriber,Male,1981.0
3,304487,2017-03-06 13:49:38,2017-03-06 13:55:28,350,Christiana Ave & Lawrence Ave,St. Louis Ave & Balmoral Ave,Subscriber,Male,1986.0
4,45207,2017-01-17 14:53:07,2017-01-17 15:02:01,534,Clark St & Randolph St,Desplaines St & Jackson Blvd,Subscriber,Male,1975.0


In [114]:
# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])

df.head()

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
0,1423854,2017-06-23 15:09:32,2017-06-23 15:14:53,321,Wood St & Hubbard St,Damen Ave & Chicago Ave,Subscriber,Male,1992.0
1,955915,2017-05-25 18:19:03,2017-05-25 18:45:53,1610,Theater on the Lake,Sheffield Ave & Waveland Ave,Subscriber,Female,1992.0
2,9031,2017-01-04 08:27:49,2017-01-04 08:34:45,416,May St & Taylor St,Wood St & Taylor St,Subscriber,Male,1981.0
3,304487,2017-03-06 13:49:38,2017-03-06 13:55:28,350,Christiana Ave & Lawrence Ave,St. Louis Ave & Balmoral Ave,Subscriber,Male,1986.0
4,45207,2017-01-17 14:53:07,2017-01-17 15:02:01,534,Clark St & Randolph St,Desplaines St & Jackson Blvd,Subscriber,Male,1975.0


In [115]:
# extract hour from the Start Time column to create an hour column
df['Hour'] = df['Start Time'].dt.hour

In [116]:
df.shape

(300000, 10)

In [117]:
df['Month'] = df['Start Time'].dt.month

In [118]:
df.shape

(300000, 11)

In [119]:
# create 'journey' column that concatenates start_station, end_station 
df['Journey'] = df['Start Station'].str.cat(df['End Station'], sep=' to ')

In [120]:
df.columns = [x.strip().replace(' ', '_') for x in df.columns]

# Data Analysis

In [121]:
df.head()

Unnamed: 0,Unnamed:_0,Start_Time,End_Time,Trip_Duration,Start_Station,End_Station,User_Type,Gender,Birth_Year,Hour,Month,Journey
0,1423854,2017-06-23 15:09:32,2017-06-23 15:14:53,321,Wood St & Hubbard St,Damen Ave & Chicago Ave,Subscriber,Male,1992.0,15,6,Wood St & Hubbard St to Damen Ave & Chicago Ave
1,955915,2017-05-25 18:19:03,2017-05-25 18:45:53,1610,Theater on the Lake,Sheffield Ave & Waveland Ave,Subscriber,Female,1992.0,18,5,Theater on the Lake to Sheffield Ave & Wavelan...
2,9031,2017-01-04 08:27:49,2017-01-04 08:34:45,416,May St & Taylor St,Wood St & Taylor St,Subscriber,Male,1981.0,8,1,May St & Taylor St to Wood St & Taylor St
3,304487,2017-03-06 13:49:38,2017-03-06 13:55:28,350,Christiana Ave & Lawrence Ave,St. Louis Ave & Balmoral Ave,Subscriber,Male,1986.0,13,3,Christiana Ave & Lawrence Ave to St. Louis Ave...
4,45207,2017-01-17 14:53:07,2017-01-17 15:02:01,534,Clark St & Randolph St,Desplaines St & Jackson Blvd,Subscriber,Male,1975.0,14,1,Clark St & Randolph St to Desplaines St & Jack...


In [126]:
print('Total trips: ', (df['Start_Time'].count()))

Total trips:  300000


In [127]:
# find the most common hour (from 0 to 23)
popular_hour = df['Hour'].mode()[0]

In [128]:
print('Popular hour: ', popular_hour)

Popular hour:  17


In [129]:
popular_month = df['Month'].mode()[0]

In [130]:
print('Popular month: ', popular_month)

Popular month:  6


In [131]:
days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
                    'Saturday', 'Sunday']
index = int(df['Start_Time'].dt.dayofweek.mode())
popular_day = days_of_week[index]

In [132]:
print('Popular day: ', popular_day)

Popular day:  Tuesday


In [133]:
popular_start_station = df['Start_Station'].mode().to_string(index = False)
popular_end_station = df['End_Station'].mode().to_string(index = False)

In [134]:
print('Popular Start Station: ', popular_start_station)
print('Popular End Station: ', popular_end_station)

Popular Start Station:  Streeter Dr & Grand Ave
Popular End Station:  Streeter Dr & Grand Ave


In [135]:
popular_journey = df['Journey'].mode().to_string(index = False)

In [136]:
print('Popular Journey: ', popular_journey)

Popular Journey:  Lake Shore Dr & Monroe St to Streeter Dr & Gra...


In [137]:
user_types=df['User_Type'].value_counts()

In [138]:
print(user_types)

Subscriber    238889
Customer       61110
Dependent          1
Name: User_Type, dtype: int64


In [139]:
gender_count=df['Gender'].value_counts()

In [140]:
print(gender_count)

Male      181190
Female     57758
Name: Gender, dtype: int64


In [141]:
earliest = int(df['Birth_Year'].min())
recent = int(df['Birth_Year'].max())
mode = int(df['Birth_Year'].mode())
print('The oldest birth year in the dataset is listed as {}.\nThe most recent birth year in the dataset is {}.'
          '\nThe most common birth year in the dataset is {}.'.format(earliest, recent, mode))

The oldest birth year in the dataset is listed as 1899.
The most recent birth year in the dataset is 2016.
The most common birth year in the dataset is 1989.


In [142]:
# display total travel time
total_travel_time = df['Trip_Duration'].sum()
print('Total Time Travel:', total_travel_time)
# display mean travel time
mean_travel_time = df['Trip_Duration'].mean()
print('Mean Time Travel:', mean_travel_time)

Total Time Travel: 280871787
Mean Time Travel: 936.23929


TO DO: Convert seconds to hours, minutes, seconds

import datetime
str(datetime.timedelta(seconds=666))
#https://stackoverflow.com/questions/775049/how-to-convert-seconds-to-hours-minutes-and-seconds
#https://docs.python.org/3/library/datetime.html

str(datetime.timedelta(seconds=936.23929,'.2f'))

the code above works if the two columns are both `datetime` columns, but you may have to take some steps to get them that way...for example if they are strings you may have to reformat the strings first using `datetime.strptime` to a different string format prior to trying to converting them to `np,datetime64` with apply().  I think the converstion using `np.datetime64` expects them to be in "%Y-%m-%d" format.

TO DO: Do not report hours if hours is 0 

if h == 0:
    average_trip = 
else:
    average_trip =

In [146]:
print (df.iloc[9220])

Unnamed:_0                                                 1265709
Start_Time                                     2017-06-13 17:06:21
End_Time                                       2017-06-13 17:15:54
Trip_Duration                                                  573
Start_Station                             LaSalle St & Illinois St
End_Station                              Columbus Dr & Randolph St
User_Type                                               Subscriber
Gender                                                      Female
Birth_Year                                                    1992
Hour                                                            17
Month                                                            6
Journey          LaSalle St & Illinois St to Columbus Dr & Rand...
Name: 9220, dtype: object


# Data Visualizations

In [149]:
import matplotlib.pyplot as plt

#https://matplotlib.org/gallery/lines_bars_and_markers/barh.html