# Case Study 1: How Does a Bike-Share Navigate Speedy Success?

## Scenario
I am a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of
marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, the
team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, the team will
design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve the team's
recommendations, so they must be backed up with compelling data insights and professional data visualizations

## Characters and teams
- **Cyclistic**: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart
by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities
and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use
the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each
day.

- **Lily Moreno**: The director of marketing and your manager. Moreno is responsible for the development of campaigns and
initiatives to promote the bike-share program. These may include email, social media, and other channels.
- **Cyclistic marketing analytics team**: A team of data analysts who are responsible for collecting, analyzing, and reporting
data that helps guide Cyclistic marketing strategy. I joined this team six months ago and have been busy learning about
Cyclistic’s mission and business goals — as well as how I, as a junior data analyst, can help Cyclistic achieve them.
- **Cyclistic executive team**: The notoriously detail-oriented executive team will decide whether to approve the
recommended marketing program

## About the company
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are
geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to
any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One
approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and
annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who
purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing
flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to
future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good
chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have
chosen Cyclistic for their mobility needs

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do
that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual
riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in
analyzing the Cyclistic historical bike trip data to identify trends.

## Ask
How do annual members and casual riders use Cyclistic bikes differently?

With this case study, I will produce a report with the following deliverables :
1. A clear statement of the business task
2. A description of all data sources used
3. Documentation of any cleaning or manipulation of data
4. A summary of the analysis
5. Supporting visualizations and key findings
6. Top three recommendations based on your analysis

## Prepare

The data used would be Cyclistic's historical trip to analyze and identify trends. While cyclistic might be a fictional company, for the purposes of this case study, the datasets are appropriate and will be enable to answer the business questions. The data has been made available by Motivate International Inc under this [license](https://www.divvybikes.com/data-license-agreement).

The data used will be Q2 2019 until Q1 2020. The data is public and will be available to use for exploring how different customer types are using Cyclistic bikes. However, data-privacy issues prohibits from using riders' personally identifiable information.

## Process

The tools I'll be using would be Python with libraries such as NumPy, Pandas, Matplotlib, and Seaborn.

I'll download the [data](https://divvy-tripdata.s3.amazonaws.com/index.html) and store it in my local folders, as previously mentioned, the data used will be Q2 2019 until Q1 2020.

## Analyze



In [2]:
#loading libraries needed
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
#reading the datas
q2_2019 = pd.read_csv('rawdata/Divvy_trips_2019_q2.csv')
q3_2019 = pd.read_csv('rawdata/Divvy_Trips_2019_Q3.csv')
q4_2019 = pd.read_csv('RawData/Divvy_Trips_2019_Q4.csv')
q1_2020 = pd.read_csv('RawData/Divvy_Trips_2020_Q1.csv')

In [4]:
#checking q2_2019 column name
q2_2019.columns

Index(['01 - Rental Details Rental ID', '01 - Rental Details Local Start Time',
       '01 - Rental Details Local End Time', '01 - Rental Details Bike ID',
       '01 - Rental Details Duration In Seconds Uncapped',
       '03 - Rental Start Station ID', '03 - Rental Start Station Name',
       '02 - Rental End Station ID', '02 - Rental End Station Name',
       'User Type', 'Member Gender',
       '05 - Member Details Member Birthday Year'],
      dtype='object')

In [5]:
#checking q3_2019 column name
q3_2019.columns

Index(['trip_id', 'start_time', 'end_time', 'bikeid', 'tripduration',
       'from_station_id', 'from_station_name', 'to_station_id',
       'to_station_name', 'usertype', 'gender', 'birthyear'],
      dtype='object')

In [6]:
#checking q4_2019 column name
q4_2019.columns

Index(['trip_id', 'start_time', 'end_time', 'bikeid', 'tripduration',
       'from_station_id', 'from_station_name', 'to_station_id',
       'to_station_name', 'usertype', 'gender', 'birthyear'],
      dtype='object')

In [7]:
#checking q1_2020 column name
q1_2020.columns

Index(['ride_id', 'rideable_type', 'started_at', 'ended_at',
       'start_station_name', 'start_station_id', 'end_station_name',
       'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng',
       'member_casual'],
      dtype='object')

In [8]:
#checking q2_2019 head to check which data is which
q2_2019.head()

Unnamed: 0,01 - Rental Details Rental ID,01 - Rental Details Local Start Time,01 - Rental Details Local End Time,01 - Rental Details Bike ID,01 - Rental Details Duration In Seconds Uncapped,03 - Rental Start Station ID,03 - Rental Start Station Name,02 - Rental End Station ID,02 - Rental End Station Name,User Type,Member Gender,05 - Member Details Member Birthday Year
0,22178529,2019-04-01 00:02:22,2019-04-01 00:09:48,6251,446.0,81,Daley Center Plaza,56,Desplaines St & Kinzie St,Subscriber,Male,1975.0
1,22178530,2019-04-01 00:03:02,2019-04-01 00:20:30,6226,1048.0,317,Wood St & Taylor St,59,Wabash Ave & Roosevelt Rd,Subscriber,Female,1984.0
2,22178531,2019-04-01 00:11:07,2019-04-01 00:15:19,5649,252.0,283,LaSalle St & Jackson Blvd,174,Canal St & Madison St,Subscriber,Male,1990.0
3,22178532,2019-04-01 00:13:01,2019-04-01 00:18:58,4151,357.0,26,McClurg Ct & Illinois St,133,Kingsbury St & Kinzie St,Subscriber,Male,1993.0
4,22178533,2019-04-01 00:19:26,2019-04-01 00:36:13,3270,1007.0,202,Halsted St & 18th St,129,Blue Island Ave & 18th St,Subscriber,Male,1992.0


In [9]:
q3_2019.head()

Unnamed: 0,trip_id,start_time,end_time,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear
0,23479388,2019-07-01 00:00:27,2019-07-01 00:20:41,3591,1214.0,117,Wilton Ave & Belmont Ave,497,Kimball Ave & Belmont Ave,Subscriber,Male,1992.0
1,23479389,2019-07-01 00:01:16,2019-07-01 00:18:44,5353,1048.0,381,Western Ave & Monroe St,203,Western Ave & 21st St,Customer,,
2,23479390,2019-07-01 00:01:48,2019-07-01 00:27:42,6180,1554.0,313,Lakeview Ave & Fullerton Pkwy,144,Larrabee St & Webster Ave,Customer,,
3,23479391,2019-07-01 00:02:07,2019-07-01 00:27:10,5540,1503.0,313,Lakeview Ave & Fullerton Pkwy,144,Larrabee St & Webster Ave,Customer,,
4,23479392,2019-07-01 00:02:13,2019-07-01 00:22:26,6014,1213.0,168,Michigan Ave & 14th St,62,McCormick Place,Customer,,


In [10]:
q4_2019.head()

Unnamed: 0,trip_id,start_time,end_time,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear
0,25223640,2019-10-01 00:01:39,2019-10-01 00:17:20,2215,940.0,20,Sheffield Ave & Kingsbury St,309,Leavitt St & Armitage Ave,Subscriber,Male,1987.0
1,25223641,2019-10-01 00:02:16,2019-10-01 00:06:34,6328,258.0,19,Throop (Loomis) St & Taylor St,241,Morgan St & Polk St,Subscriber,Male,1998.0
2,25223642,2019-10-01 00:04:32,2019-10-01 00:18:43,3003,850.0,84,Milwaukee Ave & Grand Ave,199,Wabash Ave & Grand Ave,Subscriber,Female,1991.0
3,25223643,2019-10-01 00:04:32,2019-10-01 00:43:43,3275,2350.0,313,Lakeview Ave & Fullerton Pkwy,290,Kedzie Ave & Palmer Ct,Subscriber,Male,1990.0
4,25223644,2019-10-01 00:04:34,2019-10-01 00:35:42,5294,1867.0,210,Ashland Ave & Division St,382,Western Ave & Congress Pkwy,Subscriber,Male,1987.0


In [11]:
q1_2020.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,EACB19130B0CDA4A,docked_bike,2020-01-21 20:06:59,2020-01-21 20:14:30,Western Ave & Leland Ave,239,Clark St & Leland Ave,326.0,41.9665,-87.6884,41.9671,-87.6674,member
1,8FED874C809DC021,docked_bike,2020-01-30 14:22:39,2020-01-30 14:26:22,Clark St & Montrose Ave,234,Southport Ave & Irving Park Rd,318.0,41.9616,-87.666,41.9542,-87.6644,member
2,789F3C21E472CA96,docked_bike,2020-01-09 19:29:26,2020-01-09 19:32:17,Broadway & Belmont Ave,296,Wilton Ave & Belmont Ave,117.0,41.9401,-87.6455,41.9402,-87.653,member
3,C9A388DAC6ABF313,docked_bike,2020-01-06 16:17:07,2020-01-06 16:25:56,Clark St & Randolph St,51,Fairbanks Ct & Grand Ave,24.0,41.8846,-87.6319,41.8918,-87.6206,member
4,943BC3CBECCFD662,docked_bike,2020-01-30 08:37:16,2020-01-30 08:42:48,Clinton St & Lake St,66,Wells St & Hubbard St,212.0,41.8856,-87.6418,41.8899,-87.6343,member


It looks like there's a change in the naming of the columns from ```q2_2019``` to ```q3_2019``` and it change again in ```q4_2019``` to ```q1_2020```
We'll rename to columns to make them consistent with q1_2020 since it's the latest data and it will be the supposed going-forward table design for Cyclistic.

In [12]:
#changing format of q2_2019 to that of q1_2020

#create a dictionary
dictQ2_2019 = {'01 - Rental Details Rental ID' : 'ride_id', 
        '01 - Rental Details Local Start Time' : 'started_at',
        '01 - Rental Details Local End Time' : 'ended_at', 
        '01 - Rental Details Bike ID' : 'rideable_type',
        #'01 - Rental Details Duration In Seconds Uncapped', 
        '03 - Rental Start Station ID' : 'start_station_id', 
        '03 - Rental Start Station Name' : 'start_station_name', 
        '02 - Rental End Station ID' : 'end_station_id', 
        '02 - Rental End Station Name' : 'end_station_name', 
        'User Type' : 'member_casual', 
        #'Member Gender', 
        #'05 - Member Details Member Birthday Year'
        }

q2_2019.rename(columns = dictQ2_2019,
               inplace= True)

In [13]:
#changing format of q3_2019 and q4_2019 to that of q1_2020
#since the format of q3_2019 and q4_2019 is the same, we'll be using the same dictionary

dictQ3Q4 = {
    'trip_id' : 'ride_id', 
    'start_time' : 'started_at', 
    'end_time' : 'ended_at', 
    'bikeid' : 'rideable_type', 
    #'tripduration', 
    'from_station_id' : 'start_station_id', 
    'from_station_name' : 'start_station_name', 
    'to_station_id' : 'end_station_id', 
    'to_station_name' : 'end_station_name', 
    'usertype' : 'member_casual', 
    #'gender', 
    #'birthyear'
    }

q3_2019.rename(columns = dictQ3Q4, 
               inplace = True)

q4_2019.rename(columns = dictQ3Q4, 
               inplace = True)

In [14]:
q2_2019.head()

Unnamed: 0,ride_id,started_at,ended_at,rideable_type,01 - Rental Details Duration In Seconds Uncapped,start_station_id,start_station_name,end_station_id,end_station_name,member_casual,Member Gender,05 - Member Details Member Birthday Year
0,22178529,2019-04-01 00:02:22,2019-04-01 00:09:48,6251,446.0,81,Daley Center Plaza,56,Desplaines St & Kinzie St,Subscriber,Male,1975.0
1,22178530,2019-04-01 00:03:02,2019-04-01 00:20:30,6226,1048.0,317,Wood St & Taylor St,59,Wabash Ave & Roosevelt Rd,Subscriber,Female,1984.0
2,22178531,2019-04-01 00:11:07,2019-04-01 00:15:19,5649,252.0,283,LaSalle St & Jackson Blvd,174,Canal St & Madison St,Subscriber,Male,1990.0
3,22178532,2019-04-01 00:13:01,2019-04-01 00:18:58,4151,357.0,26,McClurg Ct & Illinois St,133,Kingsbury St & Kinzie St,Subscriber,Male,1993.0
4,22178533,2019-04-01 00:19:26,2019-04-01 00:36:13,3270,1007.0,202,Halsted St & 18th St,129,Blue Island Ave & 18th St,Subscriber,Male,1992.0


In [15]:
q3_2019.head()

Unnamed: 0,ride_id,started_at,ended_at,rideable_type,tripduration,start_station_id,start_station_name,end_station_id,end_station_name,member_casual,gender,birthyear
0,23479388,2019-07-01 00:00:27,2019-07-01 00:20:41,3591,1214.0,117,Wilton Ave & Belmont Ave,497,Kimball Ave & Belmont Ave,Subscriber,Male,1992.0
1,23479389,2019-07-01 00:01:16,2019-07-01 00:18:44,5353,1048.0,381,Western Ave & Monroe St,203,Western Ave & 21st St,Customer,,
2,23479390,2019-07-01 00:01:48,2019-07-01 00:27:42,6180,1554.0,313,Lakeview Ave & Fullerton Pkwy,144,Larrabee St & Webster Ave,Customer,,
3,23479391,2019-07-01 00:02:07,2019-07-01 00:27:10,5540,1503.0,313,Lakeview Ave & Fullerton Pkwy,144,Larrabee St & Webster Ave,Customer,,
4,23479392,2019-07-01 00:02:13,2019-07-01 00:22:26,6014,1213.0,168,Michigan Ave & 14th St,62,McCormick Place,Customer,,


In [16]:
q4_2019.head()

Unnamed: 0,ride_id,started_at,ended_at,rideable_type,tripduration,start_station_id,start_station_name,end_station_id,end_station_name,member_casual,gender,birthyear
0,25223640,2019-10-01 00:01:39,2019-10-01 00:17:20,2215,940.0,20,Sheffield Ave & Kingsbury St,309,Leavitt St & Armitage Ave,Subscriber,Male,1987.0
1,25223641,2019-10-01 00:02:16,2019-10-01 00:06:34,6328,258.0,19,Throop (Loomis) St & Taylor St,241,Morgan St & Polk St,Subscriber,Male,1998.0
2,25223642,2019-10-01 00:04:32,2019-10-01 00:18:43,3003,850.0,84,Milwaukee Ave & Grand Ave,199,Wabash Ave & Grand Ave,Subscriber,Female,1991.0
3,25223643,2019-10-01 00:04:32,2019-10-01 00:43:43,3275,2350.0,313,Lakeview Ave & Fullerton Pkwy,290,Kedzie Ave & Palmer Ct,Subscriber,Male,1990.0
4,25223644,2019-10-01 00:04:34,2019-10-01 00:35:42,5294,1867.0,210,Ashland Ave & Division St,382,Western Ave & Congress Pkwy,Subscriber,Male,1987.0


In [17]:
q1_2020.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,EACB19130B0CDA4A,docked_bike,2020-01-21 20:06:59,2020-01-21 20:14:30,Western Ave & Leland Ave,239,Clark St & Leland Ave,326.0,41.9665,-87.6884,41.9671,-87.6674,member
1,8FED874C809DC021,docked_bike,2020-01-30 14:22:39,2020-01-30 14:26:22,Clark St & Montrose Ave,234,Southport Ave & Irving Park Rd,318.0,41.9616,-87.666,41.9542,-87.6644,member
2,789F3C21E472CA96,docked_bike,2020-01-09 19:29:26,2020-01-09 19:32:17,Broadway & Belmont Ave,296,Wilton Ave & Belmont Ave,117.0,41.9401,-87.6455,41.9402,-87.653,member
3,C9A388DAC6ABF313,docked_bike,2020-01-06 16:17:07,2020-01-06 16:25:56,Clark St & Randolph St,51,Fairbanks Ct & Grand Ave,24.0,41.8846,-87.6319,41.8918,-87.6206,member
4,943BC3CBECCFD662,docked_bike,2020-01-30 08:37:16,2020-01-30 08:42:48,Clinton St & Lake St,66,Wells St & Hubbard St,212.0,41.8856,-87.6418,41.8899,-87.6343,member


In [18]:
working_df = pd.concat([q2_2019, q3_2019, q4_2019, q1_2020])

working_df.head()
working_df.columns

Index(['ride_id', 'started_at', 'ended_at', 'rideable_type',
       '01 - Rental Details Duration In Seconds Uncapped', 'start_station_id',
       'start_station_name', 'end_station_id', 'end_station_name',
       'member_casual', 'Member Gender',
       '05 - Member Details Member Birthday Year', 'tripduration', 'gender',
       'birthyear', 'start_lat', 'start_lng', 'end_lat', 'end_lng'],
      dtype='object')

We'll be removing a few columns from the dataframes

In [19]:
working_df = working_df[['ride_id', 
                        'started_at', 
                        'ended_at', 
                        'rideable_type',
                        #'01 - Rental Details Duration In Seconds Uncapped', 
                        'start_station_id',
                        'start_station_name', 
                        'end_station_id', 
                        'end_station_name',
                        'member_casual', 
                        #'Member Gender',
                        #'05 - Member Details Member Birthday Year', 
                        #'tripduration', 
                        #'gender',
                        #'birthyear', 
                        #'start_lat', 'start_lng', 'end_lat', 'end_lng'
                        ]].copy()


In [20]:
working_df.head()

Unnamed: 0,ride_id,started_at,ended_at,rideable_type,start_station_id,start_station_name,end_station_id,end_station_name,member_casual
0,22178529,2019-04-01 00:02:22,2019-04-01 00:09:48,6251,81,Daley Center Plaza,56.0,Desplaines St & Kinzie St,Subscriber
1,22178530,2019-04-01 00:03:02,2019-04-01 00:20:30,6226,317,Wood St & Taylor St,59.0,Wabash Ave & Roosevelt Rd,Subscriber
2,22178531,2019-04-01 00:11:07,2019-04-01 00:15:19,5649,283,LaSalle St & Jackson Blvd,174.0,Canal St & Madison St,Subscriber
3,22178532,2019-04-01 00:13:01,2019-04-01 00:18:58,4151,26,McClurg Ct & Illinois St,133.0,Kingsbury St & Kinzie St,Subscriber
4,22178533,2019-04-01 00:19:26,2019-04-01 00:36:13,3270,202,Halsted St & 18th St,129.0,Blue Island Ave & 18th St,Subscriber


In [21]:
working_df.shape

(3879822, 9)

We have a total row of 3879822, let's check if there's any NaN value here!

In [22]:
working_df.isna().sum()

ride_id               0
started_at            0
ended_at              0
rideable_type         0
start_station_id      0
start_station_name    0
end_station_id        1
end_station_name      1
member_casual         0
dtype: int64

In [23]:
working_df.dtypes

ride_id                object
started_at             object
ended_at               object
rideable_type          object
start_station_id        int64
start_station_name     object
end_station_id        float64
end_station_name       object
member_casual          object
dtype: object

In [24]:
working_df.groupby('member_casual').sum()

Unnamed: 0_level_0,start_station_id,end_station_id
member_casual,Unnamed: 1_level_1,Unnamed: 2_level_1
Customer,162628054,163747392.0
Subscriber,535016961,537422927.0
casual,11807155,11785737.0
member,77752798,77577702.0


There are 2 names for members ("member" and "Subscriber) and two names for casual riders("Customer" and "casual"). We'll need to consolidate four labels into two labels.

We'll also be adding additional columns of data such as day, month year that will provide additional opportunities to aggregate the data.

Since there are no ```tripduration``` column on Q12020, we'll add ride_length to the entire dataframe for consistency

There are some rides where ```tripduration``` shows up as negative, including several hundred rides where the bikes was out for Quality Control reasons. We'll be deleting these rides

In [25]:
#renaming Subscriber to member and Customer to casual

working_df['member_casual'] = working_df['member_casual'].replace({'Subscriber' : 'member', 'Customer' : 'casual'})

#checking if the value has changed


In [28]:
working_df['member_casual'].value_counts()

member    2973868
casual     905954
Name: member_casual, dtype: int64

In [39]:
working_df.dtypes

ride_id                       object
started_at            datetime64[ns]
ended_at              datetime64[ns]
rideable_type                 object
start_station_id               int64
start_station_name            object
end_station_id               float64
end_station_name              object
member_casual                 object
date                          object
month                          int64
day                            int64
year                           int64
dtype: object

In [38]:
#as mentioned above, started at and ended at types was object, we'll change it to datetime using pandas
#adding columns that list the date, month, day, and year of each ride
working_df['started_at'] = pd.to_datetime(working_df['started_at'])

#we'll also change ended at to datetime while we're at it
working_df['ended_at'] = pd.to_datetime(working_df['ended_at'])

#let's add new colums for date, month, day, and year
working_df['date'] = working_df['started_at'].dt.date
working_df['month'] = working_df['started_at'].dt.month
working_df['day'] = working_df['started_at'].dt.day
working_df['year'] = working_df['started_at'].dt.year

In [48]:
working_df.head()

Unnamed: 0,ride_id,started_at,ended_at,rideable_type,start_station_id,start_station_name,end_station_id,end_station_name,member_casual,date,month,day,year,ride_length
0,22178529,2019-04-01 00:02:22,2019-04-01 00:09:48,6251,81,Daley Center Plaza,56.0,Desplaines St & Kinzie St,member,2019-04-01,4,1,2019,446.0
1,22178530,2019-04-01 00:03:02,2019-04-01 00:20:30,6226,317,Wood St & Taylor St,59.0,Wabash Ave & Roosevelt Rd,member,2019-04-01,4,1,2019,1048.0
2,22178531,2019-04-01 00:11:07,2019-04-01 00:15:19,5649,283,LaSalle St & Jackson Blvd,174.0,Canal St & Madison St,member,2019-04-01,4,1,2019,252.0
3,22178532,2019-04-01 00:13:01,2019-04-01 00:18:58,4151,26,McClurg Ct & Illinois St,133.0,Kingsbury St & Kinzie St,member,2019-04-01,4,1,2019,357.0
4,22178533,2019-04-01 00:19:26,2019-04-01 00:36:13,3270,202,Halsted St & 18th St,129.0,Blue Island Ave & 18th St,member,2019-04-01,4,1,2019,1007.0


In [47]:
#creating ride_length columns
working_df['ride_length'] = (working_df['ended_at'] - working_df['started_at']).dt.total_seconds()

In [70]:
#removing data that has start_station_name as HQ QR

working_df = working_df.query('~(start_station_name == "HQ QR" or ride_length < 0)')

Since HQ QR was for maintenance, we'll be dropping these data

Now that we have clean data, let's conduct descriptive analysis!
We'll look for mean, median, max, and min for ```ride_length```

In [71]:
pd.options.display.float_format = '{:.0f}'.format
working_df['ride_length'].describe()

count   3876042
mean       1479
std       30924
min           1
25%         412
50%         712
75%        1289
max     9387024
Name: ride_length, dtype: float64

In [72]:
pd.options.display.float_format = None

In [69]:
working_df.groupby('member_casual')['ride_length'].mean()

member_casual
casual   3553
member    850
Name: ride_length, dtype: float64