##  Dependencies and Read CSV

* Most recent CSV taken from https://data.austintexas.gov/Transportation-and-Mobility/Austin-B-Cycle-Trips/tyfh-5r8s 

* Data Available 
    * Trip ID 
    * Membership Type 
    * Bicycle ID 
    * Checkout Time 
    * Checkout Kiosk ID
    * Checkout Kiosk
    * Return Kiosk ID
    * Trip Duration Minutes
    * Month
    * Year

In [1]:
# Import Dependencies
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

# Ignore Warnings as we are rewrititng values 
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Read the CSV file using pandas and creating a dataframe 
df_bike = pd.read_csv("resources/austin_B-Cycle_Trips.csv")


In [3]:
# Display the top rows of the dataframe 
df_bike.head()


Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Kiosk ID,Checkout Kiosk,Return Kiosk ID,Return Kiosk,Trip Duration Minutes,Month,Year
0,9900285854,Annual (San Antonio B-cycle),207.0,10/26/2014,13:12:00,2537.0,West & 6th St.,2707.0,Rainey St @ Cummings,76,10.0,2014.0
1,9900285855,24-Hour Kiosk (Austin B-cycle),969.0,10/26/2014,13:12:00,2498.0,Convention Center / 4th St. @ MetroRail,2566.0,Pfluger Bridge @ W 2nd Street,58,10.0,2014.0
2,9900285856,Annual Membership (Austin B-cycle),214.0,10/26/2014,13:12:00,2537.0,West & 6th St.,2496.0,8th & Congress,8,10.0,2014.0
3,9900285857,24-Hour Kiosk (Austin B-cycle),745.0,10/26/2014,13:12:00,,Zilker Park at Barton Springs & William Barton...,,Zilker Park at Barton Springs & William Barton...,28,10.0,2014.0
4,9900285858,24-Hour Kiosk (Austin B-cycle),164.0,10/26/2014,13:12:00,2538.0,Bullock Museum @ Congress & MLK,,Convention Center/ 3rd & Trinity,15,10.0,2014.0


## Initial Data Exploration 

On an initial exploration of the data we see that the columns month, year, membership type, bicycle id, return kiosk id, and check out kiosk id are missing values. It does not necessarily mean that the data is not present. It just needs to be extracted and reformatted. 

In [4]:
# Count rows and columns 
df_bike.shape

(991271, 12)

In [5]:
# Check for missing values 
df_bike.count()

Trip ID                  991271
Membership Type          984960
Bicycle ID               990548
Checkout Date            991271
Checkout Time            991271
Checkout Kiosk ID        968117
Checkout Kiosk           991271
Return Kiosk ID          966858
Return Kiosk             991271
Trip Duration Minutes    991271
Month                    618479
Year                     618479
dtype: int64

In [6]:
# We see that the last values of our dataframe are missing the month and year
df_bike.tail()

Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Kiosk ID,Checkout Kiosk,Return Kiosk ID,Return Kiosk,Trip Duration Minutes,Month,Year
991266,18013687,Local30,2828.0,7/20/2018,20:24:42,2566.0,Pfluger Bridge @ W 2nd Street,3685.0,Henderson & 9th,6,,
991267,18048048,Local30,447.0,7/24/2018,15:12:54,2707.0,Rainey St @ Cummings,2552.0,3rd & West,7,,
991268,17988798,U.T. Student Membership,472.0,7/18/2018,9:59:11,3838.0,Nueces & 26th,3798.0,21st & Speedway @PCL,6,,
991269,17902063,U.T. Student Membership,220.0,7/9/2018,11:50:26,2547.0,Guadalupe & 21st,3792.0,22nd & Pearl,2,,
991270,18007218,Local30,666.0,7/20/2018,8:38:40,3621.0,Nueces & 3rd,2495.0,4th & Congress,5,,


In [7]:
## checking how many rows for months and year are missing
df_bike["Year"].isnull().sum()

372792

In [8]:
df_bike["Month"].isnull().sum()

372792

In [9]:
# Check misisng membership type count
df_bike["Membership Type"].isnull().sum()

6311

In [10]:
# Check missing bicycle id count
df_bike["Bicycle ID"].isnull().sum()

723

In [11]:
# Check the data types 
df_bike.dtypes

Trip ID                    int64
Membership Type           object
Bicycle ID               float64
Checkout Date             object
Checkout Time             object
Checkout Kiosk ID        float64
Checkout Kiosk            object
Return Kiosk ID          float64
Return Kiosk              object
Trip Duration Minutes      int64
Month                    float64
Year                     float64
dtype: object

## Initial Data Clean Up

In this section we create a new data frame with the needed values.
1.	Since we are missing kiosk ID's but we have the kiosk names available, we can make a new data frame without having to drop the rows of missing kiosk id's.

2.	We need to extract the year, month and date from check out date, to fill in the missing values in the Month and Year column.

3.	We can extract the hour from Check Out time for future analysis.



In [98]:
df_bike_clean = df_bike

In [40]:
# So now we see that we have a check out date that matches the number of trips taken 
df_bike_clean.count()

Trip ID                  991271
Membership Type          984960
Bicycle ID               990548
Checkout Date            991271
Checkout Time            991271
Checkout Kiosk ID        968117
Checkout Kiosk           991271
Return Kiosk ID          966858
Return Kiosk             991271
Trip Duration Minutes    991271
Month                    618479
Year                     618479
dtype: int64

In [41]:
# We see take the Checkout Date and extract the Month, Year, and Day of the Week 

df_bike_clean['Checkout Date'] = pd.to_datetime(df_bike_clean['Checkout Date']) 
df_bike_clean['Year'] = df_bike_clean['Checkout Date'].dt.year
df_bike_clean['Month'] = df_bike_clean['Checkout Date'].dt.month
df_bike_clean['Trip Date'] = df_bike_clean['Checkout Date'].dt.day
df_bike_clean['Trip Day of Week'] = df_bike_clean['Checkout Date'].dt.weekday_name



In [42]:
# Inspect the filled in values 
df_bike_clean.tail()

Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Kiosk ID,Checkout Kiosk,Return Kiosk ID,Return Kiosk,Trip Duration Minutes,Month,Year,Trip Date,Trip Day of Week
991266,18013687,Local30,2828.0,2018-07-20,20:24:42,2566.0,Pfluger Bridge @ W 2nd Street,3685.0,Henderson & 9th,6,7,2018,20,Friday
991267,18048048,Local30,447.0,2018-07-24,15:12:54,2707.0,Rainey St @ Cummings,2552.0,3rd & West,7,7,2018,24,Tuesday
991268,17988798,U.T. Student Membership,472.0,2018-07-18,9:59:11,3838.0,Nueces & 26th,3798.0,21st & Speedway @PCL,6,7,2018,18,Wednesday
991269,17902063,U.T. Student Membership,220.0,2018-07-09,11:50:26,2547.0,Guadalupe & 21st,3792.0,22nd & Pearl,2,7,2018,9,Monday
991270,18007218,Local30,666.0,2018-07-20,8:38:40,3621.0,Nueces & 3rd,2495.0,4th & Congress,5,7,2018,20,Friday


In [43]:
# Split the hour from the checkout time
checkout_time = pd.to_datetime(df_bike_clean['Checkout Time'],infer_datetime_format=True)
df_bike_clean['time'] = checkout_time.dt.time
df_bike_clean['Trip Hour'] = checkout_time.dt.hour
df_bike_clean.drop(columns='time',inplace = True)
df_bike_clean.head(1)


Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Kiosk ID,Checkout Kiosk,Return Kiosk ID,Return Kiosk,Trip Duration Minutes,Month,Year,Trip Date,Trip Day of Week,Trip Hour
0,9900285854,Annual (San Antonio B-cycle),207.0,2014-10-26,13:12:00,2537.0,West & 6th St.,2707.0,Rainey St @ Cummings,76,10,2014,26,Sunday,13


In [44]:
# Inspect the filled in values 
df_bike_clean.head()

Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Kiosk ID,Checkout Kiosk,Return Kiosk ID,Return Kiosk,Trip Duration Minutes,Month,Year,Trip Date,Trip Day of Week,Trip Hour
0,9900285854,Annual (San Antonio B-cycle),207.0,2014-10-26,13:12:00,2537.0,West & 6th St.,2707.0,Rainey St @ Cummings,76,10,2014,26,Sunday,13
1,9900285855,24-Hour Kiosk (Austin B-cycle),969.0,2014-10-26,13:12:00,2498.0,Convention Center / 4th St. @ MetroRail,2566.0,Pfluger Bridge @ W 2nd Street,58,10,2014,26,Sunday,13
2,9900285856,Annual Membership (Austin B-cycle),214.0,2014-10-26,13:12:00,2537.0,West & 6th St.,2496.0,8th & Congress,8,10,2014,26,Sunday,13
3,9900285857,24-Hour Kiosk (Austin B-cycle),745.0,2014-10-26,13:12:00,,Zilker Park at Barton Springs & William Barton...,,Zilker Park at Barton Springs & William Barton...,28,10,2014,26,Sunday,13
4,9900285858,24-Hour Kiosk (Austin B-cycle),164.0,2014-10-26,13:12:00,2538.0,Bullock Museum @ Congress & MLK,,Convention Center/ 3rd & Trinity,15,10,2014,26,Sunday,13


In [45]:
# Check for missing values 
df_bike_clean.count()

Trip ID                  991271
Membership Type          984960
Bicycle ID               990548
Checkout Date            991271
Checkout Time            991271
Checkout Kiosk ID        968117
Checkout Kiosk           991271
Return Kiosk ID          966858
Return Kiosk             991271
Trip Duration Minutes    991271
Month                    991271
Year                     991271
Trip Date                991271
Trip Day of Week         991271
Trip Hour                991271
dtype: int64

In [46]:
# Verify Data types
df_bike_clean.dtypes

Trip ID                           int64
Membership Type                  object
Bicycle ID                      float64
Checkout Date            datetime64[ns]
Checkout Time                    object
Checkout Kiosk ID               float64
Checkout Kiosk                   object
Return Kiosk ID                 float64
Return Kiosk                     object
Trip Duration Minutes             int64
Month                             int64
Year                              int64
Trip Date                         int64
Trip Day of Week                 object
Trip Hour                         int64
dtype: object

In [47]:
## Renaming the Columns Name
df_bike_clean = df_bike_clean.rename(columns = {"Checkout Kiosk ID":"Checkout Station ID","Checkout Kiosk":"Checkout Station",
                                          "Return Kiosk ID":"Return Station ID","Return Kiosk":"Return Station",
                                          "Month":"Trip Month","Year":"Trip Year"})
df_bike_clean.head(1) 

Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Station ID,Checkout Station,Return Station ID,Return Station,Trip Duration Minutes,Trip Month,Trip Year,Trip Date,Trip Day of Week,Trip Hour
0,9900285854,Annual (San Antonio B-cycle),207.0,2014-10-26,13:12:00,2537.0,West & 6th St.,2707.0,Rainey St @ Cummings,76,10,2014,26,Sunday,13


## Cleaning the column Trip Duration 

But we are not dropping the rows for our final clean csv file because even in these rows the Trip IDs are correct which we can use to count number of rides made for our analysis for other columns

In [48]:
## To check how many bikes were stolen
## These bikes have unusally large trip duration
df_bike_stolen = df_bike_clean.loc[df_bike_clean["Return Station"] == "Stolen"]
number_bike_stolen = df_bike_stolen["Return Station"].count()
number_bike_stolen

23

In [49]:
## To check how many bikes were missing
## These bikes have unusally large trip duration
df_bike_missing = df_bike_clean.loc[df_bike_clean["Return Station"] == "Missing"]
number_bike_missing = df_bike_missing["Return Station"].count()
number_bike_missing

25

In [50]:
## To check how many bikes have trip duration has zero minutes
df_bike_trip_minutes_zero = df_bike_clean.loc[df_bike_clean["Trip Duration Minutes"] == 0]
number_bike_trip_minutes_zero = df_bike_trip_minutes_zero["Trip ID"].count()
number_bike_trip_minutes_zero

19033

## Cleaning and data exploration of the Checkout Station ID and Checkout Station Column

But we are not dropping the rows for our final clean csv file because even in these rows the Trip IDs are correct which we can use to count number of rides made for our analysis for other columns

In [51]:
## To check how many Checkout Station ID are blank
number_df_bike_checkout_id_blank  = df_bike_clean["Checkout Station ID"].isnull().sum()
number_df_bike_checkout_id_blank

23154

In [74]:
# # Filling the Na values with zero for exploration
df_bike_na = df_bike_clean.fillna(0)

In [75]:
## To find which check out station have blank checkout IDs
df_bike_checkout_id_blank = df_bike_na.loc[df_bike_na["Checkout Station ID"] == 0]
df_bike_checkout_id_blank["Checkout Station"].value_counts()

Zilker Park at Barton Springs & William Barton Drive    11534
Dean Keeton & Speedway                                   3825
ACC - West & 12th                                        2462
Convention Center/ 3rd & Trinity                         1292
Mobile Station                                           1183
East 11th Street at Victory Grill                        1030
Red River @ LBJ Library                                   584
Mobile Station @ Bike Fest                                516
Main Office                                               300
Bullock Museum @ Congress & MLK                           172
State Capitol @ 14th & Colorado                           111
MapJam at Pan Am Park                                      32
MapJam at French Legation                                  27
MapJam at Hops & Grain Brewery                             19
Repair Shop                                                15
MapJam at Scoot Inn                                        11
Shop    

In [70]:
df_bike_clean["Checkout Station"].value_counts()

21st & Speedway @PCL                           42167
Riverside @ S. Lamar                           36988
City Hall / Lavaca & 2nd                       34604
2nd & Congress                                 33407
5th & Bowie                                    32405
Rainey St @ Cummings                           31680
4th & Congress                                 30160
Convention Center / 4th St. @ MetroRail        30001
Davis at Rainey Street                         29763
Capitol Station / Congress & 11th              25226
Pfluger Bridge @ W 2nd Street                  24699
3rd & West                                     21404
UT West Mall @ Guadalupe                       20858
Long Center @ South 1st & Riverside            20554
Palmer Auditorium                              19481
Zilker Park                                    19274
Barton Springs @ Kinney Ave                    19204
Barton Springs & Riverside                     19118
South Congress & James                        

In [76]:
## Number of station which have no checkout ID
len(df_bike_checkout_id_blank["Checkout Station"].value_counts())

24

In [77]:
## Number of Unique Checkout Station
df_bike_clean["Checkout Station"].unique().size

104

In [67]:
## Number of station with unique checkout station ids other than zero
104-24

80

In [65]:
## Number of Unique Checkout Station ID
df_bike_clean["Checkout Station ID"].unique().size

84

In [78]:
## list of checkout station id
checkout_station_id_list =  df_bike_na["Checkout Station ID"].value_counts().index
checkout_station_id_list

## we have 83 unique check out station ID excluding zero

Float64Index([3798.0, 2575.0, 2499.0, 2494.0, 2501.0, 2707.0, 2495.0, 2498.0,
              2563.0, 2497.0, 2566.0,    0.0, 2552.0, 2548.0, 2549.0, 2567.0,
              2574.0, 2711.0, 2502.0, 2503.0, 2547.0, 2570.0, 2539.0, 2572.0,
              2496.0, 2504.0, 3841.0, 2537.0, 3792.0, 2542.0, 3377.0, 2565.0,
              3390.0, 2571.0, 2538.0, 3793.0, 3838.0, 2550.0, 2569.0, 3794.0,
              2562.0, 3795.0, 3513.0, 2540.0, 3797.0, 2822.0, 2564.0, 3619.0,
              3621.0, 2561.0, 3799.0, 2536.0, 3455.0, 2544.0, 3292.0, 2568.0,
              2541.0, 3687.0, 1007.0, 1008.0, 3291.0, 3684.0, 3293.0, 3686.0,
              3660.0, 2712.0, 2823.0, 2576.0, 3294.0, 3685.0, 2546.0, 2545.0,
              3635.0, 1006.0, 1002.0, 3464.0, 3790.0, 1003.0, 3381.0, 2500.0,
              3791.0, 3456.0, 1005.0, 1001.0],
             dtype='float64')

In [79]:
## we have more number of unique checkout station ids than number of unique checkout stations
## this implies we have few checkout station with more than one checkout station IDs

# creating a dictionary using keyword arguments checkout station and checkout station ids 
# To check which station has more than one checkout ids
Checkout_station_id = dict()
for index, row in df_bike_na.iterrows():
    if row['Checkout Station'] not in Checkout_station_id:
        Checkout_station_id[row['Checkout Station']] = set()
    else:
         Checkout_station_id[row['Checkout Station']].add(row['Checkout Station ID'])
Checkout_station_id 

{'West & 6th St.': {2537.0},
 'Convention Center / 4th St. @ MetroRail': {2498.0},
 'Zilker Park at Barton Springs & William Barton Drive': {0.0},
 'Bullock Museum @ Congress & MLK': {0.0, 2538.0},
 '8th & Congress': {2496.0},
 'East 11th St. & San Marcos': {2569.0},
 'South Congress & Elizabeth': {2504.0},
 'Pfluger Bridge @ W 2nd Street': {2566.0},
 'Riverside @ S. Lamar': {2575.0},
 '2nd & Congress': {2494.0},
 'Convention Center/ 3rd & Trinity': {0.0},
 'East 6th at Robert Martinez': {2822.0},
 'East 6th & Pedernales St.': {2544.0},
 'Davis at Rainey Street': {2563.0},
 'UT West Mall @ Guadalupe': {2548.0},
 'East 11th Street at Victory Grill': {0.0},
 'Palmer Auditorium': {2567.0},
 'State Capitol Visitors Garage @ San Jacinto & 12th': {2561.0},
 'Rainey St @ Cummings': {2707.0},
 '5th & Bowie': {2501.0},
 'Long Center @ South 1st & Riverside': {2549.0},
 '17th & Guadalupe': {2540.0},
 'Red River & 8th Street': {2571.0},
 'Barton Springs Pool': {2572.0},
 'State Capitol @ 14th & C

## Cleaning membership data 

While there are missing values in the membership data, we want to take a closer look to understand the types of memberships. Upon closer inspection there are memberships with similar names that should be categorized together. For example: U.T. Student Membership and UT Student Membership. Furthermore, it will be more helpful to categorize the data by day, weekend, week, month, year, 3 year, and student memberships.

But we are not dropping the rows because even in these rows the Trip IDs are correct 
which we can use to count number of rides made for our analysis for other columns

In [80]:
df_bike_clean["Membership Type"].value_counts()

Walk Up                                          368322
Local365                                         167363
U.T. Student Membership                          158480
24-Hour Kiosk (Austin B-cycle)                   108672
Local30                                           54774
Weekender                                         43880
Annual Membership (Austin B-cycle)                30306
Explorer                                          14860
Local365+Guest Pass                               10331
Local365 ($80 plus tax)                            4005
Founding Member                                    3550
7-Day                                              3137
Founding Member (Austin B-cycle)                   2764
7-Day Membership (Austin B-cycle)                  2760
Semester Membership (Austin B-cycle)               2426
Annual                                             1087
Semester Membership                                 900
Local30 ($11 plus tax)                          

In [81]:
# Examine Prohibited and Restricted
test = df_bike_clean.loc[df_bike_clean["Membership Type"] == "RESTRICTED", :]
test = df_bike_clean.loc[df_bike_clean["Membership Type"] == "PROHIBITED", :]
test.head()

Unnamed: 0,Trip ID,Membership Type,Bicycle ID,Checkout Date,Checkout Time,Checkout Station ID,Checkout Station,Return Station ID,Return Station,Trip Duration Minutes,Trip Month,Trip Year,Trip Date,Trip Day of Week,Trip Hour
326220,8482437,PROHIBITED,511.0,2016-01-20,15:34:07,2497.0,Capitol Station / Congress & 11th,2497.0,Capitol Station / Congress & 11th,0,1,2016,20,Wednesday,15
329487,8370751,PROHIBITED,391.0,2016-01-11,14:14:10,2497.0,Capitol Station / Congress & 11th,2497.0,Capitol Station / Congress & 11th,0,1,2016,11,Monday,14
329488,8370756,PROHIBITED,391.0,2016-01-11,14:14:46,2497.0,Capitol Station / Congress & 11th,2497.0,Capitol Station / Congress & 11th,1,1,2016,11,Monday,14
329489,8370766,PROHIBITED,391.0,2016-01-11,14:15:09,2497.0,Capitol Station / Congress & 11th,2497.0,Capitol Station / Congress & 11th,0,1,2016,11,Monday,14
347380,9900014475,PROHIBITED,391.0,2016-01-11,14:12:10,2497.0,Capitol Station / Congress & 11th,2497.0,Capitol Station / Congress & 11th,0,1,2016,11,Monday,14


In [82]:
# Replace all 24-hour with same name == day 
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {"24-Hour Kiosk (Austin B-cycle)": "day",
     "24-Hour-Online (Austin B-cycle)": "day",
     "24-Hour Membership (Austin B-cycle)": "day",
    "Explorer": "day", 
    "Explorer ($8 plus tax)":"day",
    "Walk Up": "day",
    "Try Before You Buy Special": "day",
    "RideScout Single Ride": "day", 
    "Aluminum Access":"day"})

# Replace all weekend membership == weekend 
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {
        "Weekender": "weekend", 
        "Weekender ($15 plus tax)": "weekend", 
        "ACL Weekend Pass Special (Austin B-cycle)": "weekend", 
        "FunFunFun Fest 3 Day Pass": "weekend"
    })

# Replace all weekend membership == week
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {
        "7-Day": "week", 
        "7-Day Membership (Austin B-cycle)": "week", 
    })


# Replace all weekend membership == month
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {
        "Local30": "month", 
        "Local30 ($11 plus tax)": "month",
        "Madtown Monthly":"month", 
    })


# Combine all student memberships
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {
        "U.T. Student Membership": "student",
        "UT Student Membership": "student", 
        "Semester Membership (Austin B-cycle)":"student", 
        "Semester Membership": "student"
    })

# Replace all annual membership == year
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {
        "Annual Membership (Austin B-cycle)": "year",
         "Annual Member": "year",
         "Annual Membership":"year",
         "Annual (San Antonio B-cycle)": "year",
         "Annual Member (Houston B-cycle)":"year",
         "Annual Membership (Fort Worth Bike Sharing)":"year",
         "Annual (Denver B-cycle)":"year",
         "Republic Rider (Annual)":"year",
         "Republic Rider": "year",
         "Annual Plus":"year",
         "Annual (Madison B-cycle)":"year",
         "Annual (Broward B-cycle)":"year",
         "Annual (Denver Bike Sharing)":"year",
         "Annual (Boulder B-cycle)":"year",
         "Annual Membership (GREENbike)":"year",
         "Annual Pass":"year",
         "Annual (Kansas City B-cycle)":"year",
         "Annual (Cincy Red Bike)":"year",
         "Annual (Nashville B-cycle)":"year",
         "Annual Plus Membership":"year",
         "Annual Membership (Charlotte B-cycle)":"year",
         "Annual Membership (Indy - Pacers Bikeshare )":"year",
         "Annual (Omaha B-cycle)":"year",
         "Annual":"year",
         "Annual ": "year",
         "Local365": "year", 
         "Local365+Guest Pass":"year",
         "Local365 ($80 plus tax)": "year",
         "Local365 Youth with helmet (age 13-17 riders)": "year", 
         "Local365 Youth (age 13-17 riders)":"year",
         "Membership: pay once  one-year commitment":"year"
        
    })

# Replace all founding membership == 3 year
df_bike_clean["Membership Type"] = df_bike_clean["Membership Type"].replace(
    {
        "Founding Member": "3 year",
        "Founding Member (Austin B-cycle)": "3 year",
        "Denver B-cycle Founder": "3 year"
    })


In [83]:
# Create a new data frame that does not include restricted and prohibited
bike_trips= df_bike_clean.loc[(df_bike_clean["Membership Type"] != "RESTRICTED") & (df_bike_clean["Membership Type"] != "PROHIBITED"), :]

In [84]:
# Verify clean up 
bike_trips["Membership Type"].value_counts()

day        493728
year       216726
student    161815
month       55650
weekend     44802
3 year       6324
week         5897
Name: Membership Type, dtype: int64

In [85]:
# Filling the na values  
df_bike_clean = df_bike_clean.fillna(0)
df_bike_clean.count()

Trip ID                  991271
Membership Type          991271
Bicycle ID               991271
Checkout Date            991271
Checkout Time            991271
Checkout Station ID      991271
Checkout Station         991271
Return Station ID        991271
Return Station           991271
Trip Duration Minutes    991271
Trip Month               991271
Trip Year                991271
Trip Date                991271
Trip Day of Week         991271
Trip Hour                991271
dtype: int64

In [97]:
# Export to csv
df_bike_clean.to_csv("Clean_Data\out.csv", index = None)