<a name="top"></a>
# Indego City Bike Trip Data Analysis with Google Maps APIs

Jump to (if viewing thru GitHub, jump to links will not work)
- [Data load and API initialization](#dataload)
- [Add Neighborhood from Google Maps](#addn)
- [Station data analysis](#stationdata)
- [Trip data analysis](#tripdata)

[View source data from Indego](https://www.rideindego.com/about/data/)

## <a name="dataload"></a> Data load and API initialization
[Return to Top](#top)

Import the necessary libraries:
- dotenv to store API key
- googlemaps library to access Maps APIs (originally, intended to use the APIs directly - but saw that Google has developed a set of python libraries that make it easier to work with their APIs)
- pandas and numpy for data analysis

In [407]:
# Install if not already
# pip install -U googlemaps
# pip install python-dotenv 

import googlemaps
from datetime import datetime

import os
from dotenv import load_dotenv
import pandas as pd
import numpy as np

import json

Load data stored env file - this will primary be used to store the Google API key, so the API key itself is not present in this file

Load data stored in csv files, that include trip data and basic station data

In [408]:
# Load env 
load_dotenv("indego.env")

# Load data from trip and station csv files
trip_data = pd.read_csv("indego-trips-2021-q3.csv", low_memory=False)
#trip_data.index.name = None
station_data = pd.read_csv("indego-stations-2021-10-01.csv")

Initialize Google Maps

In [409]:
api_key = os.getenv('API_KEY')
gmaps = googlemaps.Client(key=api_key)

<a name="#addn"></a> 

## <a name="#addn"></a>Add Neighborhood to station data, using Geocode API
[Return to Top](#top)

- Get an overview of station data
- Add a Neighborhood attribute to station data, based on Google Maps geocode data

Print some basic info about the trip data

In [410]:
print("Rows and colums:")
print(station_data.shape)
print()
station_data.head(5)

Rows and colums:
(179, 4)



Unnamed: 0,Station_ID,Station_Name,Day of Go_live_date,Status
0,3000,Virtual Station,4/23/2015,Active
1,3004,Municipal Services Building Plaza,4/23/2015,Active
2,3005,"Welcome Park, NPS",4/23/2015,Active
3,3006,40th & Spruce,4/23/2015,Active
4,3007,"11th & Pine, Kahn Park",4/23/2015,Active


Add a Full_Name column, that includes the Station_Name + 'Philadelphia, PA' - this will be sent as part of the geocode request

In [411]:
station_data["Full_Name"] = station_data["Station_Name"] + (", Philadelphia, PA")
station_data.head(2)

Unnamed: 0,Station_ID,Station_Name,Day of Go_live_date,Status,Full_Name
0,3000,Virtual Station,4/23/2015,Active,"Virtual Station, Philadelphia, PA"
1,3004,Municipal Services Building Plaza,4/23/2015,Active,"Municipal Services Building Plaza, Philadelphi..."


Use the Google Maps geocode API to assign a Neighborhood to each station
- For each station, call the API and pass the full station name
- Parse the response to find the neighborhood value and assign it back to the station_data df

In [419]:
for index, row in station_data.iterrows():
    geocode = gmaps.geocode(station_data.loc[index,"Full_Name"])
    gr = geocode[0]["address_components"]
    found_neighborhood = False
    for r in gr:
        if "neighborhood" in r["types"]:
            station_data.loc[index,"Neighborhood"] = r["long_name"]
            found_neighborhood = True
    if found_neighborhood != True:
        print("Neighborhood not found")
        station_data.loc[index,"Neighborhood"] = np.NaN

Neighborhood not found


Review Neighborhood data for any issues

In [420]:
missing_neighorhood = station_data.loc[station_data["Neighborhood"].isna()]
print(missing_neighorhood.to_string())

     Station_ID  Station_Name Day of Go_live_date  Status                       Full_Name Neighborhood
145        3204  17th & Green          11/14/2019  Active  17th & Green, Philadelphia, PA          NaN


Assign a value manually for the row with a missing neighborhood

In [421]:
station_data.loc[145, "Neighborhood"] = "North Philadelphia"

## <a name="stationdata"></a> Station data analysis
[Return to Top](#top)

- View station data stats by neighborhood
- View active / inactive station stats
- View top neighborhoods with active Indego stations

Print the top 10 neighborhoods by count of Indego stations

In [398]:
grouped = station_data.groupby("Neighborhood")
grouped.size().sort_values(ascending=False)

Neighborhood
North Philadelphia         35
University City            25
Center City                20
Center City East           12
Center City West           11
West Philadelphia           8
Rittenhouse Square          6
Graduate Hospital           6
Point Breeze                6
Washington Square West      4
South Philadelphia East     4
South Philadelphia          4
Bella Vista                 3
Queen Village               3
West Poplar                 3
Grays Ferry                 3
Society Hill                2
Old City                    2
Olde Kensington             2
Chinatown                   2
Mantua                      2
Gayborhood                  2
West Parkside               2
East Passyunk Crossing      2
Devil's Pocket              2
South Philadelphia West     2
Dickinson Narrows           1
North Philadelphia West     1
East Parkside               1
Melrose                     1
Pennsport                   1
Northern Liberties          1
dtype: int64

It would be more helpful here to have a visualization of distribution across the city - to be added

Are any stations inactive?

In [394]:
grouped = station_data.groupby("Status")
print(grouped.size().sort_values(ascending=False))

Status
Active      166
Inactive     13
dtype: int64


Which stations are inactive?

In [395]:
grouped.get_group("Inactive")

Unnamed: 0,Station_ID,Station_Name,Day of Go_live_date,Status,Neighborhood,Full_Name
20,3023,Rittenhouse Square,4/23/2015,Inactive,Rittenhouse Square,"Rittenhouse Square, Philadelphia, PA"
24,3027,"40th Street Station, MFL",4/23/2015,Inactive,University City,"40th Street Station, MFL, Philadelphia, PA"
33,3036,2nd & Germantown,4/23/2015,Inactive,North Philadelphia,"2nd & Germantown, Philadelphia, PA"
35,3038,The Children's Hospital of Philadelphia (CHOP),4/23/2015,Inactive,University City,The Children's Hospital of Philadelphia (CHOP)...
43,3048,Broad & Fitzwater,4/23/2015,Inactive,South Philadelphia,"Broad & Fitzwater, Philadelphia, PA"
76,3095,29th & Diamond,4/28/2016,Inactive,North Philadelphia,"29th & Diamond, Philadelphia, PA"
84,3103,"27th & Master, Athletic Recreation Center",5/3/2016,Inactive,North Philadelphia,"27th & Master, Athletic Recreation Center, Phi..."
86,3105,Penn Treaty Park,5/3/2016,Inactive,North Philadelphia,"Penn Treaty Park, Philadelphia, PA"
90,3109,Parkside & Girard,5/6/2016,Inactive,East Parkside,"Parkside & Girard, Philadelphia, PA"
103,3122,"24th & Cecil B. Moore, Cecil B. Moore Library",4/27/2016,Inactive,North Philadelphia,"24th & Cecil B. Moore, Cecil B. Moore Library,..."


Determine the top ten neighborhoods with active stations

In [396]:
grouped = grouped.get_group("Active").groupby("Neighborhood")
grouped.size().sort_values(ascending=False).head(10)

Neighborhood
North Philadelphia        30
University City           22
Center City               19
Center City East          12
Center City West          11
West Philadelphia          8
Point Breeze               6
Graduate Hospital          5
Rittenhouse Square         5
Washington Square West     4
dtype: int64

## <a name="tripdata"></a> Trip data analysis
[Return to Top](#top)

Print some basic information on trip data

In [220]:
print("Rows and colums:")
print(trip_data.shape)
print()
trip_data.info()
trip_data.head(5)

Rows and colums:
(300432, 15)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300432 entries, 0 to 300431
Data columns (total 15 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   trip_id              300432 non-null  int64  
 1   duration             300432 non-null  int64  
 2   start_time           300432 non-null  object 
 3   end_time             300432 non-null  object 
 4   start_station        300432 non-null  int64  
 5   start_lat            300412 non-null  float64
 6   start_lon            300412 non-null  float64
 7   end_station          300432 non-null  int64  
 8   end_lat              296273 non-null  float64
 9   end_lon              296273 non-null  float64
 10  bike_id              300432 non-null  object 
 11  plan_duration        300432 non-null  int64  
 12  trip_route_category  300432 non-null  object 
 13  passholder_type      300432 non-null  object 
 14  bike_type            300432 non-null 

Unnamed: 0,trip_id,duration,start_time,end_time,start_station,start_lat,start_lon,end_station,end_lat,end_lon,bike_id,plan_duration,trip_route_category,passholder_type,bike_type
0,398698761,11,7/1/2021 0:00,7/1/2021 0:11,3045,39.947922,-75.162369,3030,39.93935,-75.157158,3360,30,One Way,Indego30,standard
1,398698759,4,7/1/2021 0:02,7/1/2021 0:06,3052,39.947319,-75.156952,3238,39.946281,-75.151382,5420,30,One Way,Indego30,standard
2,398698757,56,7/1/2021 0:03,7/1/2021 0:59,3192,39.96207,-75.141113,3161,39.954861,-75.180908,18450,30,One Way,Indego30,electric
3,398698755,55,7/1/2021 0:04,7/1/2021 0:59,3192,39.96207,-75.141113,3161,39.954861,-75.180908,16508,30,One Way,Indego30,electric
4,398698753,5,7/1/2021 0:08,7/1/2021 0:13,3052,39.947319,-75.156952,3046,39.950119,-75.144722,3475,365,One Way,Indego365,standard


Join station data with trip data

In [397]:
# Join start station data
ts_data = pd.merge(left=trip_data, right=station_data[["Station_ID", "Full_Name", "Neighborhood"]], how="left", left_on="start_station", right_on="Station_ID").drop(columns=["Station_ID"])

# Join end station data
ts_data = pd.merge(left=ts_data, right=station_data[["Station_ID", "Full_Name", "Neighborhood"]], how="left", left_on="end_station", right_on="Station_ID").drop(columns=["Station_ID"])

# Rename merged columns 
ts_data = ts_data.rename(columns={"Full_Name_x":"start_full_name", "Neighborhood_x":"start_neighborhood"})
ts_data = ts_data.rename(columns={"Full_Name_y":"end_full_name", "Neighborhood_y":"end_neighborhood"})
ts_data

Unnamed: 0,trip_id,duration,start_time,end_time,start_station,start_lat,start_lon,end_station,end_lat,end_lon,bike_id,plan_duration,trip_route_category,passholder_type,bike_type,start_full_name,start_neighborhood,end_full_name,end_neighborhood
0,398698761,11,7/1/2021 0:00,7/1/2021 0:11,3045,39.947922,-75.162369,3030,39.939350,-75.157158,3360,30,One Way,Indego30,standard,"13th & Locust, Philadelphia, PA",Washington Square West,"Darien & Catharine, Philadelphia, PA",Bella Vista
1,398698759,4,7/1/2021 0:02,7/1/2021 0:06,3052,39.947319,-75.156952,3238,39.946281,-75.151382,5420,30,One Way,Indego30,standard,"9th & Locust, Philadelphia, PA",Washington Square West,"6th & S Washington Square, Philadelphia, PA",Society Hill
2,398698757,56,7/1/2021 0:03,7/1/2021 0:59,3192,39.962070,-75.141113,3161,39.954861,-75.180908,18450,30,One Way,Indego30,electric,"2nd & Fairmount, Philadelphia, PA",North Philadelphia,"30th Street Station East, Philadelphia, PA",University City
3,398698755,55,7/1/2021 0:04,7/1/2021 0:59,3192,39.962070,-75.141113,3161,39.954861,-75.180908,16508,30,One Way,Indego30,electric,"2nd & Fairmount, Philadelphia, PA",North Philadelphia,"30th Street Station East, Philadelphia, PA",University City
4,398698753,5,7/1/2021 0:08,7/1/2021 0:13,3052,39.947319,-75.156952,3046,39.950119,-75.144722,3475,365,One Way,Indego365,standard,"9th & Locust, Philadelphia, PA",Washington Square West,"2nd & Market, Philadelphia, PA",Center City East
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
300427,428365176,10,9/30/2021 23:57,10/1/2021 0:07,3009,39.955761,-75.189819,3035,39.962711,-75.194191,18791,30,One Way,Indego30,electric,"33rd & Market, Philadelphia, PA",University City,"Dornsife Center, Philadelphia, PA",Mantua
300428,428365174,7,9/30/2021 23:57,10/1/2021 0:04,3047,39.950729,-75.149467,3028,39.940609,-75.149582,5263,30,One Way,Indego30,standard,"Independence Mall, NPS, Philadelphia, PA",Center City East,"4th & Bainbridge, Philadelphia, PA",Queen Village
300429,428365172,7,9/30/2021 23:58,10/1/2021 0:05,3046,39.950119,-75.144722,3050,39.953388,-75.154259,18675,30,One Way,Indego30,electric,"2nd & Market, Philadelphia, PA",Center City East,"9th & Arch, Philadelphia, PA",Center City East
300430,428365170,3,9/30/2021 23:58,10/1/2021 0:01,3115,39.972630,-75.167572,3075,39.967178,-75.161247,21618,30,One Way,Indego30,electric,"19th & Girard, PTTI, Philadelphia, PA",North Philadelphia,"Fairmount & Ridge, Philadelphia, PA",North Philadelphia


In [270]:
ts_data2.head()

Unnamed: 0,trip_id,duration,start_time,end_time,start_station,start_lat,start_lon,end_station,end_lat,end_lon,bike_id,plan_duration,trip_route_category,passholder_type,bike_type,Full_Name_x,Neighborhood_x,Full_Name_y,Neighborhood_y
0,398698761,11,7/1/2021 0:00,7/1/2021 0:11,3045,39.947922,-75.162369,3030,39.93935,-75.157158,3360,30,One Way,Indego30,standard,"13th & Locust, Philadelphia, PA",Washington Square West,"Darien & Catharine, Philadelphia, PA",Bella Vista
1,398698759,4,7/1/2021 0:02,7/1/2021 0:06,3052,39.947319,-75.156952,3238,39.946281,-75.151382,5420,30,One Way,Indego30,standard,"9th & Locust, Philadelphia, PA",Washington Square West,"6th & S Washington Square, Philadelphia, PA",Society Hill
2,398698757,56,7/1/2021 0:03,7/1/2021 0:59,3192,39.96207,-75.141113,3161,39.954861,-75.180908,18450,30,One Way,Indego30,electric,"2nd & Fairmount, Philadelphia, PA",North Philadelphia,"30th Street Station East, Philadelphia, PA",University City
3,398698755,55,7/1/2021 0:04,7/1/2021 0:59,3192,39.96207,-75.141113,3161,39.954861,-75.180908,16508,30,One Way,Indego30,electric,"2nd & Fairmount, Philadelphia, PA",North Philadelphia,"30th Street Station East, Philadelphia, PA",University City
4,398698753,5,7/1/2021 0:08,7/1/2021 0:13,3052,39.947319,-75.156952,3046,39.950119,-75.144722,3475,365,One Way,Indego365,standard,"9th & Locust, Philadelphia, PA",Washington Square West,"2nd & Market, Philadelphia, PA",Center City East


Print some basic info about station data

Check how many trips have a start or end location at a Virtual station - we will want to remove these from the dataset 

In [43]:
trip_data.loc[(trip_data['start_station'] == 3000) | (trip_data['end_station'] == 3000)]

Unnamed: 0_level_0,duration,start_time,end_time,start_station,start_lat,start_lon,end_station,end_lat,end_lon,bike_id,plan_duration,trip_route_category,passholder_type,bike_type
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
398758095,15,7/1/2021 5:29,7/1/2021 5:44,3125,39.943909,-75.167351,3000,,,3638,30,One Way,Indego30,standard
398768712,15,7/1/2021 8:09,7/1/2021 8:24,3049,39.945091,-75.142502,3000,,,19813,365,One Way,Indego365,electric
399033738,10,7/1/2021 8:20,7/1/2021 8:30,3170,39.944260,-75.181343,3000,,,18160,30,One Way,Indego30,electric
398777836,6,7/1/2021 8:34,7/1/2021 8:40,3032,39.945271,-75.179710,3000,,,17200,30,One Way,Indego30,electric
398786572,14,7/1/2021 9:44,7/1/2021 9:58,3008,39.979439,-75.151138,3000,,,19793,30,One Way,Indego30,electric
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
428447431,16,9/30/2021 20:11,9/30/2021 20:27,3052,39.947319,-75.156952,3000,,,11757,365,One Way,Indego365,standard
428340686,1,9/30/2021 20:23,9/30/2021 20:24,3185,39.951691,-75.158882,3000,,,17727,30,One Way,Indego30,electric
428477694,15,9/30/2021 20:38,9/30/2021 20:53,3114,39.937752,-75.180122,3000,,,18797,30,One Way,Indego30,electric
428462247,10,9/30/2021 21:41,9/30/2021 21:51,3046,39.950119,-75.144722,3000,,,19845,30,One Way,Indego30,electric


Drop trips that include a Virtual location

In [52]:
trip_data = trip_data.loc[(trip_data['start_station'] != 3000) & (trip_data['end_station'] != 3000)]
print("Update count rows and colums:")
print(trip_data.shape)

Update count rows and colums:
(296268, 14)


Check that original row count - virtual row count = current row count

In [53]:
300432-4164 == len(trip_data)

True

In [66]:
trip_data.head(5)

Unnamed: 0_level_0,duration,start_time,end_time,start_station,start_lat,start_lon,end_station,end_lat,end_lon,bike_id,plan_duration,trip_route_category,passholder_type,bike_type
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
398698761,11,7/1/2021 0:00,7/1/2021 0:11,3045,39.947922,-75.162369,3030,39.93935,-75.157158,3360,30,One Way,Indego30,standard
398698759,4,7/1/2021 0:02,7/1/2021 0:06,3052,39.947319,-75.156952,3238,39.946281,-75.151382,5420,30,One Way,Indego30,standard
398698757,56,7/1/2021 0:03,7/1/2021 0:59,3192,39.96207,-75.141113,3161,39.954861,-75.180908,18450,30,One Way,Indego30,electric
398698755,55,7/1/2021 0:04,7/1/2021 0:59,3192,39.96207,-75.141113,3161,39.954861,-75.180908,16508,30,One Way,Indego30,electric
398698753,5,7/1/2021 0:08,7/1/2021 0:13,3052,39.947319,-75.156952,3046,39.950119,-75.144722,3475,365,One Way,Indego365,standard


In [88]:
trip_data.groupby(['start_station','end_station']).size().reset_index()

Unnamed: 0,start_station,end_station,0
0,3004,3004,102
1,3004,3005,5
2,3004,3006,3
3,3004,3007,4
4,3004,3008,10
...,...,...,...
19101,3256,3212,1
19102,3256,3248,1
19103,3256,3249,2
19104,3256,3255,1


For each start station, build a list of end stations that have an existing trip

In [90]:
trip_data.loc[trip_data['start_station'] == 3004].groupby('end_station').size().reset_index()

Unnamed: 0,end_station,0
0,3004,102
1,3005,5
2,3006,3
3,3007,4
4,3008,10
...,...,...
130,3243,1
131,3244,4
132,3245,4
133,3249,1


In [55]:
trip_data.loc[(trip_data['start_station'] == trip_data['end_station'])]

Unnamed: 0_level_0,duration,start_time,end_time,start_station,start_lat,start_lon,end_station,end_lat,end_lon,bike_id,plan_duration,trip_route_category,passholder_type,bike_type
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
398698734,88,7/1/2021 0:20,7/1/2021 1:48,3049,39.945091,-75.142502,3049,39.945091,-75.142502,16719,1,Round Trip,Day Pass,electric
398698716,9,7/1/2021 0:41,7/1/2021 0:50,3168,39.951340,-75.173943,3168,39.951340,-75.173943,5191,30,Round Trip,Indego30,standard
398698711,8,7/1/2021 0:55,7/1/2021 1:03,3168,39.951340,-75.173943,3168,39.951340,-75.173943,2543,30,Round Trip,Indego30,standard
398831586,851,7/1/2021 1:06,7/1/2021 15:17,3068,39.935490,-75.167107,3068,39.935490,-75.167107,2552,30,Round Trip,Indego30,standard
398698701,26,7/1/2021 1:30,7/1/2021 1:56,3057,39.964390,-75.179871,3057,39.964390,-75.179871,14566,1,Round Trip,Day Pass,standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
428352835,7,9/30/2021 23:14,9/30/2021 23:21,3046,39.950119,-75.144722,3046,39.950119,-75.144722,5272,30,Round Trip,Indego30,standard
428352834,6,9/30/2021 23:15,9/30/2021 23:21,3046,39.950119,-75.144722,3046,39.950119,-75.144722,2559,30,Round Trip,Indego30,standard
428352833,5,9/30/2021 23:15,9/30/2021 23:20,3046,39.950119,-75.144722,3046,39.950119,-75.144722,3456,30,Round Trip,Indego30,standard
428352831,6,9/30/2021 23:15,9/30/2021 23:21,3046,39.950119,-75.144722,3046,39.950119,-75.144722,3582,30,Round Trip,Indego30,standard


Other ideas
Average trip distance
Most used bikes - by electric vs standard
Least popular station
Electric bike revenue 
Overage revenue
Duration with standard vs electric 
- Caveat traffic, stops
- Could use datetime with google api

Are day passes used for more standard or electric bikes?

Electric bike usage vs standard bike usage (does overall trend match station-level stat?)

Does the station correlate to this somehow?

Mean, median distance of trips