# Toronto Bike Share Analysis
This notebook is part of a broader analysis of Toronto Bike Share usage, and what it can tell about the impact to traffic if certain bicycle infrastructure is removed in Toronto.
The final story can be found here:
This is the calculation of some descriptive statistics used in the story. The story also includes some analysis performed in QGIS. 


In [11]:
import os
from dotenv import load_dotenv
import pandas as pd
from sqlalchemy import create_engine
import psycopg
import plotly.express as px

# Load Environment Variables
load_dotenv()
if os.environ['ENV_FLG']: print('Environment variables loaded properly.')
else: print('ERROR: Environment variables failed to load.')

Environment variables loaded properly.


## Import Data

In [2]:
def query_db(query):
    # Execute query on PostgreSQL database and return a pandas dataframe
    db_name = os.environ['DB_NAME']
    db_user = os.environ['DB_USER']
    db_pass = os.environ['DB_PASS']
    db_host = os.environ['DB_HOST']
    db_port = os.environ['DB_PORT']

    engine = create_engine(f'postgresql://{db_user}:{db_pass}@{db_host}:{db_port}/{db_name}')
    data_frame = pd.read_sql_query(query, engine)
    return data_frame

data = query_db('select * from bike_trips;')

In [12]:
# Filter columns
columns = ['trip_id','trip_duration','start_station_id','start_time','end_station_id','end_time','user_type']
X = data[columns]
X.head()

Unnamed: 0,trip_id,trip_duration,start_station_id,start_time,end_station_id,end_time,user_type
0,14805109,4335,7334,2022-01-01 00:02:00,7269.0,2022-01-01 01:15:00,Casual Member
1,14805110,126,7443,2022-01-01 00:02:00,7270.0,2022-01-01 00:05:00,Annual Member
2,14805112,942,7399,2022-01-01 00:04:00,7686.0,2022-01-01 00:19:00,Annual Member
3,14805113,4256,7334,2022-01-01 00:04:00,7269.0,2022-01-01 01:15:00,Casual Member
4,14805114,4353,7334,2022-01-01 00:05:00,7038.0,2022-01-01 01:17:00,Casual Member


In [13]:
# Check missing values
missing_val_count_by_column = (X.isnull().sum())
print(missing_val_count_by_column[missing_val_count_by_column > 0])

end_station_id    4318
dtype: int64


## Descriptive Analysis
### Statistics
- Total trips (2022-2024)
- Average trips per day 
### Plots
1. Trips per month over 2022-2024 (to see growth in usage)
2. Average trips per month (to compare winter usage)
3. Average trips by day of week (to determine commuter usage)
4. Trips by hour of day (to determine commuter usage)

In [None]:
# Total Trips (2022-2024)
print(f'Total Trips (01-2022 to 09-2024):\t{len(X):,}')

# Trips per Day 
trips_per_day = X.groupby(X.start_time.dt.date)['trip_id'].count()
mean_trips_per_day = trips_per_day.mean()
print(f'Avg trips per day:\t\t\t{mean_trips_per_day:,.0f}')

# Trips during 'rush hours' on Weekdays
mask_morn_rush_hour = ((X.start_time.dt.time >= pd.Timestamp('07:00:00').time()) & (X.start_time.dt.time <= pd.Timestamp('09:00:00').time()))   # Define morning rush hour times
mask_aftn_rush_hour = ((X.start_time.dt.time >= pd.Timestamp('16:00:00').time()) & (X.start_time.dt.time <= pd.Timestamp('18:00:00').time()))   # Define afternoon rush hour times
X_rush_hour_trips = X.loc[mask_morn_rush_hour | mask_aftn_rush_hour]

rush_hour_trips_per_day = X_rush_hour_trips.groupby(X.start_time.dt.date)['trip_id'].count()
mean_rush_hour_trips_per_day = rush_hour_trips_per_day.mean()
print(f'Average trips during \'rush hour\' per day: \t{mean_rush_hour_trips_per_day:,.0f}')

Total Trips (01-2022 to 09-2024):	15,675,998
Avg trips per day:			15,614


In [None]:
# Plot: Trips by Month, 2022-2024
plt_trips_histogram = px.histogram(X, x="start_time", y="trip_id", histfunc="count")
plt_trips_histogram.update_traces(xbins_size="M1")
plt_trips_histogram.show()

In [None]:
# Plot: Average Trips per Month
trips_by_year_month = X.groupby([X.start_time.dt.year, X.start_time.dt.month])['trip_id'].count().rename_axis(["year","month"])
mean_trips_month = trips_by_year_month.reset_index().groupby('month')['trip_id'].mean()

plt_trips_by_month = px.bar(mean_trips_month)
plt_trips_by_month.show()

In [None]:
# Plot: Trips by Day of the Week
trips_by_day = X.groupby(X.start_time.dt.day_of_week)['trip_id'].count()
plt_trips_by_day = px.bar(trips_by_day)
plt_trips_by_day.show()

In [None]:
# Trip Starts by Hour of Day
trips_by_hour = X.groupby(X.start_time.dt.hour)['trip_id'].count()
plt_trips_by_hour = px.bar(trips_by_hour)
plt_trips_by_hour.show()