### Maven Commuter Challenge

Create an interactive dashboard illustrating post-pandemic ridership recovery trends across the MTA's services.

#### About The Data Set:

The daily ridership dataset provides systemwide ridership and traffic estimates for the Metropolitan Transportation Authority's (MTA) different services beginning March 1st, 2020, and provides a percentage comparison against pre-pandemic figures.

#### Challenge Objective:

This challenge is a collaboration between Maven Analytics and Plotly!

For the Maven Commuter Challenge, you work as a Data Visualization Specialist for the Data & Analytics Team at the Metropolitan Transportation Authority (MTA), North America's largest transportation network.

Your task is to create an interactive visual or dashboard that illustrates post-pandemic ridership recovery trends across the MTA's services.

In [1]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

In [2]:
# load the dataset
mta_data = pd.read_csv("Dataset/MTA_Daily_Ridership.csv")
mta_data.shape

(1706, 15)

In [3]:
# take a look at the data
mta_data.head()

Unnamed: 0,Date,Subways: Total Estimated Ridership,Subways: % of Comparable Pre-Pandemic Day,Buses: Total Estimated Ridership,Buses: % of Comparable Pre-Pandemic Day,LIRR: Total Estimated Ridership,LIRR: % of Comparable Pre-Pandemic Day,Metro-North: Total Estimated Ridership,Metro-North: % of Comparable Pre-Pandemic Day,Access-A-Ride: Total Scheduled Trips,Access-A-Ride: % of Comparable Pre-Pandemic Day,Bridges and Tunnels: Total Traffic,Bridges and Tunnels: % of Comparable Pre-Pandemic Day,Staten Island Railway: Total Estimated Ridership,Staten Island Railway: % of Comparable Pre-Pandemic Day
0,2020-03-01,2212965,97,984908,99,86790,100,55825,59,19922,113,786960,98,1636,52
1,2020-03-02,5329915,96,2209066,99,321569,103,180701,66,30338,102,874619,95,17140,107
2,2020-03-03,5481103,98,2228608,99,319727,102,190648,69,32767,110,882175,96,17453,109
3,2020-03-04,5498809,99,2177165,97,311662,99,192689,70,34297,115,905558,98,17136,107
4,2020-03-05,5496453,99,2244515,100,307597,98,194386,70,33209,112,929298,101,17203,108


In [4]:
# Datatypes
mta_data.dtypes

Date                                                       object
Subways: Total Estimated Ridership                          int64
Subways: % of Comparable Pre-Pandemic Day                   int64
Buses: Total Estimated Ridership                            int64
Buses: % of Comparable Pre-Pandemic Day                     int64
LIRR: Total Estimated Ridership                             int64
LIRR: % of Comparable Pre-Pandemic Day                      int64
Metro-North: Total Estimated Ridership                      int64
Metro-North: % of Comparable Pre-Pandemic Day               int64
Access-A-Ride: Total Scheduled Trips                        int64
Access-A-Ride: % of Comparable Pre-Pandemic Day             int64
Bridges and Tunnels: Total Traffic                          int64
Bridges and Tunnels: % of Comparable Pre-Pandemic Day       int64
Staten Island Railway: Total Estimated Ridership            int64
Staten Island Railway: % of Comparable Pre-Pandemic Day     int64
dtype: obj

In [5]:
# Descriptive Statistics
mta_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Subways: Total Estimated Ridership,1706.0,2509055.0,1062184.0,198399.0,1715396.0,2459607.0,3440053.25,5498809.0
Subways: % of Comparable Pre-Pandemic Day,1706.0,55.46131,19.8196,7.0,40.0,61.0,69.0,143.0
Buses: Total Estimated Ridership,1706.0,1006868.0,440379.9,5498.0,715249.5,1140776.5,1347619.5,2244515.0
Buses: % of Comparable Pre-Pandemic Day,1706.0,54.69285,19.29331,1.0,53.0,60.0,65.0,126.0
LIRR: Total Estimated Ridership,1706.0,135960.1,71298.78,1903.0,78689.75,124274.0,197928.0,321569.0
LIRR: % of Comparable Pre-Pandemic Day,1706.0,59.12837,29.29799,2.0,37.0,60.0,76.0,237.0
Metro-North: Total Estimated Ridership,1706.0,114888.3,66500.21,3281.0,51271.25,108237.0,176789.75,249585.0
Metro-North: % of Comparable Pre-Pandemic Day,1706.0,51.08324,26.13731,3.0,29.0,56.0,71.0,193.0
Access-A-Ride: Total Scheduled Trips,1706.0,21941.53,7990.635,2506.0,15869.5,22301.5,27506.75,40468.0
Access-A-Ride: % of Comparable Pre-Pandemic Day,1706.0,86.1653,24.64506,13.0,72.0,84.0,104.0,144.0


In [6]:
# Concise Information
mta_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1706 entries, 0 to 1705
Data columns (total 15 columns):
 #   Column                                                   Non-Null Count  Dtype 
---  ------                                                   --------------  ----- 
 0   Date                                                     1706 non-null   object
 1   Subways: Total Estimated Ridership                       1706 non-null   int64 
 2   Subways: % of Comparable Pre-Pandemic Day                1706 non-null   int64 
 3   Buses: Total Estimated Ridership                         1706 non-null   int64 
 4   Buses: % of Comparable Pre-Pandemic Day                  1706 non-null   int64 
 5   LIRR: Total Estimated Ridership                          1706 non-null   int64 
 6   LIRR: % of Comparable Pre-Pandemic Day                   1706 non-null   int64 
 7   Metro-North: Total Estimated Ridership                   1706 non-null   int64 
 8   Metro-North: % of Comparable Pre-Pande

In [7]:
# check for null/missing values
mta_data.isnull().sum()

Date                                                       0
Subways: Total Estimated Ridership                         0
Subways: % of Comparable Pre-Pandemic Day                  0
Buses: Total Estimated Ridership                           0
Buses: % of Comparable Pre-Pandemic Day                    0
LIRR: Total Estimated Ridership                            0
LIRR: % of Comparable Pre-Pandemic Day                     0
Metro-North: Total Estimated Ridership                     0
Metro-North: % of Comparable Pre-Pandemic Day              0
Access-A-Ride: Total Scheduled Trips                       0
Access-A-Ride: % of Comparable Pre-Pandemic Day            0
Bridges and Tunnels: Total Traffic                         0
Bridges and Tunnels: % of Comparable Pre-Pandemic Day      0
Staten Island Railway: Total Estimated Ridership           0
Staten Island Railway: % of Comparable Pre-Pandemic Day    0
dtype: int64

In [8]:
# check for duplicate records
mta_data.duplicated().sum()

0

In [9]:
# Convert 'Date' column to datetime format
mta_data['Date'] = pd.to_datetime(mta_data['Date'])

In [10]:
# Simplify column names for readability in visuals
mta_data.columns = [
    'Date', 'Subways_Ridership', 'Subways_Pre_Pandemic_Percent', 
    'Buses_Ridership', 'Buses_Pre_Pandemic_Percent', 
    'LIRR_Ridership', 'LIRR_Pre_Pandemic_Percent', 
    'MetroNorth_Ridership', 'MetroNorth_Pre_Pandemic_Percent', 
    'AccessARide_Trips', 'AccessARide_Pre_Pandemic_Percent', 
    'BridgesTunnels_Traffic', 'BridgesTunnels_Pre_Pandemic_Percent', 
    'StatenIsland_Ridership', 'StatenIsland_Pre_Pandemic_Percent'
]

In [11]:
mta_data.head()

Unnamed: 0,Date,Subways_Ridership,Subways_Pre_Pandemic_Percent,Buses_Ridership,Buses_Pre_Pandemic_Percent,LIRR_Ridership,LIRR_Pre_Pandemic_Percent,MetroNorth_Ridership,MetroNorth_Pre_Pandemic_Percent,AccessARide_Trips,AccessARide_Pre_Pandemic_Percent,BridgesTunnels_Traffic,BridgesTunnels_Pre_Pandemic_Percent,StatenIsland_Ridership,StatenIsland_Pre_Pandemic_Percent
0,2020-03-01,2212965,97,984908,99,86790,100,55825,59,19922,113,786960,98,1636,52
1,2020-03-02,5329915,96,2209066,99,321569,103,180701,66,30338,102,874619,95,17140,107
2,2020-03-03,5481103,98,2228608,99,319727,102,190648,69,32767,110,882175,96,17453,109
3,2020-03-04,5498809,99,2177165,97,311662,99,192689,70,34297,115,905558,98,17136,107
4,2020-03-05,5496453,99,2244515,100,307597,98,194386,70,33209,112,929298,101,17203,108


### Exploratory Data Analysis (EDA):

In [12]:
# Visualizing Subway Ridership and Recovery Percentage Over Time

# Line chart for Subway Ridership
fig = px.line(mta_data, x='Date', y='Subways_Ridership', title='Daily Subway Ridership Over Time')
fig.show()

# Line chart for Subway Ridership as Percentage of Pre-Pandemic Level
fig = px.line(mta_data, x='Date', y='Subways_Pre_Pandemic_Percent', title='Subway Recovery Percentage Over Time')
fig.show()

In [13]:
# Visualizing Buses Ridership and Recovery Percentage Over Time

# Line chart for Buses Ridership
fig = px.line(mta_data, x='Date', y='Buses_Ridership', title='Daily Buses Ridership Over Time')
fig.show()

# Line chart for Buses Ridership as Percentage of Pre-Pandemic Level
fig = px.line(mta_data, x='Date', y='Buses_Pre_Pandemic_Percent', title='Buses Recovery Percentage Over Time')
fig.show()

In [None]:
# Visualizing LIRR Ridership and Recovery Percentage Over Time

# Line chart for LIRR Ridership
fig = px.line(mta_data, x='Date', y='LIRR_Ridership', title='Daily LIRR Ridership Over Time')
fig.show()

# Line chart for LIRR Ridership as Percentage of Pre-Pandemic Level
fig = px.line(mta_data, x='Date', y='LIRR_Pre_Pandemic_Percent', title='LIRR Recovery Percentage Over Time')
fig.show()

In [None]:
# Visualizing MetroNorth Ridership and Recovery Percentage Over Time

# Line chart for MetroNorth Ridership
fig = px.line(mta_data, x='Date', y='MetroNorth_Ridership', title='Daily MetroNorth Ridership Over Time')
fig.show()

# Line chart for MetroNorth Ridership as Percentage of Pre-Pandemic Level
fig = px.line(mta_data, x='Date', y='MetroNorth_Pre_Pandemic_Percent', title='MetroNorth Recovery Percentage Over Time')
fig.show()

In [16]:
# Visualizing Access-A-Ride Trips and Recovery Percentage Over Time

# Line chart for Access-A-Ride Trips
fig = px.line(mta_data, x='Date', y='AccessARide_Trips', title='Daily Access-A-Ride Trips Over Time')
fig.show()

# Line chart for Access-A-Ride Trips as Percentage of Pre-Pandemic Level
fig = px.line(mta_data, x='Date', y='AccessARide_Pre_Pandemic_Percent', title='Access-A-Ride Trips Recovery Percentage Over Time')
fig.show()