## MTA Daily Ridership

On March 8 2021, The New York Times published an article named "[How Corona Virus Has Changed New York Cit Transit in One Chart](https://www.nytimes.com/interactive/2021/03/08/climate/nyc-transit-covid.html)". The chart looks like the following:

<div>
<img src="https://static01.nyt.com/images/2021/03/07/us/nyc-transit-covid-promo-1615150889393/nyc-transit-covid-promo-1615150889393-superJumbo.png" width="800"/>
</div>

This chart shows the percentage of decline of ridership for bridges/tunnels, subways, buses, LIRR and Metro North. It visualizes the profound disruption of the pandemic on the large public transit system in New York City. It also shows that although the daily ridership has bounced back somewhat by March 2021, it has not fully recovered to the pre-pandemic level. It is interesting to extend this chart to include more recent data to see if we have recovered from the pandemic disruption by now.

In this assignment, your task is to reproduce and extend this chart to December 2023. The following dataset is used:

* [MTA Daily Ridership Data: Beginning 2020](https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew/about_data)

The chart should be a line chart just like the one in the New York Times. Here are the requirements:

* The X axis should go from March 2020 all the way to Dec 31, 2023.
* The Y axis should show the percentage decline from the pre-pandemic ridership level.
* There should be 5 curves corresponding to bridges/tunnels, subway, buses, LIRR and Metro North just like in the original NYT chart.
* The data to encode in the Y-axis should be the 3-day moving average of the daily ridership data. I.e. The data used for Jan 3, 2023 should be the average of the data of Jan 1, Jan 2 and Jan 3 in 2023.
* Each curve should be labeled at the end of the curve (i.e. with a dot and text at the location of the last data point).
* A vertical line showing the date of New York lockdown on March 22, 2020.
* A horizontal line showing the 100% level.

Please submit the complete notebook and the resulting visualization in .png, .svg or .html format.

In [28]:
import altair as alt
import pandas as pd

url = "https://github.com/qnzhou/practical_data_visualization_in_python/files/14484180/MTA_Daily_Ridership_Data__Beginning_2020_20240304.csv"
data = pd.read_csv(url)

In [29]:
data['Date'] = pd.to_datetime(data['Date'])
# Cutoff 12/31/2023 (This impacts the point label text positions on LIRR and Metro-North)
data = data[data['Date'] <= pd.Timestamp('12/31/2023')]
data.tail()

Unnamed: 0,Date,Subways: Total Estimated Ridership,Subways: % of Comparable Pre-Pandemic Day,Buses: Total Estimated Ridership,Buses: % of Comparable Pre-Pandemic Day,LIRR: Total Estimated Ridership,LIRR: % of Comparable Pre-Pandemic Day,Metro-North: Total Estimated Ridership,Metro-North: % of Comparable Pre-Pandemic Day,Access-A-Ride: Total Scheduled Trips,Access-A-Ride: % of Comparable Pre-Pandemic Day,Bridges and Tunnels: Total Traffic,Bridges and Tunnels: % of Comparable Pre-Pandemic Day,Staten Island Railway: Total Estimated Ridership,Staten Island Railway: % of Comparable Pre-Pandemic Day
1396,2023-12-27,2912007,0.55,955890,0.48,220010.0,0.7,186541,0.67,25945,0.89,881140,0.99,5373.0,0.34
1397,2023-12-28,3064841,0.57,969231,0.48,218546.0,0.69,193235,0.69,26462,0.91,887521,1.0,5395.0,0.35
1398,2023-12-29,3198885,0.6,1007697,0.5,237781.0,0.75,210361,0.75,26892,0.92,909062,1.03,5784.0,0.37
1399,2023-12-30,2440211,0.74,720564,0.57,121688.0,0.95,121647,0.77,16983,0.99,833538,0.94,2780.0,0.56
1400,2023-12-31,1934651,0.76,548344,0.56,99817.0,0.94,89554,0.83,20005,1.12,671873,0.84,2029.0,0.6


In [30]:
def moving_average(df, column_name, k=3):
    assert k > 1, 'k must be at least 2'
    avg_df = pd.DataFrame(data={'Date': [], '% of Comparable Pre-Pandemic Day': []}).astype({'Date': 'object', '% of Comparable Pre-Pandemic Day': 'float64'})
    for index in range(k - 1, len(df)):
        sum_values = 0
        is_valid = True
        for i in range(k):
            curr = df[column_name].iloc[index - i]
            if pd.isna(curr):  # Checking for NaN values
                is_valid = False
                break
            sum_values += curr
        if not is_valid:
            continue
        date = df['Date'].iloc[index]
        avg = sum_values / k
        avg_adj = (avg - 1)
        avg_df.loc[len(avg_df.index)] = [date, avg_adj]
    avg_df['Date'] = pd.to_datetime(avg_df['Date'])
    return avg_df

bridges_and_tunnels = moving_average(data, 'Bridges and Tunnels: % of Comparable Pre-Pandemic Day')
buses = moving_average(data, 'Buses: % of Comparable Pre-Pandemic Day')
subways = moving_average(data, 'Subways: % of Comparable Pre-Pandemic Day')
lirr = moving_average(data, 'LIRR: % of Comparable Pre-Pandemic Day')
metro_north = moving_average(data, 'Metro-North: % of Comparable Pre-Pandemic Day')

In [32]:
def create_last_point_mark(df, color, name, column_name, dx=0, dy=0):
    point_df = pd.DataFrame([df.sort_values('Date').iloc[-1]])
    print(point_df)
    # Create the mark for the last point
    last_point = alt.Chart(point_df).mark_point(color=color, size=100, fill=color).encode(
        x=alt.X('Date:T', title=''),
        y=alt.Y(f"{column_name}:Q", title='')
    )
    # Create the text label for the last point
    last_label = alt.Chart(point_df).mark_text(
        text=name, dx=10+dx, dy=dy, color=color, align='left'
    ).encode(
        x='Date:T',
        y=f"{column_name}:Q"
    )
    # Combine the point and label
    return last_point + last_label

last_point_bridges_and_tunnels = create_last_point_mark(bridges_and_tunnels, 'blue', 'Bridges and Tunnels', '% of Comparable Pre-Pandemic Day')
last_point_buses = create_last_point_mark(buses, 'orange', 'Bus', '% of Comparable Pre-Pandemic Day')
last_point_subways = create_last_point_mark(subways, 'green', 'Subway', '% of Comparable Pre-Pandemic Day')
last_point_lirr = create_last_point_mark(lirr, 'red', 'LIRR', '% of Comparable Pre-Pandemic Day')
last_point_metro_north = create_last_point_mark(metro_north, 'lightblue', 'Metro-North', '% of Comparable Pre-Pandemic Day')

           Date  % of Comparable Pre-Pandemic Day
1398 2023-12-31                         -0.063333
           Date  % of Comparable Pre-Pandemic Day
1398 2023-12-31                         -0.456667
           Date  % of Comparable Pre-Pandemic Day
1398 2023-12-31                              -0.3
           Date  % of Comparable Pre-Pandemic Day
1397 2023-12-31                             -0.12
           Date  % of Comparable Pre-Pandemic Day
1398 2023-12-31                         -0.216667


In [33]:
# Create individual line charts for each dataset
chart_bridges_and_tunnels = alt.Chart(bridges_and_tunnels).mark_line().encode(
    x=alt.X('Date:T', 
            axis=alt.Axis(labelExpr="month(datum.value) === 0 ? timeFormat(datum.value, '%Y') : (month(datum.value) === 3 || month(datum.value) === 6 || month(datum.value) === 9 ? timeFormat(datum.value, '%B') : '')",
                          tickCount={'interval': 'month', 'step': 3}),
            title=''),
    y=alt.Y('% of Comparable Pre-Pandemic Day:Q',
            axis=alt.Axis(format='%', tickCount=8),
            title=''),
    color=alt.value('blue'),
    opacity=alt.value(0.75)
)

chart_buses = alt.Chart(buses).mark_line().encode(
    x='Date:T',
    y=alt.Y('% of Comparable Pre-Pandemic Day:Q'),
    color=alt.value('orange'),
    opacity=alt.value(0.75)
)

chart_subways = alt.Chart(subways).mark_line().encode(
    x='Date:T',
    y=alt.Y('% of Comparable Pre-Pandemic Day:Q'),
    color=alt.value('green'),
    opacity=alt.value(0.75)
)

chart_lirr = alt.Chart(lirr).mark_line().encode(
    x='Date:T',
    y=alt.Y('% of Comparable Pre-Pandemic Day:Q'),
    color=alt.value('red'),
    opacity=alt.value(0.75)
)

chart_metro_north = alt.Chart(metro_north).mark_line().encode(
    x='Date:T',
    y=alt.Y('% of Comparable Pre-Pandemic Day:Q'),
    color=alt.value('lightblue'),
    opacity=alt.value(0.75)
)

horizontal_rule = alt.Chart(pd.DataFrame({'Value': [0]})).mark_rule(color='black', strokeWidth=2, opacity=0.4).encode(
    y='Value:Q'
)

vertical_rule = alt.Chart(pd.DataFrame({'Value': ['03/22/2020']})).mark_rule(color='black', strokeWidth=2, opacity=0.4, strokeDash=[5, 5]).encode(
    x='Value:T'
)

rule_title = alt.TitleParams(
    text='New York Lockdown',
    fontSize=12,
    color='black',
    fontWeight=500,
    align='left',
    anchor='end',   
    subtitle='March 22',
    subtitleColor='black',
    subtitleFontSize=12,
    subtitleFontWeight=500,
    dx=-1290,
    dy=290
)

chart_title_mark = alt.Chart().mark_text(
    text='Percent decline from 2019 MTA ridership (Rolling three-day average)',
    align='left',
    baseline='top',
    dx=-35,
    dy=-30,
    fontSize=24,
    fontWeight=500,
    color='black'
).encode(
    x=alt.value(0),
    y=alt.value(0)
)

# Layering the individual charts
combined_chart = alt.layer(chart_bridges_and_tunnels, chart_buses, chart_subways, chart_lirr, chart_metro_north, 
                           horizontal_rule, vertical_rule, last_point_bridges_and_tunnels, last_point_buses, 
                           last_point_subways, last_point_lirr, last_point_metro_north, chart_title_mark).properties(title=rule_title, height=600, width=1200).resolve_scale(y='shared')

combined_chart


In [None]:
combined_chart.save('chart.png')