To see all raw data gathered [click here](https://github.com/anly501/dsan-5000-project-jsweren1/tree/main/dsan-website/5000-website/data)

### Quarterly and Annual Ridership Totals by Mode​ of Transportation [^1]

The initial piece of data that was gathered comes from the American Public Transportation Association, and can serve as an introductory synopsis of the state of public transit ridership over time. This gives a broad view of quarterly ridership across the entire country from 1990 onward. Thus, this data has been chosen for the potential of setting the stage for the problem which we intend to explore.

The raw data and methodology for how it was obtained can be found using this link: https://www.apta.com/research-technical-resources/transit-statistics/ridership-report/

The data itself can be downloaded using this link: https://www.apta.com/wp-content/uploads/APTA-Ridership-by-Mode-and-Quarter-1990-Present.xlsx

To download this data, I used an R API tool, which saves the data in Excel format. Below is the code for this action and a screenshot of the raw data to illustrate its form upon download:

In [None]:
library(readxl)
library(httr)
url1<-'https://www.apta.com/wp-content/uploads/APTA-Ridership-by-Mode-and-Quarter-1990-Present.xlsx'
GET(url1, write_disk(tf <- tempfile(pattern = "APTA-Ridership-by-Mode-and-Quarter-1990-Present", fileext = ".xlsx", tmpdir = "../data")))
df <- read_excel(tf, 2L)
str(df)

![Quarterly and Annual Ridership Totals by Mode​ of Transportation](../images/apta_raw_data.png)

### News API Data [^2]

An essential part of understanding public perception of a topic is by assessing how it is covered in the news. This often informs general opinions, and can introduce conversations that had not previously been in the zeitgeist. Thus, this paper will analyze text data from https://newsapi.org/ to allow us to study news coverage on two distinct public transit systems.

For this project, I will be looking at data regarding the Washington Metropolitan Area Transit Authority (WMATA) and the Bay Area Rapid Transit (BART). Both of these transit systems have several advantages for academic study: they are both large networks with rich histories and connections to their respective cities, there exist robust data sources allowing us to analyze information from several angles, and comparing them will allow us to get perspectives on differences between cities on opposite coasts.

The following shows how I accessed this News API via Python code. This outputs a JSON file as raw data, the start of which is included below each code block to show the nature of the data prior to cleaning.

In [None]:
import requests
import json
import re
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

baseURL = "https://newsapi.org/v2/everything?"
total_requests=2
verbose=True

API_KEY='581fd71df234408291300dc13f0ee6e8'
TOPIC='wmata'

URLpost = {'apiKey': API_KEY,
            'q': '+'+TOPIC,
            'sortBy': 'relevancy',
            'totalRequests': 1}

response = requests.get(baseURL, URLpost)
response = response.json()

with open('WMATA-newapi-raw-data.json', 'w') as outfile:
    json.dump(response, outfile, indent=4)

![Raw JSON data from News API; Topic: WMATA](../images/wmata_news_raw_data.png)

In [None]:
baseURL = "https://newsapi.org/v2/everything?"
total_requests=2
verbose=True

API_KEY='581fd71df234408291300dc13f0ee6e8'
TOPIC='Bay Area Rapid Transit'

URLpost = {'apiKey': API_KEY,
            'q': '+'+TOPIC,
            'sortBy': 'relevancy',
            'totalRequests': 1}

response = requests.get(baseURL, URLpost)
response = response.json()

with open('BART-newapi-raw-data.json', 'w') as outfile:
    json.dump(response, outfile, indent=4)

![Raw JSON data from News API; Topic: Bay Area Rapid Transit](../images/bart_news_raw_data.png)

### Remote Work Trends [^3]

It is reasonable to hypothesize that one of the main factors in public transit usage is people commuting to and from work. The term "rush hour" is a seemingly daily phrase, meaning the times in the morning and evening at which most people go to or return from their occupation. Thus, when COVID-19 struck and many workers were no longer expected to go to work in-person, the need for public transportation decreased drastically.

In the years since, remote work has been a topic of controversy. Many workers enjoy the benefits of privacy and the added time of not having to commute, while employers often cite advantages of being on-site even in office jobs. While in-person work has rebounded recently, much like public transit usage, it has not nearly returned to the prevalence of prior to the pandemic. Therefore, understanding trends surrounding remote work can provide insights on how to analyze public transportation trends.

WFH Research has exhaustive data sets regarding remote work information. For the purposes of this project, we will take into account three data sets. To better understand the controversial aspects of remote work, the first two data sets contain survey information from *(a)* employers and *(b)* workers on what they desire in terms of average remote work days per week. The third data set provides time series information on the amount of working from home (percent of full paid days) for large cities, including Washington, D.C. and the San Francisco Bay Area. Screenshots of the raw data are shown below:

![Remote Work Desires of Employers](../images/wfh_employer.png)

![Remote Work Desires of Workers](../images/wfh_worker.png)

![Remote Work Percentages by City](../images/wfh_city.png)

### Ridership Trends for each City [^4] [^5]

Now that we have background information regarding both cities of concern, the next piece of information to gather is public transit ridership. This will give us comprehensive monthly data from 2018 to 2023 to provide insights on the nature of the decline in public transit, as well as the current recovery.

For WMATA, the data comes from https://www.wmata.com/initiatives/ridership-portal/, and gives simple average daily entries per month. Meanwhile, BART data comes from https://www.bart.gov/about/reports/ridership, which provides reports each month on entries and exits by station. The raw data for WMATA entries, as well as the most recent BART report, are shown below:

![WMATA Average Daily Entries by Month](../images/wmata_monthly_boarding.png)

![September 2023 Report for BART Entries/Exits by Station](../images/bart_sample_boarding.png)

### Ridership by Hour

In addition to the volume of public transit usage, we can glean information on the purpose of public transit usage by analyzing the users by hour of the day. High peaks during "rush hour" likely indicate a great influence of work commuting on the data. Because of this, I downloaded both pre-pandemic and post-pandemic data sets regarding WMATA ridership by hour to view this relationship and whether it has changed due to new circumstances. In this case, March 17, 2020 is chosen as the demarcation date, as that was the day in which the first social distancing precautions were announced in Washington, D.C. The data is shown below:

![Hourly Ridership from 1/1/2018 to 3/17/2020](../images/hourly_pre-covid.png)

![Hourly Ridership from 3/18/2020 to 10/5/2023](../images/hourly_post-covid.png)

### Ridership by Demographic [^6]

In answering the question of whether or not public transit's public service should be the paramount consideration for its efficacy, it is important to understand that it often provides service disproportionally to underprivileged groups. By analyzing demographic data, we can gather insights on who benefits most from robust public transit systems. To address this, there is data from the U.S. Census Bureau that provides 5-year estimates from 2021 of means of transportation to work by selected characteristics. The raw data is shown below:

![2021 5-Year Estimate of Transportation Means by Demographic](../images/ridership_demographic.png)

[^1]: “Ridership Report.” American Public Transportation Association, 21 Sept. 2023, www.apta.com/research-technical-resources/transit-statistics/ridership-report/. 

[^2]: “News API – Search News and Blog Articles on the Web.” News API Â Search News and Blog Articles on the Web, newsapi.org/. Accessed 12 Oct. 2023.

[^3]: Barrero, Jose Maria, et al. Why Working from Home Will Stick, 2021, https://doi.org/10.3386/w28731.

[^4]: “Washington Metropolitan Area Transit Authority.” WMATA, www.wmata.com/initiatives/ridership-portal/. Accessed 12 Oct. 2023. 

[^5]: “Ridership Reports.” Ridership Reports | Bay Area Rapid Transit, www.bart.gov/about/reports/ridership. Accessed 13 Oct. 2023. 

[^6]: U.S. Census Bureau. "MEANS OF TRANSPORTATION TO WORK BY SELECTED CHARACTERISTICS." American Community Survey, ACS 5-Year Estimates Subject Tables, Table S0802, 2021, https://data.census.gov/table/ACSST5Y2021.S0802?t=Commuting&g=860XX00US20020,20032. Accessed on October 12, 2023.