# Public Transit Reliability vs. Weather

![Banner](./assets/banner.jpeg)

## Topic
*What problem are you (or your stakeholder) trying to address?*


Riders of public transportation and urban planners rely on consistent and trustworthy schedules. However, delays tend to increase during severe weather such as rain, snow, or extreme temperatures. This project examines how weather affects bus and train reliability in major cities. Understanding this relationship can help transit authorities improve scheduling, allocate resources, and communicate delays more effectively.

## Project Question
*What specific question are you seeking to answer with this project?*
*This is not the same as the questions you ask to limit the scope of the project.*
📝 <!-- Answer Below -->

How do different weather conditions (rain, snow, temperature extremes) affect average transit delays and reliability in a major city?

## What would an answer look like?
*What is your hypothesized answer to your question?*
📝 <!-- Answer Below -->

- A line chart showing average bus/train delays by temperature range (e.g., <32°F, 32–70°F, >90°F).

- A bar chart comparing % of late arrivals on rainy vs. sunny vs. snowy days.

- A heatmap of time-of-day delays under different weather conditions.


Hypothesis: Delays will be significantly longer on snowy and rainy days, with extreme cold also contributing to service interruptions.

## Data Sources
*What 3 data sources have you identified for this project?*
*How are you going to relate these datasets?*
📝 <!-- Answer Below -->

1. Transit Real-Time API (GTFS data) – Example: New York MTA, Chicago CTA, or Cincinnati Metro. Provides live bus/train arrival and departure times. (API)

2. NOAA National Weather Service API – Daily and hourly weather data including precipitation, temperature, and conditions. (API)

3. City Open Data Portal – Historical transit delay/performance reports (CSV/Excel). Many cities (NYC, Chicago, DC) publish this. (File dataset)

How to relate datasets:

- Join transit delay records with NOAA weather data using date and time.

- Use city/stop ID or zip code for location-based matching if needed.

## Approach and Analysis  
1. **Data Import & Cleaning**  
   - Pull historical transit data (delays, scheduled vs. actual times) via API or CSV.  
   - Gather daily/hourly weather data for the same timeframe.  
   - Clean and normalize timestamps to ensure alignment.  

2. **Data Integration**  
   - Merge datasets on **date/time** and **location**.  
   - Create new variables such as “delay length in minutes” and “weather condition category.”  

3. **Analysis**  
   - Compute average delay times by weather category (clear, rain, snow, extreme heat/cold).  
   - Visualize trends across time of day, day of week, and severity of weather.  
   - Run simple correlation/regression analysis to see if precipitation levels or temperature extremes predict delay severity.  

4. **Expected Outcome**  
   - Quantifiable evidence that weather significantly increases delay times.  
   - Visualizations showing the most impactful conditions.  

## Resources and References
*What resources and references have you used for this project?*
📝 <!-- Answer Below -->
- [NOAA National Weather Service API](https://www.weather.gov/documentation/services-web-api)  
- [GTFS Realtime Transit Data](https://developers.google.com/transit/gtfs-realtime)  
- [NYC MTA Open Data](https://data.ny.gov/)
- [Chicago CTA Performance Reports](https://www.transitchicago.com/performance/)  
- Course Resources: Warm-Cool-Hard Feedback Protocol, Clean Code Practices 

In [2]:
# ⚠️ Make sure you run this cell at the end of your notebook before every submission!
!jupyter nbconvert --to python source.ipynb

[NbConvertApp] Converting notebook source.ipynb to python
[NbConvertApp] Writing 1271 bytes to source.py
