Introduction: I was talking to my India last month and the topic of air quality got mentioned. Thats when he mentioned that breathing in the air in Delhi daily is the equivalence of smoking 50 cigarettes a day. This seemed like an exaggeration or outright lie which to my horror wasn't. Air pollution in urban areas aren't a joke and are often overlooked. Many people might see a major modern city such as New York City and overlook air pollution such as smog. However, the more I research into this topic, the more I realize that pollution in major cities are too overlooked. I was reading the news earlier this year and also saw this headline of a Stanford professor. He had kids and didn't even smoke, however was diagnosed with stage 4 lung cancer. With his non-smoking lifestyle and while even teaching medicine, he would be the last person you'd expect to be diagnosed with lung cancer. This was a story that really woke me up to the affects of air pollution. If a non-smoking man can get lung cancer from air pollution, the air is only going to be more harmful to smokers and people with incrased liklihood of cancers. I aim to use data science to see how we can observe trends in air pollution. These questions answered can be used to help advocate and show others how dangerous and important air quality actually is.

Question 1: How does time of day affect air pollution?

Question 2: Can we use previous data to predict when a spike will occur in air pollution?



Sources:
https://www.ehn.org/like-smoking-50-cigarrettes-daily-delhis-air-pollution-reaches-hazardous-levels
https://www.youtube.com/watch?v=2O-k6TuMm6k

In [15]:
!pip install openmeteo-requests
!pip install requests-cache retry-requests numpy pandas

import openmeteo_requests

import pandas as pd
import requests_cache
from retry_requests import retry

# Setup the Open-Meteo API client with cache and retry on error
#GIVEN BY API
cache_session = requests_cache.CachedSession('.cache', expire_after = 3600)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

# Set location to Northeastern
#For that data, I'm gathering ozone, and particular matter since they are
#one of the most dangerous types of air pollution.
url = "https://air-quality-api.open-meteo.com/v1/air-quality"
params = {
	"latitude": 42.3398,
	"longitude": 71.0892,
	"hourly": ["ozone", "pm10", "pm2_5"],
	"timezone": "America/New_York",
}
responses = openmeteo.weather_api(url, params=params)
response = responses[0]

hourly = response.Hourly()
hourly_ozone = hourly.Variables(0).ValuesAsNumpy()
hourly_pm10 = hourly.Variables(1).ValuesAsNumpy()
hourly_pm2_5 = hourly.Variables(2).ValuesAsNumpy()

hourly_data = {"date": pd.date_range(
	start = pd.to_datetime(hourly.Time(), unit = "s", utc = True),
	end = pd.to_datetime(hourly.TimeEnd(), unit = "s", utc = True),
	freq = pd.Timedelta(seconds = hourly.Interval()),
	inclusive = "left"
)}

hourly_data["ozone"] = hourly_ozone
hourly_data["pm10"] = hourly_pm10
hourly_data["pm2_5"] = hourly_pm2_5

hourly_dataframe = pd.DataFrame(data = hourly_data)
print("\nHourly data\n", hourly_dataframe)


Hourly data
                          date  ozone       pm10  pm2_5
0   2025-10-05 04:00:00+00:00   94.0  18.799999   14.3
1   2025-10-05 05:00:00+00:00   99.0  14.700000   11.3
2   2025-10-05 06:00:00+00:00  102.0  12.300000    9.7
3   2025-10-05 07:00:00+00:00  100.0  10.900000    8.8
4   2025-10-05 08:00:00+00:00   96.0   9.900000    8.2
..                        ...    ...        ...    ...
115 2025-10-09 23:00:00+00:00   79.0   6.200000    5.9
116 2025-10-10 00:00:00+00:00   80.0   6.200000    5.9
117 2025-10-10 01:00:00+00:00   76.0   6.200000    5.8
118 2025-10-10 02:00:00+00:00   69.0   6.300000    5.9
119 2025-10-10 03:00:00+00:00   68.0   6.000000    5.7

[120 rows x 4 columns]


The data given can be used to solve the first question or at least provide insights. We can see the different hours of days for a 5 day period. We can take the info given and use it to see if there are any observable trends in the air pollution as each day cycles through. For instance, we can see if there is a lower air pollution after sunset. We can also used the option to get extended times for air pollution and find where there are spikes of air pollution. We can observe the air pollution prior to the spike and see if there are any signs that would signal the spike. Such as a growing ammount of pollution for extended periods of time.