**1.	Data Visualization Project Problem**

This project addresses the need for an interactive visualization tool to analyze and interpret air quality trends in the state of Texas, focusing on key pollutants—PM2.5 (Finite Particulate Matter), O3 (Ozone), and NO2 (Nitrogen dioxide). The project will provide insights into pollution patterns over time and across locations, helping to identify seasonal or geographic hotspots and trends.

**2.	Project Introduction and Motivation**

Understanding air quality is essential for public health, particularly in urban areas where pollution levels change based on industrial activity, weather, and traffic. By visualizing historical air quality data for the state of Texas, this project aims to reveal pollution patterns and support decisions that can mitigate harmful health impacts. These insights benefit policymakers, health professionals, and the public by raising awareness and helping to create cleaner, healthier environments.

**3.	Project Goals and Expected Outcomes**

*•	Goals:* Develop visualization tool that allows users to explore Texas air quality data by pollutant, location, and time.

*•	Expected Outcomes:* Visualizations that highlight pollution trends across cities and pollutants over a selected period, identifying pollution spikes, seasonal changes, and comparisons between urban areas. This tool will support informed decision-making and public awareness regarding air quality concerns.

**4.	Dataset Used and Description**

*•	Source:* U.S. Environmental Protection Agency (EPA) Air Quality System (AQS) Data Mart.

*•	Description:* The dataset includes daily records for PM2.5, O3, and NO2 levels across the state of Texas. Each record details at least the date, city/location, pollutant concentration, AQI value, and geographic information. Separate files for each pollutant and year will be combined to enable comprehensive trend analysis and visualization.



Datasets: https://www.epa.gov/outdoor-air-quality-data/download-daily-data

Information: https://www.epa.gov/outdoor-air-quality-data/air-data-basic-information

**PM2.5 (Fine Particulate Matter)**

Due to its small size and ability to penetrate deeply into the lungs and bloodstream, PM2.5 is the most concerning for public health. It’s linked to respiratory and cardiovascular diseases and has widespread sources in urban areas, including vehicles and industrial emissions.

**Ozone (O3)**

Ozone, especially ground-level ozone, forms as a secondary pollutant from reactions between VOCs and NOx in sunlight. It aggravates asthma and respiratory conditions and is a key component of smog, affecting urban populations significantly.

**Nitrogen Dioxide (NO2)**

NO2 is a major contributor to smog and respiratory problems. It’s highly prevalent in cities due to traffic and industrial emissions and poses direct health risks, especially for children and those with pre-existing conditions.

**The AQI (Air Quality Index) Explanation**

AirData uses the Air Quality Index (AQI) in some of its reports and tables and to display data using the visualization tools. The AQI is an index for reporting daily air quality. It tells how clean or polluted the air is, and what associated health effects might be a concern, especially for ground-level ozone and particle pollution.

Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.

An AQI value of 100 generally corresponds to the national air quality standard for the pollutant, which is the level EPA has set to protect public health. AQI values below 100 are generally thought of as satisfactory. When AQI values are above 100, air quality is considered to be unhealthy-at first for certain sensitive groups of people, then for everyone as AQI values get higher.

Each category corresponds to a different level of health concern. The six levels of health concern and what they mean are:

**"Good" AQI is 0 - 50.** Air quality is considered satisfactory, and air pollution poses little or no risk.

**"Moderate" AQI is 51 - 100.** Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people. For example, people who are unusually sensitive to ozone may experience respiratory symptoms.

**"Unhealthy for Sensitive Groups" AQI is 101 - 150.** Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air.

**"Unhealthy" AQI is 151 - 200.** Everyone may begin to experience some adverse health effects, and members of the sensitive groups may experience more serious effects.

**"Very Unhealthy" AQI is 201 - 300.** This would trigger a health alert signifying that everyone may experience more serious health effects.

**"Hazardous" AQI greater than 300.** This would trigger health warnings of emergency conditions. The entire population is more likely to be affected.

In [None]:
# Importing required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Loading the combined CSV file
df = pd.read_csv('Texas_Air_Quality_2019_2024.csv')

# Displaying basic information and a sample of the data
df.head()


Unnamed: 0,Date,Daily AQI Value,Daily Obs Count,AQS Parameter Description,CBSA Name,State,County,Site Latitude,Site Longitude
0,1/1/2019,5,24,NO2,Killeen-Temple,Texas,Bell,31.088002,-97.679734
1,1/2/2019,4,24,NO2,Killeen-Temple,Texas,Bell,31.088002,-97.679734
2,1/3/2019,13,24,NO2,Killeen-Temple,Texas,Bell,31.088002,-97.679734
3,1/4/2019,35,24,NO2,Killeen-Temple,Texas,Bell,31.088002,-97.679734
4,1/5/2019,21,24,NO2,Killeen-Temple,Texas,Bell,31.088002,-97.679734


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 317098 entries, 0 to 317097
Data columns (total 9 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   Date                       317098 non-null  object 
 1   Daily AQI Value            317098 non-null  int64  
 2   Daily Obs Count            317098 non-null  int64  
 3   AQS Parameter Description  317098 non-null  object 
 4   CBSA Name                  317071 non-null  object 
 5   State                      317098 non-null  object 
 6   County                     317098 non-null  object 
 7   Site Latitude              317098 non-null  float64
 8   Site Longitude             317098 non-null  float64
dtypes: float64(2), int64(2), object(5)
memory usage: 21.8+ MB


In [None]:
# Droping all columns with any NaN values
df = df.dropna(axis=1, how='any')

# Converting 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

# Displaying the result to verify
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 317098 entries, 0 to 317097
Data columns (total 8 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   Date                       317098 non-null  datetime64[ns]
 1   Daily AQI Value            317098 non-null  int64         
 2   Daily Obs Count            317098 non-null  int64         
 3   AQS Parameter Description  317098 non-null  object        
 4   State                      317098 non-null  object        
 5   County                     317098 non-null  object        
 6   Site Latitude              317098 non-null  float64       
 7   Site Longitude             317098 non-null  float64       
dtypes: datetime64[ns](1), float64(2), int64(2), object(3)
memory usage: 19.4+ MB


In [5]:
df.describe()

Unnamed: 0,Date,Daily AQI Value,Daily Obs Count,Site Latitude,Site Longitude
count,317098,317098.0,317098.0,317098.0,317098.0
mean,2021-11-21 14:36:22.265419264,33.661754,14.904846,30.72958,-97.265295
min,2019-01-01 00:00:00,0.0,1.0,25.892518,-106.5473
25%,2020-06-16 00:00:00,20.0,1.0,29.686389,-97.69166
50%,2021-11-20 00:00:00,32.0,17.0,30.039524,-96.860117
75%,2023-04-30 00:00:00,45.0,24.0,32.482083,-95.294722
max,2024-11-09 00:00:00,309.0,136.0,35.201592,-93.761341
std,,20.139319,12.208176,1.797124,3.068719
