# Title: Data Visualization using matplotlib 

<b>Problem Statement:</b> Analyzing Air Quality Index (AQI) Trends in a City  
<b>Dataset:</b> "City_Air_Quality.csv"<br>
<b>Description:</b> The dataset contains information about air quality measurements in a specific 
city over a period of time. It includes attributes such as date, time, pollutant levels (e.g., PM2.5, 
PM10, CO), and the Air Quality Index (AQI) values. The goal is to use the matplotlib library 
to create visualizations that effectively represent the AQI trends and patterns for different 
pollutants in the city.<br>
<b>Tasks to Perform:</b>
1.  Import the "City_Air_Quality.csv" dataset. 
2.  Explore the dataset to understand its structure and content. 
3. Identify the relevant variables for visualizing AQI trends, such as date, pollutant levels, 
and AQI values. 
4. Create line plots or time series plots to visualize the overall AQI trend over time. 
5. Plot individual pollutant levels (e.g., PM2.5, PM10, CO) on separate line plots to 
visualize their trends over time. 
6. Use bar plots or stacked bar plots to compare the AQI values across different dates or 
time periods. 
7. Create box plots or violin plots to analyze the distribution of AQI values for different 
pollutant categories. 
8. Use scatter plots or bubble charts to explore the relationship between AQI values and 
pollutant levels. 
9. Customize the visualizations by adding labels, titles, legends, and appropriate color 
schemes. 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

## Task 1. Import the "City_Air_Quality.csv" dataset.

In [None]:
# Load the uploaded CSV file
file_path = 'datasets/city_day.csv'
data = pd.read_csv(file_path)

# Display the first few rows and general information about the dataset
data.head(), data.info()

## <b>Task 2. and 3.</b>  Explore the dataset and Identify the relevant variables 

In [None]:
# Convert 'Date' column to datetime format
data['Date'] = pd.to_datetime(data['Date'], errors='coerce')

# Check the percentage of missing values in each column to decide on handling strategy
missing_percentage = data.isnull().mean() * 100
missing_percentage.sort_values(ascending=False)

## Remaning Task. Data Visualization

In [None]:
# Forward fill then backward fill to handle remaining missing values
data = data.fillna(method='ffill').fillna(method='bfill')

# Drop any rows with missing 'AQI' values as they are critical for analysis
data = data.dropna(subset=['AQI'])

# Verify that there are no missing values left
print(data.isnull().sum())


In [None]:
# Calculate average AQI by date
aqi_trend = data.groupby('Date')['AQI'].mean()

# Plot AQI trend over time
plt.figure(figsize=(14, 6))
plt.plot(aqi_trend.index, aqi_trend.values, label='Average AQI', color='blue')
plt.xlabel('Date')
plt.ylabel('AQI')
plt.title('AQI Trend Over Time')
plt.legend()
plt.show()

In [None]:
pollutants = ['PM2.5', 'PM10', 'CO']
for pollutant in pollutants:
    plt.figure(figsize=(14, 6))
    pollutant_trend = data.groupby('Date')[pollutant].mean()
    plt.plot(pollutant_trend.index, pollutant_trend.values, label=f'{pollutant} Levels', color='orange')
    plt.xlabel('Date')
    plt.ylabel(pollutant)
    plt.title(f'{pollutant} Trend Over Time')
    plt.legend()
    plt.show()

In [None]:
plt.figure(figsize=(14, 6))
sns.boxplot(x='AQI_Bucket', y='AQI', data=data, palette='Set3')
plt.xlabel('AQI Category')
plt.ylabel('AQI')
plt.title('AQI Distribution by AQI Category')
plt.show()

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='PM2.5', y='AQI', data=data, hue='City', alpha=0.5, palette='viridis')
plt.xlabel('PM2.5 Levels')
plt.ylabel('AQI')
plt.title('Relationship Between PM2.5 and AQI')
plt.legend(bbox_to_anchor=(1, 1))
plt.show()