# **Air Pollution in China**

## **Air Pollution**

**​Air pollution** refers to the presence of harmful substances in the air, which can be detrimental to human health and the environment. These pollutants include **particulate matter (PM), nitrogen dioxide (NO₂), sulfur dioxide (SO₂), carbon monoxide (CO), ozone (O₃)**, and **lead (Pb)**, commonly originating from sources like *vehicle emissions, industrial activities, and household combustion devices .*

Air pollutants are categorized as:

* **Primary pollutants**: Emitted directly into the atmosphere, such as carbon monoxide from vehicles.

* **Secondary pollutants**: Formed in the atmosphere through reactions between primary pollutants, like ground-level ozone resulting from the reaction of nitrogen oxides and volatile organic compounds under sunlight.

The health impacts of air pollution are significant. Exposure can lead to respiratory diseases (like asthma), cardiovascular problems, and even premature death. According to the **World Health Organization**, air pollution is responsible for approximately *7 million* premature deaths annually worldwide .

Environmental consequences include **acid rain**, which harms ecosystems; **smog formation**, reducing visibility; and contribution to climate change through greenhouse gas emissions. Additionally, air pollution can damage buildings and monuments, leading to economic costs.

## **China's Case**

Air pollution in China is a critical public health and environmental issue, contributing to approximately **2 million deaths annually** . Major sources include **coal-fired power plants, industrial emissions, vehicular exhaust, and household use of solid fuels** . These pollutants lead to severe health problems such as respiratory infections, heart disease, stroke, and lung cancer .

The rapid industrialization and urbanization over the past decades have intensified air quality challenges. Despite significant investments in renewable energy, coal remains a dominant energy source, exacerbating pollution levels . Additionally, natural phenomena like dust storms further deteriorate air quality .​

Recognizing the severity, China has implemented measures to combat air pollution, including setting stricter emission standards, promoting electric vehicles, and enhancing air quality monitoring systems .

However, balancing economic growth with environmental sustainability remains a complex challenge. Thus, this project sought to analyse the **Air Quality Index** in five major cities across China namely: **Shenzhen, Guangzhou, Chengdu, Shanghai and Beijing**.

In [None]:
import ee
import geemap

# Initialize Earth Engine
ee.Authenticate()
ee.Initialize(project='-----Insert ProjectID-----')

# Define the China boundary
china_boundary = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017").filter(ee.Filter.eq('country_na', 'China'))

# Create a map centered on China
Map = geemap.Map(center=[35, 105], zoom=4, height=500)
Map.add('Esri.WorldStreetMap')

# Add the China boundary to the map
Map.addLayer(china_boundary, {}, 'China Boundary')


# Define city coordinates and names
cities = {
    'Shenzhen': [22.5431, 114.0579],
    'Beijing': [39.9042, 116.4074],
    'Shanghai': [31.2304, 121.4737],
    'Chengdu': [30.6667, 104.0667],
    'Guangzhou': [23.1291, 113.2644]
}

# Add cities as points to the map
for city, coords in cities.items():
    point = ee.Geometry.Point(coords[1], coords[0]) # Note: Longitude, Latitude order
    Map.addLayer(point, {'color': 'red'}, city)

# Display the map
Map


# **Analysis**

## Mount To Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
file_path = '/content/drive/My Drive/Data Analytics/Data/air_pollution_china.csv'
import pandas as pd
df = pd.read_csv(file_path)

## Viewing Sample Data from the Generated Data Frame

In [None]:
df.head()

## Preliminary Statistical Summary

In [None]:
df.describe()

# **Breakdown in Air Pollutant Trends**

## Particulate Matter

**Particulate Matter (PM)** refers to a mixture of tiny solid particles and liquid droplets found in the air. These particles can vary in size, composition, and origin, and are classified based on their diameter:

1. **PM10**: Particles with diameters of 10 micrometers or less (e.g., dust, pollen).

2. **PM2.5**: Fine particles with diameters of 2.5 micrometers or less — about 30 times smaller than the width of a human hair. These are especially harmful as they can penetrate deep into the lungs and even enter the bloodstream.

**Sources of PM include**:

* **Natural**: Dust storms, sea spray, wildfires.

* **Human-made**: Vehicle exhaust, industrial emissions, construction activities, and burning of fossil fuels.

In [None]:
# A horizontal bar chart for the Average PM2.5 (µg/m³) per City.

import matplotlib.pyplot as plt

# Calculate the average PM2.5 per city
average_pm25_by_city = df.groupby('City')['PM2.5 (µg/m³)'].mean().sort_values()

# Create the horizontal bar chart
plt.figure(figsize=(15, 6))  # Adjust figure size as needed
plt.barh(average_pm25_by_city.index, average_pm25_by_city.values, color = "skyblue")
plt.xlabel('Average PM2.5 (µg/m³)')
plt.ylabel('')
plt.title('PM2.5 per City')
plt.tight_layout()  # Adjust layout to prevent labels from overlapping
plt.savefig('Average PM2.5 In China.png')
plt.show()


In [None]:
# A horizontal bar chart for the Average PM10 (µg/m³) per City.

import matplotlib.pyplot as plt

# Calculate the average PM10 per city
average_pm25_by_city = df.groupby('City')['PM10 (µg/m³)'].mean().sort_values()

# Create the horizontal bar chart
plt.figure(figsize=(15, 6))  # Adjust figure size as needed
plt.barh(average_pm25_by_city.index, average_pm25_by_city.values, color = "darkblue")
plt.xlabel('Average PM10 (µg/m³)')
plt.ylabel('')
plt.title('PM10 per City')
plt.tight_layout()  # Adjust layout to prevent labels from overlapping
plt.savefig('Average PM10 In China.png')
plt.show()

## Nitrogen Dioxide

**Nitrogen Dioxide (NO₂)** is a reddish-brown gas with a sharp, biting odor, and it is one of the most significant air pollutants in urban environments. It belongs to a group of gases known as **nitrogen oxides (NOₓ)**, which are primarily produced from the burning of fossil fuels.

**Sources**:
* Vehicle emissions (especially diesel engines)

* Power plants

* Industrial processes

* Combustion of wood and coal

In [None]:
# Calculate the average NO2 per year
average_no2_by_year = df.groupby('Year')['NO2 (µg/m³)'].mean()

# Create the combined column and line chart
plt.figure(figsize=(15, 7))

# Column chart for NO2
plt.bar(average_no2_by_year.index, average_no2_by_year.values, color='olive', label='Average NO2')

# Line chart for NO2 trend
plt.plot(average_no2_by_year.index, average_no2_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=1, label='NO2 Trend')


plt.xlabel('Year')
plt.ylabel('Average NO2 (µg/m³)')
plt.title('Nitrogen Dioxide Trends in China')
plt.legend(loc = "upper left", frameon	= False)
plt.grid(False)
plt.tight_layout()
plt.savefig('Nitrogen (IV) Oxide Trends In China.png')
plt.show()


## Sulphur Dioxide

**Sulphur Dioxide (SO₂)** is a colorless gas with a sharp, irritating smell. It is a significant air pollutant, mainly produced by human activities involving the burning of fossil fuels that contain sulfur.

**Sources**:
* Coal and oil combustion in power plants and industrial facilities

* Oil refineries

* Volcanic eruptions (natural source)

* Burning of high-sulfur fuels in vehicles and ships

In [None]:
# Calculate the average SO2 per year
average_so2_by_year = df.groupby('Year')['SO2 (µg/m³)'].mean()

# Create the combined column and line chart
plt.figure(figsize=(15, 7))

# Column chart for SO2
plt.bar(average_so2_by_year.index, average_so2_by_year.values, color='gold', label='Average SO2')

# Line chart for SO2 trend
plt.plot(average_so2_by_year.index, average_so2_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=1, label='SO2 Trend')

plt.xlabel('Year')
plt.ylabel('Average SO2 (µg/m³)')
plt.title('Sulfur Dioxide Trends in China')
plt.legend(loc="upper left", frameon=False)
plt.grid(False)
plt.tight_layout()
plt.savefig('Sulfur (IV) Oxide Trends In China.png')
plt.show()


## Carbon Monoxide

**Carbon Monoxide (CO)** is a colorless, odorless, and tasteless gas that is toxic to humans and animals. It forms when carbon-containing fuels such as gasoline, natural gas, wood, or coal burn incompletely.

**Sources**:
* Vehicle emissions (especially in traffic-congested areas)

* Gas-powered appliances and generators

* Industrial processes

* Burning of biomass and wildfires

* Poorly ventilated stoves and heaters in homes

In [None]:
# Calculate the average CO per year
average_co_by_year = df.groupby('Year')['CO (mg/m³)'].mean()

# Create the combined column and line chart
plt.figure(figsize=(15, 7))

# Column chart for CO
plt.bar(average_co_by_year.index, average_co_by_year.values, color='lightgrey', label='Average CO')

# Line chart for CO trend
plt.plot(average_co_by_year.index, average_co_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=1, label='CO Trend')

plt.xlabel('Year')
plt.ylabel('Average CO (mg/m³)')
plt.title('Carbon Monoxide Trends in China')
plt.legend(loc="upper left", frameon=False)
plt.grid(False)
plt.tight_layout()
plt.savefig('Carbon Monoxide Trends In China.png')
plt.show()


## Ozone

**Ozone (O₃)** at ground level is a harmful air pollutant, despite being beneficial in the upper atmosphere where it protects us from ultraviolet (UV) radiation.

Ground-level ozone is not emitted directly; it forms when sunlight triggers chemical reactions between nitrogen oxides (NOₓ) and volatile organic compounds (VOCs).

**Sources (Indirect):**

* Vehicle and industrial emissions (NOₓ and VOCs)

* Gasoline vapors

* Chemical solvents

* Power plants

**Health Effects:**

Ground-level ozone is a powerful respiratory irritant and can:

* Cause coughing, throat irritation, and chest pain

* Worsen asthma and other lung diseases

* Reduce lung function, especially in children and the elderly

* Increase the risk of respiratory infections

**Environmental Impact:**

* Damages crops and vegetation, reducing agricultural productivity

* Harms forests by affecting leaf structure and photosynthesis

* Contributes to the formation of photochemical smog, especially in urban areas

In [None]:
# Calculate the average O3 per year
average_o3_by_year = df.groupby('Year')['O3 (µg/m³)'].mean()

# Create the combined column and line chart
plt.figure(figsize=(15, 7))

# Column chart for O3
plt.bar(average_o3_by_year.index, average_o3_by_year.values, color='lightblue', label='Average O3')

# Line chart for O3 trend
plt.plot(average_o3_by_year.index, average_o3_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=1, label='O3 Trend')

plt.xlabel('Year')
plt.ylabel('Average O3 (µg/m³)')
plt.title('Ozone Trends in China')
plt.legend(loc="upper left", frameon=False)
plt.grid(False)
plt.tight_layout()
plt.savefig('Ozone Trends In China.png')
plt.show()


# **Breakdown in Meteorological Factors**

## Temperature





In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Function to plot temperature with trend line based on city and season
def plot_temperature(city, season):
    clear_output(wait=True)  # Clear previous plot

    df_filtered = df.copy()

    if city != 'All':
        df_filtered = df_filtered[df_filtered['City'] == city]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_temperature_by_year = df_filtered.groupby('Year')['Temperature (°C)'].mean()

    plt.figure(figsize=(18, 7))
    plt.bar(average_temperature_by_year.index, average_temperature_by_year.values, color='#197278', label='Average Temperature')
    plt.plot(average_temperature_by_year.index, average_temperature_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=2, label='Temperature Trend')

    plt.xlabel('Year')
    plt.ylabel('Average Temperature (°C)')
    plt.title(f'Temperature Trends in China ({city}, {season})')
    plt.legend(loc="best", frameon=False)
    plt.grid(False)
    plt.savefig('Temperature Trends In China.png')
    plt.show()


# Create dropdown widgets
city_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['City'].unique().tolist()),
    value='All',
    description='City:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_temperature(city_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
city_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(city_dropdown, season_dropdown, output)

# Initial plot (all cities, all seasons)
with output:
    plot_temperature('All', 'All')


## Humidity

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Function to plot humidity with trend line based on city and season
def plot_humidity(city, season):
    clear_output(wait=True)  # Clear previous plot

    df_filtered = df.copy()

    if city != 'All':
        df_filtered = df_filtered[df_filtered['City'] == city]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_humidity_by_year = df_filtered.groupby('Year')['Humidity (%)'].mean()

    plt.figure(figsize=(18, 7))
    plt.bar(average_humidity_by_year.index, average_humidity_by_year.values, color='skyblue', label='Average Humidity')
    plt.plot(average_humidity_by_year.index, average_humidity_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=2, label='Humidity Trend')

    plt.xlabel('Year')
    plt.ylabel('Average Humidity (%)')
    plt.title(f'Humidity Trends in China ({city}, {season})')
    plt.legend(loc="best", frameon=False)
    plt.grid(False)
    plt.savefig('Humidity Trends In China.png')
    plt.show()


# Create dropdown widgets
city_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['City'].unique().tolist()),
    value='All',
    description='City:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_humidity(city_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
city_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(city_dropdown, season_dropdown, output)

# Initial plot (all cities, all seasons)
with output:
    plot_humidity('All', 'All')


## Wind Speed

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Function to plot Wind Speed (m/s)	 with trend line based on city and season
def plot_ws(city, season):
    clear_output(wait=True)  # Clear previous plot

    df_filtered = df.copy()

    if city != 'All':
        df_filtered = df_filtered[df_filtered['City'] == city]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_ws_by_year = df_filtered.groupby('Year')['Wind Speed (m/s)'].mean()

    plt.figure(figsize=(18, 7))
    plt.bar(average_ws_by_year.index, average_ws_by_year.values, color='#6a0dad', label='Average Wind Speed')
    plt.plot(average_ws_by_year.index, average_ws_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=2, label='Wind Speed Trend')

    plt.xlabel('Year')
    plt.ylabel('Average Wind Speed (m/s)')
    plt.title(f'Wind Speed Trends in China ({city}, {season})')
    plt.legend(loc="best", frameon=False)
    plt.grid(False)
    plt.savefig('Wind Speed Trends In China.png')
    plt.show()


# Create dropdown widgets
city_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['City'].unique().tolist()),
    value='All',
    description='City:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_ws(city_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
city_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(city_dropdown, season_dropdown, output)

# Initial plot (all cities, all seasons)
with output:
    plot_ws('All', 'All')


## Wind Direction

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Function to plot Wind Direction with a trend line based on city and season
def plot_wd(city, season):
    clear_output(wait=True)  # Clear previous plot

    df_filtered = df.copy()

    if city != 'All':
        df_filtered = df_filtered[df_filtered['City'] == city]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_wd_by_year = df_filtered.groupby('Year')['Wind Direction (°)'].mean()

    plt.figure(figsize=(18, 7))
    plt.bar(average_wd_by_year.index, average_wd_by_year.values, color='#a4ac4c', label='Average Wind Direction')
    plt.plot(average_wd_by_year.index, average_wd_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=2, label='Wind Direction Trend')

    # Add labels inside bars
    for i, (year, value) in enumerate(zip(average_wd_by_year.index, average_wd_by_year.values)):
      plt.text(year, value - 5,  # position text slightly inside the bar
             f'{value:.1f}°',  # formatted label
             ha='center', va='top', color='white', fontsize=10)

    plt.xlabel('Year')
    plt.ylabel('Average Wind Direction (°)')
    plt.title(f'Wind Direction Trends in China ({city}, {season})')
    plt.legend(loc="best", frameon=False)
    plt.grid(False)
    plt.savefig('Wind Direction Trends In China.png')
    plt.show()


# Create dropdown widgets
city_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['City'].unique().tolist()),
    value='All',
    description='City:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_wd(city_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
city_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(city_dropdown, season_dropdown, output)

# Initial plot (all cities, all seasons)
with output:
    plot_wd('All', 'All')


## Pressure

In [None]:
# Calculate the average pressure per city
average_pressure_by_city = df.groupby('City')['Pressure (hPa)'].mean()

# Print each city with its average pressure, rounded to 2 decimal places and with units
for city, pressure in average_pressure_by_city.items():
    print(f"{city}: {pressure:.2f} hPa")


In [None]:
# Calculate the average pressure per season
average_pressure_by_season = df.groupby('Season')['Pressure (hPa)'].mean()

# Print each season with its average pressure, rounded to 2 decimal places and with units
for season, pressure in average_pressure_by_season.items():
    print(f"{season}: {pressure:.2f} hPa")


## Precipitation

In [None]:
# Function to plot precipitation with trend line based on city and season
def plot_precipitation(city, season):
    clear_output(wait=True)  # Clear previous plot

    df_filtered = df.copy()

    if city != 'All':
        df_filtered = df_filtered[df_filtered['City'] == city]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_precipitation_by_year = df_filtered.groupby('Year')['Precipitation (mm)'].mean()

    plt.figure(figsize=(18, 7))
    bars = plt.bar(average_precipitation_by_year.index, average_precipitation_by_year.values, color='#002650', label='Average Precipitation')
    plt.plot(average_precipitation_by_year.index, average_precipitation_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=1, label='Precipitation Trend')

    plt.xlabel('Year')
    plt.ylabel('Average Precipitation (mm)')
    plt.title(f'Precipitation Trends in China ({city}, {season})')
    plt.legend(loc="best", frameon=False)
    plt.grid(False)
    plt.savefig('Precipitation Trends In China.png')
    plt.show()


# Create dropdown widgets
city_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['City'].unique().tolist()),
    value='All',
    description='City:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_precipitation(city_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
city_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(city_dropdown, season_dropdown, output)

# Initial plot (all cities, all seasons)
with output:
    plot_precipitation('All', 'All')


## Visibility

In [None]:
# Function to plot visibility with trend line based on city and season
def plot_visibility(city, season):
    clear_output(wait=True)  # Clear previous plot

    df_filtered = df.copy()

    if city != 'All':
        df_filtered = df_filtered[df_filtered['City'] == city]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_visibility_by_year = df_filtered.groupby('Year')['Visibility (km)'].mean()

    plt.figure(figsize=(18, 7))
    plt.bar(average_visibility_by_year.index, average_visibility_by_year.values, color='lightgreen', label='Average Visibility')
    plt.plot(average_visibility_by_year.index, average_visibility_by_year.values, marker='o', color='darkred', linestyle='--', linewidth=2, label='Visibility Trend')

    plt.xlabel('Year')
    plt.ylabel('Average Visibility (km)')
    plt.title(f'Visibility Trends in China ({city}, {season})')
    plt.legend(loc="best", frameon=False)
    plt.grid(False)
    plt.savefig('Visibility Trends In China.png')
    plt.show()


# Create dropdown widgets
city_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['City'].unique().tolist()),
    value='All',
    description='City:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_visibility(city_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
city_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(city_dropdown, season_dropdown, output)

# Initial plot (all cities, all seasons)
with output:
    plot_visibility('All', 'All')


# **Air Quality Index in China's Big 5 Cities**

**Air Quality Index (AQI)** is a standardized system used to measure and report daily air quality. It helps people understand how clean or polluted the air is and what health effects may be a concern for them.

**What AQI Measures:**

AQI is calculated based on the levels of key air pollutants, including:

* Particulate Matter (PM2.5 and PM10)

* Nitrogen Dioxide (NO₂)

* Sulphur Dioxide (SO₂)

* Carbon Monoxide (CO)

* Ozone (O₃)

**AQI Scale:**

The AQI scale typically ranges from **0 to 500**, with higher values indicating more severe pollution and greater health risks. Here's a general breakdown:





**AQI Range - Air Quality - Health Implications**


---


0–50 : **Good** - Air quality is satisfactory.


---


51–100 : **Moderate** - Acceptable; some pollutants may affect sensitive individuals.


---


101–150 : **Unhealthy for Sensitive Groups** - May cause health effects for children, elderly, and people with respiratory issues.


---


151–200 : **Unhealthy** - Everyone may start to experience health effects.


---


201–300 : **Very Unhealthy** - Health warnings of emergency conditions.


---


301–500 : **Hazardous** - Serious health effects; everyone should avoid outdoor exposure.


---



In [None]:
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display, clear_output

# Assuming 'df' is your DataFrame and it contains 'City', 'Year', 'Season', and 'AQI' columns.

# Function to plot AQI
def plot_aqi(year, season):
    clear_output(wait=True)

    df_filtered = df.copy()
    if year != 'All':
        df_filtered = df_filtered[df_filtered['Year'] == int(year)]
    if season != 'All':
        df_filtered = df_filtered[df_filtered['Season'] == season]

    average_aqi = df_filtered.groupby('City')['AQI'].mean().sort_values()

    plt.figure(figsize=(15, 7))
    plt.barh(average_aqi.index, average_aqi.values, color='#006633')
    plt.xlabel('Average AQI')
    plt.title(f'Air Quality Indicator per City ({year}, {season})')

    # Add labels to the bars
    for index, value in enumerate(average_aqi):
        plt.text(value, index, f'{value:.2f}', va='center')
    plt.savefig('Air Quality Index In China Big 5 Cities.png')
    plt.show()

# Create dropdown widgets
year_dropdown = widgets.Dropdown(
    options=['All'] + sorted([str(x) for x in df['Year'].unique()]),
    value='All',
    description='Year:'
)

season_dropdown = widgets.Dropdown(
    options=['All'] + sorted(df['Season'].unique().tolist()),
    value='All',
    description='Season:'
)

# Create an output widget
output = widgets.Output()

# Define the interactive function
def on_change(change):
    with output:
        plot_aqi(year_dropdown.value, season_dropdown.value)

# Observe the dropdown widgets for changes
year_dropdown.observe(on_change, names='value')
season_dropdown.observe(on_change, names='value')

# Display the widgets and output
display(year_dropdown, season_dropdown, output)

# Initial plot
with output:
    plot_aqi('All', 'All')


# **Insights**

1. All Cities lie in the **201-300 AQI** Range which is **Very Unhealthy** by the International AQI Standards.
2. In 2024, Shanghai City had an AQI of **280.15** and Beijing had an AQI of **227.21** which rank highest and lowest amongst the five cities under study.
3. While Beijing's AQI still places it in the Very Unhealthy cluster on the AQI Scale, it would be the most suitable city to live in compared to the Shanghai, Shenzhen, Chengdu and Guangzhou that had higher AQI scores.