**AIR QUALITY MONITORING**

**Background**

The objective of this project is to monitor and analyze the air quality trend in our environment specifically the Southeast Calgary area. Air quality readings were sourced from Calgary Region Airshed Zone (CRAZ) for the last 10 years (from 2014 - 2024) from the Calgary SE & Inglewood Craz's monitoring station.

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
%cd /content/drive/MyDrive/engg680_2024_fall/Group_18_Project

/content/drive/MyDrive/engg680_2024_fall/Group_18_Project


In [7]:
import pandas as pd
import glob
import os

# Defining  the correct file path (relative to the python ipnyb file location
file_path = "/content/drive/MyDrive/engg680_2024_fall/Group_18_Project/craz air data"  # Using the relative path to the folder to extract all csv files
file_names = glob.glob(os.path.join(file_path, "*.csv"))  # Get all CSV files in the folder

if not file_names:
    print("No files found in the specified directory. Check the folder and try again.")
else:
    all_data = []
    for file in file_names:
        try:
            # Read each file, combining the 2nd and 3rd rows as column headers. This would be done to help track Calgry NG and Calgry SE
            df = pd.read_csv(file, header=[1, 2])  # Use multi-level header (rows 2 and 3)

            # Extract the year from the file name (e.g., "2014 hourly.csv")
            year = os.path.basename(file).split()[0]  # Extract the year from the file name
            df["Year"] = int(year)  # Add the year as a column

            # Flatten the multi-level column headers into single strings
            df.columns = ["_".join(filter(None, col)).strip() for col in df.columns]

            # Append the processed DataFrame to the list
            all_data.append(df)
        except Exception as e:
            print(f"Error processing file {file}: {e}")
            continue

    # Combine all files into a single DataFrame
    if all_data:
        merged_data = pd.concat(all_data, ignore_index=True)

        # Display the DataFrame in the notebook
        print("Data merged successfully. Displaying first 5 rows:")
        display(merged_data.head())  # Use display() to show the DataFrame in the notebook

        # Saving to a CSV file just to be sure of all the rows and columns not missing out.
        merged_data.to_csv("Merged_Air_Quality_Data.csv", index=False)
        print("Merged data saved as 'Merged_Air_Quality_Data.csv'.")
    else:
        print("No data could be processed.")

Data merged successfully. Displaying first 5 rows:


Unnamed: 0,Unnamed: 0_level_0_Unnamed: 0_level_1,Unnamed: 1_level_0_Unnamed: 1_level_1,CalIng_ NO,CalIng_ NO2,CalIng_ NOX,CalIng_ CO,CalIng_ O3,CalIng_ PM2.5,CalIng_ PM2.5.1,CalIng_ PM2.5S,...,CalVar_ NO2,CalVar_ NOX,CalVar_ CO,CalVar_ O3,CalVar_ PM2.5,CalVar_ PM2.5S,CalVar_ RH,CalVar_ WS,CalVar_ WD,Year
0,,,Ave,Ave,Ave,Ave,Ave,Ave,Ave,Ave,...,Ave,Ave,Ave,Ave,Ave,Ave,Ave,Ave,Ave,2014
1,,,,,,,,,,,...,,,,,,,,,,2014
2,,,,,,,,,,,...,,,,,,,,,,2014
3,,,BC,BC,BC,BC,BC,BC,BC,BC,...,BC,BC,BC,BC,BC,BC,BC,BC,BC,2014
4,1.0,01/01/2014 0:00,0.01426,0.03088,0.04524,0.326,0.00316,16.267,MS,MS,...,15.214,15.841,0.244,15.354,11.945,MS,96.434,8.809,301.803,2014


Merged data saved as 'Merged_Air_Quality_Data.csv'.


Data Cleaning
Step 2: Perform EDA - rename and drop some columns not required for the analysis

In [8]:
# Basic Analysis of the Merged Dataset

# Total number of rows and columns
total_rows, total_columns = merged_data.shape
print(f"Total Rows: {total_rows}, Total Columns: {total_columns}")


Total Rows: 95012, Total Columns: 36


In [10]:
# Column names in the dataset
print("\nColumn Names:")
print(merged_data.columns.tolist())


Column Names:
['Unnamed: 0_level_0_Unnamed: 0_level_1', 'Unnamed: 1_level_0_Unnamed: 1_level_1', 'CalIng_ NO', 'CalIng_ NO2', 'CalIng_ NOX', 'CalIng_ CO', 'CalIng_ O3', 'CalIng_ PM2.5', 'CalIng_ PM2.5.1', 'CalIng_ PM2.5S', 'CalIng_ ET', 'CalIng_ RH', 'CalIng_ WS', 'CalIng_ WD', 'CalSE_ SO2', 'CalSE_ NO', 'CalSE_ NO2', 'CalSE_ NOX', 'CalSE_ CO', 'CalSE_ O3', 'CalSE_ PM2.5', 'CalSE_ ET', 'CalSE_ RH', 'CalSE_ WS', 'CalSE_ WD', 'CalVar_ NO', 'CalVar_ NO2', 'CalVar_ NOX', 'CalVar_ CO', 'CalVar_ O3', 'CalVar_ PM2.5', 'CalVar_ PM2.5S', 'CalVar_ RH', 'CalVar_ WS', 'CalVar_ WD', 'Year']
