<a href="https://colab.research.google.com/github/Iamcuriousity/Alokita-Jha/blob/main/Assignment_7_3_Alokita_Jha.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

About the Dataset
Spatial AQ is a pan-India air quality dataset derived using satellite data, CPCB monitors and machine learning models. It offers a comprehensive view of air quality over the Indian Subcontinent, as satellites can cover large areas and track the movement of pollutants over time. Spatial AQ is India's most extensive air quality data, with 3.3 million points that our model estimates daily compared to ~ 300 ground sensors.

The attributes of the dataset are as follows –
•  Latitude: The longitude of the location.
•  Longitude: The latitude of the location.
•  Datetime: The date for which the data is reported, ranging from 2022-01-01 to 2023-08-01.
•  PM2.5: Estimated PM2.5 concentration for the given coordinates.



This assignment is divided into following parts    Get API token to access data from Climate Data Hub i.e https://data.blueskyhq.io/ . (5)
2.   Search for spatial air quality from fires dataset and download data for any region. (10)
3.   The data has been available from January 2021 to present.
4.   Perform initial exploratory data analysis cleaning, missing dates etc. (5)
5.   Analyze and compare trends for the last 2 years of PM2.5 concentration across different locations. (20)
6.   Study and analyze the causes for poor air quality. (20)
7.   Perform comparison of PM2.5 on major cities of your choice (Pick at least 5 cities).(20)
8.   Calculate number of good air quality days (PM2.5 < 60 ug) for different locations.(20)
9.   Visually check your findings on the dashboard https://spacetime.blueskyhq.io/ .



Five locations are
1) **Anand vihar, Delhi**
2) **Alipur, delhi **
3) **Adarsh Nagar, Jaipur **
4) **Andheri Mumbai**
5) **Ludhiana**.
First 5 steps are done below. **I have used plotly for graph to make the graphs more interactive.** Analysis is done at the last.

# Spatial Air Quality

## Import Libraries

In [None]:
import pandas as pd
import numpy as np
import requests
import urllib.parse
from pprint import pprint
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
plt.rcParams["font.family"] = "monospace"
%matplotlib inline

## Get estimated PM2.5 for a given location using its coordinates

**Step 1: Import Required Libraries**

- After importing the necessary libraries, you will be able to fetch the PM2.5 of the given region.

**Step 2: Obtain an API Key**
   - To access our APIs, you must obtain an API key.
   - Sign up at `www.data.blueskyhq.io` to receive an API key with the required permissions attached.

**Step 3: Identify the Region of Interest**
   - Determine which region you want to retrieve data for.
   - You will need the latitude and longitude for the corresponding region.

**Step 4: Replace the Coordinates**
   - Once you have the latitude and longitude, replace the default coordinates in the code with the one you've obtained.

**Step 5: Set the Date Range**
   - Define the date range for the measurements you need. Specify the start and end dates accordingly.

**Step 6: Run the Code**
   - Execute the code provided, and it will fetch the PM2.5 concentration data for the specified region within the defined date range.

By following these steps, you can effectively access the PM2.5 data for your chosen region using the API key you obtained.

## Get the actual PM2.5 from ground sensor of the same location

**Step 1: Import Required Libraries**

- After importing the necessary libraries, you will be able to fetch the PM2.5 concentration of the given ground monitor.

**Step 2: Obtain an API Key**
   - To access our APIs, you must obtain an API key.
   - Sign up at `www.data.blueskyhq.io` to receive an API key with the required permissions attached.

**Step 3: Identify the Ground Monitor**
   - Determine which ground monitor you want to retrieve data for.
   - You will need the **asset ID** for the corresponding ground monitor.
   - To find the asset ID, search for the monitor's name on `www.spacetime.blueskyhq.io/asset-explorer`.
   - An asset ID is a unique identifier for the specific asset, such as a lake or a power plant.

**Step 4: Replace the Asset ID**
   - Once you have the asset ID, replace the default asset ID in the code with the one you've obtained.

**Step 5: Set the Date Range**
   - Define the date range for the measurements you need. Specify the start and end dates accordingly.

**Step 6: Run the Code**
   - Execute the code provided, and it will fetch the air quality data for the specified ground monitor within the defined date range.

By following these steps, you can effectively access the air quality data for your chosen lake using the API key you obtained.

## Plot the response as a time series graph

In [7]:
pip install plotly




In [8]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Your existing code for API request...

# Modify the asset ID for Anand Vihar, Delhi
ASSET_ID_ANAND_VIHAR = "0ffec645-06ff-4c52-a2cd-8bf8f72b6a59"
API_ANAND_VIHAR = f"https://gateway.blueskyhq.io/api/bam-air-quality/{ASSET_ID_ANAND_VIHAR}"

# Modify other parameters if needed...
# For example, you might want to adjust the title
TITLE_ANAND_VIHAR = "Anand Vihar, Delhi"

# Modify start and end dates for a 2-year period
START_DATE = "2021-01-01T00:00:00.000Z"
END_DATE = "2022-12-31T23:59:59.999Z"

PARAMS = {
    "start_date": START_DATE,
    "end_date": END_DATE,
    "time_bucket": "1d"
}

# api request for Anand Vihar
response_anand_vihar = requests.get(API_ANAND_VIHAR, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))
if response_anand_vihar.status_code == 200:
    # convert response json to dataframe
    df_anand_vihar = pd.DataFrame(response_anand_vihar.json()["data"])
    # datetime string to datetime object
    df_anand_vihar["datetime"] = pd.to_datetime(df_anand_vihar["datetime"])
    df_anand_vihar['PM25'] = df_anand_vihar['pm25'].astype(float)

    # Plotting with Plotly
    fig = px.line(df_anand_vihar, x='datetime', y='PM25', title=TITLE_ANAND_VIHAR,
                  labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})

    # Show the plot
    fig.show()

else:
    print(response_anand_vihar.text)



- **[Tutorial on how to calculate Air Quality Index (AQI) from concentration data](https://www.kaggle.com/code/rohanrao/calculating-aqi-air-quality-index-tutorial)**

In [11]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Your existing code for API request...

# Modify the asset ID for Alipur, Delhi
ASSET_ID_ALIPUR_DELHI = "b3871b19-3b54-4a3d-b509-85dadeba1c6a"
API_ALIPUR_DELHI = f"https://gateway.blueskyhq.io/api/bam-air-quality/{ASSET_ID_ALIPUR_DELHI}"

# Modify other parameters if needed...
# For example, you might want to adjust the title
TITLE_ALIPUR_DELHI = "Alipur, Delhi"

# Modify start and end dates for the specified period (January 2021 to January 2023)
START_DATE = "2021-01-01T00:00:00.000Z"
END_DATE = "2023-01-31T23:59:59.999Z"

PARAMS = {
    "start_date": START_DATE,
    "end_date": END_DATE,
    "time_bucket": "1d"
}

# api request for Alipur, Delhi
response_alipur_delhi = requests.get(API_ALIPUR_DELHI, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))
if response_alipur_delhi.status_code == 200:
    # convert response json to dataframe
    df_alipur_delhi = pd.DataFrame(response_alipur_delhi.json()["data"])
    # datetime string to datetime object
    df_alipur_delhi["datetime"] = pd.to_datetime(df_alipur_delhi["datetime"])
    df_alipur_delhi['PM25'] = df_alipur_delhi['pm25'].astype(float)

    # Plotting with Plotly
    fig = px.line(df_alipur_delhi, x='datetime', y='PM25', title=TITLE_ALIPUR_DELHI,
                  labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})

    # Show the plot
    fig.show()

else:
    print(response_alipur_delhi.text)



In [23]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Your existing code for API request...

# Modify the asset ID for Adarsh Nagar, Jaipur
ASSET_ID_ADARSH_NAGAR_JAIPUR = "95613137-3967-43da-a7a0-0f14d9bf8ed6"
API_ADARSH_NAGAR_JAIPUR = f"https://gateway.blueskyhq.io/api/bam-air-quality/{ASSET_ID_ADARSH_NAGAR_JAIPUR}"

# Modify other parameters if needed...
# For example, you might want to adjust the title
TITLE_ADARSH_NAGAR_JAIPUR = "Adarsh Nagar, Jaipur"

# api request for Adarsh Nagar, Jaipur
response_adarsh_nagar_jaipur = requests.get(API_ADARSH_NAGAR_JAIPUR, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))
if response_adarsh_nagar_jaipur.status_code == 200:
    # convert response json to dataframe
    df_adarsh_nagar_jaipur = pd.DataFrame(response_adarsh_nagar_jaipur.json()["data"])
    # datetime string to datetime object
    df_adarsh_nagar_jaipur["datetime"] = pd.to_datetime(df_adarsh_nagar_jaipur["datetime"])
    df_adarsh_nagar_jaipur['PM25'] = df_adarsh_nagar_jaipur['pm25'].astype(float)

    # Plotting with Plotly
    fig = px.line(df_adarsh_nagar_jaipur, x='datetime', y='PM25', title=TITLE_ADARSH_NAGAR_JAIPUR,
                  labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})

    # Show the plot
    fig.show()

else:
    print(response_adarsh_nagar_jaipur.text)




In [24]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Your existing code for API request...

# Modify the asset ID for Andheri, Mumbai
ASSET_ID_ANDHERI_MUMBAI = "86ef652e-5b8e-4576-83cc-019b219ce840"
API_ANDHERI_MUMBAI = f"https://gateway.blueskyhq.io/api/bam-air-quality/{ASSET_ID_ANDHERI_MUMBAI}"

# Modify other parameters if needed...
# For example, you might want to adjust the title
TITLE_ANDHERI_MUMBAI = "Andheri, Mumbai"

# api request for Andheri, Mumbai
response_andheri_mumbai = requests.get(API_ANDHERI_MUMBAI, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))
if response_andheri_mumbai.status_code == 200:
    # convert response json to dataframe
    df_andheri_mumbai = pd.DataFrame(response_andheri_mumbai.json()["data"])
    # datetime string to datetime object
    df_andheri_mumbai["datetime"] = pd.to_datetime(df_andheri_mumbai["datetime"])
    df_andheri_mumbai['PM25'] = df_andheri_mumbai['pm25'].astype(float)

    # Plotting with Plotly
    fig = px.line(df_andheri_mumbai, x='datetime', y='PM25', title=TITLE_ANDHERI_MUMBAI,
                  labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})

    # Show the plot
    fig.show()

else:
    print(response_andheri_mumbai.text)


In [25]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Modify the asset ID for Punjab Agricultural University, Ludhiana
ASSET_ID_PUNJAB_AGRICULTURAL_UNIVERSITY_LUDHIANA = "c1efbf9f-bc19-4fee-b6b7-56c44627a879"
API_PUNJAB_AGRICULTURAL_UNIVERSITY_LUDHIANA = f"https://gateway.blueskyhq.io/api/bam-air-quality/{ASSET_ID_PUNJAB_AGRICULTURAL_UNIVERSITY_LUDHIANA}"

# Modify other parameters if needed...
# For example, you might want to adjust the title
TITLE_PUNJAB_AGRICULTURAL_UNIVERSITY_LUDHIANA = "Punjab Agricultural University, Ludhiana"

# api request for Punjab Agricultural University, Ludhiana
response_punjab_agricultural_university_ludhiana = requests.get(API_PUNJAB_AGRICULTURAL_UNIVERSITY_LUDHIANA, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))
if response_punjab_agricultural_university_ludhiana.status_code == 200:
    # convert response json to dataframe
    df_punjab_agricultural_university_ludhiana = pd.DataFrame(response_punjab_agricultural_university_ludhiana.json()["data"])
    # datetime string to datetime object
    df_punjab_agricultural_university_ludhiana["datetime"] = pd.to_datetime(df_punjab_agricultural_university_ludhiana["datetime"])
    df_punjab_agricultural_university_ludhiana['PM25'] = df_punjab_agricultural_university_ludhiana['pm25'].astype(float)

    # Plotting with Plotly
    fig = px.line(df_punjab_agricultural_university_ludhiana, x='datetime', y='PM25', title=TITLE_PUNJAB_AGRICULTURAL_UNIVERSITY_LUDHIANA,
                  labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})

    # Show the plot
    fig.show()

else:
    print(response_punjab_agricultural_university_ludhiana.text)


In [27]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Modify the asset ID for Chennai
ASSET_ID_CHENNAI = "fde4e673-691e-4936-b16a-d6172c1b75f7"
API_CHENNAI = f"https://gateway.blueskyhq.io/api/bam-air-quality/{ASSET_ID_CHENNAI}"

# Modify other parameters if needed...
# For example, you might want to adjust the title
TITLE_CHENNAI = "Chennai"

# api request for Chennai
response_chennai = requests.get(API_CHENNAI, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))
if response_chennai.status_code == 200:
    # convert response json to dataframe
    df_chennai = pd.DataFrame(response_chennai.json()["data"])
    # datetime string to datetime object
    df_chennai["datetime"] = pd.to_datetime(df_chennai["datetime"])
    df_chennai['PM25'] = df_chennai['pm25'].astype(float)

    # Plotting with Plotly
    fig = px.line(df_chennai, x='datetime', y='PM25', title=TITLE_CHENNAI,
                  labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})

    # Show the plot
    fig.show()

else:
    print(response_chennai.text)


# **Number of Good Air Quality Days**

In [31]:
import plotly.express as px
import pandas as pd
import requests
import urllib.parse

# Define the asset IDs and API endpoints for each location
locations = [
    {"name": "Ludhiana", "asset_id": "c1efbf9f-bc19-4fee-b6b7-56c44627a879"},
    {"name": "Anand Vihar, Delhi", "asset_id": "0ffec645-06ff-4c52-a2cd-8bf8f72b6a59"},
    {"name": "Chennai", "asset_id": "fde4e673-691e-4936-b16a-d6172c1b75f7"}
]

# Modify start and end dates for the specified period
START_DATE = "2021-01-01T00:00:00.000Z"
END_DATE = "2023-01-31T23:59:59.999Z"

PARAMS = {
    "start_date": START_DATE,
    "end_date": END_DATE,
    "time_bucket": "1d"
}

# Function to calculate and print the number of good air quality days for a location
def calculate_good_air_quality_days(location):
    asset_id = location["asset_id"]
    api_endpoint = f"https://gateway.blueskyhq.io/api/bam-air-quality/{asset_id}"

    # API request for the location
    response = requests.get(api_endpoint, headers=HEADERS, params=urllib.parse.urlencode(PARAMS))

    if response.status_code == 200:
        # Convert response json to dataframe
        df = pd.DataFrame(response.json()["data"])
        # Datetime string to datetime object
        df["datetime"] = pd.to_datetime(df["datetime"])
        df['PM25'] = df['pm25'].astype(float)

        # Calculate the number of good air quality days
        good_air_quality_days = df[df['PM25'] < 60]
        num_good_days = len(good_air_quality_days)

        # Plotting with Plotly
        fig = px.line(df, x='datetime', y='PM25', title=f"{location['name']} Air Quality",
                      labels={'datetime': 'Date', 'PM25': 'PM2.5 (ugm-3)'})
        # Show the plot
        fig.show()

        print(f"Number of good air quality days in {location['name']}: {num_good_days}")

    else:
        print(f"Error fetching data for {location['name']}: {response.text}")

# Loop through each location and calculate/print the number of good air quality days
for location in locations:
    calculate_good_air_quality_days(location)


Number of good air quality days in Ludhiana: 238


Number of good air quality days in Anand Vihar, Delhi: 182


Number of good air quality days in Chennai: 557


#Number of Good Air Quality Days in Different Locations:
According to the provided data, the number of good air quality days (PM2.5 < 60 µg/m³) for selected locations is as follows:

**Ludhiana: 238 days**
**Anand Vihar: 182 days**
**Chennai: 557 days **
This data indicates that Delhi had the fewest good air quality days among the mentioned locations, while Chennai had the highest number of good air quality days. The results underscore the variation in air quality across different regions.








# **Analysis**


**Causes for Poor Air Quality in Delhi and Nearby Areas:**
   Based on the above analysis and graphs, the air quality seems to worse in delhi as compared to other cities such as chennai , as well as the number of good air quality days are very less in delhi, espcially there is increase in pm 2.5 concentrarion during winter as seen from the graphs.
   
 The causes for poor air quality in Delhi and nearby areas are multifaceted and stem from a combination of demographic, industrial, and environmental factors:

   - **Unplanned Development:** The region's development has been largely unplanned, leading to the coexistence of industrial units emitting harmful chemicals in residential and commercial areas.
   
   - **Vehicular Traffic:** Despite the presence of the Delhi metro, increased vehicular traffic remains a significant contributor to air and noise pollution.
   
   - **Solid Waste Management:** The daily generation of solid waste in Delhi, coupled with inadequate waste management, results in garbage piling up, including hazardous waste from industries.
   
   - **Fossil Fuel Dependence:** The high dependence on fossil fuels contributes to the emission of harmful gases into the atmosphere.
   
   - **Construction Activities:** Large-scale construction activities contribute to dust pollution, accounting for a significant portion of PM10 and PM2.5 load.
   
   - **Geographical Factors:** Delhi's landlocked geography, obstructed escape route for air, and north-westerly winds bringing dust contribute to increased pollution. During winters, low-level inversion exacerbates the situation.
   
   - **Stubble Burning:** The practice of stubble burning in Punjab, Haryana, and Rajasthan during winter months adds to air pollution, releasing significant quantities of greenhouse gases.
