## Omdena: Monitoring & Predicting Air Quality In Mumbai
## Task 1: Data Collection

### Contents of Notebook

Section 1:
- Obtained the coordinates for the 22 air quality stations from WAQI API. 
- Link: https://aqicn.org/api/

Section 2:
- I used the Open-Meteo API to collect weather data for 22 air quality stations using the coordiates gathered from section 1. 
- Link: https://open-meteo.com/en/docs/historical-weather-api

Section 3:
- I explored if the weather coordinates collected from open-meteo api are ganular enough and close to their corresponding air quality measuring station. 

**Visualizations of the Results are here on Tableau Public:**

Note: No account is needed view map visualization.

https://public.tableau.com/app/profile/gerardo.angulo8689/viz/OmdenaAir_Quality_Stations_Relative_To_Weather_Measuring_Coordinates/Sheet1?publish=yes

## Section 1
WAQI API

Objective: In this file I obtained the lat and lon coordinates for each AQ measuring station. At the bottom of this file I created a dateframe with the location name, station ID, and the coordinates to each station. 

I used Eyal Hurvitz's code to acces the API and collect the initail API dictionary.

Saving credentials outside the Jupyter notebook:

https://towardsdatascience.com/store-api-credentials-easily-and-securely-in-jupyter-notebooks-50411e98e81c

In [2]:
from google.colab import files 
import pandas as pd
import requests
import json
import io
import numpy as np

In [3]:
uploaded = files.upload()

Saving credentials.json to credentials.json


In [4]:
file = io.BytesIO(uploaded['credentials.json'])
credentials = json.load(file)

In [5]:
api_token = credentials['api_key']

In [6]:
#Provide an area using to lat-long positions. 
#I used google maps to get the lat-long values. Can be refined.
lat1 = '19.265739'
lng1 = '72.782299'
lat2 = '18.960356'
lng2 = '73.221712'
URL = "https://api.waqi.info/v2/map/bounds?latlng="+lat1+","+lng1+","+lat2+","+lng2+"&networks=all&token=" + api_token
r = requests.get(url = URL)


In [7]:
data = r.json()

In [8]:
#get list of station IDs in Mumbai
station_ids = []
for station in data['data']:
  print(station['uid'], station['station']['name'])
  station_ids.append(station['uid'])

12456 Chhatrapati Shivaji Intl. Airport (T2), Mumbai, India
13713 Chakala-Andheri East, Mumbai, India
13711 Kandivali East, Mumbai, India
13714 Borivali East MPCB, Mumbai, India
13709 Mazgaon, Mumbai, India
13712 Deonar, Mumbai, India
12454 Kurla, Mumbai, India
12455 Vile Parle West, Mumbai, India
7020 Mumbai US Consulate, India (मुंबई अमेरिकी वाणिज्य दूतावास)
13708 Mulund West, Mumbai, India
12462 Khadakpada, Kalyan, India
11898 Nerul, Navi Mumbai, India
13702 Sector-19A Nerul, Navi Mumbai, India
11921 Worli, Mumbai, India
12464 Sion, Mumbai, India
12459 Powai, Mumbai, India
13710 Khindipada-Bhandup West, Mumbai, India
12460 Borivali East, Mumbai, India
13803 Malad West, Mumbai, India
12461 Mahape, Navi Mumbai, India
13715 Bandra Kurla Complex, Mumbai, India
13706 Siddharth Nagar-Worli, Mumbai, India
9143 Pimpleshwar Mandir, Dombivali, Thane, India


In [9]:
#Real time AQI data for all stations in Mumbai
aqi_data = []
for s in station_ids:
  loc_code = s
  URL = "https://api.waqi.info/feed/@{loc_code}/".format(loc_code=loc_code)
  PARAMS = {'token':api_token}
  r = requests.get(url = URL, params = PARAMS)
  data = r.json()
  aqi_data.append(data['data'])

In [10]:
#My Code starts here
#collected coordinates for all 23 air quality stations in Mumbai
name = []
station = []
coordinates = []
#aqs = air quality station
aqs_lat = []
aqs_lon = []

for n in range(0, len(aqi_data)):
  temp_name = aqi_data[n]["city"]["name"]
  temp_coordinates = aqi_data[n]["city"]["geo"]
  temp_stationID = aqi_data[n]["idx"]
  temp_lat = aqi_data[n]["city"]["geo"][0]
  temp_lon = aqi_data[n]["city"]["geo"][1]

  name.append(temp_name)
  coordinates.append(temp_coordinates)
  station.append(temp_stationID)
  aqs_lat.append(temp_lat)
  aqs_lon.append(temp_lon)

In [11]:
#confirming intended data was collected by checking the first 10 
coordinates

[[19.10078, 72.87462],
 [19.11074, 72.86084],
 [19.2058, 72.8682],
 [19.2243333, 72.8658113],
 [18.96702, 72.84214],
 [19.04946, 72.923],
 [19.0863, 72.8888],
 [19.10861, 72.83622],
 [19.072830200195, 72.882606506348],
 [19.175, 72.9419],
 [19.25292, 73.142019],
 [19.008751, 73.01662],
 [19.044, 73.0325],
 [18.9936162, 72.8128113],
 [19.047, 72.8746],
 [19.1375, 72.915056],
 [19.1653323, 72.922099],
 [19.23241, 72.86895],
 [19.19709, 72.82204],
 [19.1135051, 73.008978],
 [19.053536, 72.84643],
 [19.000083, 72.813993],
 [19.192056, 72.9585188]]

In [28]:
aq_station_data = {
    "location": name,
    "station_id": station,
    "latitude": aqs_lat,
    "longitude": aqs_lon
        }

The location name, station id, and coordinates for all 22 AQICN stations in Mumbai

In [29]:
df = pd.DataFrame(aq_station_data)
df.head()

Unnamed: 0,location,station_id,latitude,longitude
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456,19.10078,72.87462
1,"Chakala-Andheri East, Mumbai, India",13713,19.11074,72.86084
2,"Kandivali East, Mumbai, India",13711,19.2058,72.8682
3,"Borivali East MPCB, Mumbai, India",13714,19.224333,72.865811
4,"Mazgaon, Mumbai, India",13709,18.96702,72.84214


In [43]:
df.tail()

Unnamed: 0,location,station_id,latitude,longitude,pairing,type
18,"Malad West, Mumbai, India",13803,19.19709,72.82204,4,air_quality
19,"Mahape, Navi Mumbai, India",12461,19.113505,73.008978,3,air_quality
20,"Bandra Kurla Complex, Mumbai, India",13715,19.053536,72.84643,4,air_quality
21,"Siddharth Nagar-Worli, Mumbai, India",13706,19.000083,72.813993,1,air_quality
22,"Pimpleshwar Mandir, Dombivali, Thane, India",9143,19.192056,72.958519,8,air_quality


In [30]:
df.to_csv("aq_stations.csv")

## Section 2

Open-Meteo API

https://open-meteo.com/en/docs/historical-weather-api

Note: No API required

Objective: 
Using the coordinates for the 22 WAQI measuring stations to collect the correspond weather coordinates to see if this is reliable, ganular weather data.  

In [14]:
weather_lat = []
weather_lon = []

for x in range(0, len(coordinates)):
  try:
    # the coordinates from the aq measuring station
    lat = str(coordinates[x][0])
    lon = str(coordinates[x][1])

    #create url for each station
    api_url = "https://archive-api.open-meteo.com/v1/archive?latitude=" + lat + "&longitude=" + lon + \
            "&start_date=2021-01-01&end_date=2023-03-06& \
            hourly=temperature_2m,rain,windspeed_10m,winddirection_10m&models=best_match&timezone=auto"

    #accessing the api and pulling information
    headers = {"accept": "application/json"}

    response = requests.get(api_url, headers = headers)

    #confirm api pull request is sucessful
    print(f"Number {x}: status_code: {response.status_code}")

    #"open-meteo.com" closest weather coordinate to aq measuring station coordinate
    temp_lat = response.json()["latitude"]
    temp_lon = response.json()["longitude"]

    #add to list
    weather_lat.append(temp_lat)
    weather_lon.append(temp_lon)

  except:
    print(f'The Station with coordinates, {lat}, {lon} did not have data')

Number 0: status_code: 200
Number 1: status_code: 200
Number 2: status_code: 200
Number 3: status_code: 200
Number 4: status_code: 200
Number 5: status_code: 200
Number 6: status_code: 200
Number 7: status_code: 200
Number 8: status_code: 200
Number 9: status_code: 200
Number 10: status_code: 200
Number 11: status_code: 200
Number 12: status_code: 200
Number 13: status_code: 200
Number 14: status_code: 200
Number 15: status_code: 200
Number 16: status_code: 200
Number 17: status_code: 200
Number 18: status_code: 200
Number 19: status_code: 200
Number 20: status_code: 200
Number 21: status_code: 200
Number 22: status_code: 200


In [None]:
#confirming the code above worked properly
weather_lat

[19.0,
 19.099998,
 19.200005,
 19.099998,
 19.200005,
 19.200005,
 19.099998,
 19.099998,
 19.200005,
 19.200005,
 19.300003,
 18.900002,
 19.200005,
 19.200005,
 19.200005,
 19.0,
 19.099998,
 19.099998,
 19.099998,
 19.099998,
 19.200005,
 19.099998]

In [31]:
#added closes weather station coordinates to corresponding aq monitoring station in df
df["weather_lat"] = weather_lat
df["weather_lon"] = weather_lon
df.head()

Unnamed: 0,location,station_id,latitude,longitude,weather_lat,weather_lon
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456,19.10078,72.87462,19.099998,72.90001
1,"Chakala-Andheri East, Mumbai, India",13713,19.11074,72.86084,19.099998,72.90001
2,"Kandivali East, Mumbai, India",13711,19.2058,72.8682,19.200005,72.90001
3,"Borivali East MPCB, Mumbai, India",13714,19.224333,72.865811,19.200005,72.90001
4,"Mazgaon, Mumbai, India",13709,18.96702,72.84214,19.099998,72.90001


## Section 3

### Checking For Duplicate Weather Coordinate 

In the code below I checked how many weather coordinates are unique by using the pandas duplicated function. Essentially the code populates a True if the weather coordinate pair("weather_lat and weather_lon) has appeared before in a different row. 

Pandas duplicated function documentation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html


### Result
There are 8 unique weather coordinates represented by the value "False"

In [32]:
df["is_duplicate"] = df[["weather_lat", "weather_lon"]].duplicated()
df.head()

Unnamed: 0,location,station_id,latitude,longitude,weather_lat,weather_lon,is_duplicate
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456,19.10078,72.87462,19.099998,72.90001,False
1,"Chakala-Andheri East, Mumbai, India",13713,19.11074,72.86084,19.099998,72.90001,True
2,"Kandivali East, Mumbai, India",13711,19.2058,72.8682,19.200005,72.90001,False
3,"Borivali East MPCB, Mumbai, India",13714,19.224333,72.865811,19.200005,72.90001,True
4,"Mazgaon, Mumbai, India",13709,18.96702,72.84214,19.099998,72.90001,True


In [33]:
unique_latitude = []
unique_longitude = []
unique_pairing = []
pairing_label = 0

for j in range(0, len(df)):
  if df["is_duplicate"][j] == False:
    print(f'row {j}: Has unique Weather coordinates')
    unique_lat = df["weather_lat"][j]
    unique_lon = df["weather_lon"][j]
    pairing_label += 1
    
    unique_pairing.append(pairing_label)
    unique_latitude.append(unique_lat)
    unique_longitude.append(unique_lon)
  
  else:
    unique_pairing.append(0)

print('----------------------------------------------------------')
print(f'{unique_latitude}')
print(f'{unique_longitude}')
print(f'{unique_pairing}')

row 0: Has unique Weather coordinates
row 2: Has unique Weather coordinates
row 5: Has unique Weather coordinates
row 7: Has unique Weather coordinates
row 10: Has unique Weather coordinates
row 11: Has unique Weather coordinates
row 14: Has unique Weather coordinates
row 22: Has unique Weather coordinates
----------------------------------------------------------
[19.099998, 19.200005, 19.099998, 19.200005, 19.300003, 19.0, 18.900002, 19.200005]
[72.90001, 72.90001, 73.0, 72.8, 73.100006, 73.0, 73.0, 73.0]
[1, 0, 2, 0, 0, 3, 0, 4, 0, 0, 5, 6, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 8]


In [34]:
df["pairing"] = unique_pairing
df.head()

Unnamed: 0,location,station_id,latitude,longitude,weather_lat,weather_lon,is_duplicate,pairing
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456,19.10078,72.87462,19.099998,72.90001,False,1
1,"Chakala-Andheri East, Mumbai, India",13713,19.11074,72.86084,19.099998,72.90001,True,0
2,"Kandivali East, Mumbai, India",13711,19.2058,72.8682,19.200005,72.90001,False,2
3,"Borivali East MPCB, Mumbai, India",13714,19.224333,72.865811,19.200005,72.90001,True,0
4,"Mazgaon, Mumbai, India",13709,18.96702,72.84214,19.099998,72.90001,True,0


In [35]:
temp_list = df["pairing"].to_list()

for k in range(0, len(df)):  
  if df["is_duplicate"][k] == True:
    for n in range(0, len(df)):
      if df["weather_lat"][k] == df["weather_lat"][n] and df["weather_lon"][k] == df["weather_lon"][n]:
        temp_list[k] = df["pairing"][n]
        break

df["pairing"] = temp_list
df.head()

Unnamed: 0,location,station_id,latitude,longitude,weather_lat,weather_lon,is_duplicate,pairing
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456,19.10078,72.87462,19.099998,72.90001,False,1
1,"Chakala-Andheri East, Mumbai, India",13713,19.11074,72.86084,19.099998,72.90001,True,1
2,"Kandivali East, Mumbai, India",13711,19.2058,72.8682,19.200005,72.90001,False,2
3,"Borivali East MPCB, Mumbai, India",13714,19.224333,72.865811,19.200005,72.90001,True,2
4,"Mazgaon, Mumbai, India",13709,18.96702,72.84214,19.099998,72.90001,True,1


In [None]:
part_1 = df.loc[(df["is_duplicate"] == False), ("weather_lat", "weather_lon", "pairing")]
df = df.drop(axis=1, columns= ["weather_lat", "weather_lon", "is_duplicate"])
part_1["type"] = "weather"
part_1 = part_1.rename(columns={"weather_lat":"latitude", "weather_lon":"longitude"})
part_1

In [40]:
part_1

Unnamed: 0,latitude,longitude,pairing,type
0,19.099998,72.90001,1,weather
2,19.200005,72.90001,2,weather
5,19.099998,73.0,3,weather
7,19.200005,72.8,4,weather
10,19.300003,73.100006,5,weather
11,19.0,73.0,6,weather
14,18.900002,73.0,7,weather
22,19.200005,73.0,8,weather


In [38]:
df["type"] = "air_quality"
df.head()

Unnamed: 0,location,station_id,latitude,longitude,pairing,type
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456,19.10078,72.87462,1,air_quality
1,"Chakala-Andheri East, Mumbai, India",13713,19.11074,72.86084,1,air_quality
2,"Kandivali East, Mumbai, India",13711,19.2058,72.8682,2,air_quality
3,"Borivali East MPCB, Mumbai, India",13714,19.224333,72.865811,2,air_quality
4,"Mazgaon, Mumbai, India",13709,18.96702,72.84214,1,air_quality


In [41]:
final_coordiates = df.merge(part_1, how="outer")
final_coordiates

Unnamed: 0,location,station_id,latitude,longitude,pairing,type
0,"Chhatrapati Shivaji Intl. Airport (T2), Mumbai...",12456.0,19.10078,72.87462,1,air_quality
1,"Chakala-Andheri East, Mumbai, India",13713.0,19.11074,72.86084,1,air_quality
2,"Kandivali East, Mumbai, India",13711.0,19.2058,72.8682,2,air_quality
3,"Borivali East MPCB, Mumbai, India",13714.0,19.224333,72.865811,2,air_quality
4,"Mazgaon, Mumbai, India",13709.0,18.96702,72.84214,1,air_quality
5,"Deonar, Mumbai, India",13712.0,19.04946,72.923,3,air_quality
6,"Kurla, Mumbai, India",12454.0,19.0863,72.8888,1,air_quality
7,"Vile Parle West, Mumbai, India",12455.0,19.10861,72.83622,4,air_quality
8,"Mumbai US Consulate, India (मुंबई अमेरिकी वाणि...",7020.0,19.07283,72.882607,1,air_quality
9,"Mulund West, Mumbai, India",13708.0,19.175,72.9419,2,air_quality


In [42]:
final_coordiates.to_csv("aq_stations&_weather_stations.csv")

# Results:

There are 8 unique weather measuring coordinates in the open-meteon.com API, for the air quality stations. 

Results are visualized on tableau public here: No account is needed to view.

https://public.tableau.com/app/profile/gerardo.angulo8689/viz/OmdenaAir_Quality_Stations_Relative_To_Weather_Measuring_Coordinates/Sheet1?publish=yes

