## Omdena: Monitoring & Predicting Air Quality In Mumbai
## Task 1: Data Collection

### Contents of Notebook

Section 1:
- Obtained the coordinates for the 23 air quality stations from WAQI API. 
- Link: https://aqicn.org/api/

Section 2:
- I used the Open-Meteo API to collect weather data for 23 air quality stations using the coordiates gathered from section 1. 
- Link: https://open-meteo.com/en/docs/historical-weather-api

Section 3:
- I explored if the weather coordinates collected from open-meteo api are ganular enough and close to their corresponding air quality measuring station. 

**Visualizations of the Results are here on Tableau Public:**

Note: No account is needed view map visualization.

https://public.tableau.com/app/profile/gerardo.angulo8689/viz/OmdenaAir_Quality_Stations_Relative_To_Weather_Measuring_Coordinates/Sheet1?publish=yes

## Section 1
WAQI API

Objective: In this file I obtained the lat and lon coordinates for each AQ measuring station. At the bottom of this file I created a dateframe with the location name, station ID, and the coordinates to each station. 

I used Nirmala's code to acces the API and collect the initail API dictionary.

Saving credentials outside the Jupyter notebook:

https://towardsdatascience.com/store-api-credentials-easily-and-securely-in-jupyter-notebooks-50411e98e81c

In [46]:
from google.colab import files 
import pandas as pd
import requests
import json
import io
import numpy as np

In [47]:
uploaded = files.upload()

Saving credentials.json to credentials.json


In [48]:
file = io.BytesIO(uploaded['credentials.json'])
credentials = json.load(file)

In [49]:
api_token = credentials['api_key']

In [50]:
#Provide an area using to lat-long positions. 
#I used google maps to get the lat-long values. Can be refined.
lat1 = '19.265739'
lng1 = '72.782299'
lat2 = '18.960356'
lng2 = '73.221712'
URL = "https://api.waqi.info/v2/map/bounds?latlng="+lat1+","+lng1+","+lat2+","+lng2+"&networks=all&token=" + api_token
r = requests.get(url = URL)


In [51]:
data = r.json()

In [5]:
#get list of station IDs in Mumbai
station_ids = []
for station in data['data']:
  print(station['uid'], station['station']['name'])
  station_ids.append(station['uid'])

13702 Sector-19A Nerul, Navi Mumbai, India
12461 Mahape, Navi Mumbai, India
13715 Bandra Kurla Complex, Mumbai, India
12459 Powai, Mumbai, India
12460 Borivali East, Mumbai, India
13710 Khindipada-Bhandup West, Mumbai, India
13709 Mazgaon, Mumbai, India
11921 Worli, Mumbai, India
12455 Vile Parle West, Mumbai, India
9143 Pimpleshwar Mandir, Dombivali, Thane, India
12462 Khadakpada, Kalyan, India
12464 Sion, Mumbai, India
13711 Kandivali East, Mumbai, India
13708 Mulund West, Mumbai, India
13803 Malad West, Mumbai, India
11898 Nerul, Navi Mumbai, India
13706 Siddharth Nagar-Worli, Mumbai, India
7020 Mumbai US Consulate, India (मुंबई अमेरिकी वाणिज्य दूतावास)
12454 Kurla, Mumbai, India
12456 Chhatrapati Shivaji Intl. Airport (T2), Mumbai, India
13714 Borivali East MPCB, Mumbai, India
13712 Deonar, Mumbai, India


In [52]:
#Real time AQI data for all stations in Mumbai
aqi_data = []
for s in station_ids:
  loc_code = s
  URL = "https://api.waqi.info/feed/@{loc_code}/".format(loc_code=loc_code)
  PARAMS = {'token':api_token}
  r = requests.get(url = URL, params = PARAMS)
  data = r.json()
  aqi_data.append(data['data'])

In [53]:
#My Code starts here
#collected coordinates for all 23 air quality stations in Mumbai
name = []
station = []
coordinates = []
#aqs = air quality station
aqs_lat = []
aqs_lon = []

for n in range(0, len(aqi_data)):
  temp_name = aqi_data[n]["city"]["name"]
  temp_coordinates = aqi_data[n]["city"]["geo"]
  temp_stationID = aqi_data[n]["idx"]
  temp_lat = aqi_data[n]["city"]["geo"][0]
  temp_lon = aqi_data[n]["city"]["geo"][1]

  name.append(temp_name)
  coordinates.append(temp_coordinates)
  station.append(temp_stationID)
  aqs_lat.append(temp_lat)
  aqs_lon.append(temp_lon)

In [54]:
#confirming intended data was collected by checking the first 10 
coordinates

[[19.044, 73.0325],
 [19.1135051, 73.008978],
 [19.053536, 72.84643],
 [19.1375, 72.915056],
 [19.23241, 72.86895],
 [19.1653323, 72.922099],
 [18.96702, 72.84214],
 [18.9936162, 72.8128113],
 [19.10861, 72.83622],
 [19.192056, 72.9585188],
 [19.25292, 73.142019],
 [19.047, 72.8746],
 [19.2058, 72.8682],
 [19.175, 72.9419],
 [19.19709, 72.82204],
 [19.008751, 73.01662],
 [19.000083, 72.813993],
 [19.072830200195, 72.882606506348],
 [19.0863, 72.8888],
 [19.10078, 72.87462],
 [19.2243333, 72.8658113],
 [19.04946, 72.923]]

In [55]:
aq_station_data = {
    "location": name,
    "station_id": station,
    "latitude": aqs_lat,
    "longitude": aqs_lon
        }

The location name, station id, and coordinates for all 23 AQICN stations in Mumbai

In [56]:
df = pd.DataFrame(aq_station_data)
df.head()

Unnamed: 0,location,station_id,latitude,longitude
0,"Sector-19A Nerul, Navi Mumbai, India",13702,19.044,73.0325
1,"Mahape, Navi Mumbai, India",12461,19.113505,73.008978
2,"Bandra Kurla Complex, Mumbai, India",13715,19.053536,72.84643
3,"Powai, Mumbai, India",12459,19.1375,72.915056
4,"Borivali East, Mumbai, India",12460,19.23241,72.86895


## Section 2

Open-Meteo API

https://open-meteo.com/en/docs/historical-weather-api

Note: No API required

Objective: 
Using the coordinates for the 23 WAQI measuring stations to collect the correspond weather coordinates to see if this is reliable, ganular weather data.  

In [57]:
weather_lat = []
weather_lon = []

for x in range(0, len(coordinates)):
  try:
    # the coordinates from the aq measuring station
    lat = str(coordinates[x][0])
    lon = str(coordinates[x][1])

    #create url for each station
    api_url = "https://archive-api.open-meteo.com/v1/archive?latitude=" + lat + "&longitude=" + lon + \
            "&start_date=2021-01-01&end_date=2023-03-06& \
            hourly=temperature_2m,rain,windspeed_10m,winddirection_10m&models=best_match&timezone=auto"

    #accessing the api and pulling information
    headers = {"accept": "application/json"}

    response = requests.get(api_url, headers = headers)

    #confirm api pull request is sucessful
    print(f"Number {x}: status_code: {response.status_code}")

    #"open-meteo.com" closest weather coordinate to aq measuring station coordinate
    temp_lat = response.json()["latitude"]
    temp_lon = response.json()["longitude"]

    #add to list
    weather_lat.append(temp_lat)
    weather_lon.append(temp_lon)

  except:
    print(f'The Station with coordinates, {lat}, {lon} did not have data')

Number 0: status_code: 200
Number 1: status_code: 200
Number 2: status_code: 200
Number 3: status_code: 200
Number 4: status_code: 200
Number 5: status_code: 200
Number 6: status_code: 200
Number 7: status_code: 200
Number 8: status_code: 200
Number 9: status_code: 200
Number 10: status_code: 200
Number 11: status_code: 200
Number 12: status_code: 200
Number 13: status_code: 200
Number 14: status_code: 200
Number 15: status_code: 200
Number 16: status_code: 200
Number 17: status_code: 200
Number 18: status_code: 200
Number 19: status_code: 200
Number 20: status_code: 200
Number 21: status_code: 200


In [58]:
#confirming the code above worked properly
weather_lat

[19.0,
 19.099998,
 19.200005,
 19.099998,
 19.200005,
 19.200005,
 19.099998,
 19.099998,
 19.200005,
 19.200005,
 19.300003,
 18.900002,
 19.200005,
 19.200005,
 19.200005,
 19.0,
 19.099998,
 19.099998,
 19.099998,
 19.099998,
 19.200005,
 19.099998]

In [59]:
#added closes weather station coordinates to corresponding aq monitoring station in df
df["weather_lat"] = weather_lat
df["weather_lon"] = weather_lon
df.head()

Unnamed: 0,location,station_id,latitude,longitude,weather_lat,weather_lon
0,"Sector-19A Nerul, Navi Mumbai, India",13702,19.044,73.0325,19.0,73.0
1,"Mahape, Navi Mumbai, India",12461,19.113505,73.008978,19.099998,73.0
2,"Bandra Kurla Complex, Mumbai, India",13715,19.053536,72.84643,19.200005,72.8
3,"Powai, Mumbai, India",12459,19.1375,72.915056,19.099998,72.90001
4,"Borivali East, Mumbai, India",12460,19.23241,72.86895,19.200005,72.90001


## Section 3

### Checking For Duplicate Weather Coordinate 

In the code below I checked how many weather coordinates are unique by using the pandas duplicated function. Essentially the code populates a True if the weather coordinate pair("weather_lat and weather_lon) has appeared before in a different row. 

Pandas duplicated function documentation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html


### Result
There are 8 unique weather coordinates represented by the value "False"

In [60]:
df[["weather_lat", "weather_lon"]].duplicated()

0     False
1     False
2     False
3     False
4     False
5      True
6      True
7      True
8      True
9     False
10    False
11    False
12     True
13     True
14     True
15     True
16     True
17     True
18     True
19     True
20     True
21     True
dtype: bool

In [61]:
pairing_list = [1, 2, 3, 4, 5, 5, 4, 4, 3, 6, 7, 8, 5, 5, 3, 1, 4, 4, 4, 4, 5, 2]
df["pairing"] = pairing_list
df

Unnamed: 0,location,station_id,latitude,longitude,weather_lat,weather_lon,pairing
0,"Sector-19A Nerul, Navi Mumbai, India",13702,19.044,73.0325,19.0,73.0,1
1,"Mahape, Navi Mumbai, India",12461,19.113505,73.008978,19.099998,73.0,2
2,"Bandra Kurla Complex, Mumbai, India",13715,19.053536,72.84643,19.200005,72.8,3
3,"Powai, Mumbai, India",12459,19.1375,72.915056,19.099998,72.90001,4
4,"Borivali East, Mumbai, India",12460,19.23241,72.86895,19.200005,72.90001,5
5,"Khindipada-Bhandup West, Mumbai, India",13710,19.165332,72.922099,19.200005,72.90001,5
6,"Mazgaon, Mumbai, India",13709,18.96702,72.84214,19.099998,72.90001,4
7,"Worli, Mumbai, India",11921,18.993616,72.812811,19.099998,72.90001,4
8,"Vile Parle West, Mumbai, India",12455,19.10861,72.83622,19.200005,72.8,3
9,"Pimpleshwar Mandir, Dombivali, Thane, India",9143,19.192056,72.958519,19.200005,73.0,6


In [28]:
df.to_csv("aq_stations&_weather_stations.csv")

### Second version of storing coordinate data to use in Tableau Map Visualization

In [62]:
#air quality measuring station coordinates
test = {"latitude": aqs_lat, 
        "longitude": aqs_lon,
        }
coordinate_df = pd.DataFrame(test)
coordinate_df["type"] = "air_quality"
coordinate_df["pairing"] = [1, 2, 3, 4, 5, 5, 4, 4, 3, 6, 7, 8, 5, 5, 3, 1, 4, 4, 4, 4, 5, 2]
coordinate_df.head()

Unnamed: 0,latitude,longitude,type,pairing
0,19.044,73.0325,air_quality,1
1,19.113505,73.008978,air_quality,2
2,19.053536,72.84643,air_quality,3
3,19.1375,72.915056,air_quality,4
4,19.23241,72.86895,air_quality,5


In [63]:
#weather measuring coordinates
info = {"latitude": weather_lat, 
        "longitude": weather_lon,
        }
add_to_coordinate_df = pd.DataFrame(info)
add_to_coordinate_df["type"] = "weather"
add_to_coordinate_df["pairing"] = [1, 2, 3, 4, 5, 5, 4, 4, 3, 6, 7, 8, 5, 5, 3, 1, 4, 4, 4, 4, 5, 2]
add_to_coordinate_df.head()

Unnamed: 0,latitude,longitude,type,pairing
0,19.0,73.0,weather,1
1,19.099998,73.0,weather,2
2,19.200005,72.8,weather,3
3,19.099998,72.90001,weather,4
4,19.200005,72.90001,weather,5


In [64]:
#merge two dataframes together
coordinate_df = coordinate_df.merge(add_to_coordinate_df, how="outer")
coordinate_df.head()

Unnamed: 0,latitude,longitude,type,pairing
0,19.044,73.0325,air_quality,1
1,19.113505,73.008978,air_quality,2
2,19.053536,72.84643,air_quality,3
3,19.1375,72.915056,air_quality,4
4,19.23241,72.86895,air_quality,5


In [65]:
#confirm code above worked as intended
coordinate_df.tail()

Unnamed: 0,latitude,longitude,type,pairing
39,19.200005,72.90001,weather,5
40,19.200005,72.90001,weather,5
41,19.200005,73.0,weather,6
42,19.300003,73.100006,weather,7
43,18.900002,73.0,weather,8


In [66]:
#one more confirmation
coordinate_df.loc[(coordinate_df["pairing"] == 1), :]

Unnamed: 0,latitude,longitude,type,pairing
0,19.044,73.0325,air_quality,1
15,19.008751,73.01662,air_quality,1
22,19.0,73.0,weather,1
23,19.0,73.0,weather,1


In [45]:
#to download this file from google collab jupyter notebook to local drive, run this code
#then click folder icon on left pane, then right click folder area and click "refresh"
#you should see csv file, click three dots to the side of csv file and click "download"
coordinate_df.to_csv("coordinates.csv")

# Results:

There are 8 unique weather measuring coordinates in the open-meteon.com API, for the 23 air quality stations. 

Results are visualized on tableau public here: No account is needed to view.

https://public.tableau.com/app/profile/gerardo.angulo8689/viz/OmdenaAir_Quality_Stations_Relative_To_Weather_Measuring_Coordinates/Sheet1?publish=yes

