# Collecting Historical Weather Data From Open Meteo API
https://open-meteo.com/en/docs/historical-weather-api

Note: 
- The coordinates used are the 8 weather stations in Mumbai that are available in Open Meteo API. If you want to see the locations relative to the 22 Air Quality Stations look at this interactive dashboard I created [here](https://public.tableau.com/app/profile/gerardo.angulo8689/viz/OmdenaAir_Quality_Stations_Relative_To_Weather_Measuring_Coordinates/Sheet1?publish=yes). 

- No API Key is needed to access this API

Data Collected:
- Hourly Data
- From January 1st 2021 - March 20th 2023
- From the air quality stations in Mumbai
- Metrics collected include:
  - Date
  - Temperature
  - Relative Humidity
  - Surface Pressure
  - Rain
  - Weather Code
  - cloudcover_low
  - cloudcover_mid
  - cloudcover_high
  - Wind Speed
  - Wind Direction
  - Wind Gusts

Result:
Final DataFrame with all historical hourly data from each weather station was converted to a csv file and uploaded to GitHub.

In [1]:
import pandas as pd
import requests
import json

In [2]:
#air quality station coordinates
coordinates = [[19.044, 73.0325],
 [19.1135051, 73.008978],
 [19.053536, 72.84643],
 [19.1375, 72.915056],
 [19.23241, 72.86895],
 [19.1653323, 72.922099],
 [18.96702, 72.84214],
 [18.9936162, 72.8128113],
 [19.10861, 72.83622],
 [19.192056, 72.9585188],
 [19.25292, 73.142019],
 [19.047, 72.8746],
 [19.2058, 72.8682],
 [19.175, 72.9419],
 [19.19709, 72.82204],
 [19.008751, 73.01662],
 [19.000083, 72.813993],
 [19.072830200195, 72.882606506348],
 [19.0863, 72.8888],
 [19.10078, 72.87462],
 [19.2243333, 72.8658113],
 [19.04946, 72.923]]

Units of Measurement for each column is listed below. You can also check this out [here](https://open-meteo.com/en/docs/historical-weather-api), under the "Hourly Parameter Definition" Section. 

In [3]:
# {'time': 'iso8601', 'temperature_2m': '°C','relativehumidity_2m': '%', 
#'surface_pressure': 'hPa',  'rain': 'mm',  'windspeed_10m': 'km/h',  'winddirection_10m': '°'} 

## Below is a function to collect historical weather data for one coordinate location. It collects metrics such as Date, Temperature,Relative Humidity, Surface Pressure, Rain, Wind Speed, Wind Direction.

In [4]:
def get_weather_data(latitude, longitude):
  api_url = "https://archive-api.open-meteo.com/v1/archive?latitude=" + str(latitude) + "&longitude=" + str(longitude) + \
  "&start_date=2021-01-01&end_date=2023-03-20&" + \
  "hourly=temperature_2m,relativehumidity_2m,surface_pressure,rain,weathercode,cloudcover_low,cloudcover_mid,cloudcover_high,windspeed_10m,winddirection_10m,windgusts_10m&models=best_match"

  #accessing the api and pulling information
  headers = {"accept": "application/json"}
  response = requests.get(api_url, headers = headers)
  print(f"status_code: {response.status_code}")

  #collect and store releveant weather metrics for coordinates
  lat = response.json()['latitude']
  lon = response.json()["longitude"]
  hours = response.json()["hourly"]["time"]
  temperature = response.json()["hourly"]["temperature_2m"]
  humidity = response.json()["hourly"]["relativehumidity_2m"]
  pressure = response.json()["hourly"]["surface_pressure"]
  rain = response.json()["hourly"]["rain"]
  weathercode = response.json()["hourly"]["weathercode"]
  cloudcover_low = response.json()["hourly"]["cloudcover_low"]
  cloudcover_mid = response.json()["hourly"]["cloudcover_mid"]
  cloudcover_high = response.json()["hourly"]["cloudcover_high"]
  windspeed = response.json()["hourly"]["windspeed_10m"]
  winddirection = response.json()["hourly"]["winddirection_10m"]
  windgusts_10m = response.json()["hourly"]["windgusts_10m"]
  

  data = {"input_latitude": latitude,
          "input_longitude": longitude,
          "date": hours,
          "temperature": temperature,
          "humidity": humidity,
          "surface_pressure": pressure, 
          "rain": rain,
          "weathercode": weathercode ,
          "cloudcover_low": cloudcover_low,
          "cloudcover_mid": cloudcover_mid,
          "cloudcover_high": cloudcover_high,
          "wind_speed": windspeed,
          "wind_direction": winddirection,
          "windgusts_10m": windgusts_10m,
          "output_latitude": lat,
          "output_longitude": lon
          }
          
  df = pd.DataFrame(data)
  print("DataFrame was succesfully created")
  return df

### Confirming code works as intended

In [5]:
location_8_df = get_weather_data(19.044, 73.0325)

status_code: 200
DataFrame was succesfully created


In [6]:
location_8_df.head()

Unnamed: 0,input_latitude,input_longitude,date,temperature,humidity,surface_pressure,rain,weathercode,cloudcover_low,cloudcover_mid,cloudcover_high,wind_speed,wind_direction,windgusts_10m,output_latitude,output_longitude
0,19.044,73.0325,2021-01-01T00:00,20.8,88,1010.7,0.0,0,1,6,0,6.1,50,9.4,19.0,73.0
1,19.044,73.0325,2021-01-01T01:00,20.4,89,1011.0,0.0,0,0,11,0,6.7,54,11.2,19.0,73.0
2,19.044,73.0325,2021-01-01T02:00,20.1,90,1011.9,0.0,0,0,17,0,6.6,61,11.9,19.0,73.0
3,19.044,73.0325,2021-01-01T03:00,22.4,78,1012.4,0.0,0,0,15,0,8.0,72,14.0,19.0,73.0
4,19.044,73.0325,2021-01-01T04:00,25.6,63,1013.2,0.0,0,0,0,0,7.7,79,15.5,19.0,73.0


In [7]:
location_8_df.shape

(19416, 16)

##The function below is to collect weather data from all air quality stations and merge it into single data frame.

In [8]:
def get_final_df(coordinates_list):
  final_df = pd.DataFrame(columns = ["input_latitude", "input_longitude", "date", "temperature", "humidity", "surface_pressure", 
                                     "rain", "weathercode", "cloudcover_low", "cloudcover_mid", "cloudcover_high", "wind_speed", 
                                     "wind_direction", "windgusts_10m", "output_latitude", "output_longitude"])
  for x in range(0, len(coordinates_list)):
    temp_lat = coordinates_list[x][0]
    temp_lon = coordinates_list[x][1]

    temp_df = pd.DataFrame()
    temp_df = get_weather_data(temp_lat, temp_lon)
    print(f'Temperary DataFrame Number {x} has been created.')

    final_df = final_df.merge(temp_df, how="outer")
    print(f'df #{x}: Merged to final_df')
    print('------------------------------------------------------------------')
  print("final_df has been completed")
  return final_df

In [9]:
historical_weather_df = get_final_df(coordinates)

status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 0 has been created.
df #0: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 1 has been created.
df #1: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 2 has been created.
df #2: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 3 has been created.
df #3: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 4 has been created.
df #4: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully cr

### Confirming this code worked as intended

In [10]:
print(f'{historical_weather_df.shape}')
historical_weather_df.head()

(427152, 16)


Unnamed: 0,input_latitude,input_longitude,date,temperature,humidity,surface_pressure,rain,weathercode,cloudcover_low,cloudcover_mid,cloudcover_high,wind_speed,wind_direction,windgusts_10m,output_latitude,output_longitude
0,19.044,73.0325,2021-01-01T00:00,20.8,88,1010.7,0.0,0,1,6,0,6.1,50,9.4,19.0,73.0
1,19.044,73.0325,2021-01-01T01:00,20.4,89,1011.0,0.0,0,0,11,0,6.7,54,11.2,19.0,73.0
2,19.044,73.0325,2021-01-01T02:00,20.1,90,1011.9,0.0,0,0,17,0,6.6,61,11.9,19.0,73.0
3,19.044,73.0325,2021-01-01T03:00,22.4,78,1012.4,0.0,0,0,15,0,8.0,72,14.0,19.0,73.0
4,19.044,73.0325,2021-01-01T04:00,25.6,63,1013.2,0.0,0,0,0,0,7.7,79,15.5,19.0,73.0


In [11]:
historical_weather_df.tail()

Unnamed: 0,input_latitude,input_longitude,date,temperature,humidity,surface_pressure,rain,weathercode,cloudcover_low,cloudcover_mid,cloudcover_high,wind_speed,wind_direction,windgusts_10m,output_latitude,output_longitude
427147,19.04946,72.923,2023-03-20T19:00,24.3,73,1008.7,0.0,1,13,11,7,3.3,221,10.8,19.099998,73.0
427148,19.04946,72.923,2023-03-20T20:00,24.4,73,1008.7,0.0,1,22,35,1,1.8,90,11.2,19.099998,73.0
427149,19.04946,72.923,2023-03-20T21:00,22.0,84,1007.5,0.1,51,11,47,0,6.1,118,10.1,19.099998,73.0
427150,19.04946,72.923,2023-03-20T22:00,22.1,83,1007.2,0.0,1,9,37,4,5.8,120,11.2,19.099998,73.0
427151,19.04946,72.923,2023-03-20T23:00,22.8,80,1007.2,0.0,1,9,21,7,3.8,139,11.2,19.099998,73.0


### Downloaded DataFrame as CSV file. I uploaded this CSV file to my GitHub branch [here](https://github.com/OmdenaAI/omdena-mumbai-chapter-air-quality/blob/main/weather_station_data.csv).


In [12]:
#to download this file from google collab jupyter notebook to local drive, run this code
#then click folder icon on left pane, then right click folder area and click "refresh"
#you should see csv file, click three dots to the side of csv file and click "download"
historical_weather_df.to_csv("historical_weather_data_updated.csv")