# Collecting Historical Weather Data From Open Meteo API
https://open-meteo.com/en/docs/historical-weather-api

Note: 
- The coordinates used are the 8 weather stations in Mumbai that are available in Open Meteo API. If you want to see the locations relative to the 23 Air Quality Stations look at this interactive dashboard I created [here](https://public.tableau.com/app/profile/gerardo.angulo8689/viz/OmdenaAir_Quality_Stations_Relative_To_Weather_Measuring_Coordinates/Sheet1?publish=yes). 

- No API Key is needed to access this API

Data Collected:
- Hourly Data
- From January 1st 2021 - March 10th 2023
- From the 8 weather stations in Mumbai
- Metrics collected include:
  - Date
  - Temperature
  - Relative Humidity
  - Surface Pressure
  - Rain
  - Wind Speed
  - Wind Direction

Result:
Final DataFrame with all historical hourly data from each weather station was converted to a csv file and uploaded to GitHub.

In [49]:
import pandas as pd
import requests
import json

In [50]:
#the coordinates to the 8 weather stations in Mumbai 
weather_stations = [[19.000000, 73.000000],[19.099998, 73.000000], [19.200005, 72.800000], [19.099998, 72.900010], 
               [19.200005, 72.900010], [19.200005, 73.000000], [19.300003, 73.100006], [18.900002, 73.000000]]

Units of Measurement for each column is listed below. You can also check this out [here](https://open-meteo.com/en/docs/historical-weather-api), under the "Hourly Parameter Definition" Section. 

In [52]:
# {'time': 'iso8601', 'temperature_2m': '°C','relativehumidity_2m': '%', 
#'surface_pressure': 'hPa',  'rain': 'mm',  'windspeed_10m': 'km/h',  'winddirection_10m': '°'} 

## Below is a function to collect historical weather data for one coordinate location. It collects metrics such as Date, Temperature,Relative Humidity, Surface Pressure, Rain, Wind Speed, Wind Direction.

In [53]:
def get_weather_data(latitude, longitude):
  api_url = "https://archive-api.open-meteo.com/v1/archive?latitude=" + str(latitude) + "&longitude=" + str(longitude) + \
  "&start_date=2021-01-01&end_date=2023-03-10&" + \
  "hourly=temperature_2m,relativehumidity_2m,surface_pressure,rain,windspeed_10m,winddirection_10m&models=best_match"

  #accessing the api and pulling information
  headers = {"accept": "application/json"}
  response = requests.get(api_url, headers = headers)
  print(f"status_code: {response.status_code}")

  #collect and store releveant weather metrics for coordinates
  lat = response.json()['latitude']
  lon = response.json()["longitude"]
  hours = response.json()["hourly"]["time"]
  temperature = response.json()["hourly"]["temperature_2m"]
  humidity = response.json()["hourly"]["relativehumidity_2m"]
  pressure = response.json()["hourly"]["surface_pressure"]
  rain = response.json()["hourly"]["rain"]
  windspeed = response.json()["hourly"]["windspeed_10m"]
  winddirection = response.json()["hourly"]["winddirection_10m"]

  data = {"latitude": lat,
          "longitude": lon,
          "date": hours,
          "temperature": temperature,
          "humidity": humidity,
          "surface_pressure": pressure, 
          "rain": rain,
          "wind_speed": windspeed,
          "wind_direction": winddirection
          }
  df = pd.DataFrame(data)
  print("DataFrame was succesfully created")
  return df

### Confirming code works as intended

In [59]:
location_8_df = get_weather_data(18.900002, 73.000000)

status_code: 200
DataFrame was succesfully created


In [65]:
location_8_df.head()

Unnamed: 0,latitude,longitude,date,temperature,humidity,surface_pressure,rain,wind_speed,wind_direction
0,18.900002,73.0,2021-01-01T00:00,21.5,87.0,1008.0,0.0,6.1,50.0
1,18.900002,73.0,2021-01-01T01:00,21.2,88.0,1008.3,0.0,6.7,54.0
2,18.900002,73.0,2021-01-01T02:00,21.0,88.0,1009.2,0.0,6.6,61.0
3,18.900002,73.0,2021-01-01T03:00,22.9,78.0,1009.7,0.0,8.0,72.0
4,18.900002,73.0,2021-01-01T04:00,25.8,63.0,1010.5,0.0,7.7,79.0


In [28]:
location_8_df.shape

(19176, 9)

##The function below is to collect weather data from all 8 weather stations and merge it into single data frame.

In [54]:
def get_final_df(coordinates):
  final_df = pd.DataFrame(columns = ["latitude", "longitude", "date", "temperature", "humidity", "surface_pressure", "rain", "wind_speed", "wind_direction"])
  for x in range(0, len(coordinates)):
    temp_lat = coordinates[x][0]
    temp_lon = coordinates[x][1]

    temp_df = pd.DataFrame()
    temp_df = get_weather_data(temp_lat, temp_lon)
    print(f'Temperary DataFrame Number {x} has been created.')

    final_df = final_df.merge(temp_df, how="outer")
    print(f'df #{x}: Merged to final_df')
    print('------------------------------------------------------------------')
  print("final_df has been completed")
  return final_df

In [55]:
weather_station_df = get_final_df(weather_stations)

status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 0 has been created.
df #0: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 1 has been created.
df #1: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 2 has been created.
df #2: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 3 has been created.
df #3: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully created
Temperary DataFrame Number 4 has been created.
df #4: Merged to final_df
------------------------------------------------------------------
status_code: 200
DataFrame was succesfully cr

### Confirming this code worked as intended

In [56]:
print(f'{weather_station_df.shape}')
weather_station_df.head()

(153408, 9)


Unnamed: 0,latitude,longitude,date,temperature,humidity,surface_pressure,rain,wind_speed,wind_direction
0,19.0,73.0,2021-01-01T00:00,21.1,88.0,1008.0,0.0,6.1,50.0
1,19.0,73.0,2021-01-01T01:00,20.7,89.0,1008.3,0.0,6.7,54.0
2,19.0,73.0,2021-01-01T02:00,20.4,90.0,1009.2,0.0,6.6,61.0
3,19.0,73.0,2021-01-01T03:00,22.7,78.0,1009.7,0.0,8.0,72.0
4,19.0,73.0,2021-01-01T04:00,25.9,63.0,1010.5,0.0,7.7,79.0


In [57]:
weather_station_df.tail()

Unnamed: 0,latitude,longitude,date,temperature,humidity,surface_pressure,rain,wind_speed,wind_direction
153403,18.900002,73.0,2023-03-10T19:00,,,,,,
153404,18.900002,73.0,2023-03-10T20:00,,,,,,
153405,18.900002,73.0,2023-03-10T21:00,,,,,,
153406,18.900002,73.0,2023-03-10T22:00,,,,,,
153407,18.900002,73.0,2023-03-10T23:00,,,,,,


### Downloaded DataFrame as CSV file. I uploaded this CSV file to my GitHub branch here.


In [58]:
#to download this file from google collab jupyter notebook to local drive, run this code
#then click folder icon on left pane, then right click folder area and click "refresh"
#you should see csv file, click three dots to the side of csv file and click "download"
weather_station_df.to_csv("weather_station_data.csv")