The main purpose of this file is to import files to the bronze layer without any data editing. Thus the raw data will be stored in JSON format in the bronze folder.

# 1. Importing libraries and setting API keys/credentials

The pipeline uses the following data sources:
+ **Norwegian MET**, which provided weather data for Norway and Sweden, for API key registration please visit [here](https://frost.met.no/authentication.html), _please note that you will be provided both a credential key and a secret key, however you only need the credential key_.
+ **Danish DMI**, which provide weather data for Denmark, for API key registration please visit [here](https://opendatadocs.dmi.govcloud.dk/en/Authentication).

Both pages are in english and is supplied with great documentation including swagger.

In [None]:
from dotenv import load_dotenv
import os
from datetime import datetime, timedelta
import requests
from requests.auth import HTTPBasicAuth
from math import ceil
import json

#loading API keys from env-file, remember to rename .env_sample and type in API keys
load_dotenv()
client_cred_frost = os.getenv('FROST_CLIENT_CREDENTIAL') # importing Frost API key
client_cred_dmi = os.getenv('DMI_API_KEY') # importing DMI API key

# 2. Gathering data

In [2]:
#function for timestamps, takes input date (datetime) as well as time lag for creating intervals with days in the past e.g. yesterday till today
def get_date(input_date, timelag):
        timestamp = input_date - timedelta(days=timelag)
        return timestamp.strftime("%Y-%m-%d") # output in format required by the Frost met API

# Using client credentials from .env file, remember to rename env_sample file
auth = HTTPBasicAuth(client_cred_frost, "") 


## 2.1. Frost met data

When requesting the Frost met data for the weather stations you have to provide the station id. Thus the first request `https://frost.met.no/sources/v0.jsonld` is for all stations ids, and the next `https://frost.met.no/observations/v0.jsonld?` is detailed information for the stations ids acquired. The URIs are limited, hence why the batches are needed.

In [3]:
# get full list of weather stations
bronze_file_path = str(os.getcwd())+"/bronze/" # getting bronze layer file path
frost_file_path = bronze_file_path + '/frost/'
timestamp = get_date(datetime.today(), 0) +'T12' # for params and archiving name
timestamp_period_last24h = str(get_date(datetime.today(), 1))+'/'+str(get_date(datetime.today(), 0))
os.makedirs("bronze/frost/"+str(timestamp), exist_ok=True)
file_name_ids = 'frostStations_'+str(timestamp)+'_gathered_'+str(datetime.today().strftime("%Y-%m-%d"))+'.json'

url = f'https://frost.met.no/sources/v0.jsonld'
# will have to implement error handling
response = requests.get(url, auth=auth)
sources = []
with open(frost_file_path+file_name_ids, "w") as f:
    json.dump(response.json(), f, indent=2)
for i in response.json()['data']:
    if i['id'].startswith('SN'): # some ids are not accepted as parameters in the following GET requests, only IDs that starts with SN are
        sources.append(i['id'])
#get all data for each id in batches
url = 'https://frost.met.no/observations/v0.jsonld?sources={}&referencetime={}&elements={}'
#if folder for yesterday doesn't exist in bronze layer, create one
timestamp_dir = 'bronze/frost/'+str(timestamp)+'/'
element = 'air_temperature' # only need api information for air temp
batch_counter = 1 # for file saving
batch_size = 200  # adjust to stay under URI limits
for i in range(ceil(len(sources) / batch_size)):
    batch = sources[i * batch_size:(i + 1) * batch_size]
    sources_param = ",".join(batch)
    response = requests.get(url.format(sources_param, timestamp_period_last24h, element), auth=auth) 
    file_name = 'frostMetAirTemperature_'+str(timestamp)+'_batch_'+str(batch_counter)+'_gathered_'+str(datetime.today().strftime("%Y-%m-%d"))+'.json'
    with open(timestamp_dir+file_name, "w") as f:
        json.dump(response.json(), f, indent=2)
    batch_counter += 1
 


## 2.2. DMI data

Unlike the Frost met data, we don't need to request with ids in order to get the weather readings using `https://dmigw.govcloud.dk/v2/metObs/collections/observation/items`. However, due to data conformance the station details are still requested and collected using `https://dmigw.govcloud.dk/v2/metObs/collections/station/items`. This data could also be used for data aggregations and thus can prove to be useful

In [4]:
# get full list of weather stations
bronze_file_path = str(os.getcwd())+"/bronze/" # getting bronze layer file path
dmi_file_path = bronze_file_path + 'dmi/'
timestamp_dmi_start = get_date(datetime.today(), 1) + 'T12%3A00%3A00Z' # get the date yesterday
timestamp_dmi_end = get_date(datetime.today(), 0) + 'T12%3A00%3A00Z' # get the day today
timestamp_full = timestamp_dmi_start + '%2F' + timestamp_dmi_end
url = f'https://dmigw.govcloud.dk/v2/metObs/collections/station/items?datetime={timestamp_dmi_start}%2F{timestamp_dmi_end}&bbox-crs=https%3A%2F%2Fwww.opengis.net%2Fdef%2Fcrs%2FOGC%2F1.3%2FCRS84&api-key={client_cred_dmi}'
os.makedirs("bronze/dmi/"+str(timestamp), exist_ok=True)
file_name_ids = 'dmiStations_'+str(timestamp)+'_gathered_'+str(datetime.today().strftime("%Y-%m-%d"))+'.json'
# will have to implement error handling
response = requests.get(url.format(client_cred_dmi))
sources = []
with open(dmi_file_path+file_name_ids, "w") as f:
    json.dump(response.json(), f, indent=2)

url = 'https://dmigw.govcloud.dk/v2/metObs/collections/observation/items?period=latest-day&parameterId=temp_dry&bbox-crs=https%3A%2F%2Fwww.opengis.net%2Fdef%2Fcrs%2FOGC%2F1.3%2FCRS84&api-key={}'
#get all data for the latest day no need for batches due to data structure of API, this is really just a copy of the above
timestamp_dir = 'bronze/dmi/'+str(timestamp)+'/'
batch_counter = 1 
response = requests.get(url.format(client_cred_dmi), auth=auth) 
file_name = 'dmiAirTemperature_'+str(timestamp)+'_batch_'+str(batch_counter)+'_gathered_'+str(datetime.today().strftime("%Y-%m-%d"))+'.json'
with open(timestamp_dir+file_name, "w") as f:
    json.dump(response.json(), f, indent=2)