This script fetches weather data from a weather station managed by the [Norwegian Meteorological Institute](https://www.met.no/en) and outputs it as a `.json`-file.

The scripts connects to [The Frost API](https://frost.met.no/index.html) and is based on their [Python example script](https://frost.met.no/python_example.html) with a few modifications.

**Please note**
 * The base stations in question for the UNIL Business Intelligence project are those listed in  `location_nearest_station.txt`.
 * [Weather and Climate Elements](https://frost.met.no/elementtable) shows all hypotethically available weather estimates. However, most turn out not to be available across the weather stations of variable sizes.
 * Because of previous point, stations with lower time resolution than one hour have been avoided in favor of stations a little farther away from the municipality for which the power consumption is registered.
 * Because of privacy issues, the consumption timeseries are only tagged with municipality, so speculating on where the closest weather station is will be a game of mere luck anyways. The distances are in any way so small that the temperature will be approximately the same.
 * The MAX air temperature over the past hour has been used, due to the fact that the MEAN function seems to malfunction.

In [13]:
import requests
from json import dump, load
import pandas as pd
import os

In [None]:
# Retrieve client id from local creds.txt file
d = {}
with open('../frost_api/creds.txt', 'r') as f:
    for l in f.readlines(): d[l.split()[0]] = l.split()[1]
client_id = d['client_id']

In [None]:
# Retrieve station IDs from municipality_nearest_station.csv
df = pd.read_csv('src//connecting_tables//municipality_nearest_station.csv')
loc_ids = df['nearest_station'].values

In [None]:
loc_id = loc_ids[0] # Or iterate through..

In [None]:
endpoint = 'https://frost.met.no/observations/v0.jsonld'
parameters = {
    'sources': f'{loc_id}',
    'elements': 'max(air_temperature PT1H)',
    'referencetime': '2018-01-01/2020-03-01',
}

# Issue an HTTP GET request
r = requests.get(endpoint, parameters, auth=(client_id,''))
# Extract JSON data
json = r.json()
toc = time()

In [None]:
# Check if the request worked, print out any errors
if r.status_code == 200:
    print('Data retrieved from frost.met.no!')
else:
    print('Error! Returned status code %s' % r.status_code)
    print('Message: %s' % json['error']['message'])
    print('Reason: %s' % json['error']['reason'])

In [None]:
len(json['data'])

In [None]:
json['data'][:2]

In [None]:
with open(f'src//weather_jsons/{loc_id}.json', 'w') as f:
    dump(json, f)

Run this cell to transform all `weather.json`files to `weather.csv`. 

The `.json`files can be directly read into sql tables later, but at my ability and computational resources the cost of reading `.json`into sql tables was approximately 60 times larger than the cost of converting them to `.csv` at this stage and then read the `.csv`files into sql tables. 

In [34]:
origin_folder = 'src//weather_jsons//'
destination_folder = 'src//weather_csvs//'

for file in os.listdir(origin_folder):
    if file.split('.')[-1] != 'json': continue
        
    with open(os.path.join(origin_folder, file),'r') as f:
        json = load(f)
        
    data = json['data']
    
    df = pd.DataFrame()
    for i in range(len(data)):
        row = pd.DataFrame(data[i]['observations'])
        row['referenceTime'] = pd.to_datetime(' '.join(data[i]['referenceTime'][:-5].split('T')))
        row['sourceId'] = data[i]['sourceId']
        df = df.append(row)
    df = df.reset_index()    
    
    df.to_csv(os.path.join(destination_folder, file.strip('.json')+'.csv'))
    
    print('Processed', file)

Processed SN35210.json
Processed SN36200.json
Processed SN36330.json
Processed SN38140.json
Processed SN39040.json
Processed SN39750.json
Processed SN40880.json
Processed SN41090.json
Processed SN41770.json
Processed SN41825.json
Processed SN42940.json
