# Fetching observations from MET data base services

This notebook serves as sandbox for fetching data from frost.met, havvarsel-frost.met and thredds.met

See https://api.met.no/ for all interfaces and possible sources!!


## Havvarsel frost
Havvarsel frost delivers so far `temperature` measurements originated from badevann.no at seven beaches in the south of Norway (glider data integrated in the meantime, but not yet used here!). 
> Documentation:
> API documentation for obs/badevann https://havvarsel-frost.met.no/docs/apiref#/obs%2Fbadevann/obsBadevannGet 
> Datastructure described on https://havvarsel-frost.met.no/docs/dataset_badevann


## Frost
From the frost server we retrieve observations from the `n` closest weather observation stations and include the values 
- `air_temperature`
- `mean(surface_downwelling_shortwave_flux_in_air PT1H)`
- `wind_speed`
- `relative_humidity`
- `sum(duration_of_sunshinePT1H)`
- `cloud_area_fraction` (which takes values from 0 = no clouds to 8 = fully clouded) 
- ...

> Documentation:
> API documentation for observations on https://frost.met.no/api.html#!/observations/observations 
> Available elements (params) are listed on https://frost.met.no/elementtable 
> Examples on Frost data manipulation with Python on https://frost.met.no/python_example.html
>
> See also:
> Complete documentation at https://frost.met.no/howto.html 
> Complete frost API reference at https://frost.met.no/api.html 

## Thredds
Holds netcdf files with the a bunch of different data

> See the catalog: https://thredds.met.no/thredds/catalog.html

We primarily use the `Ocean and Ice/met.no (OLD) ROMS NorKyst800m coastal forecasting system` to get the forecasted water temperatures


TODO processing:
 - Tune processing and storing of observational data sets (to suite whatever code that will use the data sets)

In [1]:
# Importing general libraries
import sys
import json
import datetime
import requests
from traceback import format_exc

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


import netCDF4
import pyproj as proj


In [2]:
start_time = datetime.datetime.strptime("2021-10-01T00:00", "%Y-%m-%dT%H:%M")
end_time = datetime.datetime.strptime("2021-11-01T00:00", "%Y-%m-%dT%H:%M")


In [3]:
str(start_time.isoformat())

'2021-10-01T00:00:00'

In [4]:
frost_api_base="https://havvarsel-frost.met.no"
endpoint = frost_api_base + "/api/v1/obs/badevann/get"

In [9]:
payload = {'time': str(start_time.isoformat()) + "Z/" + str(end_time.isoformat()) + "Z", 
                    'incobs':'true', 'buoyids': 38, 'parameter':'temperature'}
payload_str = "&".join("%s=%s" % (k,v) for k,v in payload.items())

In [10]:
try:
    r = requests.get(endpoint, params=payload_str)
    print("Trying " + r.url)
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    raise Exception(err)


Trying https://havvarsel-frost.met.no/api/v1/obs/badevann/get?time=2021-10-01T00:00:00Z/2021-11-01T00:00:00Z&incobs=true&buoyids=38&parameter=temperature


In [7]:
r.json()["data"]["tseries"][0]["header"]

{'id': {'buoyid': '38', 'parameter': 'temperature', 'source': 'badetassen.no'},
 'extra': {'name': 'Sjøsanden',
  'pos': {'lat': '58.018192', 'lon': '7.445817'}}}

In [17]:
badevann_df = pd.DataFrame(columns=['buoyid', 'name', 'lat', 'lon'])

for id in range(150):
    try:
        payload = {'time': str(start_time.isoformat()) + "Z/" + str(end_time.isoformat()) + "Z", 
                    'incobs':'true', 'buoyids': id}
        payload_str = "&".join("%s=%s" % (k,v) for k,v in payload.items())
        r = requests.get(endpoint, params=payload_str)
        
        name = r.json()["data"]["tseries"][0]["header"]["extra"]["name"]
        lat = r.json()["data"]["tseries"][0]["header"]["extra"]["pos"]["lat"]
        lon = r.json()["data"]["tseries"][0]["header"]["extra"]["pos"]["lon"]
        badevann_df = badevann_df.append({"buoyid" : id, "name" : name, "lat" : lat, "lon" : lon}, ignore_index=True)
    except:
        pass

In [18]:
badevann_df

Unnamed: 0,buoyid,name,lat,lon
0,5,Aksdalsvatnet,59.416750,5.418333
1,7,Grønnavigå,58.988380,5.739830
2,10,Sjøbadet,63.435770,10.390610
3,11,Munkholmen,63.451380,10.384790
4,13,Vaulen,58.926347,5.750357
...,...,...,...,...
64,128,Volsdalsberga,62.467929,6.198442
65,129,Sandnessjøen havsbad,65.989325,12.623672
66,130,Mosvatnet,59.420210,6.447400
67,131,Våge,60.044350,5.524630


In [26]:
nc = netCDF4.MFDataset(["https://thredds.met.no/thredds/dodsC/metpparchive/2020/09/01/met_analysis_1_0km_nordic_20200901T00Z.nc", 
"https://thredds.met.no/thredds/dodsC/metpparchive/2020/09/02/met_analysis_1_0km_nordic_20200902T00Z.nc"])

In [3]:
nc.variables.keys()

dict_keys(['ensemble_member', 'forecast_reference_time', 'projection_lcc', 'time', 'x', 'y', 'air_pressure_at_sea_level', 'air_temperature_2m', 'altitude', 'cloud_area_fraction', 'integral_of_surface_downwelling_shortwave_flux_in_air_wrt_time', 'land_area_fraction', 'latitude', 'longitude', 'precipitation_amount', 'relative_humidity_2m', 'wind_direction_10m', 'wind_speed_10m'])

In [4]:
lon =  10.7166638
lat =  59.933329

In [5]:
proj_args = nc.variables["projection_lcc"].proj4
p = proj.Proj(str(proj_args))

xp,yp = p(lon,lat)
lats = nc.variables["latitude"][:]
lons = nc.variables["longitude"][:]
xps,yps = p(lons,lats)

In [6]:
x = (np.abs(xps[0,:]-xp)).argmin()
y = (np.abs(yps[:,0]-yp)).argmin()

In [7]:
cftimes = netCDF4.num2date(nc.variables["time"][:], nc.variables["time"].units)

datetimes = []
for t in range(len(cftimes)):
    new_datetime = datetime.datetime(cftimes[t].year, cftimes[t].month, cftimes[t].day, cftimes[t].hour, cftimes[t].minute)
    datetimes.append(new_datetime)

In [8]:
timeseries = pd.DataFrame()

In [9]:
param = "air_temperature_2m"

# EXTRACT DATA
data = nc.variables[param][:,y,x]

# Dataframe for return
new_timeseries = pd.DataFrame({"referenceTime":datetimes, param:data})

#NOTE: Since the other data sources explicitly specify the time zone
# the tz is manually added to the datetime here
new_timeseries["referenceTime"] = new_timeseries["referenceTime"].dt.tz_localize(tz="UTC") 
            
# Outer joining dataset
if timeseries.empty:
    timeseries = new_timeseries
else:
    timeseries = pd.merge(timeseries.set_index("referenceTime"), new_timeseries.set_index("referenceTime")[param], how="outer", on="referenceTime")
    timeseries = timeseries.reset_index()


In [10]:
timeseries

Unnamed: 0,referenceTime,air_temperature_2m
0,2020-09-01 00:00:00+00:00,280.25
1,2020-09-02 00:00:00+00:00,282.049988


In [11]:
param = "wind_speed_10m"

# EXTRACT DATA
data = nc.variables[param][:,y,x]

# Dataframe for return
new_timeseries = pd.DataFrame({"referenceTime":datetimes, param:data})

#NOTE: Since the other data sources explicitly specify the time zone
# the tz is manually added to the datetime here
new_timeseries["referenceTime"] = new_timeseries["referenceTime"].dt.tz_localize(tz="UTC") 
            
# Outer joining dataset
if timeseries.empty:
    timeseries = new_timeseries
else:
    timeseries = pd.merge(timeseries.set_index("referenceTime"), new_timeseries.set_index("referenceTime")[param], how="outer", on="referenceTime")
    timeseries = timeseries.reset_index()


In [12]:
timeseries

Unnamed: 0,referenceTime,air_temperature_2m,wind_speed_10m
0,2020-09-01 00:00:00+00:00,280.25,1.419991
1,2020-09-02 00:00:00+00:00,282.049988,1.404837


In [14]:
param = 'cloud_area_fraction'

# EXTRACT DATA
data = nc.variables[param][:,y,x]

# Dataframe for return
new_timeseries = pd.DataFrame({"referenceTime":datetimes, param:data})

#NOTE: Since the other data sources explicitly specify the time zone
# the tz is manually added to the datetime here
new_timeseries["referenceTime"] = new_timeseries["referenceTime"].dt.tz_localize(tz="UTC") 
            
# Outer joining dataset
if timeseries.empty:
    timeseries = new_timeseries
else:
    timeseries = pd.merge(timeseries.set_index("referenceTime"), new_timeseries.set_index("referenceTime")[param], how="outer", on="referenceTime")



In [15]:
timeseries

Unnamed: 0_level_0,air_temperature_2m,wind_speed_10m,cloud_area_fraction
referenceTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-09-01 00:00:00+00:00,280.25,1.419991,0.0
2020-09-02 00:00:00+00:00,282.049988,1.404837,0.0


In [23]:
times = pd.date_range(start_time, end_time, freq="H")

In [25]:
times = times.tz_localize("UTC")

In [30]:
data = pd.DataFrame(times, columns=["time"])

Unnamed: 0,index,time
0,0,2020-01-01 00:00:00+00:00
1,1,2020-01-01 01:00:00+00:00
2,2,2020-01-01 02:00:00+00:00
3,3,2020-01-01 03:00:00+00:00
4,4,2020-01-01 04:00:00+00:00
5,5,2020-01-01 05:00:00+00:00
6,6,2020-01-01 06:00:00+00:00
7,7,2020-01-01 07:00:00+00:00
8,8,2020-01-01 08:00:00+00:00
9,9,2020-01-01 09:00:00+00:00
