<img src='https://www.actris.eu/sites/default/files/inline-images/Actris%20logo.png' width=200 align=right>

# ACTRIS DC 
## Search with ACTRIS Metadata Rest API 

The goal of this notebook is to provide a guide on how to access data through the ACTRIS Metadata Rest API. This is a machine to machine approch to accessing data and is suited when you plan to access large amounts of data or only want to use a programming interface to access data. 

Let's get started!

### Using ACTRIS metadata catalog REST API

ACTRIS metadata catalog REST API: https://prod-actris-md.nilu.no/index.html

The ACTRIS Rest API uses the ACTRIS vocabulary for several of the search criteria, the vocabulary can be found here: https://vocabulary.actris.nilu.no/skosmos/actris_vocab/en/



**NB!** The ACTRIS REST API is currently undergoing upgrades and so you might experience some time-outs. 

### Import libraries

In [39]:
# Library for working with multi-dimensional arrays 
import pandas as pd

# Libraries for working with JSON files, making HTTP requests, and handling file system operations
import json
import requests
import os

# Libary for creating python widgets
import ipywidgets as widgets

# Library for creating interactive plots
import plotly.express as px

## Browse the metadata archive

### Vocabulary categories

This is an example of how to browse and get used to the ACTRIS Rest API metadata catalog and each search elements. First of we have a look at the vocabulary categories that are defined in the API. 

In [9]:
response = requests.get("https://prod-actris-md.nilu.no/Vocabulary/categories") # get all countries in metadata archive
archive = response.json()
df = pd.DataFrame(archive)
df.head() # '.head()' displays the first 5 elements, if you want to see more, you can specify the number of elements you want to see or just use 'df'

Unnamed: 0,category,synchronized
0,compliance,True
1,constrainttype,False
2,contentattribute,True
3,contenttype,False
4,dataprotocol,False


In [10]:
# dropdown widget, which allows you to see all the categories in the metadata archive
dropdown_categories = widgets.Dropdown(
    options=list(df['category'].sort_values()),
    value=list(df['category'])[0],
    description='Categories:',
    disabled=False,
)

display(dropdown_categories)

Dropdown(description='Categories:', options=('compliance', 'constrainttype', 'contentattribute', 'contenttype'…

Each of the categories can be explored more by going into the metadata vocabulary for the different categories, here the instruments are looked at. 

In [11]:
# Vocabulary category values, choose from the above categories and explore the values. 

category = 'instrumenttype' # Gives all instrument categories
#category = 'contentattribute' # Gives all variable categories

response = requests.get("https://prod-actris-md.nilu.no/Vocabulary/{}".format(category))  # get all Facilities in metadata archive
archive = response.json()
df = pd.DataFrame(archive)

df.head()

Unnamed: 0,label,atom
0,absorption filter sampler,https://vocabulary.actris.nilu.no/actris_vocab...
1,absorption solution sampler,https://vocabulary.actris.nilu.no/actris_vocab...
2,absorption solution spectro-photometric sensor,https://vocabulary.actris.nilu.no/actris_vocab...
3,absorption tube,https://vocabulary.actris.nilu.no/actris_vocab...
4,adsorption tube,https://vocabulary.actris.nilu.no/actris_vocab...


In [12]:
# dropdown widget for the category values chosen above
dropdown_category = widgets.Dropdown(
    options=list(df['label'].sort_values()),
    value=list(df['label'])[0],
    description='{}:'.format(category),
    disabled=False,
    
)

display(dropdown_category)

Dropdown(description='instrumenttype:', index=9, options=('Doppler lidar', 'Doppler non-scanning cloud radar',…

### Facilities

In [15]:
response = requests.get("https://prod-actris-md.nilu.no/Facilities") # get all facilities in metadata archive
archive = response.json()
df = pd.DataFrame(archive)

df.head()

Unnamed: 0,num_id,identifier,name,lat,lon,alt,country_code,identifier_type,uri,wmo_region,active,contact_organisation,facility_type,actris_national_facility,actris_nf_uri
0,2489,00LJ,Primorskaya,43.629167,132.236944,85.0,RU,other PID,https://prod-actris-md.nilu.no/facilities/00LJ,,,,,,
1,2490,03MW,Hvasser,59.066667,10.433333,35.0,NO,other PID,https://prod-actris-md.nilu.no/facilities/03MW,,,,,,
2,2491,03RG,Cottered,51.966667,-0.1,,GB,other PID,https://prod-actris-md.nilu.no/facilities/03RG,,,,,,
3,310,04ih,Anholt,56.716667,11.516667,40.0,DK,other PID,https://prod-actris-md.nilu.no/facilities/04ih,Europe,,,,,
4,311,05sb,Ansbach,49.25,10.583333,481.0,DE,other PID,https://prod-actris-md.nilu.no/facilities/05sb,Europe,,,,,


In [40]:
fig = px.scatter_geo(df, lat="lat", lon="lon", color="actris_national_facility", hover_name="name"
                     ,projection="natural earth", size_max=15, width=1000, height=500)

fig.update_layout(
    margin=dict(l=20, r=20, t=20, b=20),
)

fig.show()

In [41]:
# show all metadata for norwegian facilities 
facilities_county = df[df['country_code']=='ES'] #select norwegian facilities
facilities_county.head() # show archive as table 

Unnamed: 0,num_id,identifier,name,lat,lon,alt,country_code,identifier_type,uri,wmo_region,active,contact_organisation,facility_type,actris_national_facility,actris_nf_uri
83,341,23j8,Campisabalos,41.27417,-3.1425,1360.0,ES,other PID,https://prod-actris-md.nilu.no/facilities/23j8,Europe,,,,,
89,346,290n,Risco Llamo,39.516667,-4.35,1241.0,ES,other PID,https://prod-actris-md.nilu.no/facilities/290n,Europe,,,,,
97,3867,2geA,Valladolid (Jardin Botanico),41.668889,-4.733056,694.0,ES,other PID,https://data.actris.eu/facility/2geA,Europe,True,,"[observation platform, fixed]",False,
124,362,39pk,Vic,41.935,2.239722,496.0,ES,other PID,https://prod-actris-md.nilu.no/facilities/39pk,Europe,,,,,
139,178,3pb5,El Arenosillo,37.1,-6.73333,41.0,ES,other PID,https://dev-dc.actris.nilu.no/facility/3pb5,Europe,True,,"[observation platform, fixed]",False,https://actris-nf-labelling.out.ocp.fmi.fi/fac...


### Providers

In [43]:
response = requests.get("https://prod-actris-md.nilu.no/Providers") # get all networks in metadata archive
archive = response.json()
df = pd.DataFrame(archive)

# dropdown widget
dropdown_providers = widgets.Dropdown(
    options=list(df['name'].sort_values()),
    value=list(df['name'])[-1],
    description='Providers:',
    disabled=False,
)

display(dropdown_providers)

Dropdown(description='Providers:', options=('ARES', 'ASC', 'CLU', 'DVAS', 'GRES', 'IN-SITU', 'Norwegian Instit…

In [45]:
df[df['name']==dropdown_providers.value]

Unnamed: 0,id,name,acronym,description,created
6,14,IN-SITU,IN-SITU,ACTRIS In situ data centre unit (In-Situ),2020-06-29T07:20:45.2311600Z


## Accessing metadata

The full ACTRIS metadata catalog can be accessed with https://prod-actris-md.nilu.no/Metadata/, but(!) this can take a bit of time. Therefore its best search using the available search elements such as instrument, country, station, provider etc. 


In [53]:
# get all metadata in catalogue 
response = requests.get("https://prod-actris-md.nilu.no/metadata/instrument/integrating%20nephlometer/country/ES/facility/5qss/page/0") 
metadata_archive = response.json() 

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [11]:
metadata_archive[0] # show metadata

NameError: name 'metadata_archive' is not defined

In [None]:
# Most of these keys consists of a new dictonary with metadata information. 
# An example is md_metadata 
md_list = []
for f in metadata_archive:
    md_list.append(f['md_metadata']) 
df_md_metadata = pd.DataFrame.from_records(md_list)

df_md_metadata.iloc[0] #only show first element in list of metadata

id                                                            203849
provider           {'name': 'IN-SITU', 'atom': 'http://localhost:...
file_identifier                                         P3HD-KWCT.nc
language                                                          en
hierarchy_level                                              dataset
online_resource                  {'linkage': 'http://ebas.nilu.no/'}
datestamp                               2024-06-13T22:00:00.0000000Z
created                                 2024-06-14T08:17:20.0000000Z
contact            [{'first_name': 'Markus', 'last_name': 'Fiebig...
Name: 0, dtype: object

In [None]:
# Above the column 'contact' includes more information about a contact person for each dataset. 

df_md_metadata.iloc[0]['contact'] # show contact information for first dataset

[{'first_name': 'Markus',
  'last_name': 'Fiebig',
  'organisation_name': 'NILU',
  'role_code': ['custodian'],
  'country_code': 'NO',
  'delivery_point': 'Instituttveien 18',
  'address_city': 'Kjeller',
  'administrative_area': 'Viken',
  'postal_code': 2007,
  'email': 'ebas@nilu.no',
  'position_name': 'Senior Scientist'}]

In [None]:
# Another example of extracting metadata, here the content information.
files_list = []
for f in metadata_archive:
    url = f['md_content_information']
    files_list.append(url)
    
df_content_information = pd.DataFrame.from_records(files_list)
# Displays the content information for all datasets from Birkenes II 
df_content_information 


Unnamed: 0,attribute_descriptions,content_type
0,[aerosol particle mass concentration],physicalMeasurement
1,[aerosol particle light absorption coefficient],physicalMeasurement
2,[aerosol particle light absorption coefficient],physicalMeasurement
3,[aerosol particle light absorption coefficient],physicalMeasurement
4,[aerosol particle light absorption coefficient],physicalMeasurement
...,...,...
105,[aerosol particle elemental carbon mass concen...,physicalMeasurement
106,[ozone mass concentration],physicalMeasurement
107,[hydrogen amount fraction],physicalMeasurement
108,[aerosol particle aluminium mass concentration...,physicalMeasurement


In [None]:
# Another example of extracting metadata, here the distribution information.
# The distribution information includes data format, dataset url, protocol, restrictions and more.

files_list = []
for f in metadata_archive:
    url = f['md_distribution_information'][0]
    files_list.append(url)
    
df_distribution_information = pd.DataFrame.from_records(files_list)
df_distribution_information.iloc[0] #show the distribution information for the first dataset. 
# If you wish to see distribution information about all Birkenes II datasets, remove .iloc[0]

data_format                                                       NETCDF
version_data_format                                                    4
dataset_url            https://thredds.nilu.no/thredds/dodsC/ebas_doi...
protocol                                                         OPeNDAP
function                                                       streaming
restriction                                               {'set': False}
transfersize                                                   2143439.0
Name: 0, dtype: object