# COSMOS-UK API: Example Python snippets for accessing data
The COSMOS-UK soil moisture monitoring network released a data API in March 2023 to allow open access to the near-real-time data collected from the network of sites across the UK.

This notebook aims to provide a basic introduction in how you might interact with the API programatically, for example to automate the process of bringing COSMOS-UK data into your workflows. More comprehensive documentation will be available soon.

More information about COSMOS-UK can be found on our website: https://cosmos.ceh.ac.uk/

A detailed general user guide for COSMOS-UK, including information about sensors, quality control checks and other information about how our data is collected, is available here: https://cosmos.ceh.ac.uk/sites/default/files/COSMOS-UK%20User%20Guide%20v3.06.pdf

><span style="color:red">**Important**</span>  
>The API is currently in beta phase and no guarentees are given with respect to periods of unexpected downtime. Please send any bugs or issues found to: `ricsmi@ceh.ac.uk`, including the API URL used and a description of the problem you found.

><span style="color:blue">**Information**</span>  
>The data within the API is updated once per day at 03:00GMT. Therefore, data can be expected to be complete up to yesterday's date.

><span style="color:red">**Important**</span>  
>The last 10 days of data is considered 'unchecked' and may be subject to changes based on the routine weekly quality control checks that the COSMOS-UK team carry out.

If using COSMOS-UK data for scientific research/publication, we recommend using and citing the data available (upto the end of 2022) from the Environmental Information Data Centre archive (EIDC): https://catalogue.ceh.ac.uk/documents/5060cc27-0b5b-471b-86eb-71f96da0c80f

In [9]:
from datetime import datetime
import io
import json
import requests
import zipfile

import pandas as pd

In [9]:
BASE_URL = 'https://cosmos-api.ceh.ac.uk'

In [10]:
def get_api_response(url, csv=False):
    """ Helper function to send request to API and get the response 
    
    :param str url: The URL of the API request
    :param bool csv: Whether this is a CSV request. Default False. 
    :return: API response
    """ 
    # Send request and read response
    print(url)
    response = requests.get(url)

    if csv:
        return response
    else:
        # Decode from JSON to Python dictionary
        return json.loads(response.content)

# Data collections
The COSMOS-UK API can provide data for sites at two time steps, known as '**collections**':

- 30 minutes (30M)
- Daily (1D)

Each collection has their own set of variables (known as '**parameters**') that data is available for. You can find these parameters by interrogating the collection metadata with a call to the root collection URL:

## 30 minute (30M) parameters

In [None]:
collection_30M_meta

In [11]:
collection_30M_url = f'{BASE_URL}/collections/30M'
collection_30M_meta = get_api_response(collection_30M_url)

# Get the information about the parameter names from the metadata dictionary
collection_30M_params = collection_30M_meta['parameter_names']

# Here, we're just wrangling the information into a more visually appealing format!
collection_30M_params_df = pd.DataFrame.from_dict(collection_30M_params)
collection_30M_params_df = collection_30M_params_df.T[['label']]
display(collection_30M_params_df)

https://cosmos-api.ceh.ac.uk/collections/30M


Unnamed: 0,label
g1,Soil Heat Flux 1
g2,Soil Heat Flux 2
lwin,Incoming Longwave Radiation
lwout,Outgoing Longwave Radiation
pa,Atmospheric Pressure
precip,Precipitation (pluvio)
precip_raine,Precipitation (rainE)
precip_tipping,Precipitation (tipping bucket)
q,Absolute Humidity
rh,Relative Humidity


## Daily (1D) parameters


In [12]:
collection_1D_url = f'{BASE_URL}/collections/1D'
collection_1D_meta = get_api_response(collection_1D_url)

# Get the information about the parameter names from the metadata dictionary
collection_1D_params = collection_1D_meta['parameter_names']

# Here, we're just wrangling the information into a more visually appealing format!
collection_1D_params_df = pd.DataFrame.from_dict(collection_1D_params)
collection_1D_params_df = collection_1D_params_df.T[['label']]
display(collection_1D_params_df)

https://cosmos-api.ceh.ac.uk/collections/1D


Unnamed: 0,label
albedo,Albedo
cosmos_vwc,COSMOS VWC
cts_mod_corr,COSMOS Neutron Counts (corrected)
d86_75m,D86 75M
g1,Soil Heat Flux 1
g2,Soil Heat Flux 2
lwin,Incoming Longwave Radiation
lwout,Outgoing Longwave Radiation
pa,Atmospheric Pressure
pe,Potential Evaporation


### Data flags parameters
Alongside each data parameter (as listed above), there is a corresponding 'flag' parameter. The flag parameters mirror the structure of their data parameter data part, and hold information about the data value itself. There are currently three values that the flag parameters can be:

**`M`** : Missing - Where data has been lost and not infilled.

**`I`** : Infilled - Where missing data has been filled in directly using an infill method. The only infilling method currently in use is interpolation, used for gaps smaller than 10 values. More information is available in the supporting documentation of the EIDC data holding.

**`E`** : Estimated - Where the value contains some degree of uncertainty. This can be for one of two reasons:

- The value was aggregated with less than a full set of data. For example, a daily mean temperature where some of the sub daily values were missing.
- The value was aggregated or derived with some degree of infilled data. For example, if net radiation was calculated using an infilled short wave radiation value.

A null value in the flag parameter indicates a 'normal' data value.

><span style="color:red">**Important**</span>   
>The default is to always return the flag parameters alongside the data parameters. You can request only the data parameters by using the query parameter `flags=false`. See below for examples.

### Site list
Both the 30M and 1D data collections contain data for the same set of COSMOS-UK sites. You can get information about the sites from the following call:

**Note**: 'End date' is the last available date there is data available for each site. For currently open sites, this will be the date this notebook was created. For closed sites, this will be the date the site was closed.

In [66]:
site_info_url = f'{BASE_URL}/collections/1D/locations'
site_info_response = get_api_response(site_info_url)

site_info = {}
for site in site_info_response['features']:
    site_id = site['id']
    site_name = site['properties']['label']
    coordinates = site['geometry']['coordinates']
    date_range = site['properties']['datetime']
    start_date, end_date = date_range.split('/')
    
    site_info[site_id] = {'site_name': site_name, 
                          'coordinates': coordinates, 
                          'start_date': start_date, 
                          'end_date': end_date}    

site_info_df = pd.DataFrame.from_dict(site_info).T
display(site_info_df)

https://cosmos-api.ceh.ac.uk/collections/1D/locations


Unnamed: 0,site_name,coordinates,start_date,end_date
ALIC1,Alice Holt,"[51.153551, -0.858232]",2015-03-06T13:30:00Z,2023-04-17T00:00:00Z
BALRD,Balruddery,"[56.482297, -3.1114881]",2014-05-15T17:30:00Z,2023-04-17T00:00:00Z
BICKL,Bickley Hall,"[53.02635, -2.7005297]",2015-01-28T17:00:00Z,2023-04-17T00:00:00Z
BUNNY,Bunny Park,"[52.86073, -1.12685]",2015-01-27T00:30:00Z,2023-04-17T00:00:00Z
CARDT,Cardington,"[52.105601, -0.424644]",2015-06-24T10:00:00Z,2023-04-17T00:00:00Z
CGARW,Cwm Garw,"[51.951295, -4.746634]",2016-06-29T11:00:00Z,2023-04-17T00:00:00Z
CHIMN,Chimney Meadows,"[51.708021, -1.4787658]",2013-10-02T13:30:00Z,2023-04-17T00:00:00Z
CHOBH,Chobham Common,"[51.367821, -0.597484]",2015-02-24T14:00:00Z,2023-04-17T00:00:00Z
COCHN,Cochno,"[55.941421, -4.4035431]",2017-08-23T00:00:00Z,2020-11-16T00:00:00Z
COCLP,Cockle Park,"[55.216013, -1.6943736]",2014-11-21T14:30:00Z,2023-04-17T00:00:00Z


---
# Data collection queries

To get hold of the data for COSMOS-UK sites, there are a number of different '*queries*' that you can use against each data collection. They are:

- Location / site ID
- Position
- Cube
- Radius

Data responses can either be in JSON or CSV format. First we will look at the JSON response - **this is the default**. The JSON response is essentially loaded as a nested Python dictionary. We can access the parts of the dictionary we need to build a dataframe of the COSMOS-UK data. The following is a function that we can re-use to wrangle the dictionary into a Pandas dataframe:

In [19]:
def read_json_collection_data(json_response):
    """ Wrangle the response JSON from a COSMOS-API data collection request into a more usable format - in this case a Pandas Dataframe
    
    :param dict json_response: The JSON response dictionary returned from a COSMOS-API data collection request
    :return: Dataframe of data
    :rtype: pd.DataFrame
    """
    # The response is a list of dictionaries, one for each requested site
    
    # You can choose how you want to build your dataframes.  Here, I'm just loading all stations into one big dataframe.  
    # But you could modify this for your own use cases.  For example you might want to build a dictionary of {site_id: dataframe} 
    # to keep site data separate, etc.
    master_df = pd.DataFrame()
    
    for site_data in resp['coverages']:
        # Read the site ID
        site_id = site_data['dct:identifier']
        
        # Read the time stamps of each data point
        time_values = pd.DatetimeIndex(site_data['domain']['axes']['t']['values'])
        
        # Now read the values for each requested parameter at each of the time stamps
        param_values = {param_name: param_data['values'] for param_name, param_data in site_data['ranges'].items()}
    
        # And put everything into a dataframe
        site_df = pd.DataFrame.from_dict(param_values)
        site_df['datetime'] = time_values
        site_df['site_id'] = site_id
        
        site_df = site_df.set_index(['datetime', 'site_id']) 
        master_df = pd.concat([master_df, site_df])
    
    return master_df

## Location / site ID query
`/collections/<collection_id>/locations/<site_id>`

The location query fetches data for a given site ID. You can constrain the query by specifing a given date or date range and one or more parameter names.

The default without additional query parameters is to return data for all parameters (including flags), for the latest single timestep available

In [20]:
site_id = 'CHIMN'
query_url = f'{BASE_URL}/collections/1D/locations/{site_id}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/locations/CHIMN


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2023-04-12 00:00:00+00:00,CHIMN,0.199,,49.5,,846.21582,,13.2629,,-6.7,E,...,41.5,,8.8,,43.0,,234.7,,4.7,


Lets add some query parameters to select a date range. There are 4 ways to specify a date query parameter:

A single date in the format: datetime=<date>
A date range in the format: datetime=<start_date>/<end_date>
A date range with an open end (i.e. fetch all from start date): datetime=<start_date>..
A date range with an open beginning (i.e. fetch all up to end date): datetime=..<end_date>
Note that dates always need to be provided in the format: "*YYYY-mm-ddTHH:MM:SSZ*", for example: "2023-03-31T10:30:00Z"

In [22]:
def format_datetime(dt):
    return dt.strftime("%Y-%m-%dT%H:%M:%SZ")
# An example of using specific start and end dates
start_date = format_datetime(datetime(2023, 3, 1))
end_date = format_datetime(datetime(2023, 3, 6))
query_date_range = f'{start_date}/{end_date}'

query_url = f'{BASE_URL}/collections/1D/locations/{site_id}?datetime={query_date_range}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/locations/CHIMN?datetime=2023-03-01T00:00:00Z/2023-03-06T00:00:00Z


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2023-03-01 00:00:00+00:00,CHIMN,0.182,,43.3,,870.3839,,14.04427,,0.3,E,...,36.5,,5.4,,39.3,,33.2,,3.4,
2023-03-02 00:00:00+00:00,CHIMN,0.209,,42.6,,873.54724,,14.14392,,-1.9,E,...,36.4,,5.6,,39.3,,45.5,,3.4,
2023-03-03 00:00:00+00:00,CHIMN,0.195,,43.0,,871.49535,,14.08666,,-2.3,E,...,36.3,,5.4,,39.2,,34.8,,2.6,
2023-03-04 00:00:00+00:00,CHIMN,0.193,,41.6,,877.72452,,14.29089,,-2.7,E,...,36.4,,5.4,,39.1,,9.6,,2.4,
2023-03-05 00:00:00+00:00,CHIMN,0.194,,44.3,,865.99066,,13.90633,,-3.8,E,...,36.3,,5.2,,39.1,,277.0,,1.7,
2023-03-06 00:00:00+00:00,CHIMN,0.199,,42.6,,873.31374,,14.14392,,-1.0,E,...,36.2,,5.2,,39.0,,253.4,,2.5,


We can constrain the data response further by providing a list of parameter names.

Remember, the default is to always return the flag parameters alongside the data parameters:

In [23]:
# Specify a subset of paramater names
param_names = ['albedo', 'cosmos_vwc', 'pe']

query_url = f'{BASE_URL}/collections/1D/locations/{site_id}?datetime={query_date_range}&parameter-name={",".join(param_names)}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/locations/CHIMN?datetime=2023-03-01T00:00:00Z/2023-03-06T00:00:00Z&parameter-name=albedo,cosmos_vwc,pe


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,pe,pe_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-03-01 00:00:00+00:00,CHIMN,0.182,,43.3,,0.8,E
2023-03-02 00:00:00+00:00,CHIMN,0.209,,42.6,,1.3,E
2023-03-03 00:00:00+00:00,CHIMN,0.195,,43.0,,0.8,E
2023-03-04 00:00:00+00:00,CHIMN,0.193,,41.6,,0.9,E
2023-03-05 00:00:00+00:00,CHIMN,0.194,,44.3,,0.7,E
2023-03-06 00:00:00+00:00,CHIMN,0.199,,42.6,,0.9,E


By specifying 'flags=false', we can return only the data parameters:

In [24]:
# Add flags=false to the query:
query_url = f'{BASE_URL}/collections/1D/locations/{site_id}?datetime={query_date_range}&parameter-name={",".join(param_names)}&flags=false'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/locations/CHIMN?datetime=2023-03-01T00:00:00Z/2023-03-06T00:00:00Z&parameter-name=albedo,cosmos_vwc,pe&flags=false


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,cosmos_vwc,pe
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-03-01 00:00:00+00:00,CHIMN,0.182,43.3,0.8
2023-03-02 00:00:00+00:00,CHIMN,0.209,42.6,1.3
2023-03-03 00:00:00+00:00,CHIMN,0.195,43.0,0.8
2023-03-04 00:00:00+00:00,CHIMN,0.193,41.6,0.9
2023-03-05 00:00:00+00:00,CHIMN,0.194,44.3,0.7
2023-03-06 00:00:00+00:00,CHIMN,0.199,42.6,0.9


---
## Position query
`/collections/<collection_id>/position?<coord(s)>`

The position query fetches data for site(s) nearest to the given coordinates. You can specifiy coordinates in 2 ways:

A single POINT coordinate to return just one site
A set of MULTIPOINT coordinates - this is currently the recommended way for getting data for multiple named sites, the method for doing so is explored below.
The default coordinate reference system is WGS84 (i.e. specifying latitude and longitude coordinates). Alternatively, you can specify crs=OSGB36 as an additional query parameter to use British National Grid coordinates.

As with the location query, you can constrain the query further by specifing a given date or date range and one or more parameter names.

### Single point query

In [25]:
x, y = (-1.12685, 52.86073)
coords = f'POINT({x} {y})'

query_url = f'{BASE_URL}/collections/1D/position?coords={coords}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/position?coords=POINT(-1.12685 52.86073)


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2023-04-12 00:00:00+00:00,BUNNY,0.202,,27.7,,1809.57369,,15.11006,,-6.4,E,...,26.6,,9.0,,23.0,,155.4,,4.1,


### Multiple point query

In [26]:
x1, y1 = (-1.12685, 52.86073)
x2, y2 = (-0.8264188, 52.610159)
coords = f'MULTIPOINT(({x1} {y1}),({x2} {y2}))'

query_url = f'{BASE_URL}/collections/1D/position?coords={coords}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/position?coords=MULTIPOINT((-1.12685 52.86073),(-0.8264188 52.610159))


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2023-04-12 00:00:00+00:00,LODTN,0.182,,65.2,,1444.10042,,14.35398,,-12.3,E,...,55.3,,8.0,,55.7,,229.1,,5.7,
2023-04-12 00:00:00+00:00,BUNNY,0.202,,27.7,,1809.57369,,15.11006,,-6.4,E,...,26.6,,9.0,,23.0,,155.4,,4.1,


In [None]:
df

### Position query for specific sites
In the examples above, we had to know the coordinates of the sites we are interested in. However, we could write a function to get those coordinates for us, for a set of sites we are interested in. This would use the site information that we extracted earlier on in this notebook.

In [27]:
def get_site_coords_query_string(site_ids):
    """ Get coordinates query string for given site ids for use in position queries on the COSMOS-UK API
    
    :param list site_ids: A list of site IDs to get coordinates for
    :return: Query string for given site coordinates
    :rtype: str
    """    
    site_coords = site_info_df.loc[site_ids]['coordinates'].to_list()
    
    if len(site_ids) == 1:
        y, x = site_coords[0]
        site_coords_str = f'POINT({x} {y})'
    else:
        points = [f'({x} {y})' for y, x in site_coords]
        site_coords_str = f'MULTIPOINT({",".join(points)})'
    
    return site_coords_str
site_ids = ['CHIMN', 'MOORH', 'WIMPL']
coords = get_site_coords_query_string(site_ids)

start_date = format_datetime(datetime(2023, 3, 1))
end_date = format_datetime(datetime(2023, 3, 3))
query_date_range = f'{start_date}/{end_date}'

query_url = f'{BASE_URL}/collections/1D/position?coords={coords}&datetime={query_date_range}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/position?coords=MULTIPOINT((-1.4787658 51.708021),(-2.4678 54.659417),(-0.044411 52.132078))&datetime=2023-03-01T00:00:00Z/2023-03-03T00:00:00Z


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2023-03-01 00:00:00+00:00,WIMPL,0.181,,37.4,,1570.37808,,16.57967,,2.2,E,...,38.2,,6.0,,38.4,,27.2,,2.2,
2023-03-02 00:00:00+00:00,WIMPL,0.196,,40.9,,1541.40998,,15.9557,,-0.3,E,...,38.1,,6.0,,38.5,,38.5,,2.6,
2023-03-03 00:00:00+00:00,WIMPL,0.179,,37.9,,1565.63872,,16.48513,,-1.7,E,...,38.1,,6.0,,38.3,,20.0,,2.2,
2023-03-01 00:00:00+00:00,MOORH,0.159,,70.7,,1306.83835,,24.80186,,-3.1,E,...,71.8,,4.1,,58.3,,41.8,,9.7,
2023-03-02 00:00:00+00:00,MOORH,0.166,,77.1,,1294.0727,,24.01692,,-2.3,E,...,73.8,,4.0,,59.6,,60.0,,5.6,
2023-03-03 00:00:00+00:00,MOORH,0.175,,86.3,,1278.52477,,23.06919,,-3.1,E,...,73.1,,4.0,,58.8,,37.5,,3.5,
2023-03-01 00:00:00+00:00,CHIMN,0.182,,43.3,,870.3839,,14.04427,,0.3,E,...,36.5,,5.4,,39.3,,33.2,,3.4,
2023-03-02 00:00:00+00:00,CHIMN,0.209,,42.6,,873.54724,,14.14392,,-1.9,E,...,36.4,,5.6,,39.3,,45.5,,3.4,
2023-03-03 00:00:00+00:00,CHIMN,0.195,,43.0,,871.49535,,14.08666,,-2.3,E,...,36.3,,5.4,,39.2,,34.8,,2.6,


And just to prove the same works for the 30M collection:

In [29]:
query_url = f'{BASE_URL}/collections/30M/position?coords={coords}&datetime={query_date_range}'
resp = get_api_response(query_url)

df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/30M/position?coords=MULTIPOINT((-1.4787658 51.708021),(-2.4678 54.659417),(-0.044411 52.132078))&datetime=2023-03-01T00:00:00Z/2023-03-03T00:00:00Z


Unnamed: 0_level_0,Unnamed: 1_level_0,g1,g1_flag,g2,g2_flag,lwin,lwin_flag,lwout,lwout_flag,pa,pa_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2023-03-01 00:00:00+00:00,WIMPL,-6.21257,,-6.83610,,325.2,,335.9,,1030.191,,...,38.08,,5.74,,38.48,,15.72754,,2.040,
2023-03-01 00:30:00+00:00,WIMPL,-5.85000,I,-6.41000,I,324.9,,335.7,,1030.349,,...,38.15,,5.72,,38.43,,15.74277,,2.135,
2023-03-01 01:00:00+00:00,WIMPL,-5.65000,I,-6.29000,I,326.0,,335.9,,1030.305,,...,38.08,,5.69,,38.31,,28.80631,,2.359,
2023-03-01 01:30:00+00:00,WIMPL,-5.71789,,-6.42189,,324.1,,335.4,,1030.105,,...,38.03,,5.68,,38.31,,33.97706,,2.170,
2023-03-01 02:00:00+00:00,WIMPL,-6.08754,,-6.79954,,323.2,,334.7,,1029.960,,...,38.15,,5.67,,38.37,,49.00548,,1.688,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-03-02 22:00:00+00:00,CHIMN,-7.88240,,-12.06142,,301.2,,332.0,,1019.836,,...,36.27,,5.62,,39.28,,49.02252,,4.855,
2023-03-02 22:30:00+00:00,CHIMN,-7.38722,,-11.22949,,285.3,,328.7,,1019.797,,...,36.27,,5.60,,39.33,,54.77137,,3.575,
2023-03-02 23:00:00+00:00,CHIMN,-7.32857,,-11.32586,,288.2,,328.1,,1019.747,,...,36.27,,5.55,,39.05,,45.94635,,3.389,
2023-03-02 23:30:00+00:00,CHIMN,-7.36281,,-11.42386,,296.7,,328.7,,1019.693,,...,36.27,,5.53,,39.21,,40.50465,,3.066,


---
## Cube and radius queries
Two other methods are available for requesting multiple sites, however these are potentially less useful than specifying the position coordinates (as shown above). They have been included in the API for completeness and conformance with the OGC Environmental Data Retrieval API standards.

#### Cube
`/collections/<collection_id>/cube?<bbox>`

The cube query fetches data for sites within a given bounding box. The bounding box is given in the format: bbox=<min x>,<min y>,<max x>,<max y>

For example: `/collections/1D/cube?bbox=-0.8,50,0,53`

#### Radius
`/collections/<collection_id>/radius?<coord(s)>`

The radius query fetches data for sites within a circle of a given radius, centred at a given point (or points). The radius query parameter can be omitted, in which case this performs as described in the Position query above.

For example: `/collections/30M/radius?coords=POINT(-1, 52)&within=10&within-units=km`

This fetches data for all sites within a 10km radius of the given point.

As with all the other queries, you can constrain the cube and radius queries further by specifing a given date or date range and one or more parameter names.

---

## CSV responses
All of the data collection queries described above can return CSV file(s) instead of JSON if required. Simply add f=csv as an additional query parameter to the query URL.

A CSV file per site is created and returned within a zip archive.

## Create a CSV query URL for multiple sites

In [30]:
site_ids = ['CHIMN', 'MOORH', 'WIMPL']
coords = get_site_coords_query_string(site_ids)

start_date = format_datetime(datetime(2023, 3, 1))
end_date = format_datetime(datetime(2023, 3, 3))
query_date_range = f'{start_date}/{end_date}'

query_url = f'{BASE_URL}/collections/1D/position?coords={coords}&datetime={query_date_range}&f=csv'
resp = get_api_response(query_url, csv=True)

zip_file = zipfile.ZipFile(io.BytesIO(resp.content))

https://cosmos-api.ceh.ac.uk/collections/1D/position?coords=MULTIPOINT((-1.4787658 51.708021),(-2.4678 54.659417),(-0.044411 52.132078))&datetime=2023-03-01T00:00:00Z/2023-03-03T00:00:00Z&f=csv


# List files within the returned zip archive

In [31]:
for csv_name in zip_file.namelist():
    print(csv_name)

COSMOS_UK_WIMPL_1D_202303010000_202303030000.csv
COSMOS_UK_MOORH_1D_202303010000_202303030000.csv
COSMOS_UK_CHIMN_1D_202303010000_202303030000.csv


 
The CSV files are formatted with some metadata at the top.  You can read the CSV files into a Pandas dataframe with code such as:

In [32]:
csv_file = zip_file.open(csv_name)

df = pd.read_csv(csv_file, index_col=0, skiprows=[0, 1, 3, 4])
display(df)

Unnamed: 0_level_0,lwout,stp_tsoil2,q,stp_tsoil5,stp_tsoil10,stp_tsoil20,precip_tipping,swin,stp_tsoil50,rn,...,wd_flag,lwin_flag,stp_tsoil50_flag,tdt2_tsoil_flag,q_flag,lwout_flag,stp_tsoil20_flag,swout_flag,ta_max_flag,ta_min_flag
parameter-id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-03-01T00:00:00Z,29.4,5.2,5.9,5.2,5.2,5.3,1.0,5.5,6.0,3.3,...,,,,,,,,,,
2023-03-02T00:00:00Z,29.2,4.9,4.7,5.1,5.3,5.5,0.0,11.2,6.0,3.8,...,,,,,,,,,,
2023-03-03T00:00:00Z,29.2,4.8,4.7,5.0,5.1,5.4,0.0,3.3,6.1,1.2,...,,,,,,,,,,


## Request limits

Given the large number of stations, variables, and timesteps, there is potential for data requests to the API to be very large, particularly for the 30 minute time step collection. Therefore, it is necessary for a limit to be set on the size of the data request so that the data can be returned in a timely manner without the API server or your local connection being overwhelmed.

The COSMOS-UK API has a 'credit' system to help you manage your data requests in a transparent way. The current credit limit is set to **15,000,000**. Credits are calculated by:

`(Number of sites) x (number of parameters) x (number of timesteps)`

><span style="color:red">**Important**</span>  
>Remember that the default is to always return the flag parameters alongside the data parameters. This will effectively double the number of parameters in the request - so bear that in mind when working about request credit totals.

Example representative credit totals:

- 1 site, all parameters (including flags), 1 year of 30 minute data = 1 * 50 * 17,520 = 876,000 credits
- 1 site, all parameters (including flags), 10 years of 30 minute data = 1 * 50 * 175,200 = 8,760,000 credits
- All sites, all parameters (including flags), 10 years of daily data = 51 * 68 * 3,650 = 12,658,200 credits

A request that exceeds the credit limit will be returned a 413 error code with a description of how many credits you used in the failed request.

In [70]:
# Request 30 minute data for all sites, for all parameters, for all timesteps, no timestep means laterest day only
query_url = f'{BASE_URL}/collections/1D/cube?bbox=-9,49.75,2,61&datetime={query_date_range}'
resp = get_api_response(query_url)
#print(resp)
df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/cube?bbox=-9,49.75,2,61&datetime=2015-03-01T00:00:00Z/2023-03-03T00:00:00Z


Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2015-03-01 00:00:00+00:00,ALIC1,,,,,,,,,,,...,,,,,,,,,,
2015-03-02 00:00:00+00:00,ALIC1,,,,,,,,,,,...,,,,,,,,,,
2015-03-03 00:00:00+00:00,ALIC1,,,,,,,,,,,...,,,,,,,,,,
2015-03-04 00:00:00+00:00,ALIC1,,,,,,,,,,,...,,,,,,,,,,
2015-03-05 00:00:00+00:00,ALIC1,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-02-27 00:00:00+00:00,WYTH1,,,,M,,M,,M,,M,...,,M,,M,,M,,M,,M
2023-02-28 00:00:00+00:00,WYTH1,,,,M,,M,,M,,M,...,,M,,M,,M,,M,,M
2023-03-01 00:00:00+00:00,WYTH1,,,,M,,M,,M,,M,...,,M,,M,,M,,M,,M
2023-03-02 00:00:00+00:00,WYTH1,,,,M,,M,,M,,M,...,,M,,M,,M,,M,,M


### request daily data for all sites (takes a few seconds)

In [41]:
# Request all sites daily data
all_sites = site_info_df.index

In [56]:
coords = get_site_coords_query_string(all_sites)

start_date = format_datetime(datetime(2015, 3, 1))
end_date = format_datetime(datetime(2023, 3, 3))
query_date_range = f'{start_date}/{end_date}'

query_url = f'{BASE_URL}/collections/1D/position?coords={coords}&datetime={query_date_range}'
resp = get_api_response(query_url)
#print(resp)
df = read_json_collection_data(resp)
display(df)

https://cosmos-api.ceh.ac.uk/collections/1D/position?coords=MULTIPOINT((-0.858232 51.153551),(-3.1114881 56.482297),(-2.7005297 53.02635),(-1.12685 52.86073),(-0.424644 52.105601),(-4.746634 51.951295),(-1.4787658 51.708021),(-0.597484 51.367821),(-4.4035431 55.941421),(-1.6943736 55.216013),(-3.583386 55.043057),(-3.207115 55.867392),(0.9930645 52.094647),(0.7847112 52.383178),(0.5107284 52.617773),(-7.291954 54.298468),(-2.384689 54.023711),(-2.5621547 56.914403),(-6.0045994 54.838087),(0.320276 51.228582),(-3.828995 55.810254),(-2.025488 55.216677),(-4.01255 53.225198),(-6.068526 54.446959),(-2.6621054 52.02088),(-0.959477 54.110665),(-2.079616 51.20277),(-5.19999 50.03266),(-0.8264188 52.610159),(0.18887 50.79372),(-2.4678 54.659417),(-1.563081 52.199407),(1.0342313 52.548146),(-3.905963 50.773479),(-3.762572 52.453337),(-1.681482 51.120071),(0.42104 52.44577),(0.4291348 51.26287),(-0.5259074 53.261647),(-0.378304 51.813787),(-1.481903 51.530244),(-2.229989 55.479876),(-1.31886 53.

Unnamed: 0_level_0,Unnamed: 1_level_0,albedo,albedo_flag,cosmos_vwc,cosmos_vwc_flag,cts_mod_corr,cts_mod_corr_flag,d86_75m,d86_75m_flag,g1,g1_flag,...,tdt1_vwc,tdt1_vwc_flag,tdt2_tsoil,tdt2_tsoil_flag,tdt2_vwc,tdt2_vwc_flag,wd,wd_flag,ws,ws_flag
datetime,site_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2015-03-01 00:00:00+00:00,GISBN,0.171,,61.5,,1198.22255,,22.43854,,-5.3,E,...,77.4,,4.8,,77.0,,281.3,,2.5,
2015-03-02 00:00:00+00:00,GISBN,0.623,,58.8,,1205.99898,,22.83917,,-9.5,E,...,77.6,,3.9,,77.1,,283.6,,2.4,
2015-03-03 00:00:00+00:00,GISBN,0.526,,58.8,,1186.62496,,21.8202,,-8.0,E,...,77.7,,3.4,,77.2,,277.9,,2.3,
2015-03-04 00:00:00+00:00,GISBN,0.198,,64.0,,1191.89593,,22.09314,,-0.4,E,...,77.5,,3.7,,77.2,,292.8,,1.9,
2015-03-05 00:00:00+00:00,GISBN,0.171,,58.8,,1205.75719,,22.83917,,2.6,E,...,77.4,,4.2,,77.1,,275.3,,1.7,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-02-27 00:00:00+00:00,PLYNL,0.211,,50.5,,1317.13144,,27.96073,,-7.0,E,...,72.1,,4.0,,64.1,,59.8,,4.8,
2023-02-28 00:00:00+00:00,PLYNL,0.208,,48.6,,1323.69612,,28.39297,,-3.8,E,...,72.0,,4.2,,63.8,,40.2,,5.5,
2023-03-01 00:00:00+00:00,PLYNL,0.219,,49.8,,1319.32417,,28.11693,,-4.8,E,...,71.8,,4.2,,63.5,,50.0,,6.4,
2023-03-02 00:00:00+00:00,PLYNL,0.229,,53.1,,1308.20379,,27.40959,,-7.2,E,...,71.7,,4.3,,63.0,,62.7,,4.0,


In [55]:
df.reset_index().site_id.unique()

array(['GISBN', 'CHOBH', 'REDHL', 'HARWD', 'ALIC1', 'GLENW', 'CARDT',
       'BICKL', 'WYTH1', 'RISEH', 'STGHT', 'ELMST', 'RDMER', 'LIZRD',
       'HENFS', 'MOORH', 'ROTHD', 'FINCH', 'EUSTN', 'HLACY', 'EASTB',
       'COCLP', 'PORTN', 'SPENF', 'WADDN', 'CRICH', 'HYBRY', 'SHEEP',
       'HILLB', 'MOREM', 'HADLW', 'LULLN', 'SYDLG', 'HARTW', 'CGARW',
       'NWYKE', 'WIMPL', 'MORLY', 'STIPS', 'CHIMN', 'BALRD', 'HOLLN',
       'BUNNY', 'WRTTL', 'LODTN', 'GLENS', 'TADHM', 'COCHN', 'SOURH',
       'FIVET', 'PLYNL'], dtype=object)

In [62]:
df.columns

Index(['albedo', 'albedo_flag', 'cosmos_vwc', 'cosmos_vwc_flag',
       'cts_mod_corr', 'cts_mod_corr_flag', 'd86_75m', 'd86_75m_flag', 'g1',
       'g1_flag', 'g2', 'g2_flag', 'lwin', 'lwin_flag', 'lwout', 'lwout_flag',
       'pa', 'pa_flag', 'pe', 'pe_flag', 'precip', 'precip_flag',
       'precip_raine', 'precip_raine_flag', 'precip_tipping',
       'precip_tipping_flag', 'q', 'q_flag', 'rh', 'rh_flag', 'rn', 'rn_flag',
       'snow', 'snow_flag', 'stp_tsoil10', 'stp_tsoil10_flag', 'stp_tsoil2',
       'stp_tsoil20', 'stp_tsoil20_flag', 'stp_tsoil2_flag', 'stp_tsoil5',
       'stp_tsoil50', 'stp_tsoil50_flag', 'stp_tsoil5_flag', 'swe_crns',
       'swe_crns_flag', 'swin', 'swin_flag', 'swout', 'swout_flag', 'ta',
       'ta_flag', 'ta_max', 'ta_max_flag', 'ta_min', 'ta_min_flag',
       'tdt1_tsoil', 'tdt1_tsoil_flag', 'tdt1_vwc', 'tdt1_vwc_flag',
       'tdt2_tsoil', 'tdt2_tsoil_flag', 'tdt2_vwc', 'tdt2_vwc_flag', 'wd',
       'wd_flag', 'ws', 'ws_flag'],
      dtype='object')

In [58]:
print('hello')

hello
