# Data API

## Server
There is a CLIMADA data file server on https://data.iac.ethz.ch that can be accessed via a REST API on https://climada.ethz.ch.
For REST API details, see the [documentation](https:/climada.ethz.ch/rest/docs).

## Client

For programmatical access to the CLIMADA data API there is a specific Python REST call wrapper available: `climada.util.client.Client`.

In [1]:
Client?

Object `Client` not found.


In [2]:
from climada.util.api_client import Client, DataTypeInfo
client = Client()
client.host, client.chunk_size

('https://climada.ethz.ch', 8192)

The url to the API server and the chunk size for the file download can be configured in 'climada.conf'. Just replace the corresponding default values:

```json
    "data_api": {
        "host": "https://climada.ethz.ch",
        "chunk_size": 8192,
        "cache_db": "{local_data.system}/.downloads.db"
    }
```

The other configuration value affecting the data_api client, `cache_db`, is the path to an SQLite database file, which is keeping track of the files that are successfully downloaded from the api server. Before the Client attempts to download any file from the server, it checks whether the file has been downloaded before and if so, whether the previously downloaded file still looks good (i.e., size and time stamp are as expected). If all of this is the case, the file is simply read from disk without submitting another request.

The main methods of the client are `get_dataset`, `get_datasets` and `download_dataset`. The first two return meta-data about one ore more datasets and the latter downloads all files associated to a dataset.

The signature of `get_dataset` and `get_datasets` are almost identical. Both take `data_type`, `name`, `version`, `properties` and `status` as optional arguments.
The difference is in the default value of the `status` ("active" vs. `None`) and the return type: `get_datasets` returns a list of `DatasetInfo` objects, which may me empty, whereas `get_dataset` returns a single `DatasetInfo` object - or raises an Exception if the arguments provided don't allow a positive identification of the dataset.

In [3]:
client.get_datasets?

[1;31mSignature:[0m
[0mclient[0m[1;33m.[0m[0mget_datasets[0m[1;33m([0m[1;33m
[0m    [0mdata_type[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mname[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mversion[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mproperties[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mstatus[0m[1;33m=[0m[1;34m'active'[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Find all datasets matching the given parameters.

Parameters
----------
data_type : str, optional
    data_type of the dataset, e.g., 'litpop' or 'draught'
name : str, optional
    the name of the dataset
version : str, optional
    the version of the dataset
properties : dict, optional
    search parameters for dataset properties, by default None
status : str, optional
    valid values are 'preliminary', 'active', 'expired', and 'test_dataset',
    by default 'active'

Returns
-------
l

## Search
If the identity of a dataset is not knwon from the beginning it may be convenient to collect a meaningful preselection in a data frame.

Let's suppose we have to write a test that involves reading a .netcdf file with tracks and we want to know whether there is a suitable test file availble.\
First we get the meta data of all 'test_datasets':

In [18]:
import pandas as pd
test_datasets = client.get_datasets(status='test_dataset', data_type='tracks')

ds_df = pd.DataFrame(test_datasets)
ds_df.head()

Unnamed: 0,uuid,data_type,name,version,status,properties,files,doi,description,license,activation_date,expiration_date
0,f8cc4d68-1cd2-4639-8c17-3d8e76b3fe89,"{'data_type': 'tracks', 'data_type_group': 'ha...",1988234N13299,v1,test_dataset,{},[{'uuid': 'f8cc4d68-1cd2-4639-8c17-3d8e76b3fe8...,,,Attribution 4.0 International (CC BY 4.0),,
1,1352b6ea-1f75-4cea-9003-45c5d16a208f,"{'data_type': 'tracks', 'data_type_group': 'ha...",ibtracs_global_intp-None_1992230N11325,v1,test_dataset,{},[{'uuid': '1352b6ea-1f75-4cea-9003-45c5d16a208...,,,Attribution 4.0 International (CC BY 4.0),,
2,625874c2-8323-472c-baa0-3f5ce2d61b8a,"{'data_type': 'tracks', 'data_type_group': 'ha...",tracks_antimeridian,v1,test_dataset,{},[{'uuid': '625874c2-8323-472c-baa0-3f5ce2d61b8...,,,Attribution 4.0 International (CC BY 4.0),,


Then we build a data frame of files to look for the 'netcdf' format:

In [19]:
test_files = [fileinfo for ds in test_datasets for fileinfo in ds.files]
fl_df = pd.DataFrame(test_files)
fl_df.head()

Unnamed: 0,uuid,url,file_name,file_format,file_size,check_sum
0,f8cc4d68-1cd2-4639-8c17-3d8e76b3fe89,https://data.iac.ethz.ch/climada/f8cc4d68-1cd2...,1988234N13299.nc,netcdf,16278,md5:448d6c94a3d691682baf6ab38d431fb1
1,1352b6ea-1f75-4cea-9003-45c5d16a208f,https://data.iac.ethz.ch/climada/1352b6ea-1f75...,ibtracs_global_intp-None_1992230N11325.csv,csv,5161,md5:67db574ce2e5056157ab847a5abf2f3e
2,625874c2-8323-472c-baa0-3f5ce2d61b8a,https://data.iac.ethz.ch/climada/625874c2-8323...,2018079S09162.nc,netcdf,23487,md5:8d01027f4e257df7e6bb96c9ceb99fc1
3,625874c2-8323-472c-baa0-3f5ce2d61b8a,https://data.iac.ethz.ch/climada/625874c2-8323...,1980052S16155.nc,netcdf,18756,md5:de553b826f54e24d4a4353002dcd8579


Now we are ready to filter the meta data for file format and data type:

In [20]:
ds_df[ds_df.uuid.isin(fl_df[fl_df.file_format=='netcdf'].uuid)]

Unnamed: 0,uuid,data_type,name,version,status,properties,files,doi,description,license,activation_date,expiration_date
0,f8cc4d68-1cd2-4639-8c17-3d8e76b3fe89,"{'data_type': 'tracks', 'data_type_group': 'ha...",1988234N13299,v1,test_dataset,{},[{'uuid': 'f8cc4d68-1cd2-4639-8c17-3d8e76b3fe8...,,,Attribution 4.0 International (CC BY 4.0),,
2,625874c2-8323-472c-baa0-3f5ce2d61b8a,"{'data_type': 'tracks', 'data_type_group': 'ha...",tracks_antimeridian,v1,test_dataset,{},[{'uuid': '625874c2-8323-472c-baa0-3f5ce2d61b8...,,,Attribution 4.0 International (CC BY 4.0),,


### Search for data types
In the example above we were looking for datasets of type 'tracks'. When we wanted to know what data types are there in the first place, we can have a look at them like this:

In [24]:
import pandas as pd
data_types = client.get_data_types()

pd.DataFrame(data_types).tail()

Unnamed: 0,data_type,data_type_group,description
24,entity,entity,
25,low_flow,hazard,
26,crop_yield,hazard,
27,open_street_map,exposures,
28,centroids,hazard,


## Get

In any case we can identify the meta data by data_type, name and version, since these must be unique in the data base of the api server.

In [33]:
client.get_dataset(name='tracks_antimeridian', data_type='tracks', version='v1')

DatasetInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', data_type=DataTypeInfo(data_type='tracks', data_type_group='hazard', description=''), name='tracks_antimeridian', version='v1', status='test_dataset', properties={}, files=[FileInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', url='https://data.iac.ethz.ch/climada/625874c2-8323-472c-baa0-3f5ce2d61b8a/2018079S09162.nc', file_name='2018079S09162.nc', file_format='netcdf', file_size=23487, check_sum='md5:8d01027f4e257df7e6bb96c9ceb99fc1'), FileInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', url='https://data.iac.ethz.ch/climada/625874c2-8323-472c-baa0-3f5ce2d61b8a/1980052S16155.nc', file_name='1980052S16155.nc', file_format='netcdf', file_size=18756, check_sum='md5:de553b826f54e24d4a4353002dcd8579')], doi=None, description=None, license='Attribution 4.0 International (CC BY 4.0)', activation_date=None, expiration_date=None)

But often, just the name is enough information:

In [36]:
client.get_dataset(name='tracks_antimeridian')

DatasetInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', data_type=DataTypeInfo(data_type='tracks', data_type_group='hazard', description=''), name='tracks_antimeridian', version='v1', status='test_dataset', properties={}, files=[FileInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', url='https://data.iac.ethz.ch/climada/625874c2-8323-472c-baa0-3f5ce2d61b8a/2018079S09162.nc', file_name='2018079S09162.nc', file_format='netcdf', file_size=23487, check_sum='md5:8d01027f4e257df7e6bb96c9ceb99fc1'), FileInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', url='https://data.iac.ethz.ch/climada/625874c2-8323-472c-baa0-3f5ce2d61b8a/1980052S16155.nc', file_name='1980052S16155.nc', file_format='netcdf', file_size=18756, check_sum='md5:de553b826f54e24d4a4353002dcd8579')], doi=None, description=None, license='Attribution 4.0 International (CC BY 4.0)', activation_date=None, expiration_date=None)

However, ambiguous results lead to an error:

In [25]:
client.get_dataset(data_type='tracks', version='v1', status='test_dataset')

AmbiguousResult: there are several datasets meeting the requirements: [DatasetInfo(uuid='f8cc4d68-1cd2-4639-8c17-3d8e76b3fe89', data_type=DataTypeInfo(data_type='tracks', data_type_group='hazard', description=''), name='1988234N13299', version='v1', status='test_dataset', properties={}, files=[FileInfo(uuid='f8cc4d68-1cd2-4639-8c17-3d8e76b3fe89', url='https://data.iac.ethz.ch/climada/f8cc4d68-1cd2-4639-8c17-3d8e76b3fe89/1988234N13299.nc', file_name='1988234N13299.nc', file_format='netcdf', file_size=16278, check_sum='md5:448d6c94a3d691682baf6ab38d431fb1')], doi=None, description=None, license='Attribution 4.0 International (CC BY 4.0)', activation_date=None, expiration_date=None), DatasetInfo(uuid='1352b6ea-1f75-4cea-9003-45c5d16a208f', data_type=DataTypeInfo(data_type='tracks', data_type_group='hazard', description=''), name='ibtracs_global_intp-None_1992230N11325', version='v1', status='test_dataset', properties={}, files=[FileInfo(uuid='1352b6ea-1f75-4cea-9003-45c5d16a208f', url='https://data.iac.ethz.ch/climada/1352b6ea-1f75-4cea-9003-45c5d16a208f/ibtracs_global_intp-None_1992230N11325.csv', file_name='ibtracs_global_intp-None_1992230N11325.csv', file_format='csv', file_size=5161, check_sum='md5:67db574ce2e5056157ab847a5abf2f3e')], doi=None, description=None, license='Attribution 4.0 International (CC BY 4.0)', activation_date=None, expiration_date=None), DatasetInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', data_type=DataTypeInfo(data_type='tracks', data_type_group='hazard', description=''), name='tracks_antimeridian', version='v1', status='test_dataset', properties={}, files=[FileInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', url='https://data.iac.ethz.ch/climada/625874c2-8323-472c-baa0-3f5ce2d61b8a/2018079S09162.nc', file_name='2018079S09162.nc', file_format='netcdf', file_size=23487, check_sum='md5:8d01027f4e257df7e6bb96c9ceb99fc1'), FileInfo(uuid='625874c2-8323-472c-baa0-3f5ce2d61b8a', url='https://data.iac.ethz.ch/climada/625874c2-8323-472c-baa0-3f5ce2d61b8a/1980052S16155.nc', file_name='1980052S16155.nc', file_format='netcdf', file_size=18756, check_sum='md5:de553b826f54e24d4a4353002dcd8579')], doi=None, description=None, license='Attribution 4.0 International (CC BY 4.0)', activation_date=None, expiration_date=None)]

And no results as well:

In [26]:
client.get_dataset(name='tracks_antimeridian', version='v2')

NoResult: there is no dataset meeting the requirements