# Tutorial 2: Finding and downloading datasets from the DigitalTWINS platform using the digitaltwins-api

## Introduction
The 12 LABOURS DigitalTWINS Platform’s harmonised database is organised into **Programs** and **Projects**. For example, Exemplar Project 1 (**EP1**) is a project within the 12 LABOURS (**12L**) Program. Users can only access and download datasets from these projects once they have been granted access. See Tutorial 1 for information on how to request access and connect to the platform. This tutorial shows how to find and download existing datasets from the DigitalTWINS Platform's portal or its Python API.

## Definitions
- API - Application Programming Interface used to access the features or data of an application or service.

## Learning outcomes
In this tutorial, you will learn how to:
- find existing datasets stored in the platform's portal.
- access the platform using the `digitaltwins` Python API and find existing datasets.
- download datasets in SDS format using the `digitaltwins` Python API.

## Finding datasets
Each dataset stored in the platform has a unique identifier (ID) e.g. `12L-EP1-dataset-1-version-1`.

### Finding datasets using the platform's portal
Dataset IDs are included in each dataset listed on the data catalogue page of the 12 LABOURS DigitalTWINS platform's portal (see screenshots below). Please see Tutorial 1 for instructions for how to connect to an instance of the platform and open its portal in a local web browser.

### Finding datasets in the platform using the `digitaltwins` Python API
Using the `digitaltwins` Python API requires a `config.ini` file that specifies the location and API access keys for your instance of the DigitalTWINS Platform. Please see [Tutorial 1](https://github.com/ABI-CTT-Group/digitaltwins-api/blob/main/tutorials/tutorial_1_getting_started.md) for information on how to access the `config.ini` file for your instance of the platform.

In [11]:
import pathlib
# Change the path below to point to the location of your config.ini file as described in Tutorial 1.
config_file = pathlib.Path(r"L:\DigitalTWINS\resources\latest\configs\configs.ini")

We will use the `digitaltwins` Python API's `Querier` class to list or search for existing datasets in the platform.

In [12]:
import digitaltwins as dts

querier = dts.Querier(config_file)

**Please let one of the workshop organisers know if you encounter an error with querying the platform.**

#### Listing program and projects in the platform

A list of existing programs in the platform can be retrieved as follows.

In [13]:
programs = querier.get_all_programs()
print(programs)

A list of existing projects within a program can be retrieved as follows. The optional `program` argument can be used to list projects in a specific program.

In [17]:
projects = querier.get_projects_by_program(program="12L")
print(projects)

In [18]:
datasets = querier.get_datasets()
for dataset in datasets:
    print(dataset.get_id())

#### Using the platform's API to search for datasets
The `search_datasets` method of the `Querier` class allows for searching of datasets, and returns a python list of `Dataset` objects that match the search criteria . 

Currently, only searching text that matches exactly with the title of existing datasets in the platform is currently supported.

In [19]:
dataset_id = '12L-EP1-dataset-1-version-1'
dataset = querier.get_dataset(dataset_id)
print(dataset.get_id())

## Downloading datasets
Datasets are stored in SDS format within the platforms harmonised database. We can use the DigitalTWINS Python API's `Downloader` class to select and download a dataset in SDS format. Once downloaded, the `sparc-me` Python module can be used explore the metadata in a dataset (see Tutorial 3).

By default, datasets are downloaded to the current working directory, however, the `save_dir` optional argument can be specified to select a different download destination path.

In [23]:
downloader = dts.Downloader(config_file)
downloader.download(dataset_id, save_dir='./logs')

**Please let one of the workshop organisers know if you encounter an error with downloading datasets.**

## Feedback
Once you have completed this tutorial, please complete [this survey](https://docs.google.com/forms/d/e/1FAIpQLSe-EsVz6ahz2FXFy906AZh68i50jRYnt3hQe-loc-1DaFWoFQ/viewform?usp=sf_link), which will allow us to improve this and future tutorials.

## Next steps
The [next tutorial](https://github.com/ABI-CTT-Group/digitaltwins-api/blob/main/tutorials/tutorial_3_loading_and_exploring_sds_datasets.ipynb) will show how to load and explore SDS datasets using the sparc-me Python tool.