# Tutorial 2: Exploring and downloading datasets from the DigitalTWINS platform using the digitaltwins-api

## Introduction
The 12 LABOURS DigitalTWINS Platform’s harmonised database is organised into **Programs** and **Projects**. For example, Exemplar Project 1 (**EP1**) is a project within the 12 LABOURS (**12L**) Program. Users can only access and download datasets from these projects once they have been granted access. See Tutorial 1 for information on how to request access and connect to the platform. This tutorial shows how to explore and download existing datasets from the DigitalTWINS Platform using its Python API.

## Definitions
- API - Application Programming Interface used to access the features or data of an application or service.

## Learning outcomes
In this tutorial, you will learn how to:
- access the platform using its Python API.
- find datasets stored in the platform.
- download datasets in SDS format.

## Accessing the platform using its Python API
First, we will use Python's built in `configparser` module to load a `config.ini` file that specifies the location and API access keys for your instance of the DigitalTWINS Platform.

Please see [Tutorial 1](https://github.com/ABI-CTT-Group/digitaltwins-api/blob/main/tutorials/tutorial_1_getting_started.md) for information on how to access the `config.ini` file for your instance of the platform.

In [1]:
import pathlib
# Change the path below to point to the location of your config.ini file as described in Tutorial 1.
#config_file = pathlib.Path(r"./path/to/config.ini")

config_file = pathlib.Path(r"X:\DigitalTWINS\resources\latest\configs\configs.ini")

We will use the DigitalTWINS Python API's `Querier` class to list or search for existing datasets in the platform.

In [2]:
import digitaltwins as dts

querier = dts.Querier(config_file)

### Listing program and  projects in the platform

A list of existing programs in the platform can be retrieved as follows.

In [3]:
programs = querier.get_all_programs()

ConnectionError: HTTP connection error: Please make sure you have access to the remote server. then try again!

A list of existing projects within a program can be retrieved as follows. The optional `program` argument can be used to list projects in a specific program.

In [None]:
projects = querier.get_projects(program=programs[0])

In [None]:
datasets = querier.get_datasets(program=all, project=all)
for dataset in datasets:
    print(dataset.get_id())

### Using the platform's API to search for datsets
The `search_datasets` method of the `Querier` class allows for searching of datasets, and returns a python list of `Dataset` objects that match the search criteria . 

Currently, only searching text that matches exactly with the title of existing datasets in the platform is supported.

In [None]:
dataset_id = 'dataset-1-version-1'
datasets = querier.search_datasets(query=dataset_id)

## Downloading datasets
Datasets are stored in SDS format within the platforms harmonised database. We can use the DigitalTWINS Python API's `Downloader` class to select and download a dataset in SDS format. Once downloaded, the `sparc-me` Python module can be used explore the metadata in a dataset (see Tutorial 3).

By default, datsets are downloaded to the current working directory, however, the `save_dir` optional argument can be specified to select a different download destination path.

In [None]:
downloader = dts.Downloader(config_file)
downloader.download(dataset_id, save_dir='./')

Some datasets can be very large, so an option is provided to  only download the metadata files in a dataset, or the entire dataset.

In [None]:
downloader.download(dataset_id, save_dir='./', metadata_only=True)

## Feedback
Once you have completed this tutorial, please complete [this survey](https://docs.google.com/forms/d/e/1FAIpQLSe-EsVz6ahz2FXFy906AZh68i50jRYnt3hQe-loc-1DaFWoFQ/viewform?usp=sf_link), which will allow us to improve this and future tutorials.

## Next steps
The [next tutorial](https://github.com/ABI-CTT-Group/digitaltwins-api/blob/main/tutorials/tutorial_3_loading_and_exploring_sds_datasets.ipynb) will show how to load and explore SDS datasets using the sparc-me Python tool.