# Tutorial 5: Query existing datasets via Digital Twin Platform API
## Introduction
The 12 LABOURS Digital Twin Platform’s database is organised into Program and Projects, e.g., “12 Labours” would be a program and “EP1” would be a project within this program. Only users who have been granted access to projects can access dataset metadata and download datasets. See Tutorial 1 for information on how to request access to data and for connecting to the Digital Twin Platform. 

This tutorial describes how to access existing datasets from a specific instance of the Digital Twin Platform using its API.

## Learning outcomes

* How to access metadata from an existing dataset in the 12 LABOURS digital Twin Platform’s API.
* How to search for existing datasets using the 12 LABOURS digital Twin Platform’s API.
* How to download existing datasets in SDS format using the 12 LABOURS digital Twin Platform’s API.

## Setup

We will use the MetadataQuerier class of the Digital Twin Platform API to access metadata of the datasets in the platform.

As described in Tutorial 1, we can place the information required  for connecting to the 12 LABOURS Digital Twin platform in a ‘config.ini’ file. This file can be loaded using Python’s in-built ‘configparser’ and ‘Path’ classes.



In [1]:
import sys
sys.path.append('../')

from digitaltwins import MetadataQuerier
from pathlib import Path
import configparser

ModuleNotFoundError: No module named 'gen3'

In [None]:
# Load platform configuration.
config = configparser.ConfigParser()
config.read(Path(r"config.ini"))
querier = MetadataQuerier(config)

## Accessing program and projects within the platform
A list of existing programs in the platform can be retrieved using the `get_all_programs` method.


In [None]:
# List programs on the platform.
programs = querier.get_all_programs()
print("Programs: " + str(programs))

The projects within a specific program can be retrieved using the `get_projects_by_program` method.


In [None]:
# List projects within a program.
projects = querier.get_projects_by_program(program=config["gen3"].get("program"))
print("projects: " + str(projects))


## Interacting with Datasets
Each dataset has a unique identifier (UID). Datasets can be retrieved as a ‘Dataset’ python object. The ‘Dataset’ object provides multiple methods to help with interacting with a datasets e.g. ‘get_dataset_description’, which will provide dataset information as a data dictionary, or ‘download_dataset’, which will download a dataset in a specific format such as in SDS format. The Digital Twin Platform’s API documentation lists all the methods available for interacting with datasets (add link to API documentation). [TO DO]

### Listing all datasets in the platform
The ‘get_datasets’ method allows  ‘Dataset’ objects for all datasets in the platform to be retrieved as a python list.

In [None]:
datasets = querier.get_datasets()
for dataset in datasets:
    print(dataset.uid)

### Finding specific dataset identifiers
The first step in accessing a specific dataset is to find the dataset's UID. There are multiple approaches to find the UID for a dataset of interest:
1. From 12 LABOURS Digital Twin Portal - the Dataset IDs are included in each dataset entry in the platform’s data catalogue.  
2. By selecting a specific dataset UID after listing all datasets, or 
3. By searching for existing datasets and retrivining the dataset UID of interest


### Searching for datasets
The ‘search_datasets’ method allows for searching of datasets, and returns a python list of ‘Dataset’ objects that match the search criteria . Currently, only searching text that matches exactly with the title of existing datasets in the platform is supported.


In [None]:
datasets = querier.search_datasets(query=‘breast’)

### Accessing a specific dataset by its identifier
If the UID of a dataset has already been identified, the ‘get_dataset’ method can be used to obtain its ‘Dataset’ object.

In [None]:
dataset = querier.get_dataset(103)

### Accessing a dataset’s description
A ‘Dataset_description’ object  can be retrieved using the ‘get_dataset_description’ method.

In [None]:
dataset_description = dataset.get_dataset_description()

### Listing subjects in a dataset
A ‘Subjects’ object  can be retrieved using the ‘get_subjects’ method.

In [None]:
subjects = dataset.get_subjects()