# Tutorial 1 - Downloading and exploring SPARC datasets

In this tutorial, you will get these ideas on how to use `sparc_me` to interact with `SPARC datasets` from `Pennsive API`. To download SDS dataset (such as human whole-body computational scaffold with embedded organs), and using UBERON ontology term to query data.

Before you run the examples, you need to make sure these dependencies you've already installed.

```bash
pip install -r requirements
```
- the python version is v3.8.6

## Download an existing curated SDS dataset
### Access pennsieve metadata

In [None]:
from sparc_me import Dataset_Api
api_tools = Dataset_Api()

In [None]:
'''
Get all datasets from Pennsive API.
You will get a list of all SPARC datasets.
'''
datasets = api_tools.get_all_datasets_latest_version_pensieve()

'''
Get specific dataset with id
'''

dataset = api_tools.get_dataset_latest_version_pensieve(156)

'''
Get dataset latest version number
:parameter: datasetId number|string
'''
latest_version_num = api_tools.get_dataset_latest_version_number(156)


### Retrieve and store protocol from protocol.io in json format

In [None]:
# We can through this function to find out protocol doi
# You need to provide a dataset id
doi = api_tools.get_dataset_protocolsio_link(273)

In [None]:
'''
Get a dataset entire information, and store the information locally with json format.
This function will query the protocol doi from pennsieve dataset, then connect to protocol.
:parameter: datasetId number|string
:parameter: savepath string - provide a path to store the data
'''
api_tools.get_protocolsio_text(273,"./datasets")

### Download files from SDS dataset (folders, xlsx files etc)
#### Download dataset_description.xlsx

In [None]:
'''
Get an xlsx file from pennsieve API, then store in given path.
:parameter: datasetId number|string
:parameter: target file path from SPARC datasets
:parameter: save_path string - provide a path to store the data
'''
api_tools.get_xlsx_csv_file_pennsieve(156, "files/dataset_description.xlsx", "./datasets")

#### Download humanWholeBody_annotations.csv

In [None]:
api_tools.get_xlsx_csv_file_pennsieve(156, "files/docs/humanWholeBody_annotations.csv", "./datasets")

## Query UBERON ontology term

- To do this, we use `ontquery` package, it provides very nice API for us to query term in SciCrunch.
- To get the API-key please see here: https://github.com/tgbugs/ontquery#scicrunch-api-key
- To config API-key to local environment variables:
    - create a .env in project root folder
    - pip install decouple
     ```python
     from decouple import config
     SCICRUNCH_API_KEY = config('SCICRUNCH_API_KEY')
     ```

In [None]:
uberons = api_tools.get_UBERONs_From_Dataset(156, "files/docs/humanWholeBody_annotations.csv")
# get first uberons
uberons[0]

- Query UBERON code from SciCrunch (pending update from ontquery [issue #34](https://github.com/tgbugs/ontquery/issues/34))

In [None]:
# from ontquery import OntQuery, SciGraphRemote, OntTerm, OntCuries
# from ontquery.plugins.namespaces import CURIE_MAP
# curies = OntCuries(CURIE_MAP)
# query = OntQuery(SciGraphRemote())
# OntTerm.query = query
# query('UBERON:0000916')

- also can query UBERON term from SPARC dataset

In [None]:
# query(uberons[0])