# Tutorial 1 - Downloading and exploring SPARC datasets

In this tutorial, you will get these ideas on how to use `sparc_me` to interact with `SPARC datasets` from `Pennsive API`. To download SDS dataset (such as human whole-body computational scaffold with embedded organs), and using UBERON ontology term to query data.

Before you run the examples, you need to make sure the dependencies you've already installed.

```bash
pip install -r requirements
pip install decouple
```
- the python version is v3.9.0

Also will get some idea on connect protocol through the doi which is stored in SPARC dataset. 

Query UBERON ontology term (this code stays in example - does not need to be moved inside the sparc-me module).
a. Hard code uberon_code = "UBERON:0000916" in example
b. call an existing python library to access info for that UBERON term e.g. info = getTermInfo(UBERON_CODE)

## Download an existing curated SDS dataset
### Access pennsieve metadata

In [1]:
from sparc_me import Dataset_Api
api_tools = Dataset_Api()

In [2]:
'''
Get all datasets from Pennsive API.
You will get a list of all SPARC datasets.
'''
datasets = api_tools.get_all_datasets_latest_version_pensieve()

'''
Get specific dataset with id
'''

dataset = api_tools.get_dataset_latest_version_pensieve(156)

'''
Get dataset latest version number
:parameter: datasetId number|string
'''
latest_version_num = api_tools.get_dataset_latest_version_number(156)




### Retrieve and store protocol from protocol.io in json format

In [3]:
'''
Get a dataset entire information, and store the information locally with json format.
This function will query the protocol doi from pennsieve dataset, then connect to protocol.
:parameter: datasetId number|string
:parameter: savepath string - provide a path to store the data
'''
api_tools.get_protocolsio_text(273,"./datasets")



### Download files from SDS dataset (folders, xlsx files etc)
#### Download dataset_description.xlsx

In [4]:
'''
Get an xlsx file from pennsieve API, then store in given path.
:parameter: datasetId number|string
:parameter: target file path from SPARC datasets
:parameter: save_path string - provide a path to store the data
'''
api_tools.get_xlsx_csv_file_pennsieve(156, "files/dataset_description.xlsx", "./datasets")



#### Download humanWholeBody_annotations.csv

In [5]:
api_tools.get_xlsx_csv_file_pennsieve(156, "files/docs/humanWholeBody_annotations.csv", "./datasets")



## Query UBERON ontology term

- To do this, we use `ontquery` package, it provides very nice API for us to query term in SciCrunch.
- To get the API-key please see here: https://github.com/tgbugs/ontquery#scicrunch-api-key
- To config API-key to local environment variables:
    - create a .env in project root folder
    - pip install decouple
     ```python
     from decouple import config
     SCICRUNCH_API_KEY = config('SCICRUNCH_API_KEY')
     ```

In [6]:
api_tools.get_UBERONs_From_Dataset(156, "files/docs/humanWholeBody_annotations.csv")



0     UBERON:0000916
1     UBERON:0000468
4     UBERON:0001103
5     UBERON:0000033
6     UBERON:0000974
8     UBERON:0001003
9     UBERON:0002240
10       ILX:0742178
11              None
12              None
13              None
14              None
15       ILX:0774405
16       ILX:0778144
17      ILX:0778146 
18       ILX:0778145
19       ILX:0778147
20    UBERON:0002098
21       ILX:0778116
22       ILX:0778117
23       ILX:0778112
24       ILX:0778113
25       ILX:0778124
26       ILX:0778118
27       ILX:0778120
28       ILX:0778122
29       ILX:0778121
30       ILX:0778126
31       ILX:0778127
32              None
33              None
34              None
35              None
Name: Term ID, dtype: object