# Tutorial 2: Create datasets and description
This tutorial shows how to create datasets and descpritions in SPARC Data Structure (SDS) format.



## 1. Create an empty SDS dataset or load an existing SDS dataset
* Setup sparc-me 

    Sparc-me is a python tool to explore, enhance, and expand SPARC datasets and their descriptions in accordance with DAIR principles.

In [None]:
from sparc_me import Dataset

dataset = Dataset()

* Create an empty dataset from the SDS template. 

    In this tutorial, it is highly recommended to use SDS template version 2.0.0. Remember to include the path of where you want your dataset to be saved. 

In [None]:
save_dir = "./tmp/template/"
dataset.load_from_template(version="2.0.0")

* Load an existing SDS dataset (not recommended in this tutorial)

In [None]:
existing_dir = "./your/dataset/path/here"
# dataset.load_dataset(dataset_path=existing_dir)

## 2. Get metadata files and clear metadata if necessary.

This tutorial focus more on the dataset_description metadata file. For other metadata files, replace the category field with the name of the metadata file you want to use e.g., from `category="dataset_description"` to `category="code_parameters"`.

* List the categories and elements of metadata files

In [None]:
categories = dataset.list_categories(version="2.0.0")
elements = dataset.list_elements(category="dataset_description", version="2.0.0")

* Get metadata file

    Again, dataset_description metadata is used as an exmaple here and feel free to replace the category with the name of the metadata file you want to use.

In [None]:
dataset_description = dataset.get_metadata(category="dataset_description")

* (Optional) clear your metadata values before you edit them.
    * Clear all metadata values in the dataset_description file
    * Clear the entire row (e.g., `field_name='Contributor role'`) of metadata values

In [None]:
dataset_description.clear_values()

In [None]:
dataset_description.clear_values(field_name='Contributor role')

## 3. Add/update metadata values and save them

* This function allows you to add or update metadata values

    `add_values( *values: Any, row_name: str = '', header: str = '', append: bool = True)`

    * `*values` allows single or multiple string values for metadata values you would like to add or update.
    * `row_name` is the row heading in the `dataset_description` and `code_description` metadata files or elements in other metadata files.  Only these two metadata files have both row and coloum headings while other files only have column heading. Thereforre, it is recommended to use this parameter and add values by rows in `dataset_description` and `code_description` metadata files.
    * `header` takes the column heading in metadata file. The default value of header in `dataset_description` and `code_description` is `Value` column, feel free to specify yours. In other metadata files such as code_parameters, manifest, performances, resources, samples, subjects, and submission, it is recommended to use this parameter and add values by column.
    * `append` takes a boolean value. The default value is True, which appends an element to the end of the list. If the append is set to False, the values will be overwritten/replaced with the new values you specify.

In [None]:
dataset_description.add_values("2.0.0", row_name='metadataversion')
dataset_description.add_values("experimental", row_name='type')
dataset_description.add_values("Duke breast cancer MRI preprocessing", row_name='Title')
dataset_description.add_values("""Preprocessing the breast cancer MRI images and saving in Nifti format""",
                                      row_name='subtitle')
dataset_description.add_values("Breast cancer", "image processing", row_name='Keywords')
dataset_description.add_values("""Preprocessing the breast cancer MRI images and saving in Nifti format""",
                                      row_name="Study purpose")
dataset_description.add_values("derived from Duke Breast Cancer MRI dataset",
                                      row_name='Study data Collection')
dataset_description.add_values("NA", row_name='Study primary conclusion')
#dataset_description.add_values("NA", row_name='Study primary conclusion', append=True)
dataset_description.add_values("breast", row_name='Study organ system')
dataset_description.add_values("image processing", row_name='Study approach')
dataset_description.add_values("""dicom2nifti""", row_name='Study technique')
dataset_description.add_values("Lin, Chinchien", "Gao, Linkun", row_name='contributorname')
#dataset_description.add_values("Prasad", "Jiali", row_name='contributorNAME', append=True)
#dataset_description.add_values(*["bob", "db"], row_name="contributor name", append=True)
dataset_description.add_values(
    "https://orcid.org/0000-0001-8170-199X",
    "https://orcid.org/0000-0001-8171-199X",
    "https://orcid.org/0000-0001-8172-199X",
    "https://orcid.org/0000-0001-8173-199X",
    "https://orcid.org/0000-0001-8174-199X",
    "https://orcid.org/0000-0001-8176-199X",
    row_name='Contributor orcid')

dataset_description.add_values(*["University of Auckland"] * 6, row_name='Contributor affiliation')
dataset_description.add_values(*["developer", "developer", "Researcher", "Researcher", "tester", "tester"],
                                      row_name="contributor role")
dataset_description.add_values("source", row_name='Identifier description')
dataset_description.add_values("WasDerivedFrom", row_name='Relation type')
dataset_description.add_values("DTP-UUID", row_name='Identifier')
dataset_description.add_values("12L digital twin UUID", row_name='Identifier type')
dataset_description.add_values("1", row_name='Number of subjects')
dataset_description.add_values("1", row_name='Number of samples')

## 4. Add your datasets, can either be folder or files.
add_data(folder/file)