# Tutorial 2: Create SDS from existing data set

The SPARC Dataset Structure (SDS) is a standardised method for organising files and metadata. In this tutorial existing data is loaded into a SDS file structure and the metadata is explored and edited. 

# Creating SDS folder structure 

In [None]:
# Initialise a dataset object
import sys
[sys.path.append(i) for i in ['.', '..']]

from sparc_me import Dataset, Sample, Subject

dataset = Dataset()

# Specify the SDS schema version to be created
version = "2.0.0"
dataset.create_empty_dataset(version)

# Specify location to generate SDS structure
save_dir= "./tmp/template/"

#Creates SDS folder structure

dataset.set_path(save_dir)
dataset.save(save_dir)

## Transfering data into SDS structure

Now that there is a destination for the data to be transdered, it is time to transfer your existing data. 

In [None]:
# Add a copy of the data from the specified path into the SDS folder structure

subjects = []
samples = []

sample1 = Sample()
#Set the folder path to the sample
sample1.add_path("./test_data/bids_data/sub-01/sequence1/")
samples.append(sample1)

# create a subject obj
subject1 = Subject()
# add a sample obj list to subject
subject1.add_samples(samples)
subjects.append(subject1)

dataset.add_subjects(subjects)

# Editing the metadata
Now we can explore some of the meta data that was automatically generated as we were transfering files
In this example, we now wish to add age information for the subjects.

In [None]:
# Modify the subject and sample metadata
subject_sds_id = "sub-1"
subject = dataset.get_subject(subject_sds_id)
subject.set_value(
    element='age',
    value=30)

sample_sds_id = "sam-1"
sample = subject.get_sample(sample_sds_id)
sample.set_value(
    element='sampleexperimental group',
    value='experimental')
sample.set_value(
    element='sample type',
    value='DCE-MRI Contrast Image {0}'.format(sample_sds_id))
sample.set_value(
    element='sample anatomical location',
    value='breast')

# Save changes
dataset.save(save_dir)


If the data meta data is incomplete for a given category, as below, then it is useful to be able to extract the rows that contain values.

In [None]:
header = "sex"

subjects_metadata = dataset.get_metadata(metadata_file="subjects")
subjects_metadata.set_values(element="sex", values=["female","male"])

dataset.save(save_dir)


# Filtering through the metadata to identify subjects
We can use the metadata stored in the dataset to select subjects based on specific criteria 

In [None]:
#select out the metadata for female subjects
index = subjects_metadata.data['sex'] == 'female'
subjects_metadata.data[['subject id','age','sex']][index]