Jupyter Notebooks can sometime hard to work with. Some magic methods will be really handy when things don't seem to work out. The following cell reloads all changed modules.

In [26]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Table of Contents
1. [Installation](#installation)
2. [Motivation](#motivation)
3. [Importing a dataset](#importing)
4. [Attributes](#attributes)
5. [Accessing Samples](#accessing)
6. [Iteration over samples](#iteration)
7. [Subset selection](#subset)
8. [Saving/Loading a dataset](#saving)
9. [Combining datasets](#merging)

## Installation<a name="installation"></a>

In [2]:
# !pip install MRdataset

In [3]:
from MRdataset import import_dataset
# check if install worked
# check version
# upgrade command


## Motivation<a name="motivation"></a>
Large scale neuroimaging datasets play an essential role in brain-behavior relationships. While neuroimaging studies have shown promising results, reproducibility can be affected by differences in acquisition parameters at the scanner level. The motivation behind creating MRdataset is to provide a unified interface to access image acquisition data, across various formats such as XNAT, BIDS, LONI etc.

Having a unified interface to access image acquisition data is important because it allows users to easily and consistently access and manipulate the data, regardless of the specific format or source of the data. This can save time and reduce the potential for errors, as users do not need to worry about dealing with the nuances of different data formats or sources. In addition, a unified interface can make it easier to integrate image acquisition data with other systems and processes, allowing for more efficient and effective analysis and use of the data.

## Importing a dataset<a name="importing"></a>
To provide concrete examples, let's jump to an example right away. We will use an example dicom dataset to provide an example. Note that the outputs will be quite different based on your data. We will complete this tutorial using a dicom dataset. However, the libraries also support BIDS datasets. Example code for BIDS dataset would be discussed later in this tutorial

Let's get started!

A dataset can be imported from disk, simply using the function `import_dataset`. Observe that it includes the functionality to add a `name` to the dataset, and also `style` is specified which can be one of either `dicom` or `bids`. As of now, we have an empty dataset.

In [15]:
DATA_ROOT = '/home/sinhah/github/MRdataset/examples/example_dicom_data/'
dicom_dataset = import_dataset(data_root=DATA_ROOT,
                               style='dicom',
                               name='dummy_study_experiment')

If a dataset is empty, it means that there is no data stored in it. This can be a problem if the dataset is supposed to contain data that is needed for a particular analysis or task. In such cases, the absence of data can prevent the analysis or task from being performed, or it can lead to incorrect or incomplete results.

We can check that `dicom_dataset` is empty, by printing it.

In [5]:
print(dicom_dataset)

DicomDataset dummy_study_experiment is empty. Use .walk()


It is often beneficial to have a separate method for reading data because it allows the user to explicitly control when the data is read. This can be useful in cases where the data is very large or complex, as it allows the user to manage the amount of data that is being processed at any given time. Additionally, having a separate method for reading data can make the overall design of the code more modular and flexible, as it allows the user to easily swap out different data sources or change the way that the data is read without having to modify the rest of the code.

After creating the `dicom_dataset` object, the user must call `.walk()` method to read the data from disk.

In [20]:
dicom_dataset.walk()

2022-12-15 10:31:43,518 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/16756
2022-12-15 10:31:43,518 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/16756
2022-12-15 10:31:43,518 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/16756
2022-12-15 10:31:43,521 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/10995
2022-12-15 10:31:43,521 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/10995
2022-12-15 10:31:43,521 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/10995
2022-12-15 10:31:43,526 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel_ND/19598

2022-12-15 10:31:44,720 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel/18941
2022-12-15 10:31:44,720 - INFO - Localizer: Skipping /home/sinhah/github/MRdataset/examples/example_dicom_data/localizer___64_channel/18941


Using the `print()` method to see the contents of a dataset can be beneficial because it allows the user to quickly and easily view the data, without having to write additional code to extract and display the data. 

In [21]:
print(dicom_dataset)

DicomDataset dummy_study_experiment with 12 Modality


If we print `dicom_dataset` again, it prints concise information about the dataset, i.e. the number of modalities inside dataset (12). It also mentions the type of dataset, i.e. DicomDataset. 

Using `print_tree()` can be especially useful when working with large or complex datasets, as it can be difficult to manually inspect the data and identify any patterns or issues. It can help the user to verify that the data has been correctly read and processed, and to diagnose any problems that may have occurred during the reading or processing of the data. 

In [22]:
dicom_dataset.print_tree()

dummy_study_experiment
+- ACR_Sag_locator
|  +- 17251
|  |  +- 14
|  |     +- 1.3.12.2.1107.5.2.43.167092.2021082615203170093953502.0.0.0_e20
|  +- 14733
|  |  +- 1
|  |     +- 1.3.12.2.1107.5.2.43.167092.202108261422131955188731.0.0.0_e20
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510010820219246816.0.0.0_e20
|  +- 18941
|     +- 16
|        +- 1.3.12.2.1107.5.2.43.167092.2021082615340486712854114.0.0.0_e20
+- ACR_Axial_T1
|  +- 16756
|  |  +- 2
|  |  |  +- 1.3.12.2.1107.5.2.43.67078.2022050510041395171846876.0.0.0_e20
|  |  +- 3
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  +- 10995
|  |  +- 3
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  +- 19598
|  |  +- 3
|  |  |  +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  |  +- 2
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510041395171846876.0.0.0_e20
|  +- 17251
|  |  +- 3
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  +-

We will dicuss the complete description of this method, `print_tree()` later in this tutorial. Let's complete our discussion on importing a dataset.

You may have noted that when we used `.walk()`, there are log messages informing that many files have been skipped. The method skips any file which it identifies as a localizer. In general, these are not required, but you still want to include them in your object, a flag `include_phantom` can be used to do so.

In [23]:
DATA_ROOT = '/home/sinhah/github/MRdataset/examples/example_dicom_data/'
dicom_dataset_w_phantom = import_dataset(data_root=DATA_ROOT,
                               style='dicom',
                               name='dummy_study_experiment',
                               include_phantom=True)

As we used earlier,  if we invoke the method `.walk()`, we don't see the output messages that any file has been skipped

In [24]:
dicom_dataset_w_phantom.walk()



However, we see some warning messages, saying that some localizer files have some issues in their parameters. This is expected as localizers are not valid MRI volumes. Localizer are short scans the allows the scanner to setup correctly. Note that these checks/warnings are still under active development and one may observe these messages even if files from your project has perfectly valid MRI scan volumes. 

## Structure

Going further, lets dig deeper what are the elements present in our dataset. It is essential to describe the elements of a dataset because it helps to provide context and information about the data that is contained in the dataset. This can be useful for other users or researchers who are working with the MRdataset, as it allows them to understand the structure and how it should be interpreted. In addition, describing the elements of a dataset can help to ensure that the data structure is being used correctly and consistently, and it can also facilitate the integration of the dataset with other data sources or systems. 

The library has hierarchichal structure as displayed below:

![alt text](mrdataset-structure.png "Title")

The above figure shows a simple schematic to depict the structure of MRdataset object. 

Different MRI modalities, such as T1-weighted, T2-weighted, and diffusion-weighted imaging, can provide different types of information about the structure and composition of tissues in the body. Additionally, MRI scans are often performed on multiple subjects, such as healthy individuals and patients with a specific condition, in order to compare and contrast the differences in their anatomy and physiology. This can help researchers to better understand the underlying mechanisms of a particular condition or disease, and to develop more effective treatments.

Similarly, the MRdataset object is a hierarchical data structure that is made up of different elements of a neuroimaging experiment, such as modalities, subjects, sessions and runs. Each element is represented as a node in a tree, and the edges connect the nodes to show hierarchical relationship between data elements. 

So, the **dataset** is at the top of the tree, and the various modalities beneath it, like T1-weighted, T2-weighted and diffusion-weighted are branching out of the dataset. The term **modality** refers to the specific technique that is used to acquire the imaging data. Each modality contains several subjects, which are part of the experiments. Observe that different modalities may typically have common subjects, in order to compare and contrast the differences in their brain anatomy and function. 

Each **subject** may have one or more sessions for a modality. The term **session** refers to a specific imaging session that is performed on a given subject. Typically, there would be multiple sessions in order to obtain multiple sets of data for a given subject. Often, a subject return to MR Research center several time during a span of 1-2 years, which helps in tracking longitudnal changes in the brain.

Finally, **run** refers to a specific set if imaging data that is acquired during a given session. Often, a single session will involve multiple runs to obtain a comprehensive acquisition. For example, an fMRI experiment involves multiple runs, each of which acquires information about particular brain region or might even have a different behavioral task.

We can observe this hierarchical structure in our dataset `dicom_dataset` object.

In [7]:
print(f"{dicom_dataset.name} dataset contains following modalities:")
for modality in dicom_dataset.modalities:
    print('\t',modality)

dummy_study_experiment dataset contains following modalities:
	 Modality ACR_Sag_locator with 3 Subject
	 Modality ACR_Axial_T1 with 10 Subject
	 Modality DTI_LR with 10 Subject
	 Modality me_fMRI with 5 Subject
	 Modality rsfMRI_RL with 9 Subject
	 Modality DTI_LR_repeat with 10 Subject
	 Modality 2D_GRE-MT with 9 Subject
	 Modality rsfMRI_LR with 10 Subject
	 Modality 3D_T1-weighted with 9 Subject
	 Modality DTI_RL with 10 Subject
	 Modality 3D_T2_FLAIR with 9 Subject
	 Modality me_FieldMap_GRE with 9 Subject


And we can browse through each one of modalities, to see that they contain several subjects.

In [27]:
print(f"{dicom_dataset.name} dataset contains following modalities:")
for modality in dicom_dataset.modalities:
    print('\t', modality)
    for subject in modality.subjects:
        print('\t\t', subject)

dummy_study_experiment dataset contains following modalities:
	 Modality ACR_Sag_locator with 3 Subject
		 Subject 17251 with 1 Session
		 Subject 14733 with 1 Session
		 Subject 18941 with 1 Session
	 Modality ACR_Axial_T1 with 10 Subject
		 Subject 16756 with 2 Session
		 Subject 10995 with 1 Session
		 Subject 19598 with 2 Session
		 Subject 17251 with 1 Session
		 Subject 14733 with 2 Session
		 Subject 18489 with 2 Session
		 Subject 15132 with 1 Session
		 Subject 15079 with 1 Session
		 Subject 18710 with 2 Session
		 Subject 18941 with 2 Session
	 Modality DTI_LR with 10 Subject
		 Subject 16756 with 1 Session
		 Subject 10995 with 1 Session
		 Subject 19598 with 1 Session
		 Subject 17251 with 1 Session
		 Subject 14733 with 1 Session
		 Subject 18489 with 1 Session
		 Subject 15132 with 1 Session
		 Subject 15079 with 1 Session
		 Subject 18710 with 1 Session
		 Subject 18941 with 1 Session
	 Modality me_fMRI with 5 Subject
		 Subject 10995 with 1 Session
		 Subject 19598 wit

We can also use `print_tree()` directly rather than manually iterating through a for-loop. This can be especially useful when working with large or complex datasets, as it can be difficult to manually inspect the data and identify any patterns or issues.

In [28]:
dicom_dataset.print_tree()

dummy_study_experiment
+- ACR_Sag_locator
|  +- 17251
|  |  +- 14
|  |     +- 1.3.12.2.1107.5.2.43.167092.2021082615203170093953502.0.0.0_e20
|  +- 14733
|  |  +- 1
|  |     +- 1.3.12.2.1107.5.2.43.167092.202108261422131955188731.0.0.0_e20
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510010820219246816.0.0.0_e20
|  +- 18941
|     +- 16
|        +- 1.3.12.2.1107.5.2.43.167092.2021082615340486712854114.0.0.0_e20
+- ACR_Axial_T1
|  +- 16756
|  |  +- 2
|  |  |  +- 1.3.12.2.1107.5.2.43.67078.2022050510041395171846876.0.0.0_e20
|  |  +- 3
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  +- 10995
|  |  +- 3
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  +- 19598
|  |  +- 3
|  |  |  +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  |  +- 2
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510041395171846876.0.0.0_e20
|  +- 17251
|  |  +- 3
|  |     +- 1.3.12.2.1107.5.2.43.67078.2022050510062560316546934.0.0.0_e20
|  +-

## Saving and Loading a dataset<a name="saving"></a>

Saving and loading a dataset is important because it allows you to store and retrieve your data for later use. This is especially useful when you have a large dataset that takes a long time to process or generate, or when you want to share your dataset with others.

By saving your dataset, you can avoid having to recreate it each time you want to use it, which can save you a significant amount of time and resources. Additionally, storing your dataset in a structured and organized way can make it easier to analyze and manipulate later on.

Overall, the ability to save and load a dataset is a valuable tool that can help you work more efficiently and effectively with your data.

We use `save_mr_dataset` and `load_mr_dataset` to save and load MRdataset objects, respectively. Let's see an example


In [27]:
from MRdataset import save_mr_dataset, load_mr_dataset
save_mr_dataset(filepath='/home/sinhah/github/MRdataset/examples/example_dicom.mrds.pkl', 
                mrds_obj=dicom_dataset)

In [28]:
ret_dicom_dataset = load_mr_dataset(filepath='/home/sinhah/github/MRdataset/examples/example_dicom.mrds.pkl',
                                   style='dicom')

  ret_dicom_dataset = load_mr_dataset(filepath='/home/sinhah/github/MRdataset/examples/example_dicom.mrds.pkl',


NotImplementedError: Expected dicomdataset to be a subclass of MRdataset.base.Project in MRdataset.dicom_dataset.py.

## Accessing Samples<a name="accessing"></a>


In [31]:
from pathlib import Path
filepath=Path('/home/sinhah/github/MRdataset/examples/example_dicom.mrds.pkl')
filepath.is_file()

True

In [33]:
from MRdataset.base import find_dataset_using_style
dataset_class = find_dataset_using_style('dicom')

NotImplementedError: Expected dicomdataset to be a subclass of MRdataset.base.Project in MRdataset.dicom_dataset.py.


## Iteration over Samples<a name="iteration"></a>


## Subset Selection<a name="subset"></a>



## Merging Datasets<a name="merging"></a>
