# Batch extraction Tutorial − Radiomics batch extraction using MEDimage package

@Author : [MEDomics consortium](https://github.com/medomics/)

@Email : medomics.info@gmail.com


**STATEMENT**:
This file is part of <https://github.com/MEDomics/MEDomicsLab/>,
a package providing PYTHON programming tools for radiomics analysis.
--> Copyright (C) MEDomicsLab consortium.

## Introduction

Running this notebook requires running the [DataManager-tutorial notebook](https://colab.research.google.com/github/MahdiAll99/MEDimage/blob/dev/notebooks/tutorial/DataManager-Tutorial.ipynb). We also recommend that you take a look at [MEDscan-Tutorial notebook](https://colab.research.google.com/github/MahdiAll99/MEDimage/blob/dev/notebooks/tutorial/MEDscan-Tutorial.ipynb) as well.

This notebook is a tutorial of radiomics batch extraction using the *MEDimage* package and specifically the ``BatchExtractor`` class. For this task, the ``BatchExtractor`` class is the main object used to order scans and prepare batches and run processing and features extraction. The class extracts all type of family features and organizes the results in json files and csv tables.


In a nutshell, This tutorial will help you learn everything you need about batch extraction in the *MEDimage package*. We also advise you to read the [class documentation](https://medimage.readthedocs.io/en/documentation/biomarkers.html#module-MEDimage.biomarkers.BatchExtractor) before starting to test it.

The ``BatchExtractor`` class is capable of running all the steps in the following flowchart, starting from the MEDscan box. So the class takes care of all the steps of the processing and the extraction:

<img src="images/MEDimageFlowchart.png" width=600 height=400 />


### DICOM data

In this tutorial we will use data from STS study (soft-tissue-sorcoma) processed by McGill institute, containing 204 scans with different scan types (PTscan, CTscan...). We assume that you have already processed these scans in the [DataManager-tutorial notebook]().

Imports

In [1]:
import os
import sys

MODULE_DIR = os.path.dirname(os.path.abspath('../MEDimage/'))
sys.path.append(os.path.dirname(MODULE_DIR))

import MEDimage

## BatchExtractor initialization

Initializing the ``BatchExtractor`` class is easy. It can be initialized using the following parameters:
 - ``path_read``: path to data (``MEDscan`` objects)
 - ``path_csv``: path to csv file containing list of the scans that will be processed by the ``BatchExtractor``
 - ``path_params``: path processing & extraction parameters JSON file.
 - ``path_save``: path to where the results are gonna be saved.
 - ``n_batch``:  Numerical value specifying the number of batch to use in the parallel computations (use 0 for serial computation).
 
We recommend that you organize you folder as follows:

<img src="images/BatchExtractionFolderStructure.png"/>

The *data* and the *csv* folder contains data for scans from McGill study and a CSV file of the scans that can be respectively downloaded [here](). As for the settings file, a model can be found in the repository.

We will start by initializing the parameters needed.

In [2]:
from pathlib import Path


path_read = Path(os.getcwd()) / "data" / "npy"
path_csv = Path(os.getcwd()) / "CSV"
path_to_params = Path(os.getcwd()) / "settings" / "MEDimage-Tutorial.json"
path_save = Path(os.getcwd())

We will now initialize our class ``BatchExtractor``

In [3]:
batch_extractor = MEDimage.biomarkers.BatchExtractor(
                                    path_read=path_read,
                                    path_csv=path_csv,
                                    path_params=path_to_params,
                                    path_save=path_save)

We will now call ``compute_radiomics`` which is the method that does all the job for us (processing, extraction and saving). The ``compute_radiomics`` method takes only one optional boolean argument ``create_tables`` that specify if we need to create csv tables or stop at JSON files and by default it's ``True`` and we recommend it stays this way since JSON files are harder to go through.

In [4]:
batch_extractor.compute_radiomics()

2022-10-04 19:14:45,520	INFO services.py:1090 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8267[39m[22m



 --> Computing features for the "GTV" roi type ...

100%|██████████| 204/204 [14:41<00:00,  4.32s/it]


DONE


100%|██████████| 4/4 [00:00<00:00, 10407.70it/s]

DONE





We should now have all the results (extracted features) saved in ``path_save`` in a folder called features.

## Extra: BatchExtractor class diagram

To further understand the ``BatchExtractor`` class, you can take a look at the class diagram below that describes the structure of the class, attributes, methods and the relationships with other objects.

<img src="images/BatchExtractionClassDiagram.png"/>


### Common errors to avoid:

- No CSV found: make sure you have the CSV file in the right folder (``path_csv``).
- Bad settings file or wrong keywords use in json settings files: Make sure you have the right keywords in your json settings file in the right place, many json settings files are provided in the repository so use it as a template.