# Preprocessing Steps

This tutorial continues using the example dataset CSVs generated in the previous entry and runs them through a Deep Learning focused pipeline comprised of the following ordered steps:

* Binarizing segmentation files (if specified)
* Enforcing desired coordinate orientation (defaults to 'RAS')
* Resampling to desired voxel spacing (defaults to 1mm isotropic)
* Derivation of a skullstrip mask using [synthstrip](https://surfer.nmr.mgh.harvard.edu/docs/synthstrip/)
* Intra-study registration with [synthmorph](https://martinos.org/malte/synthmorph/) of all series within a study to the series with a desired "NormalizedSeriesDescription" (defaults to "T1Post")
* Longitudinal registration with [synthmorph](https://martinos.org/malte/synthmorph/) to first study of the same patient or to a desired atlas (if specified)
* N4 bias field correction over the skullstrip mask or a foreground foreground mask (defaults to skullstrip mask if skullstripping not disabled)
* 0 mean, unit variance normalization of volume based on statistics over the skullstrip mask
* Background normalization to 0 based on the skullstripped mask or the foreground mask (defaults to skullstrip mask if skullstripping not disabled)

In [1]:
!preprocessing brain-preprocessing --help

usage: preprocessing <command> [<args>]

The following commands are available:
    validate-installation       Check that the `preprocessing` library is installed correctly along
                                with all of its dependencies.

    dicom-dataset               Create a DICOM dataset CSV compatible with subsequent `preprocessing`
                                scripts. The final CSV provides a series level summary of the location
                                of each series alongside metadata extracted from DICOM headers.  If the
                                previous organization schems of the dataset does not enforce a DICOM
                                series being isolated to a unique directory (instances belonging to
                                multiple series must not share the same lowest level directory),
                                reorganization must be applied for NIfTI conversion.

    nifti-dataset               Create a NIfTI dataset CSV compat

In [None]:
!preprocessing brain-preprocessing \
    /autofs/space/crater_001/datasets/public/NIH_IDC_Brain/upenn_gbm_preprocessed \
    dicom_dataset_examples/upenn_gbm_dataset.csv \
    -ns \
    -c 30

Preprocessing patients:   0%|                           | 0/539 [00:00<?, ?it/s]......Clearing unnecessary files......
Finished preprocessing sub-121:
                                   SeriesInstanceUID  ...                                    PreprocessedSeg
0  1.3.6.1.4.1.14519.5.2.1.1452448862230347881864...  ...  /autofs/space/crater_001/datasets/public/NIH_I...

[1 rows x 18 columns]
Preprocessing patients:   0%|                | 1/539 [01:18<11:45:49, 78.72s/it]......Clearing unnecessary files......
Finished preprocessing sub-109:
                                   SeriesInstanceUID  ...                                    PreprocessedSeg
0  1.3.6.1.4.1.14519.5.2.1.1227166070094937926102...  ...  /autofs/space/crater_001/datasets/public/NIH_I...

[1 rows x 18 columns]
Preprocessing patients:   0%|                 | 2/539 [01:57<8:16:05, 55.43s/it]......Clearing unnecessary files......
Finished preprocessing sub-114:
                                   SeriesInstanceUID  ...        