# SegFormer 3D Training

---

## 1) Data Pre-Processing

---

In ***medical imaging***, particularly when working with ***brain MRI*** scans, it is essential to perform various pre-processing steps to enhance the quality of the images and ensure the ***robustness*** and ***accuracy*** of machine learning models. These pre-processing techniques help ***standardize*** the input data, reduce noise, and highlight relevant features, ultimately ***lightening*** the ***computational load*** on the model and improving its convergence. Common pre-processing steps include:

1) ***Resampling and Resizing***:
MRI scans are often acquired with different resolutions and voxel sizes depending on the scanner settings. Resampling the images to a consistent resolution ensures uniformity across the dataset, which is crucial for machine learning models to learn effectively without being biased by variations in image dimensions.

2) ***Skull Stripping***:
This process involves removing non-brain tissue, such as the skull and scalp, from MRI images. Skull stripping focuses the analysis on brain tissue, thereby reducing noise and irrelevant features, which can improve the model's ability to extract meaningful patterns from the data.

3) ***Intensity Normalization***:
MRI images often have varying intensity ranges due to differences in scanner parameters and acquisition protocols. Normalizing the intensity values to a standard range helps models learn from more consistent input data and reduces variability, leading to more stable and generalizable performance.

4) ***Noise Reduction (Denoising)***:
MRI scans can contain random noise, which may obscure critical features. Applying denoising filters such as Gaussian or non-local means can help preserve the relevant anatomical structures while eliminating noise that could negatively impact the model's learning process.

5) ***Bias Field Correction***:
MR images sometimes suffer from low-frequency intensity variations, known as bias fields, caused by inhomogeneities in the magnetic field. Correcting these variations ensures more uniform intensity across the brain tissue, improving segmentation and feature extraction.

6) ***Spatial Alignment (Registration)***:
Aligning images to a common coordinate space, such as the MNI152 template, ensures that corresponding anatomical regions are in the same location across different subjects. This spatial consistency helps models generalize better and facilitates group analysis.

7) ***Data Augmentation***:
Pre-processing can also involve data augmentation techniques, such as random rotations, flips, and elastic deformations. Augmentation increases the diversity of the training dataset, helping models become more robust to variations in real-world MRI scans.

By implementing these pre-processing steps, the computational burden on machine learning models is significantly ***reduced***, as the models can focus on learning meaningful patterns rather than compensating for inconsistencies and noise in the data. Consequently, this leads to faster training times, improved model accuracy, and ***better clinical insights***.

---

In [1]:
# The following code relies on the usage of the BraTs 2021 dataset --> https://www.kaggle.com/datasets/dschettler8845/brats-2021-task1

# 1) Build the SegFormer Utils module

In [4]:
%%bash
cd ../segformer-utils/
python -m build
cd ../notebooks
cp ../segformer-utils/dist/*.whl .
python -m pip install *.whl

[1m* Creating isolated environment: venv+pip...[0m
[1m* Installing packages in isolated environment:[0m
  - setuptools >= 40.8.0
[1m* Getting build dependencies for sdist...[0m
running egg_info
writing segformer3dutils.egg-info/PKG-INFO
writing dependency_links to segformer3dutils.egg-info/dependency_links.txt
writing requirements to segformer3dutils.egg-info/requires.txt
writing top-level names to segformer3dutils.egg-info/top_level.txt
reading manifest file 'segformer3dutils.egg-info/SOURCES.txt'
writing manifest file 'segformer3dutils.egg-info/SOURCES.txt'
[1m* Building sdist...[0m
running sdist
running egg_info
writing segformer3dutils.egg-info/PKG-INFO
writing dependency_links to segformer3dutils.egg-info/dependency_links.txt
writing requirements to segformer3dutils.egg-info/requires.txt
writing top-level names to segformer3dutils.egg-info/top_level.txt
reading manifest file 'segformer3dutils.egg-info/SOURCES.txt'
writing manifest file 'segformer3dutils.egg-info/SOURCES.tx

In [12]:
from segformer3dutils.data_splitter.create_train_val_test_csv import create_train_val_test_csv_from_data_folder

In [14]:
# Create the appropriate metadata for the BraTs 2021 dataset
create_train_val_test_csv_from_data_folder(folder_dir="../data/brats2021_seg/Brats2021/BraTS2021_Training_Data", 
                          append_dir="../data/brats2021_seg", save_dir="../data/brats2021_seg", 
                          train_split_perc=0.85, val_split_perc=0.10)

NameError: name 'train_dp' is not defined