# Python Script: Prepare Data for Segmentation

```python
prepare_data_segmentation.py
```
This script prepares the original MRI scans from the Kaggle competition titled 'UW-Madison GI Tract Image Segmentation,' which will be used to train and validate the segmentation model (SegFormer).

The script takes as input parameters the path of the image database and the train.csv file that includes all the information needed to generate the target masks:
* The images are min-max normalized and saved in RGB PNG format in a specified target folder. If the image is grayscale, it is copied into each of the three channels.
* The target masks are created from the run-length code provided in ```bash train.csv``` (column ```python segmentation```).
* The script also allows for saving the images in a 2.5D format, where the first (R) channel is used to store the actual image 'i', the second (G) channel to store the image 'i + stride', and the third (B) channel to store the image 'i + 2*stride'.
* Additionally, the script offers the option to remove non-segmented slices. This allows the segmentation model to be trained using the entire dataset or only a subset containing segmented images.

The usage sample of this application is given next:

```bash
python .\prepare_data_segmentation.py -dimension 2d -csv data/train.csv -input_dir images/train -output_dir segmentation_data -remove_non_seg 1

```
More information about the input parameters is provided with the help parameter as follows:

```bash
python .\prepare_data_segmentation.py --help

usage: prepare_data_segmentation.py [-h] [-dimension {2d,2.5d}] [-stride STRIDE] [-csv CSV] [-input_dir INPUT_DIR]
                                    [-output_dir OUTPUT_DIR] [-test_patients TEST_PATIENTS]
                                    [-remove_non_seg REMOVE_NON_SEG] [-mask_rgb MASK_RGB]

options:
-h, --help                     Show this help message and exit
-dimension {2d,2.5d}           Choose either '2d' or '2.5d'
-stride STRIDE                 Specify the stride as an integer (default 1) for 2.5d
-csv CSV                       Path and file name of the csv file with rle data (default 'data/train.csv'
-input_dir INPUT_DIR           Specify the directory where the input images reside (default 'images/train')
-output_dir OUTPUT_DIR         Specify the directory where the images will be stored (default 'segmentation_data')
-test_patients VALID_PATIENTS  Specify the list of validation images for inference (default "['2', '6', '7', '9', '11', '15', '16', '140', '145', '146', '147', '148', '149', '154', '156']")
-remove_non_seg REMOVE_NON_SEG Remove pictures that are not segmented (default 1)
-mask_rgb MASK_RGB             Generate masks also in RGB format (default 0)
```                

# Usage Sample

In [1]:
!python .\prepare_data_segmentation.py -dimension 2d -csv data/train.csv -input_dir images/train -output_dir segmentation_data -remove_non_seg 1

Dimension: 2d
Stride: 1
CSV: data/train.csv
Input Dir: images/train
Output Dir: segmentation_data
Valid Patients: ['2', '6', '7', '9', '11', '15', '16', '140', '145', '146', '147', '148', '149', '154', '156']
Remove Non-Segmented Images: 1
Mask RGB: 0



Train :: case101:   0%|          | 0/224 [00:00<?, ?it/s]
Train :: case101:   2%|1         | 4/224 [00:00<00:06, 33.08it/s]
Train :: case101:   4%|3         | 8/224 [00:00<00:09, 23.50it/s]
Train :: case101:   5%|4         | 11/224 [00:00<00:08, 24.74it/s]
Train :: case101:   7%|7         | 16/224 [00:00<00:06, 29.94it/s]
Train :: case101:   9%|8         | 20/224 [00:00<00:07, 26.27it/s]
Train :: case101:  10%|#         | 23/224 [00:00<00:08, 22.78it/s]
Train :: case101:  12%|#1        | 26/224 [00:01<00:08, 23.14it/s]
Train :: case101:  14%|#4        | 32/224 [00:01<00:06, 28.49it/s]
Train :: case101:  16%|#5        | 35/224 [00:01<00:07, 25.01it/s]
Train :: case101:  17%|#6        | 38/224 [00:01<00:08, 22.08it/s]
Train :: case101:  18%|#8        | 41/224 [00:01<00:07, 23.47it/s]
Train :: case101:  20%|#9        | 44/224 [00:01<00:08, 21.16it/s]
Train :: case101:  21%|##        | 47/224 [00:01<00:08, 21.11it/s]
Train :: case101:  22%|##2       | 50/224 [00:02<00:08, 19.65it/s]
Train