# Python Script: Prepare Data for Classification

```python
prepare_data_classification.py
```
This script prepares the original MRI scans from the Kaggle competition titled 'UW-Madison GI Tract Image Segmentation,' which will be used to train and validate the binary classification model (ResNetClassifier).

The script takes as input parameters the path of the image database and the train.csv file that includes all the information needed to generate the target masks:
* The images are min-max normalized and saved in RGB PNG format in a specified target folder. If the image is grayscale, it is copied into each of the three channels.
* The target masks are created from the run-length code provided in ```bash train.csv``` (column ```python segmentation```).
* Additionally, the script allows for saving the images in a 2.5D format, where the first (R) channel is used to store the actual image 'i', the second (G) channel to store the image 'i + stride', and the third (B) channel to store the image 'i + 2*stride'.

The usage sample of this application is given next:

```bash
python prepare_data_classification.py -dimension 2d -csv data/train.csv -input_dir images/train -output_dir classification_data
```
More information about the input parameters is provided with the help parameter as follows:

```bash
python .\prepare_data_classification.py --help

usage: prepare_data_classification.py [-h] [-dimension {2d,2.5d}] [-stride STRIDE] [-csv CSV] [-input_dir INPUT_DIR] [-output_dir OUTPUT_DIR] [-test_patients TEST_PATIENTS]

options:
-h, --help                   Show this help message and exit
-dimension {2d,2.5d}         Choose either '2d' or '2.5d'
-stride STRIDE               Specify the stride as an integer (default 1) for 2.5d
-csv CSV                     Path and file name of the csv file with rle data (default 'data/train.csv')
-input_dir INPUT_DIR         Specify the directory where the input images reside (default 'images/train')
-output_dir OUTPUT_DIR       Specify the directory where the images will be stored (default 'classification_data')
-test_patients TEST_PATIENTS Specify the list of test images (default "['2', '6', '7', '9', '11', '15', '16', '140', '145', '146', '147', '148', '149', '154', '156']")
```                

# Usage Sample

In [1]:
!python prepare_data_classification.py -dimension 2d -csv data/train.csv -input_dir images/train -output_dir classification_data

Dimension: 2d


Train :: case101:   0%|          | 0/576 [00:00<?, ?it/s]
Train :: case101:   1%|          | 4/576 [00:00<00:14, 39.02it/s]
Train :: case101:   1%|1         | 8/576 [00:00<00:15, 35.93it/s]
Train :: case101:   2%|2         | 13/576 [00:00<00:13, 41.51it/s]
Train :: case101:   3%|3         | 18/576 [00:00<00:16, 34.36it/s]
Train :: case101:   4%|3         | 23/576 [00:00<00:14, 38.10it/s]
Train :: case101:   5%|4         | 27/576 [00:00<00:14, 38.34it/s]
Train :: case101:   5%|5         | 31/576 [00:00<00:14, 37.56it/s]
Train :: case101:   6%|6         | 35/576 [00:00<00:14, 36.71it/s]
Train :: case101:   7%|6         | 40/576 [00:01<00:14, 38.24it/s]
Train :: case101:   8%|7         | 45/576 [00:01<00:12, 41.25it/s]
Train :: case101:   9%|8         | 50/576 [00:01<00:12, 42.37it/s]
Train :: case101:  10%|9         | 55/576 [00:01<00:11, 43.75it/s]
Train :: case101:  10%|#         | 60/576 [00:01<00:11, 43.28it/s]
Train :: case101:  11%|#1        | 65/576 [00:01<00:11, 42.63it/s]
Train


Stride: 1
CSV: data/train.csv
Input Dir: images/train
Output Dir: classification_data
Test Patients: ['2', '6', '7', '9', '11', '15', '16', '140', '145', '146', '147', '148', '149', '154', '156']
