# BrainTriage - Group 3 - AngryChickens

Chihab Amghane, s4112288<br>
Freek van den Bergh, s4801709<br>
Max Driessen, s4789628<br>
Jordy Naus, s4722426<br>
Marlous Nijman, s4551400

This notebook expects the following file structure:
```
data/
    BrainTriage/
        final_test_split/
        train/
ISMI-braintriage/
    code/
        full-pipeline.ipynb (this notebook)
    old-code/
        ..
```
where `Braintriage/train/` and `BrainTriage/final_test_split/` are the original train and test data as provided by the competion, respectively, and `ISMI-braintriage` is our GitHub repository.

The data can be moved or accessed from a different location by modifying the `ORIGINAL_DATA_PATH` variable in the *Generate Slice Data* section below. **However**, note that the code still expects the following file structure:
```
ORIGINAL_DATA_PATH/
    final_test_split/
    train/
```
It is advised to leave the other variables intact, since the code has been tested to work with this set-up.

## Imports

In [1]:
import os
# Import from repository
from utils import clean_up

## Generate Slice Data

As noted above, the `ORIGINAL_DATA_PATH` variable can be changed to the path where the original data resides. Note that the code still expects a specific file structure specified above though.

If the specified directories do not exist already, this section will extract all slices for every patient from the original train and test data and save them in the specified directories (`SLICED_DATA_DIR`, `TRAIN_PATH`, and `TEST_DIR`). Additionally, a `.csv` file will be created to store the labels of each slice and patient.

### Parameters

In [2]:
DATA_DIR = os.path.join("..","..","data")
ORIGINAL_DATA_PATH = os.path.join(DATA_DIR,"BrainTriage")
SLICED_DATA_DIR = os.path.join(DATA_DIR, "sliced-data")
TRAIN_PATH = os.path.join(SLICED_DATA_DIR, "train")
TEST_PATH = os.path.join(SLICED_DATA_DIR, "test")
DATA_SPLIT_DIR = os.path.join(DATA_DIR, "data-split")
TMP_DIR = "tmp"
SUBMISSION_DIR = os.path.join("..", "..", "submission")
print(DATA_DIR, ORIGINAL_DATA_PATH, SLICED_DATA_DIR, TRAIN_PATH, TEST_PATH, DATA_SPLIT_DIR, TMP_DIR, SUBMISSION_DIR, sep="\n")

../../data
../../data/BrainTriage
../../data/sliced-data
../../data/sliced-data/train
../../data/sliced-data/test
../../data/data-split


### Run section

In [None]:
if not os.path.exists(TRAIN_PATH):
    !python3 dataset/create_slices.py -d $ORIGINAL_DATA_PATH -o $SLICED_DATA_DIR --train
if not os.path.exists(TEST_PATH):
    !python3 dataset/create_slices.py -d $ORIGINAL_DATA_PATH -o $SLICED_DATA_DIR --test

## Create Train/Validation Split

In this section, the train/validation split will be defined for our entire pipeline, since we should use the same train/validation split for all networks in our pipeline. This split will be saved in `.csv` files in the directory specified in `DATA_SPLIT_DIR` above.

### Parameters

In [None]:
K = 10### Parameters

### Run section

In [None]:
if not os.path.exists(DATA_SPLIT_DIR):
    !python3 dataset/create_data_split.py -k $K -d $TRAIN_PATH -ds $DATA_SPLIT_DIR

## Train ResNet34

Here we train the convolutional neural network (CNN) (ResNet34) of our pipeline to classify the images. ResNet34 is trained on slice-level. Note that we have added a seed (`SEED`) during every training stage to ensure that are results are reproducible.

### Parameters

In [None]:
CNN_NAME = RESNET_TYPE = "resnet34"
SEED = 420
LR = 0.0001
EPOCHS = 30
BATCH_SIZE = 16
N_FEATURES = 128
FLIP_PROBABILITY = ROTATE_PROBABILITY = 0.5
MODEL_DIR = "../../models"

### Run section

In [None]:
!python3 train/train-resnet.py $CNN_NAME $RESNET_TYPE -s $SEED -lr $LR -e $EPOCHS -b $BATCH_SIZE  \
                               -m $MODEL_DIR -f $N_FEATURES -d $TRAIN_PATH -ds $DATA_SPLIT_DIR \
                               -afp $FLIP_PROBABILITY -arp $ROTATE_PROBABILITY -ts 0 32 --tuple 

## Train LSTM (Freeze ResNet34)

Here we train the LSTM of our pipeline. We do this by stripping ResNet34 from its classification layer, such that it outputs feature vectors, and training the LSTM to classify the images based on ResNet34's feature vectors. ResNet34 is frozen, such that it does not learn. The LSTM is trained on patient-level.

### Parameters

In [None]:
with open(os.path.join(TMP_DIR, CNN_NAME), 'r') as f:
    CNN_LOC = f.read()
LSTM_NAME = "lstm"
BATCH_SIZE = 2

### Run section

In [None]:
!python3 train/train-lstm.py $LSTM_NAME $RESNET_TYPE -s $SEED -d $TRAIN_PATH -ds $DATA_SPLIT_DIR -c $CNN_LOC \
                             -lr $LR -e $EPOCHS -b $BATCH_SIZE -m $MODEL_DIR -f $N_FEATURES \
                             -afp $FLIP_PROBABILITY -arp $ROTATE_PROBABILITY -ts 0 32 --tuple

## Finetune CombinedNet

Here we finetune both ResNet34 and the LSTM with a smaller learning rate. This finetuned network will do the final classification on the test set.

### Parameters

In [None]:
with open(os.path.join(TMP_DIR, LSTM_NAME), 'r') as f:
    LSTM_LOC = f.read()
COMBINED_NAME = "combinednet"
LR = 0.000001

### Run section

In [None]:
!python3 train/train-combinednet.py $COMBINED_NAME $RESNET_TYPE -s $SEED -d $TRAIN_PATH -ds $DATA_SPLIT_DIR \
                                    -l $LSTM_LOC -lr $LR -e $EPOCHS -b $BATCH_SIZE -m $MODEL_DIR -f $N_FEATURES \
                                    -afp $FLIP_PROBABILITY -arp $ROTATE_PROBABILITY -ts 0 32 --tuple

## Predict Test Data

Make predictions on the test data with the final finetuned combined network.

### Parameters

In [None]:
with open(os.path.join(TMP_DIR, COMBINED_NAME), 'r') as f:
    COMBINED_FILENAME = os.path.basename(os.path.normpath(f.read()))

### Run section

In [None]:
!python3 test/submission.py $COMBINED_NAME $RESNET_TYPE $COMBINED_FILENAME -d $TEST_PATH -m $MODEL_DIR \
                            -sd $SUBMISSION_DIR -b $BATCH_SIZE -f $N_FEATURES -ts 0 32 --tuple

## Clean-up

Remove temporary directory that was created in the process.

In [None]:
clean_up()