https://www.kaggle.com/c/rsna-miccai-brain-tumor-radiogenomic-classification/overview

# Definition

MRI is an imaging modality that uses non-ionizing radiation to create useful diagnostic images.
MRI is used to distinguish pathologic tissue such as a brain tumor or MS lesions from normal tissue.
In simple terms, an MRI scanner consists of a large, powerful magnet in which the patient lies. A radio wave antenna is used to send signals to the body and then a radiofrequency receiver detects the emitted signals. These returning signals are converted into images by a computer attached to the scanner. Imaging of any part of the body can be obtained in any plane.
Source: MRI | Radiology Reference Article | Radiopaedia.org

# Parameter Weighting

Following are some of the main parameter weighting techniques being used in MRI:

https://geekymedics.com/wp-content/uploads/2020/04/t1-and-t2-770x287.png

## T1-weighted

ONE tissue is bright: fat
provides the most anatomically-relevant images
shape of the brain can be clearly seen, and morphological abnormalities are easy to detect.
fat is depicted in white and water in black.
grey matter is darker than white matter.

## T2-weighted

TWO tissues are bright: fat and water (WW2 – Water is White in T2)
Regular T2 MRI's are important for tracking long-term disease progression.
White matter is darker than grey
Lesions appear white.
Suitable for lesion evaluation.

## T1Gd

A Gadolinium (gd)-Enhanced T1-Weighted scan reveals only new lesions. These are areas where the disease is currently active.
Before the MRI, an injection of gadolinium (gd) is administered. This will distinguish the active lesions from the normal parts of the brain.
This type of MRI will not show older, inactive lesions.

## Fluid Attenuated Inversion Recovery (FLAIR)

FLAIR is similar to T2, but the fluid is darker or "suppressed".
In T2, the spinal fluid (water) is white and the lesion is also white, so you have to look for the white in the white, which is difficult to understand. FLAIR can be roughly thought of as T2, in which the water is also black, making it easier to find the lesion.


# Imaging Planes

Magnetic resonance imaging (MRI) is partially defined by the plane or direction of the image that is taken. The most important model coordinate system for medical imaging is the anatomical coordinate system (also called patient coordinate system).

The Three Basic Anatomical Planes:

This coordinate system consists of three planes to describe the standard anatomical position of a human. The basic orientation terms for an MRI of the body taken: From the side would be a sagittal plane; from the front would be a coronal plane, and from the top down would be a transverse or axial plane.
![Image](https://users.fmrib.ox.ac.uk/~stuart/thesis/chapter_3/image3_5.gif)
Source: https://users.fmrib.ox.ac.uk/~stuart/thesis/chapter_3/section3_2.html

## Sagittal plane

Also known as median plane.
It is a y-z plane, perpendicular to the ground, which separates left from right.
The mid-sagittal plane is the specific sagittal plane that is exactly in the middle of the body.
A sagittal MRI looks at the brain from the side in a series of images starting at one ear and moving to the other.

## Coronal plane

Also known as frontal plane.
It is an x-z plane, perpendicular to the ground, which (in humans) separates the anterior from the posterior, the front from the back, the ventral from the dorsal.
A coronal MRI looks at the brain from behind in a series of images starting at the back of the head and moving to the face.

## Axial plane

Also known as transverse plane.
It is an x-y-z plane, parallel to the ground, which (in humans) separates the superior from the inferior, or put another way, the head from the feet.
An axial MRI looks at the brain from below in a series of images starting at the chin and moving to the top of the head.

The idea is to select out the section with the tomor on it with a preprocessing network and after just use 2D restnet to train the data

In [None]:
# matplotlib and it settings
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
import torchvision.models as models
print(f"device: {device}")
import torch
import os
import numpy as np
import matplotlib.animation as animation
from IPython.display import HTML
import pydicom
import re
import time
import torchvision
from torchvision import datasets, models, transforms
import torch.optim as optim
from torch.optim import lr_scheduler
import copy
import random
import zipfile
import pandas as pd
import io
import h5py
import torch.nn as nn
from tqdm import tqdm
import shutil
from PIL import Image
# for batch loading
from distributed import Client
import dask
import glob
import toolz
# weight classes
from sklearn.utils import class_weight
# image preprocessing
import cv2
from PIL import Image
import shutil

In [None]:
version = "1.4.4"
hastumor_version = "1.4.2"
scan_types = {'t2': 'T2w', 'flair': 'FLAIR'}
model_scan_type = "t2"
data_raw = "../input/rsna-miccai-brain-tumor-radiogenomic-classification/"
data_sourcedir = "./data_1.4"
data_dir = "./data_1.4_filtered"
data_dir_predict = "./data_1.4_predict"
targetsize = 224

# Preprocessing the data

In [None]:
# copy the files to the traning dir
try:
    shutil.rmtree(data_sourcedir)
except Exception as e:
    print(e)
# create directories
if not os.path.exists(data_sourcedir):
    os.makedirs(data_sourcedir)
train = os.path.join(data_sourcedir, "train")
if not os.path.exists(train):
    os.makedirs(train)
train_1 = os.path.join(train, "1")
if not os.path.exists(train_1):
    os.makedirs(train_1)
train_0 = os.path.join(train, "0")
if not os.path.exists(train_0):
    os.makedirs(train_0)
# varidation directories
val = os.path.join(data_sourcedir, "val")
if not os.path.exists(val):
    os.makedirs(val)
val_1 = os.path.join(val, "1")
if not os.path.exists(val_1):
    os.makedirs(val_1)
val_0 = os.path.join(val, "0")
if not os.path.exists(val_0):
    os.makedirs(val_0)

In [None]:
# read the csv data from the archive
df = pd.read_csv(os.path.join(data_raw, 'train_labels.csv'))
# fill the patient ids with 0 to match with the dataset directories
df['BraTS21ID'] = df['BraTS21ID'].apply( lambda x : str(x).zfill(5))
df.set_index("BraTS21ID", inplace=True)
# drop some value
deleteid = [ "00109", "00123", "00709" ]
df.drop(deleteid, inplace=True)

df.head(8)

In [None]:
# the files in the archive
files = []
for root, dirnames, filenames in os.walk(data_raw):
    for filename in filenames:
        files.append(os.path.join(root, filename))

In [None]:
print(files[3])

In [None]:
def imshow(inp, title=None):
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

def image_orientation(dicom):
        rt = 'unkown'
        # https://www.kaggle.com/davidbroberts/determining-mr-image-planes
        (x1,y1,_,x2,y2,_) = [round(v) for v in dicom.ImageOrientationPatient]
        if (x1,y1,x2,y2) == (1,0,0,0):
            rt = 'coronal'
        if (x1,y1,x2,y2) == (1,0,0,1):
            rt = 'axial'
        if (x1,y1,x2,y2) == (0,1,0,0):
            rt = 'sagittal'        
        return rt

def colornormalisation(img):
    if img is not None and np.max(img) != 0:
        # values to greyscale
        img = img - np.min(img)
        img = img / np.max(img)    # 0~1
        img = (img * 255).astype(np.uint8)    # 0~255 
        return img
    elif img is None:
        return img
    else:
        return img.astype(np.uint8)
    
def imagenormalisation(image, mincutsize=0.1):
    image = image.astype(np.uint8)
    # the minimal pixel value 
    cutoff = 5
    sizecut = 1 # default sizecut is maximum image
    if np.max(image) > cutoff:
        
        # apply binary thresholding to the gray image
        ret, thresh = cv2.threshold(image, cutoff, 255, cv2.THRESH_BINARY)
        # detect the contours on the binary image using cv2.CHAIN_APPROX_NONE
        cnts = cv2.findContours(image=thresh, mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_SIMPLE)
        
        # if not found any contour return the original image
        if len(cnts[0]) == 0: 
            cv2.imwrite('test.png', image)
            cv2.imwrite('test_gray.png', image)
            image_copy1 = image.copy()
            return colorAndShape(image)
        cnts = cnts[0] if len(cnts) == 2 else cnts[1]
        cnts = sorted(cnts, key=cv2.contourArea, reverse=True)

        # Find bounding box and extract ROI
        x = 10000
        y = 10000
        w = 0
        h = 0
        for c in cnts:
            xc,yc,wc,hc = cv2.boundingRect(c)
            if xc < x:
                x = xc
            if yc < y:
                y = yc
            if xc+wc > w:
                w = xc+wc
            if yc+hc > h:
                h = yc+hc
        # crop the image
        sizecut = (h-y)*(w-x)/(image.shape[0]*image.shape[1])
        # check the image has got enough pixel to worth to consider
        if sizecut < mincutsize:
            image = None
        else:
            # cut the image
            image = image[y:h,x:w]
    else:
        # the image haw not meaningfull result
        image = None
    if image is not None:
        # resize the image
        image = cv2.resize(image, (targetsize, targetsize))
    return image, sizecut
    
    
def fileprocess(path_sorted, direction = 'axial'):
    images = []
    # how small the new image compaing to the original
    sizes = []
    orientation = None
    for idx in range(len(path_sorted)):
        ds = pydicom.filereader.dcmread( path_sorted[idx] )
        orientation = image_orientation(ds)
        # normalization of the color
        images.append(ds.pixel_array)
        
    # we need to rotate
    newimages = []
    if orientation != direction:
        data = np.stack(images)
        data = colornormalisation(data)
        # rotate to axial
        if direction == 'axial':
            # coronal -> axial
            if orientation == 'coronal':                
                for d in range(data.shape[1]): 
                    i, size = imagenormalisation(data[:, d, :])
                    newimages.append(i)
                    sizes.append(size)
            if orientation == 'sagittal':
                for d in range(data.shape[2]):
                    i, size = imagenormalisation(data[:, :, d].T)
                    newimages.append(i)
                    sizes.append(size)
    else:
        data = np.stack(images)
        data = colornormalisation(data)
        for d in range(data.shape[0]):
            i, size = imagenormalisation(data[d, :, :])            
            newimages.append(i)
            sizes.append(size)            
    
    return newimages, sizes

## Training data

In [None]:
datalength = df.shape[0]
progress = 0
for d in range(datalength):
    progress += 1
    idname = df.index[d]
    print(f"Process: {np.round(100*progress/datalength, 2)}%", end="\r")
    for scan_type in scan_types.keys():
        # get the files
        path = os.path.join( data_raw, "train", idname, scan_types[scan_type] )
        # sort the files to make right timeorder
        path_sorted = [f for f in files if re.search(path+"/", f) ]
        imagenum = [s.split("/")[6].split("-")[1] for s in path_sorted]
        imagenum = [s.split(".")[0] for s in imagenum]
        tempdf = pd.DataFrame()
        tempdf["image_num"] = imagenum
        tempdf["image_num"] = tempdf["image_num"].astype("int")
        tempdf["temp_path"] = path_sorted
        tempdf = tempdf.sort_values("image_num").reset_index(drop=True)
        # collect the files and process them
        images, sizes = fileprocess(tempdf["temp_path"].values)
        # if image is meaningfull
        images = [ i for i in images if i is not None ]
        # save as jpg
        label = str(df.iloc[d]['MGMT_value'])        
        for idx in range(len(images)):
            # asign randomly them to train and val group
            val = np.random.binomial(1, 0.2, 1)[0]
            if val == 1:
                val = "val"
            else:
                val = "train"
            filename = os.path.join(data_sourcedir, val, label, f"{idname}_{scan_type}_{idx}.jpg")
            cv2.imwrite(filename, images[idx])

## Test data

In [None]:
# copy the files to the traning dir
try:
    shutil.rmtree(data_dir_predict)
except Exception as e:
    print(e)
# create directories
if not os.path.exists(data_dir_predict):
    os.makedirs(data_dir_predict)
pre = os.path.join(data_dir_predict, "pre")
if not os.path.exists(pre):
    os.makedirs(pre)

In [None]:
# files to predict
files_pre = [ f for f in files if re.search('test/', f)]
df = pd.DataFrame({'BraTS21ID': list(set([ f.split("/")[4] for f in files_pre ]))})
print(df.head())

In [None]:
datalength = df.shape[0]
progress = 0
for d in range(datalength):
    progress += 1
    idname = df.iloc[d]['BraTS21ID']
    print(f"Process: {np.round(100*progress/datalength, 2)}%", end="\r")
    for scan_type in scan_types.keys():
        # get the files
        path = os.path.join( data_raw, "test", idname, scan_types[scan_type] )
        # sort the files to make right timeorder
        path_sorted = [f for f in files_pre if re.search(path+"/", f) ]
        imagenum = [s.split("/")[6].split("-")[1] for s in path_sorted]
        imagenum = [s.split(".")[0] for s in imagenum]
        tempdf = pd.DataFrame()
        tempdf["image_num"] = imagenum
        tempdf["image_num"] = tempdf["image_num"].astype("int")
        tempdf["temp_path"] = path_sorted
        tempdf = tempdf.sort_values("image_num").reset_index(drop=True)
        # collect the files and process them
        images, sizes = fileprocess(tempdf["temp_path"].values)
        # if image is meaningfull
        images = [ i for i in images if i is not None ]
        # save as jpg
        for idx in range(len(images)):
            filename = os.path.join(data_dir_predict, f"{idname}_{scan_type}_{idx}.jpg")
            cv2.imwrite(filename, images[idx])

# Build the model

In [None]:
# load has_tumor model model_has_tumor_*.pth
model_has_tumor = {}
for scan_type in scan_types:
    try:
        model_has_tumor[scan_type] = torch.load(f"../input/mri-brain-tumor-detection/model_has_tumor_{hastumor_version}_{scan_type}.pth")
    except Exception as e:
        print(f"Missing {scan_type} model! {e}")

In [None]:
def test_single_slices(scan_type, num_images=6):
    
    fileslist = []
    # copy the files 
    for root, dirs, files in os.walk(data_sourcedir):
        for filename in files:
            if re.search("_"+scan_type+"_", filename, re.IGNORECASE):
                fileslist.append((root, filename))                    
                
    # the data
    # Data augmentation and normalization for training
    # Just normalization for validation
    data_transforms =  transforms.Compose([
            transforms.ToTensor()
        ])

    # get the model
    model = model_has_tumor[scan_type]
    model.to(device)
    model.eval()
    
    
    with torch.no_grad():
    
        # itterate over the images and get prediction
        predictions = []
        datalength = len(fileslist)
        progress = 0
        for i in range(len(fileslist)):
            progress += 1
            print(f"Process: {np.round(100*progress/datalength, 2)}%", end="\r")
            # read the file
            filepath = os.path.join(fileslist[i][0], fileslist[i][1])
            image = Image.open(filepath).convert('RGB') # be sure it is 3 channel
            image = data_transforms(image) # transform
            image = image.unsqueeze_(0) #so img is not treated as a batch 
            if torch.cuda.is_available(): 
                image = image.cuda()
            predict = model(image)
            if torch.cuda.is_available(): 
                predictions.append( predict.cpu().detach().numpy()[0])
            else:
                predictions.append( predict.detach().numpy()[0])
        
    df = pd.DataFrame({
        'path': [ f[0] for f in fileslist], 
        'filename': [ f[1] for f in fileslist],
        'notumor': [ p[0] for p in predictions ],
        'tumor': [ p[1] for p in predictions ]        
    })
    print(df.head())
    
    return df



In [None]:

df = test_single_slices(model_scan_type)
# save data
df.to_csv(f"datalabel_hastomur_{version}_{model_scan_type}.csv", index=False)


In [None]:
df = pd.read_csv(f"datalabel_hastomur_{version}_{model_scan_type}.csv")
print(df.head())

In [None]:
def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    """Imshow for Tensor."""
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

def visualize_model(model, dataloaders, class_names, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(
                    f"predicted: {class_names[preds[j]]} real: {class_names[int(labels[j])]}"
                )
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)    
    
def train_model(model, criterions, optimizer, scheduler, dataloaders, dataset_sizes, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in tqdm(dataloaders[phase]):
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterions[phase](outputs, labels)


                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def buildmodel(scan_type):
    
    df = pd.read_csv(f"datalabel_hastomur_{version}_{scan_type}.csv")
    
    # copy the files to the traning dir
    try:
        shutil.rmtree(data_dir)
    except Exception as e:
        print(e)
    # create directories
    os.makedirs(data_dir)
    train = os.path.join(data_dir, "train")
    os.makedirs(train)
    train_1 = os.path.join(train, "1")
    os.makedirs(train_1)
    train_0 = os.path.join(train, "0")
    os.makedirs(train_0)
    val = os.path.join(data_dir, "val")
    os.makedirs(val)
    val_1 = os.path.join(val, "1")
    os.makedirs(val_1)
    val_0 = os.path.join(val, "0")
    os.makedirs(val_0)
    # copy the files 
    for destination in [train_1, train_0, val_0, val_1]:
        source = destination.replace(data_dir, data_sourcedir)
        filelist = df[df['path'] == source]
        filelist = filelist[filelist['tumor'] > filelist['notumor']]
        for filename in filelist['filename']:
            shutil.copy( os.path.join(source, filename), destination)

    # Data augmentation and normalization for training
    # Just normalization for validation
    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomHorizontalFlip(),
            transforms.RandomVerticalFlip(),
            transforms.RandomRotation(degrees=(-10, 10)),
            transforms.ToTensor()
        ]),
        'val': transforms.Compose([
            transforms.ToTensor()
        ]),
    }


    image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                              data_transforms[x])
                      for x in ['train', 'val']}
    print(image_datasets['train'].classes)
    dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=64,
                                                 shuffle=True, num_workers=4)
                  for x in ['train', 'val']}
    dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
    class_names = image_datasets['train'].classes
    
    class_weights = {}
    for phase in data_transforms.keys():
        y = [ p for p in df['path'].values if re.search(phase, p) ]
        classes = np.unique(y)
        weight=class_weight.compute_class_weight('balanced',classes,y)
        class_weights[phase] = torch.tensor(weight,dtype=torch.float).to(device)
    print(class_weights)
    
    # Get a batch of training data
    inputs, classes = next(iter(dataloaders['train']))

    # Make a grid from batch
    out = torchvision.utils.make_grid(inputs)

    imshow(out, title=[class_names[x] for x in classes])
    
    
    # pretrained pytorch model with internet connection
    # model_ft = models.resnet18(pretrained=True)
    # pretrained pytorch model without internet connection
    model_ft = models.resnet18(pretrained=False)
    model_ft.load_state_dict(torch.load(f"../input/pytorch-resnet18/resnet18-f37072fd.pth"))
    # number of input feature from the pretrained model
    num_ftrs = model_ft.fc.in_features
    # Here the size of each output sample is set to 2.
    # Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).    
    model_ft.fc = nn.Linear(num_ftrs, 2)
        
    model_ft = model_ft.to(device)
    

    criterions = {}
    for phase in data_transforms.keys():
        criterions[phase] = nn.CrossEntropyLoss(weight=class_weights[phase])
    # criterion = torch.nn.BCEWithLogitsLoss() # this use sigmoid automatilcy

    # Observe that all parameters are being optimized    
    LEARNING_RATE = 1e-3
    optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
    # optimizer_ft = optim.Adam(model_ft.parameters(), lr=LEARNING_RATE)

    # Decay LR by a factor of 0.1 every 7 epochs
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
    
    for i in range(6):
        model_ft = train_model(model_ft, criterions, optimizer_ft, exp_lr_scheduler, 
                               dataloaders, dataset_sizes,
                               num_epochs=1)
        # save the model
        torch.save(model_ft, f"model_{version}_{scan_type}.pth")
    
    visualize_model(model_ft, dataloaders, class_names)

buildmodel(model_scan_type)

## Predicting

In [None]:
data_sourcedir = data_dir_predict

In [None]:
def test_single_slices(scan_type, num_images=6):
    
    fileslist = []
    # copy the files 
    for root, dirs, files in os.walk(data_sourcedir):
        for filename in files:
            if re.search("_"+scan_type+"_", filename, re.IGNORECASE):
                fileslist.append((root, filename))                    
                
    # the data
    # Data augmentation and normalization for training
    # Just normalization for validation
    data_transforms =  transforms.Compose([
            transforms.ToTensor()
        ])

    # get the model
    model =  torch.load( f"model_{version}_{scan_type}.pth" )
    model.eval()
    
    
    with torch.no_grad():
    
        # itterate over the images and get prediction
        predictions = []
        datalength = len(fileslist)
        progress = 0
        for i in range(len(fileslist)):
            progress += 1
            print(f"Process: {np.round(100*progress/datalength, 2)}%", end="\r")
            # read the file
            filepath = os.path.join(fileslist[i][0], fileslist[i][1])
            image = Image.open(filepath).convert('RGB') # be sure it is 3 channel
            image = data_transforms(image) # transform
            image = image.unsqueeze_(0) #so img is not treated as a batch 
            if torch.cuda.is_available(): 
                image = image.cuda()
            predict = model(image)
            if torch.cuda.is_available(): 
                predictions.append( predict.cpu().detach().numpy()[0])
            else:
                predictions.append( predict.detach().numpy()[0])
        
    df = pd.DataFrame({
        'path': [ f[0] for f in fileslist], 
        'filename': [ f[1] for f in fileslist],
        '0': [ p[0] for p in predictions ],
        '1': [ p[1] for p in predictions ]        
    })
    print(df.head())
    
    return df

In [None]:
df = test_single_slices(model_scan_type)
# save data
df.to_csv(f"datalabel_prediction_{version}_{model_scan_type}.csv", index=False)

In [None]:
df_result = df.copy()
df_result['BraTS21ID'] = [ int(f.split("_")[0]) for f in df_result['filename']]
df_result.drop(['filename', 'path'], axis=1, inplace=True)
df_result = df_result.groupby(['BraTS21ID']).sum()
df_result['MGMT_value'] = [ 0.0 if df_result.iloc[idx]['0'] > df_result.iloc[idx]['1'] 
                           else 1.0
                           for idx in range(df_result.shape[0])] 
df_result.drop(['0', '1'], axis=1, inplace=True)
print(df_result.head())
print(df_result.shape)
df_result.to_csv('submission.csv')

In [None]:
df_result = df.copy()
df_result['BraTS21ID'] = [ int(f.split("_")[0]) for f in df_result['filename']]
df_result.drop(['filename', 'path'], axis=1, inplace=True)
df_result['MGMT_value_0'] = [ 1.0 if df_result.iloc[idx]['0'] > df_result.iloc[idx]['1'] 
                           else 0.0
                           for idx in range(df_result.shape[0])] 
df_result['MGMT_value_1'] = [ 0.0 if df_result.iloc[idx]['0'] > df_result.iloc[idx]['1'] 
                           else 1.0
                           for idx in range(df_result.shape[0])] 
df_result.drop(['0', '1'], axis=1, inplace=True)
df_result = df_result.groupby(['BraTS21ID']).sum()
df_result['MGMT_value'] = df_result['MGMT_value_1']/(df_result['MGMT_value_0']+df_result['MGMT_value_1'])
df_result.drop(['MGMT_value_1', 'MGMT_value_0'], axis=1, inplace=True)
print(df_result.head())
print(df_result.shape)
df_result.to_csv('submission2.csv')

## Cleanup

In [None]:
import shutil

for d in ['data_1.4_predict',  'data_1.4_filtered',
           'data_1.4', 
           ]:
    try:
        shutil.rmtree(d)
    except Exception as e:
        print(e)
        
for f in ['datalabel_prediction_1.4.4_t2.csv', 'model_1.4.4_t2.pth', 'submission2.csv']:
    or.remove(f)

print(listdir("./"))