<h1><center> </center></h1>

<div class="list-group" id="list-tab" role="tablist">
<h1 class="list-group-item list-group-item-action active" data-toggle="list" style='background:maroon; border:1; color:white' role="tab" aria-controls="home"><center>RSNA | EDA + Visual + DeepUnderstanding + W&B</center></h1>


<center><img src = "https://www.radiologybusiness.com/sites/default/files/2019-12/rsna_copy.jpg" width = "550" height = "300"/></center>                                                                                               

<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Contents</center></h2>

1. [Competition Overview](#competition-overview)  
2. [Understanding MRI](#understanding-mri)
3. [Libraries](#libraries)  
4. [Weights and Biases](#weights-and-biases)
5. [Global Config](#global-config)
6. [Load Datasets](#load-datasets)  
7. [Tabular Exploration](#tabular-exploration)  
8. [Model](#model)
9. [wandb System Metrics](#wandb-system-metrics)
9. [References](#references)  

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='background:purple; border:0; color:white' role="tab" aria-controls="home"><center>If you find this notebook useful, do give me an upvote, it helps to keep up my motivation. This notebook will be updated frequently so keep checking for furthur developments.</center></h3>

<a id="competition-overview"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Competition Overview</center></h2>

## Description

The Radiological Society of North America (RSNA) has teamed up with the Medical Image Computing and Computer Assisted Intervention Society (the MICCAI Society) to improve diagnosis and treatment planning for patients with glioblastoma. 

In this competition you will predict the genetic subtype of glioblastoma using MRI (magnetic resonance imaging) scans to train and test your model to detect for the presence of MGMT promoter methylation.

If successful, you'll help brain cancer patients receive less invasive diagnoses and treatments. The introduction of new and customized treatment strategies before surgery has the potential to improve the management, survival, and prospects of patients with brain cancer.

## Evaluation Criteria

Submissions are evaluated on the [area under the ROC curve](http://en.wikipedia.org/wiki/Receiver_operating_characteristic) between the predicted probability and the observed target.

<a id="understanding-mri"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Understanding MRI</center></h2>

**Magnetic resonance imaging (MRI)** is one of the most commonly used tests in neurology and neurosurgery. MRI provides exquisite detail of brain, spinal cord and vascular anatomy, and has the advantage of being able to visualize anatomy in all three planes: axial, sagittal and coronal (see the example image below).



<center><img src = "https://case.edu/med/neurology/NR/mri%20slices%20new.jpg"/></center> 

MRI has an advantage over CT in being able to detect flowing blood and cryptic vascular malformations. It can also detect demyelinating disease, and has no beam-hardening artifacts such as can be seen with CT. 

Thus, the posterior fossa is more easily visualized on MRI than CT. Imaging is also performed without any ionizing radiation.

## MRI Imaging Sequences

The most common MRI sequences are **T1-weighted** and **T2-weighted** scans. 

- **T1-weighted** images are produced by using short TE and TR times. The contrast and brightness of the image are predominately determined by T1 properties of tissue. 

- **T2-weighted** images are produced by using longer TE and TR times. In these images, the contrast and brightness are predominately determined by the T2 properties of tissue.

In general, T1- and T2-weighted images can be easily differentiated by looking the CSF. **CSF** is dark on T1-weighted imaging and bright on T2-weighted imaging.

A third commonly used sequence is the **Fluid Attenuated Inversion Recovery (Flair)**. The Flair sequence is similar to a T2-weighted image except that the TE and TR times are very long. By doing so, abnormalities remain bright but normal CSF fluid is attenuated and made dark. This sequence is very sensitive to pathology and makes the differentiation between CSF and an abnormality much easier.

## Comparison of T1 vs T2 vs Flair (Brain)
<center><img src = "https://case.edu/med/neurology/NR/t1t2flairbrain.jpg"/></center> 

## Comparison of T1 vs T1 with Gadolinium
<center><img src = "https://case.edu/med/neurology/NR/T1%20T1%20gad.jpg"/></center> 

## Comparison of Flair vs Diffusion Weighted
<center><img src = "https://case.edu/med/neurology/NR/flairdwicom.jpg"/></center> 

## Comparison of T1 vs T2 - Spine
<center><img src = "https://case.edu/med/neurology/NR/t1t2spine.jpg"/></center> 

<a id="libraries"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Libraries</center></h2>

In [None]:
import json
import glob
import random
import collections

import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut
import cv2

import numpy as np
import pandas as pd

import os
import plotly.express as px

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import cv2

#Text Color
from termcolor import colored

package_path = "../input/efficientnet-pytorch/EfficientNet-PyTorch/EfficientNet-PyTorch-master/"
import sys 
sys.path.append(package_path)

import time

import torch
from torch import nn
from torch.utils import data as torch_data
from sklearn import model_selection as sk_model_selection
from torch.nn import functional as torch_functional
import efficientnet_pytorch

from sklearn.model_selection import StratifiedKFold

# W&B for experiment tracking
import wandb
wandb.login()

<a id="weights-and-biases"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Weights and Biases</center></h2>

<center><img src = "https://i.imgur.com/1sm6x8P.png" width = "750" height = "500"/></center>  

**Weights & Biases** is the machine learning platform for developers to build better models faster. 

You can use W&B's lightweight, interoperable tools to 
- quickly track experiments, 
- version and iterate on datasets, 
- evaluate model performance, 
- reproduce models, 
- visualize results and spot regressions, 
- and share findings with colleagues. 

Set up W&B in 5 minutes, then quickly iterate on your machine learning pipeline with the confidence that your datasets and models are tracked and versioned in a reliable system of record.

In this notebook I will use Weights and Biases's amazing features to perform wonderful visualizations seamlessly. 

<a id="global-config"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Global Config</center></h2>

In [None]:
class config:
    DIRECTORY_PATH = "../input/rsna-miccai-brain-tumor-radiogenomic-classification"
    TRAIN_LABELS_PATH = DIRECTORY_PATH + "/train_labels.csv"
    
# wandb config
WANDB_CONFIG = {
    'competition': 'rsna-miccai-brain', 
          '_wandb_kernel': 'neuracort'
}

In [None]:
def set_seed(seed):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True


set_seed(42)

<a id="load-datasets"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Load Datasets</center></h2>

## Data Files

- **train/** - folder containing the training files, with each top-level folder representing a subject

- **train_labels.csv** - file containing the target MGMT_value for each subject in the training data (e.g. the presence of MGMT promoter methylation)

- **test/** - the test files, which use the same structure as train/; your task is to predict the MGMT_value for each subject in the test data. NOTE: the total size of the rerun test set (Public and Private) is ~5x the size of the Public test set

- **sample_submission.csv** - a sample submission file in the correct format

In [None]:
train_df = pd.read_csv(config.TRAIN_LABELS_PATH)
train_df.head()

<a id="tabular-exploration"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Tabular Exploration</center></h2>

## Basic Tabular Details

In [None]:
# Function to print stylized text
def style_text(text, text_color = 'yellow', attributes = ['bold'], data = False):
    """
    Function to stylize print Text using Colored by Termcolor
    
    parameters: text(str) - Input Text to be Stylized
                text_color(str) - Color of text
                attributes(list of strings) - Attributes to be applied on text
                data - To be printed with text 
    """
    if data:
        print(colored(text, text_color, attrs = attributes), data)
        
    else:
        print(colored(text, text_color, attrs = attributes))

In [None]:
# Data shape
style_text("No. of Rows in train_df: ", data = train_df.shape[0])
style_text("No. of Columns in train_df: ", data = train_df.shape[1])

In [None]:
# Missing Values
style_text("Missing Values in train_df:")
print(train_df.isnull().sum())

Thus, there are no missing values in the `train_df` dataset.

In [None]:
#Dataset Info
style_text("Info about train_df:")
train_df.info()

## CountPlot for MGMT Value

In [None]:
plt.figure(figsize=(5, 5))
sns.countplot(data=train_df, x="MGMT_value");

<a id="model"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Model</center></h2>

Credits to the model goes to [
Yaroslav Isaienkov](https://www.kaggle.com/ihelon)

I will take his model a step ahead and incorporate Weights and Biases in it with the explanation on how to utilise it.

**Wandb Step 1:** In the first step we need to initialize wandb with the name of a `project` where we want to save our runs.

In [None]:
wandb.init(project='brain-tumor-viz', config=WANDB_CONFIG)

In [None]:
def load_dicom(path):
    dicom = pydicom.read_file(path)
    data = dicom.pixel_array
    data = data - np.min(data)
    if np.max(data) != 0:
        data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
    return data

In [None]:
df = pd.read_csv("../input/rsna-miccai-brain-tumor-radiogenomic-classification/train_labels.csv")
df_train, df_valid = sk_model_selection.train_test_split(
    df, 
    test_size=0.2, 
    random_state=42, 
    stratify=train_df["MGMT_value"],
)

In [None]:
class DataRetriever(torch_data.Dataset):
    def __init__(self, paths, targets):
        self.paths = paths
        self.targets = targets
          
    def __len__(self):
        return len(self.paths)
    
    def __getitem__(self, index):
        _id = self.paths[index]
        patient_path = f"../input/rsna-miccai-brain-tumor-radiogenomic-classification/train/{str(_id).zfill(5)}/"
        channels = []
        for t in ("FLAIR", "T1w", "T1wCE"): # "T2w"
            t_paths = sorted(
                glob.glob(os.path.join(patient_path, t, "*")), 
                key=lambda x: int(x[:-4].split("-")[-1]),
            )
            # start, end = int(len(t_paths) * 0.475), int(len(t_paths) * 0.525)
            x = len(t_paths)
            if x < 10:
                r = range(x)
            else:
                d = x // 10
                r = range(d, x - d, d)
                
            channel = []
            # for i in range(start, end + 1):
            for i in r:
                channel.append(cv2.resize(load_dicom(t_paths[i]), (256, 256)) / 255)
            channel = np.mean(channel, axis=0)
            channels.append(channel)
            
        y = torch.tensor(self.targets[index], dtype=torch.float)
        
        return {"X": torch.tensor(channels).float(), "y": y}

In [None]:
train_data_retriever = DataRetriever(
    df_train["BraTS21ID"].values, 
    df_train["MGMT_value"].values, 
)

valid_data_retriever = DataRetriever(
    df_valid["BraTS21ID"].values, 
    df_valid["MGMT_value"].values,
)

In [None]:
plt.figure(figsize=(16, 6))
for i in range(3):
    plt.subplot(1, 3, i + 1)
    plt.imshow(train_data_retriever[100]["X"].numpy()[i], cmap="gray")

In [None]:
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = efficientnet_pytorch.EfficientNet.from_name("efficientnet-b0")
        checkpoint = torch.load("../input/efficientnet-pytorch/efficientnet-b0-08094119.pth")
        self.net.load_state_dict(checkpoint)
        n_features = self.net._fc.in_features
        self.net._fc = nn.Linear(in_features=n_features, out_features=1, bias=True)
    
    def forward(self, x):
        out = self.net(x)
        return out

In [None]:
class LossMeter:
    def __init__(self):
        self.avg = 0
        self.n = 0

    def update(self, val):
        self.n += 1
        # incremental update
        self.avg = val / self.n + (self.n - 1) / self.n * self.avg

        
class AccMeter:
    def __init__(self):
        self.avg = 0
        self.n = 0
        
    def update(self, y_true, y_pred):
        y_true = y_true.cpu().numpy().astype(int)
        y_pred = y_pred.cpu().numpy() >= 0
        last_n = self.n
        self.n += len(y_true)
        true_count = np.sum(y_true == y_pred)
        # incremental update
        self.avg = true_count / self.n + last_n / self.n * self.avg

**Wandb Step 2:** In this example we are going to log the Training and Validation losses. To do this we need to instruct wandb to `watch` the `model`

In [None]:
class Trainer:
    def __init__(
        self, 
        model, 
        device, 
        optimizer, 
        criterion, 
        loss_meter, 
        score_meter
    ):
        self.model = model
        self.device = device
        self.optimizer = optimizer
        self.criterion = criterion
        self.loss_meter = loss_meter
        self.score_meter = score_meter
        
        self.best_valid_score = -np.inf
        self.n_patience = 0
        
        self.messages = {
            "epoch": "[Epoch {}: {}] loss: {:.5f}, score: {:.5f}, time: {} s",
            "checkpoint": "The score improved from {:.5f} to {:.5f}. Save model to '{}'",
            "patience": "\nValid score didn't improve last {} epochs."
        }
    
    def fit(self, epochs, train_loader, valid_loader, save_path, patience):        
        for n_epoch in range(1, epochs + 1):
            self.info_message("EPOCH: {}", n_epoch)
            
            train_loss, train_score, train_time = self.train_epoch(train_loader)
            valid_loss, valid_score, valid_time = self.valid_epoch(valid_loader)
            
            self.info_message(
                self.messages["epoch"], "Train", n_epoch, train_loss, train_score, train_time
            )
            
            self.info_message(
                self.messages["epoch"], "Valid", n_epoch, valid_loss, valid_score, valid_time
            )

            if True:
#             if self.best_valid_score < valid_score:
                self.info_message(
                    self.messages["checkpoint"], self.best_valid_score, valid_score, save_path
                )
                self.best_valid_score = valid_score
                self.save_model(n_epoch, save_path)
                self.n_patience = 0
            else:
                self.n_patience += 1
            
            if self.n_patience >= patience:
                self.info_message(self.messages["patience"], patience)
                break
            
    def train_epoch(self, train_loader):
        
        wandb.watch(model)    # Use wandb.watch() to provide the model to be logged upon

        self.model.train()
        t = time.time()
        train_loss = self.loss_meter()
        train_score = self.score_meter()
                
        for step, batch in enumerate(train_loader, 1):
            X = batch["X"].to(self.device)
            targets = batch["y"].to(self.device)
            self.optimizer.zero_grad()
            outputs = self.model(X).squeeze(1)
            
            loss = self.criterion(outputs, targets)
            wandb.log({"train_loss": loss})    # Use wandb.log() to log desired metrics 
            
            loss.backward()

            train_loss.update(loss.detach().item())
            train_score.update(targets, outputs.detach())

            self.optimizer.step()
            
            _loss, _score = train_loss.avg, train_score.avg
            message = 'Train Step {}/{}, train_loss: {:.5f}, train_score: {:.5f}'
            self.info_message(message, step, len(train_loader), _loss, _score, end="\r")
        
        return train_loss.avg, train_score.avg, int(time.time() - t)
    
    def valid_epoch(self, valid_loader):
        self.model.eval()
        t = time.time()
        valid_loss = self.loss_meter()
        valid_score = self.score_meter()

        for step, batch in enumerate(valid_loader, 1):
            with torch.no_grad():
                X = batch["X"].to(self.device)
                targets = batch["y"].to(self.device)

                outputs = self.model(X).squeeze(1)
                loss = self.criterion(outputs, targets)
                wandb.log({"valid_loss": loss})    # Use wandb.log() to log desired metrics 

                valid_loss.update(loss.detach().item())
                valid_score.update(targets, outputs)
                
            _loss, _score = valid_loss.avg, valid_score.avg
            message = 'Valid Step {}/{}, valid_loss: {:.5f}, valid_score: {:.5f}'
            self.info_message(message, step, len(valid_loader), _loss, _score, end="\r")
        
        return valid_loss.avg, valid_score.avg, int(time.time() - t)
    
    def save_model(self, n_epoch, save_path):
        torch.save(
            {
                "model_state_dict": self.model.state_dict(),
                "optimizer_state_dict": self.optimizer.state_dict(),
                "best_valid_score": self.best_valid_score,
                "n_epoch": n_epoch,
            },
            save_path,
        )
    
    @staticmethod
    def info_message(message, *args, end="\n"):
        print(message.format(*args), end=end)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_data_retriever = DataRetriever(
    df_train["BraTS21ID"].values, 
    df_train["MGMT_value"].values, 
)

valid_data_retriever = DataRetriever(
    df_valid["BraTS21ID"].values, 
    df_valid["MGMT_value"].values,
)

train_loader = torch_data.DataLoader(
    train_data_retriever,
    batch_size=8,
    shuffle=True,
    num_workers=8,
)

valid_loader = torch_data.DataLoader(
    valid_data_retriever, 
    batch_size=8,
    shuffle=False,
    num_workers=8,
)

model = Model()
model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = torch_functional.binary_cross_entropy_with_logits

trainer = Trainer(
    model, 
    device, 
    optimizer, 
    criterion, 
    LossMeter, 
    AccMeter
)

history = trainer.fit(
    1, 
    train_loader, 
    valid_loader, 
    f"best-model-0.pth", 
    100,
)

And that's it, with just 2 Steps you have successfully integrated wandb to your project. Now you can go to your dashboard and check the logged metrics. You can even follow the same method to log the hyperparameters.

In [None]:
models = []
for i in range(1):
    model = Model()
    model.to(device)
    
    checkpoint = torch.load(f"best-model-{i}.pth")
    model.load_state_dict(checkpoint["model_state_dict"])
    model.eval()
    
    models.append(model)

In [None]:
class DataRetriever(torch_data.Dataset):
    def __init__(self, paths):
        self.paths = paths
          
    def __len__(self):
        return len(self.paths)
    
    def __getitem__(self, index):
        _id = self.paths[index]
        patient_path = f"../input/rsna-miccai-brain-tumor-radiogenomic-classification/test/{str(_id).zfill(5)}/"
        channels = []
        for t in ("FLAIR", "T1w", "T1wCE"): # "T2w"
            t_paths = sorted(
                glob.glob(os.path.join(patient_path, t, "*")), 
                key=lambda x: int(x[:-4].split("-")[-1]),
            )
            # start, end = int(len(t_paths) * 0.475), int(len(t_paths) * 0.525)
            x = len(t_paths)
            if x < 10:
                r = range(x)
            else:
                d = x // 10
                r = range(d, x - d, d)
                
            channel = []
            # for i in range(start, end + 1):
            for i in r:
                channel.append(cv2.resize(load_dicom(t_paths[i]), (256, 256)) / 255)
            channel = np.mean(channel, axis=0)
            channels.append(channel)
        
        return {"X": torch.tensor(channels).float(), "id": _id}

In [None]:
submission = pd.read_csv("../input/rsna-miccai-brain-tumor-radiogenomic-classification/sample_submission.csv")

test_data_retriever = DataRetriever(
    submission["BraTS21ID"].values, 
)

test_loader = torch_data.DataLoader(
    test_data_retriever,
    batch_size=4,
    shuffle=False,
    num_workers=8,
)

In [None]:
y_pred = []
ids = []

for e, batch in enumerate(test_loader):
    print(f"{e}/{len(test_loader)}", end="\r")
    with torch.no_grad():
        tmp_pred = np.zeros((batch["X"].shape[0], ))
        for model in models:
            tmp_res = torch.sigmoid(model(batch["X"].to(device))).cpu().numpy().squeeze()
            tmp_pred += tmp_res
        y_pred.extend(tmp_pred)
        ids.extend(batch["id"].numpy().tolist())

In [None]:
submission = pd.DataFrame({"BraTS21ID": ids, "MGMT_value": y_pred})
submission.to_csv("submission.csv", index=False)

In [None]:
submission

<a id="wandb-system-metrics"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>wandb System Metrics</center></h2>

Yet another interesting fact is that you can view your hardware utilization too in the wandb dashboard. I am putting up some examples here for reference. These can be viewed in my [project page](https://wandb.ai/ishandutta/brain-tumor-viz/runs/1ipijldy/overview?workspace=user-ishandutta) as well.

In [None]:
# Store all wandb image paths in a list

wandb_img_paths = []
folder_path = "../input/wandb-rsna/wandb_1.png"

for i in range(1, 8):
    path = "../input/wandb-rsna/wandb_" + str(i) + ".png"
    wandb_img_paths.append(path)

In [None]:
def display_img(img_path):
    """
    Function which takes an image path and displays it.
    
    params: img_path(str): Path of Image to be displayed
    """

    fig = matplotlib.pyplot.gcf()
    fig.set_size_inches(25.5, 17.5)

    img = cv2.imread(img_path)

    plt.axis('off')
    plt.imshow(img)

In [None]:
display_img(wandb_img_paths[0])

In [None]:
display_img(wandb_img_paths[1])

In [None]:
display_img(wandb_img_paths[2])

In [None]:
display_img(wandb_img_paths[3])

In [None]:
display_img(wandb_img_paths[4])

In [None]:
display_img(wandb_img_paths[5])

In [None]:
display_img(wandb_img_paths[6])

<a id="references"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>References</center></h2>

> **[01]** [Magnetic Resonance Imaging (MRI) of the Brain and Spine: Basics](https://case.edu/med/neurology/NR/MRI%20Basics.htm)  
> **[02]** [
> Brain Tumor EDA and Interactive Viz with W&B](https://www.kaggle.com/ayuraj/brain-tumor-eda-and-interactive-viz-with-w-b)  
> **[03]** [🧠Brain Tumor🧠 - EDA with Animations and Modeling](https://www.kaggle.com/ihelon/brain-tumor-eda-with-animations-and-modeling/data)

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='background:purple; border:0; color:white' role="tab" aria-controls="home"><center> This notebook will be updated frequently so keep checking for furthur developments.</center></h3>

### Connect with me on [LinkedIn](https://www.linkedin.com/in/ishandutta0098) :-)