# RSNA Intracranial Hemorrhage Detection
## Identify acute intracranial hemorrhage and its subtypes


![](https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41746-017-0015-z/MediaObjects/41746_2017_15_Fig3_HTML.jpg)

<br>

**Intracranial hemorrhage**, bleeding that occurs inside the cranium, is a serious health problem requiring rapid and often intensive medical treatment. For example, intracranial hemorrhages account for approximately 10% of strokes in the U.S., where stroke is the fifth-leading cause of death. Identifying the location and type of any hemorrhage present is a critical step in treating the patient.

**Diagnosis** requires an urgent procedure. When a patient shows acute neurological symptoms such as severe headache or loss of consciousness, highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming.

**The challenge** is to build an algorithm to detect acute intracranial hemorrhage and its subtypes. 

<br>
### <span style="color:red"> IMPORTANT: </span> I'll update this kernels almost every day, stay tuned :)
<br>

# Table of Contents

1. [EDA](#EDA)
2. [Visualization & Augmentations](#Visualization)
3. [Model](#Model)
4. [Submission](#Submission)

<br>

### References:

- [Basic EDA + Data Visualization 🧠 ](https://www.kaggle.com/marcovasquez/basic-eda-data-visualization)
- [Simple EDA](https://www.kaggle.com/currypurin/simple-eda)
- [Basic EDA + albumentations augs](https://www.kaggle.com/alimbekovkz/basic-eda-albumentations-augs)

<br>
## Hemorrhage Types

**You can find more information [here](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/overview/hemorrhage-types)**

> Hemorrhage in the head (intracranial hemorrhage) is a relatively common condition that has many causes ranging from trauma, stroke, aneurysm, vascular malformations, high blood pressure, illicit drugs and blood clotting disorders. The neurologic consequences also vary extensively depending upon the size, type of hemorrhage and location ranging from headache to death. The role of the Radiologist is to detect the hemorrhage, characterize the hemorrhage subtype, its size and to determine if the hemorrhage might be jeopardizing critical areas of the brain that might require immediate surgery. 

![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F603584%2F56162e47358efd77010336a373beb0d2%2Fsubtypes-of-hemorrhage.png?generation=1568657910458946&alt=media)

<br>

## Model example

![](https://s3.amazonaws.com/zapnito/uploads/bcf25032e0801dcfebd72ff2f6a2a064/95a8da83-49b5-4081-af62-371a460fc9f0.jpeg)

> [Explainable, Radiologist Mimicking, Deep-Learning for Detection of Acute Intracranial Haemorrhage from Small CT Datasets](https://bioengineeringcommunity.nature.com/users/203140-michael-h-lev-md-faha-facr/posts/42310-explainable-radiologist-mimicking-deep-learning-for-detection-of-acute-intracranial-haemorrhage-from-small-ct-datasets)

<br>

## Metric

**Weighted multi-label logarithmic loss**

- [What is Log Loss?](https://www.kaggle.com/dansbecker/what-is-log-loss)
- [sklearn.metrics.log_loss](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html)

In [None]:
import numpy as np
import pandas as pd
import os
import pydicom
import matplotlib.pyplot as plt
import seaborn as sns
import json
import cv2

print ('Packages ready!')

In [None]:
ls ../input/rsna-intracranial-hemorrhage-detection/

### Load data

In [None]:
train = pd.read_csv("../input/rsna-intracranial-hemorrhage-detection/stage_1_train.csv")
sub = pd.read_csv("../input/rsna-intracranial-hemorrhage-detection/stage_1_sample_submission.csv")
train_images = os.listdir("../input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/")
test_images = os.listdir("../input/rsna-intracranial-hemorrhage-detection/stage_1_test_images/")
print ('Train:', train.shape[0])
print ('Sub:', sub.shape[0])

# EDA

### [Data Description](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data)

The training data is provided as a set of image Ids and multiple labels, one for each of five sub-types of hemorrhage, plus an additional label for any, which should always be true if any of the sub-type labels is true.

There is also a target column, ```Label```, indicating the probability of whether that type of hemorrhage exists in the indicated image.

There will be **6** rows per image ```Id```. The label indicated by a particular row will look like ```[Image Id]_[Sub-type Name]```, as follows:

```
Id,Label
1_epidural_hemorrhage,0
1_intraparenchymal_hemorrhage,0
1_intraventricular_hemorrhage,0
1_subarachnoid_hemorrhage,0.6
1_subdural_hemorrhage,0
1_any,0.9
```

In [None]:
train['type'] = train['ID'].str.split("_", n = 3, expand = True)[2]
train['PatientID'] = train['ID'].str.split("_", n = 3, expand = True)[1]
train['filename'] = train['ID'].apply(lambda st: "ID_" + st.split('_')[1] + ".png")

sub['filename'] = sub['ID'].apply(lambda st: "ID_" + st.split('_')[1] + ".png")
sub['type'] = sub['ID'].apply(lambda st: st.split('_')[2])

train.head()

In [None]:
print ('Train type =', list(train.type.unique()))
print ('Train label =', list(train.Label.unique()))
#train.to_csv('train.csv', index=False)

### Basic Counts

In [None]:
print ('Number of Patients: ', train.PatientID.nunique())

**Type freq**

- We have the same amount of pictures per type! 

In [None]:
train.type.value_counts()

**Labels**

> **imbalanced data !**

In [None]:
print(train.Label.value_counts())
sns.countplot(x='Label', data=train)

But, let's see better **Labels per Type** ...

In [None]:
train.groupby('type').Label.value_counts()

In [None]:
sns.countplot(x="Label", hue="type", data=train)

## Visualization

As you can read at the post [Window level and width on CT](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109328#latest-629856)
> Intracranial hemorrhages are better visualized with a brain window (level = 40, width = 80) than the default non normalized HU values.

See: https://www.kaggle.com/omission/eda-view-dicom-images-with-correct-windowing



In [None]:
TRAIN_IMG_PATH = "../input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/"
TEST_IMG_PATH = "../input/rsna-intracranial-hemorrhage-detection/stage_1_test_images/"
BASE_PATH = '/kaggle/input/rsna-intracranial-hemorrhage-detection/'
TRAIN_DIR = 'stage_1_train_images/'
TEST_DIR = 'stage_1_test_images/'

def window_image(img, window_center,window_width, intercept, slope, rescale=True):

    img = (img*slope +intercept)
    img_min = window_center - window_width//2
    img_max = window_center + window_width//2
    img[img<img_min] = img_min
    img[img>img_max] = img_max
    
    if rescale:
        # Extra rescaling to 0-1, not in the original notebook
        img = (img - img_min) / (img_max - img_min)
    
    return img
    
def get_first_of_dicom_field_as_int(x):
    #get x[0] as in int is x is a 'pydicom.multival.MultiValue', otherwise get int(x)
    if type(x) == pydicom.multival.MultiValue:
        return int(x[0])
    else:
        return int(x)

def get_windowing(data):
    dicom_fields = [data[('0028','1050')].value, #window center
                    data[('0028','1051')].value, #window width
                    data[('0028','1052')].value, #intercept
                    data[('0028','1053')].value] #slope
    return [get_first_of_dicom_field_as_int(x) for x in dicom_fields]

    
    
def view_images(images, title = '', aug = None):
    width = 5
    height = 2
    fig, axs = plt.subplots(height, width, figsize=(15,5))
    
    for im in range(0, height * width):
        data = pydicom.read_file(os.path.join(TRAIN_IMG_PATH,'ID_'+images[im]+ '.dcm'))
        image = data.pixel_array
        window_center , window_width, intercept, slope = get_windowing(data)
        image_windowed = window_image(image, window_center, window_width, intercept, slope)


        i = im // width
        j = im % width
        axs[i,j].imshow(image_windowed, cmap=plt.cm.bone) 
        axs[i,j].axis('off')
        
    plt.suptitle(title)
    plt.show()

In [None]:
case = 5
data = pydicom.dcmread(TRAIN_IMG_PATH+train_images[case])

print(data)
window_center , window_width, intercept, slope = get_windowing(data)


#displaying the image
img = pydicom.read_file(TRAIN_IMG_PATH+train_images[case]).pixel_array

img = window_image(img, window_center, window_width, intercept, slope)
plt.imshow(img, cmap=plt.cm.bone)
plt.grid(False)


In [None]:
view_images(train[(train['type'] == 'epidural') & (train['Label'] == 1)][:10].PatientID.values, title = 'Images with epidural')

In [None]:
view_images(train[(train['type'] == 'intraparenchymal') & (train['Label'] == 1)][:10].PatientID.values, title = 'Images with intraparenchymal')

In [None]:
view_images(train[(train['type'] == 'subarachnoid') & (train['Label'] == 1)][:10].PatientID.values, title = 'Images with subarachnoid')

In [None]:
view_images(train[(train['type'] == 'subdural') & (train['Label'] == 1)][:10].PatientID.values, title = 'Images with subdural')

# Model

- Reference: **[RSNA Intracranial: Simple DenseNet in Keras](https://www.kaggle.com/xhlulu/rsna-intracranial-simple-densenet-in-keras)** by @xhlulu

I'm going to spend here a lot of quota ;)

In [None]:
from keras import layers
from keras.applications import DenseNet121
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import Callback, ModelCheckpoint
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.optimizers import Adam
from tqdm import tqdm

In [None]:
test = pd.DataFrame(sub.filename.unique(), columns=['filename'])
print ('Test:', test.shape[0])
test.head()

In [None]:
np.random.seed(1234)
sample_files = np.random.choice(os.listdir(TRAIN_IMG_PATH), 200000)
sample_df = train[train.filename.apply(lambda x: x.replace('.png', '.dcm')).isin(sample_files)]

pivot_df = sample_df[['Label', 'filename', 'type']].drop_duplicates().pivot(
    index='filename', columns='type', values='Label').reset_index()
print(pivot_df.shape)
pivot_df.head()

In [None]:
def save_and_resize(filenames, load_dir):    
    save_dir = '/kaggle/tmp/'
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    for filename in tqdm(filenames):
        path = load_dir + filename
        new_path = save_dir + filename.replace('.dcm', '.png')
        
        dcm = pydicom.dcmread(path)
        window_center , window_width, intercept, slope = get_windowing(dcm)
        img = dcm.pixel_array
        img = window_image(img, window_center, window_width, intercept, slope)
        
        resized = cv2.resize(img, (224, 224))
        res = cv2.imwrite(new_path, resized)
        if not res:
            print('Failed')

In [None]:
save_and_resize(filenames=sample_files, load_dir=BASE_PATH + TRAIN_DIR)
save_and_resize(filenames=os.listdir(BASE_PATH + TEST_DIR), load_dir=BASE_PATH + TEST_DIR)

## Data Generator

In [None]:
BATCH_SIZE = 32

def create_datagen():
    return ImageDataGenerator(
        zoom_range=0.1,  # set range for random zoom
        # set mode for filling points outside the input boundaries
        fill_mode='constant',
        cval=0.,  # value used for fill_mode = "constant"
        horizontal_flip=True,  # randomly flip images
        vertical_flip=True,  # randomly flip images,
        validation_split=0.2
    )

def create_test_gen():
    return ImageDataGenerator().flow_from_dataframe(
        test,
        directory='/kaggle/tmp/',
        x_col='filename',
        class_mode=None,
        target_size=(224, 224),
        batch_size=BATCH_SIZE,
        shuffle=False
    )

def create_flow(datagen, subset):
    return datagen.flow_from_dataframe(
        pivot_df, 
        directory='/kaggle/tmp/',
        x_col='filename', 
        y_col=['any', 'epidural', 'intraparenchymal', 
               'intraventricular', 'subarachnoid', 'subdural'],
        class_mode='other',
        target_size=(224, 224),
        batch_size=BATCH_SIZE,
        subset=subset
    )

# Using original generator
data_generator = create_datagen()
train_gen = create_flow(data_generator, 'training')
val_gen = create_flow(data_generator, 'validation')
test_gen = create_test_gen()

## DenseNet Model 

In [None]:
densenet = DenseNet121(
    weights='../input/densenet-keras/DenseNet-BC-121-32-no-top.h5',
    include_top=False,
    input_shape=(224,224,3)
)

In [None]:
def build_model():
    model = Sequential()
    model.add(densenet)
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(6, activation='sigmoid'))
    
    model.compile(
        loss='binary_crossentropy',
        optimizer=Adam(lr=0.001),
        metrics=['accuracy']
    )
    
    return model

In [None]:
model = build_model()
model.summary()

## Training

In [None]:
checkpoint = ModelCheckpoint(
    'model.h5', 
    monitor='val_loss', 
    verbose=0, 
    save_best_only=True, 
    save_weights_only=False,
    mode='auto'
)

total_steps = sample_files.shape[0] / BATCH_SIZE

history = model.fit_generator(
    train_gen,
    steps_per_epoch=total_steps * 0.85,
    validation_data=val_gen,
    validation_steps=total_steps * 0.15,
    callbacks=[checkpoint],
    epochs=11
)

In [None]:
with open('history.json', 'w') as f:
    json.dump(history.history, f)

history_df = pd.DataFrame(history.history)
history_df[['loss', 'val_loss']].plot()
history_df[['acc', 'val_acc']].plot()

<br>
# Submission

There will be **6 rows** per image ```Id```. The label indicated by a particular row will look like ```[Image Id]_[Sub-type Name]```, as follows

There is also a target column, ```Label```, indicating the probability of whether that type of hemorrhage exists in the indicated image.

In [None]:
model.load_weights('model.h5')
y_test = model.predict_generator(
    test_gen,
    steps=len(test_gen),
    verbose=1
)

In [None]:
test_df = test.join(pd.DataFrame(y_test, columns = ['any', 'epidural', 'intraparenchymal', 
         'intraventricular', 'subarachnoid', 'subdural']))

# Unpivot table
test_df = test_df.melt(id_vars=['filename'])

# Combine the filename column with the variable column
test_df['ID'] = test_df.filename.apply(lambda x: x.replace('.png', '')) + '_' + test_df.variable
test_df['Label'] = test_df['value']

test_df[['ID', 'Label']].to_csv('submission.csv', index=False)

In [None]:
test_df[['ID', 'Label']].head(10)

<img src="https://cdn.dopl3r.com/memes_files/tom-to-be-continued-meme-uDhLB.jpg" height="300" width="300"> 