In this notebook, I will be going through a step-by-step guide on how to apply statistical clustering methods, computer graphics algorithms, and image processing techniques to medical images to help understand and visualize the data in both 2D and 3D.


NOTE: In order to use plotly you will need to: 
1. Install all necessary packages/extensions following these [instructions](https://plotly.com/python/getting-started/). 
2. Sign up for a free account and activate your key following [these instructions](https://plotly.com/python/getting-started/) (only read the section ‘Initialization for Online Plotting’).

## Step 1: Install Necessary Packages

NOTE: In order to use plotly you will need to: 
1. Install all necessary packages/extensions following these [instructions](https://plotly.com/python/getting-started/). 
2. Sign up for a free account and activate your key following [these instructions](https://plotly.com/python/getting-started/) (only read the section ‘Initialization for Online Plotting’).

In [None]:
!pip install  chart_studio


Now let's load the necessary packages for the whole notebook. You might need to 

In [None]:
# common packages 
import numpy as np 
import os
import copy
from math import *
import matplotlib.pyplot as plt
from functools import reduce
from glob import glob

# reading in dicom files
import pydicom

# skimage image processing packages
from skimage import measure, morphology
from skimage.morphology import ball, binary_closing
from skimage.measure import label, regionprops

# scipy linear algebra functions 
from scipy.linalg import norm
import scipy.ndimage

# ipywidgets for some interactive plots
from ipywidgets.widgets import * 
import ipywidgets as widgets

# plotly 3D interactive graphs 
import plotly
from plotly.graph_objs import *
import chart_studio
chart_studio.tools.set_credentials_file(username='redwankarimsony', api_key='aEbXWsleQv7PJrAtOkBk')
# set plotly credentials here 
# this allows you to send results to your account plotly.tools.set_credentials_file(username=your_username, api_key=your_key)

## Step 2: Loading DICOM Data
Let's set the paths of the dicom files and then we will be able to have a look at them. 

In [None]:
patient_id = '6897fa9de148'
patient_folder = f'../input/rsna-str-pulmonary-embolism-detection/train/{patient_id}/'
data_paths = glob(patient_folder + '/*/*.dcm')

# Print out the first 5 file names to verify we're in the right folder.
print (f'Total of {len(data_paths)} DICOM images.\nFirst 5 filenames:' )
data_paths[:5]

In [None]:
def load_scan(paths):
    slices = [pydicom.read_file(path ) for path in paths]
    slices.sort(key = lambda x: int(x.InstanceNumber), reverse = True)
    try:
        slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
        slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
        
    for s in slices:
        s.SliceThickness = slice_thickness
        
    return slices

def get_pixels_hu(scans):
    image = np.stack([s.pixel_array for s in scans])
    image = image.astype(np.int16)
    # Set outside-of-scan pixels to 0
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    
    # Convert to Hounsfield units (HU)
    intercept = scans[0].RescaleIntercept
    slope = scans[0].RescaleSlope
    
    if slope != 1:
        image = slope * image.astype(np.float64)
        image = image.astype(np.int16)
        
    image += np.int16(intercept)
    
    return np.array(image, dtype=np.int16)

Run the following code to extract DICOM pixels for each slice location and display a single slice:

In [None]:
# set path and load files 
patient_dicom = load_scan(data_paths)
patient_pixels = get_pixels_hu(patient_dicom)
#sanity check
plt.imshow(patient_pixels[80], cmap=plt.cm.bone)

## Step 3: Image Processing
Lets use some thresholding and morphological operations to segment just the lung from the chest:

In [None]:
def largest_label_volume(im, bg=-1):
    vals, counts = np.unique(im, return_counts=True)
    counts = counts[vals != bg]
    vals = vals[vals != bg]
    if len(counts) > 0:
        return vals[np.argmax(counts)]
    else:
        return None
    
def segment_lung_mask(image, fill_lung_structures=True):
    # not actually binary, but 1 and 2. 
    # 0 is treated as background, which we do not want
    binary_image = np.array(image >= -700, dtype=np.int8)+1
    labels = measure.label(binary_image)
 
    # Pick the pixel in the very corner to determine which label is air.
    # Improvement: Pick multiple background labels from around the  patient
    # More resistant to “trays” on which the patient lays cutting the air around the person in half
    background_label = labels[0,0,0]
 
    # Fill the air around the person
    binary_image[background_label == labels] = 2
 
    # Method of filling the lung structures (that is superior to 
    # something like morphological closing)
    if fill_lung_structures:
        # For every slice we determine the largest solid structure
        for i, axial_slice in enumerate(binary_image):
            axial_slice = axial_slice - 1
            labeling = measure.label(axial_slice)
            l_max = largest_label_volume(labeling, bg=0)
 
            if l_max is not None: #This slice contains some lung
                binary_image[i][labeling != l_max] = 1
    binary_image -= 1 #Make the image actual binary
    binary_image = 1-binary_image # Invert it, lungs are now 1
 
    # Remove other air pockets inside body
    labels = measure.label(binary_image, background=0)
    l_max = largest_label_volume(labels, bg=0)
    if l_max is not None: # There are air pockets
        binary_image[labels != l_max] = 0
 
    return binary_image

By running the code below, you are using skimage functions from above to create a mask that covers the lung. We will use both `fill_lung_structures=True` and `fill_lung_structures=False`, to isolate the lung and the internal structures. Let’s run the code below, and display an example of isolating the lung from the chest:

In [None]:
# get masks 
segmented_lungs = segment_lung_mask(patient_pixels, fill_lung_structures=False)
segmented_lungs_fill = segment_lung_mask(patient_pixels, fill_lung_structures=True)
internal_structures = segmented_lungs_fill - segmented_lungs

# isolate lung from chest
copied_pixels = copy.deepcopy(patient_pixels)
for i, mask in enumerate(segmented_lungs_fill): 
    get_high_vals = mask == 0
    copied_pixels[i][get_high_vals] = 0
seg_lung_pixels = copied_pixels
# sanity check
f, ax = plt.subplots(1,2, figsize=(10,6))
ax[0].imshow(patient_pixels[80], cmap=plt.cm.bone)
ax[0].axis(False)
ax[0].set_title('Original')
ax[1].imshow(seg_lung_pixels[80], cmap=plt.cm.bone)
ax[1].axis(False)
ax[1].set_title('Segmented')
plt.show()

If it looks like this, then perfect! You just isolated the lung from the rest of the scan. Don’t worry about the masks yet, I will show you some visualizations later on.

## Step 3: 2D Visualizations Techniques
**Non Interactive:**
When visualizing data, I find very beneficial to visualize each process of your script. This will not only help you understand each step of your code, but it makes for a very nice, clean presentation.
The code below will, in essence, visually convey a story on how you segmented your data: 

&#9632; original image <br>
&#9632; the binary mask that covers the lung <br>
&#9632; highlighting internal structures of the lung using one of the masks <br>
&#9632; extracting internal structures using GK clustering

In [None]:
f, ax = plt.subplots(2,2, figsize = (10,10))

# pick random slice 
slice_id = 80

ax[0,0].imshow(patient_pixels[slice_id], cmap=plt.cm.bone)
ax[0,0].set_title('Original Dicom')
ax[0,0].axis(False)


ax[0,1].imshow(segmented_lungs_fill[slice_id], cmap=plt.cm.bone)
ax[0,1].set_title('Lung Mask')
ax[0,1].axis(False)

ax[1,0].imshow(seg_lung_pixels[slice_id], cmap=plt.cm.bone)
ax[1,0].set_title('Segmented Lung')
ax[1,0].axis(False)

ax[1,1].imshow(seg_lung_pixels[slice_id], cmap=plt.cm.bone)
ax[1,1].imshow(internal_structures[slice_id], cmap='jet', alpha=0.7)
ax[1,1].set_title('Segmentation with \nInternal Structure')
ax[1,1].axis(False)



## Interactive(1):
There are really only a few ways to display multiple images on jupyter notebook — manually plot each image one by one, or make a plot with multiple columns and rows to display in one figure. Unfortunately, this is not helpful for those that want to scan through each image and get a better understanding of the data. With the code below you create an interactive slide bar that lets you scroll through the images <font color=red>(But the catch is that you have to run the code in **interactive window**. So if you want to use the slider window to browse through the slices, fork the code and run it manually in interactive mode)</font>:

In [None]:
# slide through dicom images using a slide bar 
plt.figure(1)
def dicom_animation(x):
    plt.imshow(patient_pixels[x], cmap = plt.cm.gray)
    return x
interact(dicom_animation, x=(0, len(patient_pixels)-1))

## Interactive(2):
Another way to visualize the CT-Angiograms in a bit lively fashion is to use gif images. Basically GIFs are a series of images shown at an preconfigured interval automatically. <font color=blue>However, the upper side of GIF is that, it stores images in lossless compression. </blue>. So it is highly efficent and accurate for storing CT scan slides. 

In [None]:
import imageio
from IPython import display
print('Original Image Slices before processing')
imageio.mimsave(f'./{patient_id}.gif', patient_pixels, duration=0.1)
display.Image(f'./{patient_id}.gif', format='png')

In [None]:
print('Lung Segmentation Mask')
imageio.mimsave(f'./{patient_id}.gif', segmented_lungs_fill, duration=0.1)
display.Image(f'./{patient_id}.gif', format='png')

In [None]:
print('Segmented Part of Lung Tissue')
imageio.mimsave(f'./{patient_id}.gif', seg_lung_pixels, duration=0.1)
display.Image(f'./{patient_id}.gif', format='png')

However, among the previous 3 GIFs, one of the most important is the Lung Segmentation Mask. You can see there are several images where we can see that the mask only selects the tissue portion of the lung but it is not considering the air vessels thorugh the lungs i.e. Bronchioles.

<br>
<font color=blue>(**Bronchioles** are air passages inside the lungs that branch off like tree limbs from the bronchi—the two main air passages into which air flows from the trachea (windpipe) after being inhaled through the nose or mouth. The bronchioles deliver air to tiny sacs called alveoli where oxygen and carbon dioxide are exchanged. They are vulnerable to conditions like asthma, bronchiolitis, cystic fibrosis, and emphysema that can cause constriction and/or obstruction of the airways.) </font>

We need to close those gaps so that we can segment the whole lung portion. With this view in mind, we can run closing operation so that it will fill up those portions. 

In [None]:
from skimage.morphology import opening, closing
from skimage.morphology import disk

def plot_comparison(original, filtered, filter_name):

    fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(8, 4), sharex=True,
                                   sharey=True)
    ax1.imshow(original, cmap=plt.cm.gray)
    ax1.set_title('original')
    ax1.axis('off')
    ax2.imshow(filtered, cmap=plt.cm.gray)
    ax2.set_title(filter_name)
    ax2.axis('off')

In [None]:
original = segmented_lungs_fill[65]

rows = 4
cols = 4
f, ax = plt.subplots(rows, cols, figsize = (15,12))

for i in range(rows*cols):
    if i==0:
        ax[0,0].imshow(original, cmap = plt.cm.gray)
        ax[0,0].set_title('Original')
        ax[0,0].axis(False)
    else:
        closed = closing(original, disk(i))
        ax[int(i/rows),int(i % rows)].set_title(f'closed disk({i})')
        ax[int(i/rows),int(i % rows)].imshow(closed, cmap = plt.cm.gray)
        ax[int(i/rows),int(i % rows)].axis('off')
plt.show()   
    

Therefore we can select the desired filters from here and just multiply that with the original image to get lung segmentation from these. Before selecting any particular filter, let's see the segmentation quality of the filters and then we can select the desired filter. 

In [None]:
original_image = patient_pixels[65]
original = segmented_lungs_fill[65]
f, ax = plt.subplots(rows, cols, figsize = (15,15))

for i in range(rows*cols):
    if i==0:
        ax[0,0].imshow(original_image, cmap = plt.cm.gray)
        ax[0,0].set_title('Original')
        ax[0,0].axis(False)
    else:
        closed = closing(original, disk(i))
        ax[int(i/rows),int(i % rows)].set_title(f'closed disk({i})')
        ax[int(i/rows),int(i % rows)].imshow(original_image * closed, cmap = plt.cm.gray)
        ax[int(i/rows),int(i % rows)].axis('off')
plt.show()   

Now it is super clear that filter size 15 is a good filter for lung segmentation. <font color=red></font>

![](https://www.clipartmax.com/png/middle/265-2655834_work-in-progress-icon.png)





### In the meantime, check out my other ongoing works in this same competition: 
💥 [RSNA-STR Pulmonary Embolism [Dummy Sub]](https://www.kaggle.com/redwankarimsony/rsna-str-pulmonary-embolism-dummy-sub)<br>
💥 [CT-Scans, DICOM files, Windowing Explained](https://www.kaggle.com/redwankarimsony/ct-scans-dicom-files-windowing-explained)<br>
💥 [RSNA-STR-PE [Gradient & Sigmoid Windowing]](https://www.kaggle.com/redwankarimsony/rsna-str-pe-gradient-sigmoid-windowing)<br>
💥 [RSNA-STR [✔️3D Stacking ✔️3D Plot ✔️Segmentation]](https://www.kaggle.com/redwankarimsony/rsna-str-3d-stacking-3d-plot-segmentation/edit/run/42517982)<br>
💥 [RSNA-STR [DICOM 👉 GIF 👉 npy]](https://www.kaggle.com/redwankarimsony/rsna-str-dicom-gif-npy)<br>
💥 [RSNA-STR Pulmonary Embolism [EDA]](https://www.kaggle.com/redwankarimsony/rsna-str-pulmonary-embolism-eda)<br>