# 👨‍⚕️​🩺​ Pulmonary Embolism EDA

***Problem Statement:*** If every breath is strained and painful, it could be a serious and potentially life-threatening condition. A pulmonary embolism (PE) is caused by an artery blockage in the lung. It is time consuming to confirm a PE and prone to overdiagnosis. Machine learning could help to more accurately identify PE cases, which would make management and treatment more effective for patients.

Currently, CT pulmonary angiography (CTPA), is the most common type of medical imaging to evaluate patients with suspected PE. These CT scans consist of hundreds of images that require detailed review to identify clots within the pulmonary arteries. As the use of imaging continues to grow, constraints of radiologists’ time may contribute to delayed diagnosis.

The Radiological Society of North America (RSNA®) has teamed up with the Society of Thoracic Radiology (STR) to help improve the use of machine learning in the diagnosis of PE.

In this competition, you’ll detect and classify PE cases. In particular, you'll use chest CTPA images (grouped together as studies) and your data science skills to enable more accurate identification of PE. If successful, you'll help reduce human delays and errors in detection and treatment.

With 60,000-100,000 PE deaths annually in the United States, it is among the most fatal cardiovascular diseases. Timely and accurate diagnosis will help these patients receive better care and may also improve outcomes.

[A full set of acknowledgments can be found on this page.](https://www.kaggle.com/c/rsna-str-pulmonary-embolism-detection/overview/acknowledgments)

Please upvote and share if you found this useful or have a love one affected by PE ❤️

## Table of contents

1. [Example](#example)
    * [Papers](#papers)
2. [Prepare to start](#prepare)

In [None]:
import re
import gc
import os
import cv2
import glob
import keras
import shutil
import pathlib
import PIL
import numpy as np
import pandas as pd
import seaborn as sb
import pydicom as dcm
import networkx as nx
import tensorflow as tf
import matplotlib.pyplot as plt
from shutil import copyfile
from datetime import datetime
from packaging import version
from tensorflow import keras as ks
from tensorflow.keras import datasets, layers, models
from kaggle_datasets import KaggleDatasets
from mpl_toolkits.mplot3d import Axes3D
from tqdm import tqdm

In [None]:
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
# Show current versions
print('TensorFlow Version: {}'.format(tf.__version__))
print('Eager execution: {}'.format(tf.executing_eagerly()))
print('OpenCV Version:{}'.format(cv2.__version__))
print('Keras Version:{}'.format(ks.__version__))
print('Numpy Version:{}'.format(np.__version__))
print('Pandas Version:{}'.format(pd.__version__))

In [None]:
# Check the number of GPU's that are ready
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
# Check the number of TPU's that are ready
print("Num TPUs Available: ", len(tf.config.list_physical_devices('TPU')))

In [None]:
# Read in CSV
train=pd.read_csv("../input/rsna-str-pulmonary-embolism-detection/train.csv")
print(train)

In [None]:
train.head()

In [None]:
train.shape

In [None]:
train.describe()

In [None]:
train.columns

In [None]:
# Classification labels
column_names=['StudyInstanceUID', 'SeriesInstanceUID', 'SOPInstanceUID',
       'pe_present_on_image', 'negative_exam_for_pe', 'qa_motion',
       'qa_contrast', 'flow_artifact', 'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1',
       'leftsided_pe', 'chronic_pe', 'true_filling_defect_not_pe',
       'rightsided_pe', 'acute_and_chronic_pe', 'central_pe', 'indeterminate']

In [None]:
print(column_names[3:17])

In [None]:
# Lets have a look at the first 3 patients
for index, row in train.head(n=20).iterrows():
    print(index,row)

In [None]:
# Assign each image a condition
pe_present_on_image=(1,0,0,0,0,0,0,0,0,0,0,0,0,0)
negative_exam_for_pe=(0,1,0,0,0,0,0,0,0,0,0,0,0,0)
leftsided_pe=(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
chronic_pe=(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
rightsided_pe=(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
acute_and_chronic_pe=(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
central_pe=(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
indeterminate=(1,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

# Count the amount of conditions
pe_present_on_image_count=0
negative_exam_for_pe_count=0
leftsided_pe_count=0
rightsided_pe_count=0
acute_and_chronic_pe_count=0
central_pe=0
indeterminate=0

# Display the condition of each image to console
for index, row in train.iterrows():
    condition=index, row['pe_present_on_image'],row['negative_exam_for_pe'],row['leftsided_pe'],row['rightsided_pe'],row['acute_and_chronic_pe']
    if condition[3:17]==pe_present_on_image:
        pe_present_on_image_count+=1
        print(condition[0]+"- PE found in images:",pe_present_on_image)
    if condition[3:17]==negative_exam_for_pe:
        negative_exam_for_pe_count+=1
        print("- Negative results:",negative_exam_for_pe)
    if condition[3:17]==leftsided_pe:
        leftsided_pe_count+=1
        print("- Left sided PE:",leftsided_pe)       
    if condition[3:17]==rightsided_pe:
        rightsided_pe_count+=1
        print("- Right sided PE:",rightsided_pe)

In [None]:
# Display the amount of conditions for each category
print("The amount of suspectible PE found in images:",pe_present_on_image_count)
print("The amount of negative results for PE:",negative_exam_for_pe_count)
print("The amount of left sided PE:",leftsided_pe_count)
print("The amount of right sided PE:",rightsided_pe_count)

In [None]:
# Read in CSV
test=pd.read_csv("../input/rsna-str-pulmonary-embolism-detection/test.csv")
print(test)

In [None]:
test.head()

In [None]:
# Show the amount of rows & columns
test.shape

In [None]:
test.describe()

In [None]:
# Read in CSV
sample_submission="../input/rsna-str-pulmonary-embolism-detection/sample_submission.csv"
submission=pd.read_csv(sample_submission)

In [None]:
dcm.dcmread("../input/rsna-str-pulmonary-embolism-detection/train/000f7f114264/9f7378c3b2ab/060f829ca995.dcm")

In [None]:
# Show the first 25 images in the train folder
fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(20,20))
images = glob.glob("../input/rsna-str-pulmonary-embolism-detection/train/000f7f114264/9f7378c3b2ab/*.dcm")
for i, image in enumerate(images):
    if (i == 25) : break
    row = i // 5
    col = i % 5
    axes[row, col].imshow(dcm.dcmread(image).pixel_array)

In [None]:
# convert the color to grayscale 
gray = dcm.dcmread(image).pixel_array

# resize the image(optional)
gray = cv2.resize(gray, (200, 200))

# apply smoothing operation
gray = cv2.blur(gray,(3,3))

# create grid to plot using numpy
xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]]

# create the figure
fig = plt.figure(figsize=(150,150))
ax = fig.gca(projection='3d')
ax.plot_wireframe(xx, yy, gray,rstride=1, cstride=1, cmap=plt.cm.gray,
 linewidth=1)

# rotate 3d plot
for angle in range(180, 360):
    ax.view_init(45, angle)

In [None]:
"""
Practical Computer Vision: Extract Insightful Information from Images Using TensorFlow, Keras, and OpenCV
Book by Abhinav Dadhich
"""
# convert the color to grayscale 
gray = dcm.dcmread(image).pixel_array

# resize the image(optional)
gray = cv2.resize(gray, (800, 800))

# apply smoothing operation
gray = cv2.blur(gray,(3,3))

# create grid to plot using numpy
xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]]

# create the figure
fig = plt.figure(figsize=(50,50))
ax = fig.gca(projection='3d')
ax.contour(xx, yy, gray)

# rotate 3d plot
for angle in range(70, 210):
    ax.view_init(45, angle)

In [None]:
dcm.dcmread("../input/rsna-str-pulmonary-embolism-detection/train/000f7f114264/9f7378c3b2ab/060f829ca995.dcm").pixel_array

In [None]:
images = glob.glob("../input/rsna-str-pulmonary-embolism-detection/train/00db04fdae51/bc1f7e2c4087/*.dcm")
for i, image in tqdm(enumerate(images)):
    #if (i == 3) : break
    # convert the color to grayscale 
    scan = dcm.dcmread(image).pixel_array

    # resize the image(optional)
    scan = cv2.resize(scan, (800, 800))

    # apply smoothing operation
    scan = cv2.blur(scan,(3,3))

    # create grid to plot using numpy
    x, y = np.mgrid[0:scan.shape[0], 0:scan.shape[1]]
    
    # create the figure & 3D Axes
    fig = plt.figure(figsize=(20,20))
    ax = fig.gca(projection='3d')
    # apply contouring
    ax.contourf(x, y, scan)

    # rotate 3D plot
    for angle in range(70, 210):
        ax.view_init(45, angle)

    # turn off axis
    plt.savefig('./'+str(i)+'animation.png')
    # Clear the current figure.
    plt.clf() 
    # Closes all the figure windows.
    plt.close('all')

In [None]:
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import numpy as np
from skimage import measure

def plot_3d(image, threshold=700, color="navy"):
    
    # Position the scan upright, 
    # so the head of the patient would be at the top facing the camera
    p = image.transpose(2,1,0)
    
    verts, faces,_,_ = measure.marching_cubes_lewiner(p, threshold)

    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='3d')

    # Fancy indexing: `verts[faces]` to generate a collection of triangles
    mesh = Poly3DCollection(verts[faces], alpha=0.2)
    mesh.set_facecolor(color)
    ax.add_collection3d(mesh)

    ax.set_xlim(0, p.shape[0])
    ax.set_ylim(0, p.shape[1])
    ax.set_zlim(0, p.shape[2])

    plt.show()

In [None]:
plot_3d(image)

In [None]:
import glob
from PIL import Image

# filepaths
fp_in = "./*.png"
fp_out = "./pe.gif"

# https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#gif
img, *imgs = [Image.open(f) for f in sorted(glob.glob(fp_in))]
img.save(fp=fp_out, format='GIF', append_images=imgs,
         save_all=True, duration=200, loop=0)

<img src="./pe.gif"  style="width:900px;" />

In [None]:
"""
Practical Computer Vision: Extract Insightful Information from Images Using TensorFlow, Keras, and OpenCV
Book by Abhinav Dadhich
"""

# convert the color to grayscale 
gray = dcm.dcmread(image).pixel_array

# resize the image(optional)
gray = cv2.resize(gray, (800, 800))

# apply smoothing operation
gray = cv2.blur(gray,(3,3))

# create grid to plot using numpy
xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]]

# create the figure
fig = plt.figure(figsize=(150,150))
ax = fig.gca(projection='3d')
ax.contour(xx, yy, gray, stride=1)

# rotate 3d plot
for angle in range(180, 360):
    ax.view_init(45, angle)

In [None]:
"""
Practical Computer Vision: Extract Insightful Information from Images Using TensorFlow, Keras, and OpenCV
Book by Abhinav Dadhich
"""

# convert the color to grayscale 
gray = dcm.dcmread(image).pixel_array

# resize the image(optional)
gray = cv2.resize(gray, (100, 100))

# apply smoothing operation
gray = cv2.blur(gray,(3,3))

# create grid to plot using numpy
xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]]

# create the figure
fig = plt.figure(figsize=(150,150))
ax = fig.gca(projection='3d')
ax.plot_surface(xx, yy, gray,cmap='viridis')

# rotate 3d plot
for angle in range(70, 210):
    ax.view_init(45, angle)

In [None]:
"""
Practical Computer Vision: Extract Insightful Information from Images Using TensorFlow, Keras, and OpenCV
Book by Abhinav Dadhich
"""

# convert the color to grayscale 
gray = dcm.dcmread(image).pixel_array

# resize the image(optional)
gray = cv2.resize(gray, (200, 200))

# apply smoothing operation
gray = cv2.blur(gray,(3,3))

# create grid to plot using numpy
xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]]

# create the figure
fig = plt.figure(figsize=(150,150))
ax = fig.gca(projection='3d')
ax.scatter(xx, yy, gray)

# rotate 3d plot
for angle in range(180, 360):
    ax.view_init(90, angle)