In [1]:
%matplotlib notebook
import pandas as pd
import numpy as np
import pydicom
import skimage
import matplotlib.pyplot as plt

In [2]:
bbox = pd.read_csv('bounding_boxes.csv')
bbox

Unnamed: 0.1,Unnamed: 0,Image Index,Finding Label,Bbox [x,y,w,h]
0,583,dicom_00023075_033.dcm,Mass,239.502222,535.077934,72.817778,65.991111
1,584,dicom_00029579_005.dcm,Mass,609.28,189.19349,73.955556,71.68
2,585,dicom_00013659_019.dcm,Mass,559.217778,167.575712,102.4,136.533333


## Step 1: 
Read the DICOM's pixel_array attribute into a dataframe using the pydicom.dcmread function

In [3]:
dcm = pydicom.dcmread('dicom_00023075_033.dcm')

## Step 2: 
Visualize the image using plt.imshow

In [4]:
plt.imshow(dcm.pixel_array,cmap='gray')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x239867f7548>

## Step 3: 
plot a histogram of the image pixel intensity values

In [6]:
plt.figure(figsize=(5,5))
plt.hist(dcm.pixel_array.ravel(), bins = 256)
plt.show()

<IPython.core.display.Javascript object>

Note the peak at zero, which are background pixels. This distribution makes it seem like the mean intensity is probably around 140, but let's find out for sure:

## Step 4: 
Find the mean and std dev intensity values of the image, and standerdize it

In [7]:
mean_intensity = np.mean(dcm.pixel_array)
mean_intensity

123.25588417053223

In [8]:
std_intensity = np.std(dcm.pixel_array)
std_intensity

57.47256019573095

In [9]:
new_img = dcm.pixel_array.copy()
new_img = (new_img - mean_intensity)/std_intensity

## Step 5: 
Re-plot a histogram of the normalized intensity values

In [10]:
plt.figure(figsize=(5,5))
plt.hist(new_img.ravel(), bins = 256)
plt.show()

<IPython.core.display.Javascript object>

Notice how the histogram has the same shape, but now it's centered around 0. This is a key step in image pre-processing when we prepare imaging data for machine learning. 

## Step 6: 
Use the coordinates in the dataframe that tell the starting x & y values, and the width and height of the mass to plot visualize only the mass using plt.imshow

In [11]:
bbox

Unnamed: 0.1,Unnamed: 0,Image Index,Finding Label,Bbox [x,y,w,h]
0,583,dicom_00023075_033.dcm,Mass,239.502222,535.077934,72.817778,65.991111
1,584,dicom_00029579_005.dcm,Mass,609.28,189.19349,73.955556,71.68
2,585,dicom_00013659_019.dcm,Mass,559.217778,167.575712,102.4,136.533333


In [13]:
plt.imshow(dcm.pixel_array[535:(535+66),240:(240+73)],cmap='gray')
plt.show()

<IPython.core.display.Javascript object>

## Step 7: 
Plot a histogram of the normalized intensity values of the mass

In [14]:
plt.figure(figsize=(5,5))
plt.hist(new_img[535:(535+66),240:(240+73)].ravel(), bins = 256,color='red')
plt.show()

<IPython.core.display.Javascript object>

What does this tell us? It tell us that the intensity values of the mass are higher than the image mean, but mostly fall within a single standard deviation of the image intensity values. This tell us that using some sort of thresholding mechanism to identify tumors in an image would probably _not_ be appropriate, because the mass's intensity values are not _that_ different from the rest of the image. 