# "Medical Imaging Part1"
> "Reviewing fastai notebook 60_medical.imaging.ipynb"
- toc: true
- branch: master
- badges: true
- comments: true
- categories: [medical_imaging, dicom, fastai]

### What are Dicom images? 

**DICOM**(**D**igital **I**maging and **CO**mmunications in **M**edicine) is the de-facto standard that establishes rules that allow medical images(X-Ray, MRI, CT) and associated information to be exchanged between imaging equipment from different vendors, computers, and hospitals. The DICOM format provides a suitable means that meets [health infomation exchange](https://www.himss.org/interoperability-and-health-information-exchange) (HIE) standards for transmision of health related data among facilites and HL7 standards which is the messaging standard that enables clinical applications to exchange data.

<img src="images/Dicom_wf.png">

DICOM files typically have a .dcm extension and provides a means of storing data in seperate **'tags'** such as patient information as well as image/pixel data. A DICOM file consists of a header and image data sets packed into a single file. The information within the header is organized as a constant and standardized series of tags. By extracting data from these tags one can access important information regarding the patient demographics, study parameters, etc

16 bit DICOM images have values ranging from -32768 to 32768 while 8-bit greyscale images store values from 0 to 255. The value ranges in DICOM images are useful as they correlate with the [Hounsfield Scale](https://en.wikipedia.org/wiki/Hounsfield_scale) which is a quantitative scale for describing radiodensity


<center>
    
    Parts of a DICOM
</center>

<img src="images/dicom_.png">

### Requirements

Requires installing `pycidom`

- `pip install pycidom`

and `scikit-image`

- `pip install scikit-image`

and `kornia`

- `pip install kornia`

Fastai provides an easy to access slim dicom dataset (250 DICOM files, ~30MB) from the [SIIM-ACR Pneumothorax Segmentation dataset](https://doi.org/10.1007/s10278-019-00299-9) for us to experiment with dicom images.  The file structure of the dataset is as follows:
<img src="images/dicom.png">

In [1]:
from fastai2.basics import *
from fastai2.callback.all import *
from fastai2.vision.all import *
from fastai2.medical.imaging import *

import pydicom

In [2]:
#Load the Data
pneumothorax_source = untar_data(URLs.SIIM_SMALL)

### Patching

##### get_dicom_files

Provides a convenient way of recursively loading .dcm images from a folder.  By default the folders option is set to **False** but you could specify a specific folder if required

In [3]:
#get dicom files
items = get_dicom_files(pneumothorax_source, recurse=True, folders='train')
items

(#250) [Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000000.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000002.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000005.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000006.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000007.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000008.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000009.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000011.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000012.dcm'),Path('C:/Users/avird/.fastai/data/siim_small/train/No Pneumothorax/000014.dcm')...]

#### dcmread 

**Pydicom** is a python package for parsing DICOM files and makes it easy to covert DICOM files into pythonic structures for easier manipulation.  Files are opened using pydicom.dcmread 

In [4]:
img = items[10]
dimg = dcmread(img)
type(dimg)

pydicom.dataset.FileDataset

You can now view all the information of the DICOM file. Explanation of each element is beyond the scope of this tutorial but [this](http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.3.html#sect_C.7.6.3.1.4) site has some excellent information about each of the entries.
Information is listed by the **DICOM tag** (eg: 0008, 0005) or **DICOM keyword** (eg: Specific Character Set)

In [5]:
dimg

(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.276.0.7230010.3.1.4.8323329.6340.1517875197.696624
(0008, 0020) Study Date                          DA: '19010101'
(0008, 0030) Study Time                          TM: '000000.00'
(0008, 0050) Accession Number                    SH: ''
(0008, 0060) Modality                            CS: 'CR'
(0008, 0064) Conversion Type                     CS: 'WSD'
(0008, 0090) Referring Physician's Name          PN: ''
(0008, 103e) Series Description                  LO: 'view: AP'
(0010, 0010) Patient's Name                      PN: '13f40bdc-803d-4fe0-b008-21234c2be1c3'
(0010, 0020) Patient ID                          LO: '13f40bdc-803d-4fe0-b008-21234c2be1c3'
(0010, 0030) Patient's Birth Date                DA: ''
(0010, 0040) Patient's Sex                       CS: 'F'
(0010, 1010) Patient's

Some key pointers on the tag information above:
 - **Pixel Data** (7fe0 0010) - This is where the raw pixel data is stored. The order of pixels encoded for each image plane is left to right, top to bottom, i.e., the upper left pixel (labeled 1,1) is encoded first
 - **Photometric Interpretation** (0028, 0004) - aka color space. In this case it is MONOCHROME2 where pixel data is  represented as a single      monochrome image plane where the minimum sample value is intended to be displayed as black [info](http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.3.html)
 - **Samples per Pixel** (0028, 0002) - This should be 1 as this image is monochrome.  This value would be 3 if the color        space was RGB for example
 - **Bits Stored** (0028 0101) - Number of bits stored for each pixel sample
 - **Pixel Represenation** (0028 0103) - can either be unsigned(0) or signed(1).  The default is unsigned.  This [Kaggle notebook](https://www.kaggle.com/jhoward/some-dicom-gotchas-to-be-aware-of-fastai) by Jeremy explains why BitsStored and PixelRepresentation are important
 - **Lossy Image Compression** (0028 2110) - 00 image has not been subjected to lossy compression. 01 image has been subjected to lossy compression.  
 - **Lossy Image Compression Method** (0028 2114) - states the type of lossy compression used (in this case JPEG Lossy Compression)

Important tags not included in this dataset:

 - **Rescale Intercept** (0028, 1052) - The value b in relationship between stored values (SV) and the output units. Output units = m*SV + b.
 - **Rescale Slope** (0028, 1053) - m in the equation specified by Rescale Intercept (0028,1052).
 
The Rescale Intercept and Rescale Slope are applied to transform the pixel values of the image into values that are meaningful to the application. Calculating the new values usually follow a linear formula:
 - NewValue = (RawPixelValue * RescaleSlope) + RescaleIntercept
 
and when the relationship is not linear a LUT(LookUp Table) is utilized.

By default pydicom reads pixel data as the **raw bytes** found in the file and typically `PixelData` is often not immediately useful as data may be stored in a variety of different ways:
- The pixel values may be signed or unsigned integers, or floats
- There may be multiple image frames
- There may be multiple planes per frame (i.e. RGB) and the order of the pixels may be different
These are only a few examples and more information can be found on the [pycidom](https://pydicom.github.io/pydicom/stable/old/working_with_pixel_data.html#dataset-pixel-array) website

In [43]:
dimg.PixelData[:200]

b'\xfe\xff\x00\xe0\x00\x00\x00\x00\xfe\xff\x00\xe0\xe0\xcd\x01\x00\xff\xd8\xff\xdb\x00C\x00\x03\x02\x02\x02\x02\x02\x03\x02\x02\x02\x03\x03\x03\x03\x04\x06\x04\x04\x04\x04\x04\x08\x06\x06\x05\x06\t\x08\n\n\t\x08\t\t\n\x0c\x0f\x0c\n\x0b\x0e\x0b\t\t\r\x11\r\x0e\x0f\x10\x10\x11\x10\n\x0c\x12\x13\x12\x10\x13\x0f\x10\x10\x10\xff\xc0\x00\x0b\x08\x04\x00\x04\x00\x01\x01\x11\x00\xff\xc4\x00\x1d\x00\x00\x02\x02\x03\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x04\x02\x05\x00\x01\x06\x07\x08\t\xff\xc4\x00]\x10\x00\x01\x04\x01\x03\x02\x05\x01\x04\x05\x05\n\x08\x0b\x05\t\x01\x00\x02\x03\x11\x04\x12!1\x05A\x06\x13"Qaq\x142\x81\x91\x07#B\xa1\xb1\x15R\xc1\xd1\xd2\x08\x16$3b\x92\x95\xb2\xb3\xe1%CSr\x82\x93\xa2'

Because of the complexity in interpreting `PixelData`, pydicom provides an easy way to get it in a convenient form: `pixel_array` which returns a `numpy.ndarray` containing the pixel data:

In [7]:
dimg.pixel_array, dimg.pixel_array.shape

(array([[  2,   6,   5, ...,   3,   3,   2],
        [  5,   9,   8, ...,   6,   5,   5],
        [  5,   9,   9, ...,   6,   5,   5],
        ...,
        [ 49,  85,  80, ..., 123, 121,  69],
        [ 54,  88,  81, ..., 118, 115,  70],
        [ 17,  48,  39, ...,  46,  52,  27]], dtype=uint8), (1024, 1024))