# Introduction of the DICOM format
You'll learn all about the DICOM file format and how to read, modify and anonymize them.

This demo is a jupyter notebook, i.e. intended to be run step by step.

Author: Eric Einspänner

First version: 6th of July 2023


Copyright 2023 Clinic of Neuroradiology, Magdeburg, Germany

License: Apache-2.0

## Table of contents
0. [File Formats](#Neuroimaging-file-formats)
1. [Initial set-up](#-Initial-set-up)
2. [Print DICOM header](#Print-DICOM-Header)
3. [Core elements](#Core-elements)
    - [PyDICOM dataset](#PyDicom-Dataset)
    - [Exercise 1](#Exercise)
4. [Print specific tags](Pprint-specific-tags)
5. [Methods](#Methods-for-a-pydicom-dataset)
    - [Keys](#.keys()-Method)
    - [Values](#.values()-Method)
    - [Elements](#.elements()-Method)
    - [Group dataset](#.group_dataset()-Method)
    - [Dir](#.dir()-Method)
6. [Modify tags](#Modify-Tags)
7. [Deleting tags](#Deleting-Elements)
    - [Exercise 2](#Exercise)
8. [Anonymize a DICOM header](#Anonymize-a-Dicom-Header)
9. [Writing](#Writing)
10. [Attributes for pydicom dataset](#Attributes-for-PyDicom-DataSet)
11. [Closing remark](#Closing-Remark)

# Neuroimaging file formats
| Format Name | File Extension | Origin                                         |
|-------------|----------------|------------------------------------------------|
| DICOM       | .dcm           | ACR/NEMA Consortium                            |
| Analyze     | .img/.hdr      | Analyze Software, Mayo Clinic                  |
| NIfTI       | .nii           | Neuroimaging Informatics Technology Initiative |
| MINC        | .mnc           | Montreal Neurological Institute                |

From the MRI scanner, images are initially collected in the DICOM format and can be converted to these other formats to make working with the data easier.

## Initial set-up
To read a DICOM file, you can use the dcmread function, which returns a Dataset object containing the data from the DICOM file.

In [None]:
# Make sure figures appears inline and animations works
# Edit this to ""%matplotlib notebook" when using the "classic" jupyter notebook interface
%matplotlib widget

import numpy
import matplotlib.pyplot as plt

In [None]:
import pydicom
from pydicom.data import get_testdata_file

# load test file
dcm_data = get_testdata_file('MR_small.dcm')

# read a DICOM file
dcm = pydicom.dcmread(dcm_data)

## Print DICOM Header

In [None]:
print(dcm)

We can see a pattern of the output, whereas the DICOM attributes are written row by row. Each row has a unique tag and other core elements of the DICOM attribute.

## Core elements
Applying `.dcmread()` wraps a DataSet, a dictionary data structure {}. This DataSet contains keys and values represented like the following:

- Keys: contains the DICOM Tags of the attributes specified in the DICOM file you are reading. Examples of the keys like:
    - (0x0010, 0x0010) PatientName attribute.
    - (0x0028, 0x0010) Rows attribute.
    - (0x7fe0, 0x0010) PixelData attribute.
    - The tags’ numbers consist of two hexadecimal, the first refers to the group, and the second refers to a specific element. So, you might find many attributes that have the same first number of tags.

- Values: the values of this dictionary generally contain the following:
    - Tag: the element’s tag like (0028, 0030), for example.
    - Keyword: describes what the attribute refers to. The keyword of the tag (0028, 0030) is “Pixel Spacing”.
    - VR: it’s only two characters that refer to the Value Representation of the element, which describes the data type and format of the attribute value. The VR of the tag (0028, 0030) is “DS”, Decimal String. You can see the VR of any tag and how it is represented using Python structures following the [link](https://pydicom.github.io/pydicom/stable/guides/element_value_types.html).
    - Value: the actual value of the element. It could be an integer, a string, a list, or even a Sequence, which is a dataset of attributes. The value of the tag (0028, 0030) is a list of two floats that represent the physical distance along the rows and columns, respectively, in mm.

![Alt text](img/pydicom_overview.png?raw=true "The output of 'dcmread()' function")

### PyDicom DataSet
A DICOM DataSet is a mutable mapping of DICOM DataElements. Each DataElement, a value of the dictionary, in the DICOM DataSet has a unique tag, a key of the dictionary, that identifies it. For example, the “PatientName” attribute corresponds to the tag (0x0010, 0x0010) in the DICOM standard, which identifies the patient’s name data element.

![Alt text](img/pydicom_dataset.png?raw=true "The contents of PyDicom DataSet class")

### Exercise
Explore the DICOM file and answer the following questions:
- Which keys contain information that defines the image size (matrix size)?
- Specify the largest pixel value. Which VR is it?
- On which day was the study recorded? Which VR was used here?

## Print specific tags
You can access specific DICOM attributes in many different ways like the following:

In [None]:
# Extract the patient's name.
patient_name = dcm.PatientName
print(patient_name)

# Extract the patient's name using its unique DICOM tag (0010, 0010)
patient_value = dcm[0x0010, 0x0010]
print(patient_value)

Can you see the difference between the two outputs?

## Methods for a pydicom dataset

### .keys() Method
Using .keys() returns the list of the keys of the DataSet dictionary. This method could be helpful when joining metadata from multiple sources of DICOM files that have common DICOM attributes.

In [None]:
# Extract the keys, the DICOM tags, that are in a DICOM file
dcm.keys()

### .values() Method
This method returns a list of the values of the DataSet dictionary. It’s kinda bulky and not preferred to read in this form. But this method could be useful for iterating over a list of values in some cases.

In [None]:
# Extract the values, the DICOM attributes, that are in a DICOM file
dcm.values()

### .elements() Method
Using `.elements()` yields the top-level elements of the Dataset. This method will be useful when you don’t need the DICOM attributes in any of the Sequences that you might find in the DICOM files you’re working with. Notice how the Sequences are represented when using the `.elements()` method in the output figure below.

In [None]:
# Extract the top-level elements of the Dataset Class
[*dcm.elements()]

### .group_dataset() Method
As we’ve mentioned above, you might find many attributes that have the same first number. These tags, the ones with a common first tag, usually describe a common parameter. For example, the attributes with the common first tag (0x0010) usually refer to Patient-related attributes. Attributes with the first tag (0x0028) describe Image Pixel attributes. Sometimes it’s helpful to see all the attributes that are related to a specific parameter. Using the method (.group_dataset) returns a Dataset containing only elements of a certain group.

In [None]:
# Extract the attributes related to 0x0028 tag, these are related to ImagePixel
dcm.group_dataset(0x0028)

### .dir() Method
Return an alphabetical list of element keywords in the Dataset. This is a great choice to give the first insight into the metadata you’re dealing with.

In [None]:
# An alphabetical list of the element keywords in the DICOM DataSet.
dcm.dir()

In [None]:
# Extract all the attributes that have "Pixel" in its keywords
dcm.dir('Pixel')

## Modify Tags
You can modify the value of any element by retrieving it and setting the value:

In [None]:
# use the keyword
dcm.PatientName = 'Mustermann^Max'
print(dcm[0x0010, 0x0010])

# modify the value of any element by retrieving it and setting the value
elem = dcm[0x0010, 0x0010]
elem.value = 'Musterfrau^Max'
print(elem)

Multi-valued elements can be set using a list or modified using the list methods:

In [None]:
print(dcm.ImageType)
# replaces the 2nd element of the list
dcm.ImageType[1] = 'BLINDTEXT'
# inserts the defined element at the 2nd position
dcm.ImageType.insert(1, '2.BLINDTEXT')
print(dcm.ImageType)

## Deleting Elements
All elements can be deleted with the del operator in combination with the element tag:

In [None]:
# use the keyword
del dcm.WindowCenter

# check, if WindowCenter is in DICOM header
print('WindowCenter' in dcm)

### Exercise
Delete the Window Width using the correct element tag and then check whether the tag is still present.

In [None]:
# Write your code here (the solution is below)








In [None]:
### Solution
# use the element tag (here: Window Width)
del dcm[0x0028, 0x1051]

print('WindowWidth' in dcm)

## Anonymize a DICOM Header

In [32]:
# change the PatientID
dcm.PatientID = "Anonymous"

# change the PatientBirthDate
tag = "PatientBirthDate"
if tag in dcm:
    dcm.data_element(tag).value = "19000101"

We can define a callback function to find all tags corresponding to a person names inside the dataset and set them to 'anonymous'.

In [33]:
def person_names_callback(dataset, data_element):
    if data_element.VR == "PN":
        data_element.value = "anonymous"

# use the callback function to iterate through the dataset
dcm.walk(person_names_callback)

pydicom allows to remove private tags using `remove_private_tags()` method:

In [None]:
dcm.remove_private_tags()

## Writing
After changing the dataset, the final step is to write the modifications back to file. This can be done by using `save_as()` to write the dataset to the supplied path:

In [30]:
# create a temporary file; the file is deleted as soon as it is closed
import tempfile
temp = tempfile.NamedTemporaryFile()

# save the modify DICOM file as temp file
dcm.save_as(temp)

## Attributes for PyDicom DataSet
There are some attributes that you can use with the DataSet class. The most important and commonly used attribute is pixel_array.

In [None]:
# Extract the image pixels
im = dcm.pixel_array
print(im)

Each value of this matrix represent a pixel in the image. Now, you will plot the image:

In [None]:
# Image representation
plt.imshow(im, cmap='gray')
plt.axis('off')
plt.title('Pixel Array')
plt.show()

## Closing Remark
PyDicom is the best package to deal with DICOM files. It’s kinda obvious because it’s specifically designed for DICOM. It provides more flexible options, especially when you want to deal with DICOM metadata and not just pixel data. On the other side, if you want only the pixel data, the better solution is ImageIO as it provides only the basic metadata that you need to deal with pixel data.