<a href="https://colab.research.google.com/github/retico/SSFM_2021/blob/main/Demo3_anonymize.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline

# Reading the dataset from Google Drive
Prior to this operation be sure to have added the shared folder to your Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!ls "/content/drive/My Drive/cmepda_medphys_dataset"

In [None]:
DATASETS = "/content/drive/My Drive/cmepda_medphys_dataset"

## Anonymize a single file

In [None]:
!pip install pydicom

In [None]:
import pydicom

In [None]:
filename = DATASETS + "/IMAGES/DICOM_Examples/Brain_MRI/IM67_1slice.dcm"
dataset = pydicom.dcmread(filename)

In [None]:
elements = ['PatientID',
                 'PatientBirthDate']
for element in elements:
    print(dataset.data_element(element))

In [None]:
dataset

We can define a callback function to find all tags corresponding to a person
names inside the dataset.

In [None]:
def person_names_callback(dataset, data_element):
    if data_element.VR == "PN": #VR = value representation, PN=persons's name
        data_element.value = "anonymous"
        
def scanner_info_callback(dataset, data_element):
    if data_element.VR == "LO":
        data_element.value = "scanner info"

We can use the different callback function to iterate through the dataset but
also some other tags such that patient ID, etc.

This can be achieved by means of the `walk` method, which iterates over the data elements of the *FileDataset* object:

In [None]:
dataset.walk(person_names_callback)
dataset.walk(scanner_info_callback)

or, equivalently, as:

In [None]:
callbacks = [person_names_callback, scanner_info_callback]
for callback in callbacks:
    dataset.walk(callback)

pydicom allows to remove private tags using `remove_private_tags` method

In [None]:
dataset.remove_private_tags()

Optional data elements can be easily deleted using `del` or `delattr`.



In [None]:
if 'OtherPatientIDs' in dataset:
    delattr(dataset, 'OtherPatientIDs')

if 'OtherPatientIDsSequence' in dataset:
    del dataset.OtherPatientIDsSequence

Data can also be modified via assignments:

In [None]:
dataset.OperatorsName= 'Lucio Verdi'


# Anonymize DICOM data


This example is a starting point to anonymize DICOM data.

It shows how to read data and replace tags: person names, patient id,
optionally remove curves and private tags, and write the results in a new file.


# Anonymizing the birthdate

Let's try to set the birth date to the first day of the month

In [None]:
import datetime
import time

In [None]:
date = '20000122'

In [None]:
format_ = "%Y%m%d"
time_struct = time.strptime(date, format_)
time_struct

In [None]:
birth_date = datetime.datetime(*time_struct[:3])
birth_date

datetime.datetime objects are immutable

In [None]:
birth_date.day = 1

In [None]:
new_date = birth_date.replace(day=1, month=5)
new_date

In [None]:
new_date.strftime(format_)

In [None]:
def anonimize_day(date_str, format_="%Y%m%d", day=1):
    time_struct = time.strptime(date_str, format_)
    date = datetime.datetime(*time_struct[:3])
    new_date = date.replace(day=day)
    return new_date.strftime(format_)


In [None]:
tag = 'PatientBirthDate'
if tag in dataset:
    date_str = dataset.data_element(tag).value
    dataset.data_element(tag).value = anonimize_day(date_str, day=5)
dataset.PatientBirthDate

Finally, it is possible to store the image



In [None]:
output_filename ='IM67_orig_anon.dcm'
dataset.save_as(output_filename)

## Anonymize a folder

In [None]:
import glob
import os

In [None]:
DIR_NAME = DATASETS + "/IMAGES/DICOM_Examples/Breast_Mammography_Case2/"
PATTERN='*.dcm'

In [None]:
!ls -R '/content/drive/My Drive/cmepda_medphys_dataset/IMAGES/DICOM_Examples/Breast_Mammography_Case2/'

Here we define a generator to get the names of the dicom files and an anonymization function

In [None]:
def file_list_generator(dir_name, pattern):
    for path, subfolder, files in os.walk(dir_name):
        for file in files:
            if glob.fnmatch.fnmatch(file, pattern):
                yield os.path.join(path, file)

In [None]:
def anonymize_file(fname):
    dataset = pydicom.dcmread(fname)
    callbacks = [person_names_callback, scanner_info_callback]
    for callback in callbacks:
        dataset.walk(callback)
    output_filename = fname.replace('.dcm', '_anonym.dcm')
    output_filename = os.path.basename(output_filename)
    output_filename = os.path.join('/content', output_filename)
    dataset.save_as(output_filename)


In [None]:
import multiprocessing
pool = multiprocessing.Pool()

The map operator applies the function *anonimyze_file* to each element of the iterator *file_list_generator*.

In [None]:
pool.map(anonymize_file, file_list_generator(DIR_NAME, PATTERN))

In [None]:
os.listdir('/content')