# HIPAA Privacy Rule-based De-identification on DICOM Dataset

HIPAA provides two methods for de-identification: the "Safe Harbor" method and the "Expert Determination" method. The Safe Harbor method is more straightforward and involves anonymizing/redacting 18 specific types of identifiers from the data.

Here, we will focus on the Safe Harbor method, which includes removing or redacting identifiers such as names, geographic subdivisions smaller than a state, dates directly related to an individual, phone numbers, email addresses, and more.

After de-ID,  the DICOM file will be updated and uploaded to destiny storage and evaluated by AWS services, Rekongnition, Comprehend and Comprehend Medical.

## Setup

Let's start by setting environment variables for de identification of DICOM file:
1) set local path of DICOM img folder.
2) set source and destiny s3 bucket.
3) set source and destiny prefix for DICOM file.
4) set user profile of AWS account if different default.

In [1]:
from med_img_de_id_class import ProcessMedImage
# setup environment
LOC_DICOM_FOLDER = '/Users/gup2/Documents/AI/Dicom_files/manifest-1617826555824/Pseudo-PHI-DICOM-Data'
LOC_DE_ID_DICOM_FOLDER = '../images/med_de_id_img/Pseudo-PHI-DICOM-Data'
SOURCE_BUCKET = "crdcdh-test-submission"
DESTINATION_BUCKET = "crdc-hub-dev"
SOURCE_PREFIX = "dicom-images/"
DESTINATION_PREFIX = "de-id-dicom-images/"
EVAL_BUCKET = "crdc-hub-dev"
EVAL_PREFIX = "eval_de-id-dicom-images/"

processor = ProcessMedImage()

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/gup2/Library/Application Support/sagemaker/config.yaml


## Loop through the all DICOM files under the local DICOM folder

In [8]:
import glob, os
FILE_NAME = 'file_name'
FILE_PATH = 'file_path'
FILE_PREFIX = 'prefix'
dicom_list = []
dicom_files = glob.glob('{}/**/*.dcm'.format(LOC_DICOM_FOLDER), recursive=True)

for filepath in dicom_files:
    filename = os.path.basename(filepath)
    prefix = os.path.join(SOURCE_PREFIX, filepath.split('/')[-2])
    dicom_list.append({FILE_NAME:filename, FILE_PATH: filepath, FILE_PREFIX: prefix})
    
    # print('filename: {}, filepath: {}, prefix: {}'.format(filename, filepath, prefix))
print(f'Found {len(dicom_list)} DICOM files under {LOC_DICOM_FOLDER}')

Found 1693 DICOM files under /Users/gup2/Documents/AI/Dicom_files/manifest-1617826555824/Pseudo-PHI-DICOM-Data


## De-Identification on DICOM files based n HIPPA privacy rules

In [None]:
for file in dicom_list:
    # 1 parse dicom file

## De-Identification in metadata of DICOM

In [None]:
processor.de_identify_dicom(dicom_dataset)

## Draw DICOM image before de-identification

In [None]:
# show med image before de-identification
processor.draw_img(pixel_data)

## De-Identification in pixel of DICOM

In [None]:
from PIL import Image
src_png_key = src_key.replace('.dcm', '.png')
local_de_id_png = LOC_DE_ID_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
id_text_detected = processor.detect_id_in_img(SOURCE_BUCKET, src_png_key)
if id_text_detected and len(id_text_detected):
    print(f'Sensitive text detected in {src_png_key}')
    print (id_text_detected)
    processor.redact_id_in_image(SOURCE_BUCKET, src_png_key, id_text_detected, local_de_id_png)
    # show redacted image
    print('Showing redacted image')
    img = Image.open(local_de_id_png)
    processor.draw_img(img)

else:
    print(f'No sensitive text detected in {src_png_key}')


## Updated the DICOM with redacted metadata and blurred sensitive identification text in image.

In [None]:
local_de_id_dicom = local_de_id_png.replace('.png', '.dcm')
if id_text_detected and len(id_text_detected):
    processor.convert_back_dicom(local_de_id_png, dicom_dataset,local_de_id_dicom)

## Upload redacted DICOM to destination s3 bucket for evaluation with AWS Comprehend and Comprehend Medical

In [None]:
local_img_file = 'MartinChad-1-1.dcm'

# check after redacted
local_img_path = LOC_DE_ID_DICOM_FOLDER + local_img_file
src_key= DESTINATION_PREFIX + local_img_file
dist_key= EVAL_PREFIX + 'lung-1-1.dcm'
result, dicom_dataset, pixel_data = processor.upload_dicom_file(DESTINATION_BUCKET, src_key, local_img_path)

# check before redacted
# local_img_path = LOC_DICOM_FOLDER + local_img_file
# src_key= SOURCE_PREFIX + local_img_file
# dist_key= SOURCE_PREFIX + 'lung-1-1.dcm'
# result, dicom_dataset, pixel_data = processor.upload_dicom_file(SOURCE_BUCKET, src_key, local_img_path)

## Evaluate Redacted DICOM Metadata

In [None]:
tags, ids = processor.detect_id_in_tags(dicom_dataset)
if ids and len(ids) > 0:
    print("Found PII/PHI in redacted DICOM: ", ids)
else:
    print("No PII/PHI found in redacted DICOM")

## Evaluate Redacted DICOM Pixel Data

In [None]:
from PIL import Image
src_png_key = src_key.replace('.dcm', '.png')
local_img_file = src_key.split('/')[-1]
# check after redacted
local_de_id_png = LOC_DE_ID_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
id_text_detected = processor.detect_id_in_img(DESTINATION_BUCKET, src_png_key, True)
# check before redacted
# local_de_id_png = LOC_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
# id_text_detected = processor.detect_id_in_img(SOURCE_BUCKET, src_png_key, True)
if id_text_detected and len(id_text_detected):
    print(f'Sensitive text detected in {src_png_key}')
    print (id_text_detected)
    processor.redact_id_in_image(SOURCE_BUCKET, src_png_key, id_text_detected, local_de_id_png)
    # show redacted image
    print('Showing redacted image')
    img = Image.open(local_de_id_png)
    processor.draw_img(img)

else:
    print(f'No sensitive text detected in {src_png_key}')

## Self-learning: update rules for detecting PHI/PII information in DICOM file.

In [None]:
if ids and len(ids) > 0:
    processor.update_rules_in_configs(processor.dicom_tags, processor.sensitive_words, '../configs/de-id/output.yaml')