# HIPAA Privacy Rule-based De-identification on DICOM Dataset

HIPAA provides two methods for de-identification: the "Safe Harbor" method and the "Expert Determination" method. The Safe Harbor method is more straightforward and involves anonymizing/redacting 18 specific types of identifiers from the data.

Here, we will focus on the Safe Harbor method, which includes removing or redacting identifiers such as names, geographic subdivisions smaller than a state, dates directly related to an individual, phone numbers, email addresses, and more.

After de-ID,  the DICOM file will be updated and uploaded to destiny storage and evaluated by AWS services, Rekongnition, Comprehend and Comprehend Medical.

## Setup

Let's start by setting environment variables for de identification of DICOM file:
1) set local path of DICOM img folder.
2) set source and destiny s3 bucket.
3) set source and destiny prefix for DICOM file.
4) set user profile of AWS account if different default.

In [3]:
from med_img_de_id_class import ProcessMedImage
from common.utils import generate_regex
# setup environment
LOC_DICOM_FOLDER = '../images/med_phi_img/'
LOC_DE_ID_DICOM_FOLDER = '../images/med_de_id_img/'
SOURCE_BUCKET = "crdcdh-test-submission"
DESTINATION_BUCKET = "crdc-hub-dev"
SOURCE_PREFIX = "dicom-images/"
DESTINATION_PREFIX = "de-id-dicom-images/"
EVAL_BUCKET = "crdc-hub-dev"
EVAL_PREFIX = "eval-de-id-dicom-images/"


\d{3}\s[a-zA-Z]{4}\s[a-zA-Z]{6}


## Upload DICOM image(s) to s3 bucket

In [None]:
local_img_file = 'MartinChad-1-1.dcm'
local_img_path = LOC_DICOM_FOLDER + local_img_file
src_key= SOURCE_PREFIX + 'MartinChad-1-1.dcm'
dist_key= DESTINATION_PREFIX + 'MartinChad-1-1.dcm'
result, dicom_dataset = processor.upload_dicom_file(SOURCE_BUCKET, src_key, local_img_path)

## De-Identification in metadata of DICOM

In [None]:
processor.de_identify_dicom(dicom_dataset)

## Draw DICOM image before de-identification

In [None]:
# show med image before de-identification
processor.draw_img(processor.image_data)

## De-Identification in pixel of DICOM

In [None]:
from PIL import Image
local_de_id_png = LOC_DE_ID_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
id_text_detected = processor.detect_id_in_img(None, None, None)
if id_text_detected and len(id_text_detected):
    print(f'Sensitive text detected in {local_img_file}')
    print (id_text_detected)
    processor.redact_id_in_image(id_text_detected, local_de_id_png)
    # show redacted image
    print('Sensitive text in image have been redacted')
    img = Image.open(local_de_id_png)
    processor.draw_img(img)

else:
    print(f'No sensitive text detected in {local_img_file}')


## Updated the DICOM with redacted metadata and blurred sensitive identification text in image.

In [None]:
local_de_id_dicom = local_de_id_png.replace('.png', '.dcm')
if id_text_detected and len(id_text_detected) > 0:
    processor.save_de_id_dicom(local_de_id_png, dicom_dataset,local_de_id_dicom)

## Upload redacted DICOM to destination s3 bucket for evaluation with AWS Comprehend and Comprehend Medical

In [None]:
local_img_file = src_key.split('/')[-1]

# check after redacted
local_img_path = LOC_DE_ID_DICOM_FOLDER + local_img_file
src_key= DESTINATION_PREFIX + local_img_file
dist_key= DESTINATION_PREFIX + local_img_file 
result, dicom_dataset = processor.upload_dicom_file(DESTINATION_BUCKET, src_key, local_img_path, True)

# check before redacted
# local_img_path = LOC_DICOM_FOLDER + local_img_file
# src_key= SOURCE_PREFIX + local_img_file
# dist_key= DESTINATION_PREFIX + local_img_file 
# result, dicom_dataset= processor.upload_dicom_file(SOURCE_BUCKET, src_key, local_img_path, True)

## Evaluate Redacted DICOM Metadata

In [None]:
from common.utils import dump_dict_to_tsv, get_date_time
tags, ids = processor.detect_id_in_tags(dicom_dataset)
if ids and len(ids) > 0:
    print("Found PII/PHI in redacted DICOM: ", ids)
    # create a evaluation report
    eval_dict_list = []
    for i in range(len(ids)):
        eval_dict = {"tag": tags[i], "Detected PHI": ids[i]}
        eval_dict_list.append(eval_dict)
    dump_dict_to_tsv(eval_dict_list, f"../output/eval_report/tags_de_id_evaluation_report_{get_date_time()}.tsv")
    # redact remaining PHI 
    processor.redact_tags(dicom_dataset, tags)

else:
    print("No PII/PHI found in redacted DICOM")

## Evaluate Redacted DICOM Pixel Data

In [None]:
from PIL import Image

local_img_file = src_key.split('/')[-1]
local_de_id_png = LOC_DE_ID_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
# check after redacted
src_png_key = dist_key.replace('.dcm', '.png')
local_png = LOC_DE_ID_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
id_text_detected = processor.detect_id_in_img(local_png, DESTINATION_BUCKET, src_png_key, True)
# check before redacted
# local_png = LOC_DICOM_FOLDER + local_img_file.replace('.dcm', '.png')
# id_text_detected = processor.detect_id_in_img(local_png, SOURCE_BUCKET, src_key.replace('.dcm', '.png'), True)
if id_text_detected and len(id_text_detected) > 0:
    print(f'Sensitive text detected in {local_png}')
    dump_dict_to_tsv(id_text_detected, f"../output/eval_report/img_de_id_evaluation_report_{get_date_time()}.tsv")
    print (id_text_detected)
    processor.redact_id_in_image(id_text_detected, local_de_id_png)
    # show redacted image
    print('Showing redacted image')
    img = Image.open(local_de_id_png)
    processor.draw_img(img)

else:
    print(f'No sensitive text detected in {src_png_key}')

## Update evaluated DICOM file if remaining PHI info detected and redacted

In [None]:
if ids and len(ids) > 0:
    local_de_id_dicom = local_de_id_png.replace('.png', '.dcm')
    processor.save_de_id_dicom(local_de_id_png, dicom_dataset,local_de_id_dicom)

## Self-learning: update rules for detecting PHI/PII information in DICOM file.

In [None]:
if ids and len(ids) > 0:
    processor.update_rules_in_configs('../configs/de-id/output.yaml')