## Metadata deidentification of SVS files

### Removal of auxiliary images, like macro and label images, along with metadata.

Remove the auxiliary images and the metadata. For this, the svs.remove_phi() comes handy. Let's take a look at the documentation,

In [1]:
from sparkocr.utils.svs.phi_cleaning import remove_phi

help(remove_phi)

Help on function remove_phi in module sparkocr.utils.svs.phi_cleaning:

remove_phi(input, output, tags=['ImageDescription.Filename', 'ImageDescription.Date', 'ImageDescription.Time', 'ImageDescription.User'], append_tags=[], rename=False, verbose=False)
    Remove label images, macro images, and specified metadata from SVS files.

    This function processes SVS files to remove sensitive metadata and associated images.
    By default, it removes specific metadata tags: "ImageDescription.Filename",
    "ImageDescription.Date", "ImageDescription.Time", and "ImageDescription.User".

    Parameters:
    - input (str): The file path or directory containing the SVS files to be de-identified.
    - output (str): The file path or directory where the de-identified files will be saved.
    - tags (list, optional): A list of metadata tags to remove, replacing the default tags if provided.
    - append_tags (list, optional): Additional tags to remove, added to the defaults without overriding them.

## Process a set of files

In [2]:
input_path = "../data/svs/"
output_path = "./outputs"

remove_phi(input_path, output_path, verbose=True)

de-identifying ../data/svs/62893.svs
no need to clean: label
no need to clean: macro
cleaned tag field: ImageDescription.Filename
cleaned tag field: ImageDescription.Date
cleaned tag field: ImageDescription.Time
cleaned tag field: ImageDescription.User
cleaned tag field: ImageDescription.Filename
cleaned tag field: ImageDescription.Date
cleaned tag field: ImageDescription.Time
cleaned tag field: ImageDescription.User
Copied ../data/svs/62893.svs as ./outputs/62893.svs


### [optional] Tweaking additional parameters

You can safely skip this if you are good with the set of metadata tags that were removed.

In [3]:
input_path = '../data/svs/'
output_path = "./svs_output"

remove_phi(input_path, output_path, verbose=True, append_tags=['ImageDescription.ScanScope ID', 'ImageDescription.Time Zone', 'ImageDescription.ScannerType'])

de-identifying ../data/svs/62893.svs
no need to clean: label
no need to clean: macro
cleaned tag field: ImageDescription.Filename
cleaned tag field: ImageDescription.Date
cleaned tag field: ImageDescription.Time
cleaned tag field: ImageDescription.User
cleaned tag field: ImageDescription.ScanScope ID
cleaned tag field: ImageDescription.Time Zone
cleaned tag field: ImageDescription.ScannerType
cleaned tag field: ImageDescription.Filename
cleaned tag field: ImageDescription.Date
cleaned tag field: ImageDescription.Time
cleaned tag field: ImageDescription.User
cleaned tag field: ImageDescription.ScanScope ID
cleaned tag field: ImageDescription.Time Zone
cleaned tag field: ImageDescription.ScannerType
Copied ../data/svs/62893.svs as ./svs_output/62893.svs
