## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page <font color='red'> For Seller to update:[Title_of_your_product](Provide link to your marketplace listing of your product).</font>
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Clinical Deidentification German

Deidentification is essential for safeguarding patient privacy in clinical data, including texts, PDFs, images, and DICOM files containing Protected Health Information (PHI). PHI encompasses various health-related data, including common identifiers such as name, address, birth date, and Social Security Number.

- **Model**: `de.deid.clinical`
- **Model Description**: This pipeline can be used to deidentify PHI information from German medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate PATIENT, HOSPITAL, DATE, ORGANIZATION, CITY, STREET, USERNAME, PROFESSION, PHONE, COUNTRY, DOCTOR, AGE, CONTACT, ID, PHONE, ZIP, ACCOUNT, SSN, DLN, PLATE entities.

In [23]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [24]:
import base64
import json
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
import boto3
from IPython.display import Image, display
from PIL import Image as ImageEdit
import numpy as np

In [25]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [26]:
model_name = "de-deid-clinical"

content_type = "application/json"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.xlarge"


### A. Create an endpoint

In [27]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

INFO:sagemaker:Creating model with name: de-deid-clinical-2024-04-02-17-44-41-617
INFO:sagemaker:Creating endpoint-config with name de-deid-clinical
INFO:sagemaker:Creating endpoint with name de-deid-clinical


--------!

Once endpoint has been created, you would be able to perform real-time inference.

In [28]:
import json
import pandas as pd
import os
import boto3


# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)


def process_data_and_invoke_realtime_endpoint(data_dicts):
    for data_dict in data_dicts:
        json_input_data = json.dumps(data_dict, ensure_ascii=False)
        i = 1
        input_file_name = f'inputs/real-time/input{i}.json'
        output_file_name = f'outputs/real-time/out{i}.out'

        while os.path.exists(input_file_name) or os.path.exists(output_file_name):
            i += 1
            input_file_name = f'inputs/real-time/input{i}.json'
            output_file_name = f'outputs/real-time/out{i}.out'

        os.makedirs(os.path.dirname(input_file_name), exist_ok=True)
        os.makedirs(os.path.dirname(output_file_name), exist_ok=True)

        with open(input_file_name, 'w', encoding="utf-8") as f:
            f.write(json_input_data)

        s3_client.put_object(Bucket=s3_bucket, Key=f"{model_name}/validation-input-json/real-time/{os.path.basename(input_file_name)}", Body=bytes(json_input_data.encode('UTF-8')))

        response = sm_runtime.invoke_endpoint(
            EndpointName=model_name,
            ContentType=content_type,
            Accept="application/json",
            Body=json_input_data,
        )

        # Process response
        response_data = json.loads(response["Body"].read().decode("utf-8"))
        df = pd.DataFrame(response_data)
        display(df)

        # Save response data to file
        with open(output_file_name, 'w') as f_out:
            json.dump(response_data, f_out, indent=4 , ensure_ascii=False)

### Initial Setup

In [29]:
docs = [
    '''Herr Reus - Telefon: 0604876475 - Beruf: Koch an der Friedrich-Alexander Universität. Er berichtet über Krankenhausaufenthalte im Klinikum Nürnberg-Süd, und die Akte zeigt weitere Krankenhausaufenthalte, darunter Kreiskrankenhaus im Jahr 2001, Sankt Antonius Krankenhaus am 28 Juli 2000, Klinikum Ludwigsburg im Jahr 05/1966.''',

    '''Dr.  Hans-Wolfgang Weihmann - RM57, Städt Klinikum Dresden-Friedrichstadt, Friedrichstraße 41, Dresden''',

    '''Er arbeitete bis 24.08.1940 - Gärtner bei Planten un Blomen in Hamburg, verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. Klein im September.''',

    '''Koronare Herzkrankheit : s/p ant SEMI + Stent LAD Aug/1963 , Dr . Lars Meister , ETT Fallingbostel 24.02.2009 - neg . Scan auf Ischämie .'''

]


sample_text = """Zusammenfassung : Michael Berger wird am Morgen des 12 Dezember 2018 ins St.Elisabeth Krankenhaus eingeliefert. 
Herr Michael Berger ist 76 Jahre alt und hat zu viel Wasser in den Beinen.

Persönliche Daten :
ID-Nummer: T0110053F
Platte A-BC124
Kontonummer: DE89370400440532013000
SSN : 13110587M565
Lizenznummer: B072RRE2I55
Adresse : St.Johann-Straße 13 19300
"""

### Important Parameters

- **masking_policy**: `str`

    Users can select a masking policy to determine how sensitive entities are handled:

    Example: "**Dr. Hans-Wolfgang Weihmann - RM57, Städt Klinikum Dresden-Friedrichstadt, Friedrichstraße 41, Dresden**"

    - **masked**: Default policy that masks entities with their type.

      -> 'Dr.  `<PATIENT>` - `<USERNAME>`, `<HOSPITAL>`, `<STREET>`, `<CITY>`'

    - **obfuscated**: Replaces sensitive entities with random values of the same type.

      -> 'Dr.  `Karl-August Blümel` - `RP400`, `University Hospital Cologne`, `Fadime-Pölitz-Allee`, `Böblingen`'

    - **masked_fixed_length_chars**: Masks entities with a fixed length of asterisks (\*).

      -> 'Dr.  `****` - `****`, `****`, `****`, `****`'

    - **masked_with_chars**: Masks entities with asterisks (\*).

      -> 'Dr.  [`********************`] - [`**`], [`***********************************`], [`****************`], [`*****`]'

- **sep**: `str`

    Separator used to join subparts within each prediction.

    By default, the separator is set to a single space (" "), but users can specify any other separator as needed. Necessary because the model outputs predictions as separate subparts, and the chosen separator is used to join them into coherent text.

    The separator must be one of the following characters: space (' '), newline ('\n'), comma (','), tab ('\t'), or colon (':').

You can specify these parameters in the input as follows:

```json
{
    "text": [
        "Text document 1",
        "Text document 2",
        ...
    ],
    "masking_policy": "masked",
    "sep": " ",
}


### **Input format**: Single Text Document

Provide a single text document as a string.

  
  
```json
{
    "text": "Single text document"
}
```

In [30]:
# masked (default-policy)
data_dicts = [
    {
        "text": sample_text
    }
]

process_data_and_invoke_realtime_endpoint(data_dicts)

Unnamed: 0,predictions
0,Zusammenfassung : <PATIENT> wird am Morgen des <DATE> ins <HOSPITAL> eingeliefert. Herr <PATIENT> ist <AGE> Jahre alt und hat zu viel Wasser in den Beinen. Persönliche Daten :\nID-Nummer: <ID> Platte <PLATE> Kontonummer: <ACCOUNT>\nSSN : <SSN> Lizenznummer: <DLN> Adresse : <STREET> <ZIP>


In [31]:
# obfuscated
data_dicts = [
    {
        "text": sample_text,
        "masking_policy": "obfuscated"
    }
]

process_data_and_invoke_realtime_endpoint(data_dicts)

Unnamed: 0,predictions
0,Zusammenfassung : Hansgeorg Burger wird am Morgen des 12 Dezember 2018 ins Krankenhaus am Zoo Karlsruhe eingeliefert. Herr Hansgeorg Burger ist 10 Jahre alt und hat zu viel Wasser in den Beinen. Persönliche Daten :\nID-Nummer: Z6109604V Platte W-UJ811 Kontonummer: 192837465738\nSSN : 91478295A213 Lizenznummer: Y865HQI6N62 Adresse : Kösterring 4/1 95284


### **Input format**: Array of Text Documents

Use an array containing multiple text documents. Each element represents a separate text document.

```json
{
    "text": [
        "Text document 1",
        "Text document 2",
        ...
    ]
}
```

In [32]:
# masked (default-policy)
data_dicts = [
    {
        "text": docs
    }
]

process_data_and_invoke_realtime_endpoint(data_dicts)

Unnamed: 0,predictions
0,"Herr <PATIENT> - Telefon: <ID> - Beruf: <PROFESSION> an der <ORGANIZATION>. Er berichtet über Krankenhausaufenthalte im <HOSPITAL>, und die Akte zeigt weitere Krankenhausaufenthalte, darunter <HOSPITAL> im Jahr <DATE>, <HOSPITAL> am <DATE>, <HOSPITAL> im Jahr <DATE>."
1,"Dr. <DOCTOR> - <USERNAME>, <HOSPITAL>, <STREET>, <CITY>"
2,"Er arbeitete bis <DATE> - <PROFESSION> bei <ORGANIZATION> in <CITY>, verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. <DOCTOR> im <DATE>."
3,"Koronare Herzkrankheit : s/p ant SEMI + Stent LAD <DATE> , Dr . <DOCTOR> , ETT <CITY> <DATE> - neg . Scan auf Ischämie ."


In [33]:
# obfuscated
data_dicts = [
    {
        "text": docs,
        "masking_policy": "obfuscated"
    }
]

process_data_and_invoke_realtime_endpoint(data_dicts)

Unnamed: 0,predictions
0,"Herr Jolanta Ullmann - Telefon: 1610960454 - Beruf: Webentwickler an der SYSCO. Er berichtet über Krankenhausaufenthalte im Sankt Elisabeth Krankenhaus Leipzig, und die Akte zeigt weitere Krankenhausaufenthalte, darunter Universitätsklinikum Göttingen im Jahr 2001, St. Petrus Klinikum am 28 Juli 2000, Krankenhaus Nabburg im Jahr 05/1966."
1,"Dr. Karl-Dieter Lukas - Jupiter.Schürmann, Klinikum Hannover-Süd, Heinweg 2/9, Hannover"
2,"Er arbeitete bis 24.08.1940 - IT-Administrator bei Belkin in Teterow, verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. Sorgatz Junker im September."
3,"Koronare Herzkrankheit : s/p ant SEMI + Stent LAD Aug/1963 , Dr . Simon Reuter , ETT Hohenmölsen 24.02.2009 - neg . Scan auf Ischämie ."


### C. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [34]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

INFO:sagemaker:Deleting endpoint with name: de-deid-clinical
INFO:sagemaker:Deleting endpoint configuration with name: de-deid-clinical


## 3. Batch inference

In [35]:
import os

validation_file_name_1 = "input_1.json"
validation_file_name_2 = "input_2.json"
validation_file_name_3 = "input_3.json"
validation_file_name_4 = "input_4.json"

validation_input_path = f"s3://{s3_bucket}/{model_name}/validation-input-json/batch"
validation_output_path = f"s3://{s3_bucket}/{model_name}/validation-output-json/batch"

input_dir = 'inputs/batch'
output_dir = 'outputs/batch'

os.makedirs(input_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)

In [36]:
import json

def write_and_upload_to_s3(json_input_data, file_name):

    json_data = json.dumps(json_input_data, ensure_ascii=False)

    with open(file_name, "w", encoding="utf-8") as f:
        f.write(json_data)

    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input-json/batch/{os.path.basename(file_name)}",
        Body=(bytes(json_data.encode("UTF-8"))),
    )

In [37]:
# Define input JSON data for each validation file
input_json_data = {
    validation_file_name_1: {"text": docs},
    validation_file_name_2: {"text": docs, "masking_policy": "obfuscated"},
    validation_file_name_3: {"text": docs, "masking_policy": "masked_fixed_length_chars"},
    validation_file_name_4: {"text": docs, "masking_policy": "masked_with_chars"},
}

# Write and upload each input JSON data to S3
for file_name, json_data in input_json_data.items():
    write_and_upload_to_s3(json_data, f"{input_dir}/{file_name}")

In [None]:
# Initialize a SageMaker Transformer object for making predictions
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
)
transformer.transform(validation_input_path, content_type=content_type)
transformer.wait()

In [39]:
from urllib.parse import urlparse

def process_s3_output_and_save(validation_file_name, output_file_name):

    output_file_path = f"{output_dir}/{output_file_name}"
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}/{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    df = pd.DataFrame(data)
    display(df)

    # Save the data to the output file
    with open(output_file_path, 'w', encoding='utf-8') as f_out:
        json.dump(data, f_out, indent=4, ensure_ascii=False)

#### masked (default-policy)

In [40]:
process_s3_output_and_save(validation_file_name_1, "out_1.out")

Unnamed: 0,predictions
0,"Herr <PATIENT> - Telefon: <ID> - Beruf: <PROFESSION> an der <ORGANIZATION>. Er berichtet über Krankenhausaufenthalte im <HOSPITAL>, und die Akte zeigt weitere Krankenhausaufenthalte, darunter <HOSPITAL> im Jahr <DATE>, <HOSPITAL> am <DATE>, <HOSPITAL> im Jahr <DATE>."
1,"Dr. <DOCTOR> - <USERNAME>, <HOSPITAL>, <STREET>, <CITY>"
2,"Er arbeitete bis <DATE> - <PROFESSION> bei <ORGANIZATION> in <CITY>, verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. <DOCTOR> im <DATE>."
3,"Koronare Herzkrankheit : s/p ant SEMI + Stent LAD <DATE> , Dr . <DOCTOR> , ETT <CITY> <DATE> - neg . Scan auf Ischämie ."


#### obfuscated

In [41]:
process_s3_output_and_save(validation_file_name_2, "out_2.out")

Unnamed: 0,predictions
0,"Herr Jolanta Ullmann - Telefon: 1610960454 - Beruf: Webentwickler an der SYSCO. Er berichtet über Krankenhausaufenthalte im Sankt Elisabeth Krankenhaus Leipzig, und die Akte zeigt weitere Krankenhausaufenthalte, darunter Universitätsklinikum Göttingen im Jahr 2001, St. Petrus Klinikum am 28 Juli 2000, Krankenhaus Nabburg im Jahr 05/1966."
1,"Dr. Karl-Dieter Lukas - Jupiter.Schürmann, Klinikum Hannover-Süd, Heinweg 2/9, Hannover"
2,"Er arbeitete bis 24.08.1940 - IT-Administrator bei Belkin in Teterow, verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. Sorgatz Junker im September."
3,"Koronare Herzkrankheit : s/p ant SEMI + Stent LAD Aug/1963 , Dr . Simon Reuter , ETT Hohenmölsen 24.02.2009 - neg . Scan auf Ischämie ."


#### masked_fixed_length_chars

In [42]:
process_s3_output_and_save(validation_file_name_3, "out_3.out")

Unnamed: 0,predictions
0,"Herr **** - Telefon: **** - Beruf: **** an der ****. Er berichtet über Krankenhausaufenthalte im ****, und die Akte zeigt weitere Krankenhausaufenthalte, darunter **** im Jahr ****, **** am ****, **** im Jahr ****."
1,"Dr. **** - ****, ****, ****, ****"
2,"Er arbeitete bis **** - **** bei **** in ****, verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. **** im ****."
3,"Koronare Herzkrankheit : s/p ant SEMI + Stent LAD **** , Dr . **** , ETT **** **** - neg . Scan auf Ischämie ."


#### masked_with_chars

In [43]:
process_s3_output_and_save(validation_file_name_4, "out_4.out")

Unnamed: 0,predictions
0,"Herr [**] - Telefon: [********] - Beruf: [**] an der [*****************************]. Er berichtet über Krankenhausaufenthalte im [*******************], und die Akte zeigt weitere Krankenhausaufenthalte, darunter [**************] im Jahr [**], [************************] am [**********], [******************] im Jahr [*****]."
1,"Dr. [********************] - [**], [***********************************], [****************], [*****]"
2,"Er arbeitete bis [********] - [*****] bei [***************] in [*****], verbrannte sich an beiden Beinen - entwickelte Geschwüre. Der Patient konsultierte Dr. [***] im [*******]."
3,"Koronare Herzkrankheit : s/p ant SEMI + Stent LAD [******] , Dr . [**********] , ETT [***********] [********] - neg . Scan auf Ischämie ."


In [44]:
model.delete_model()

INFO:sagemaker:Deleting model with name: de-deid-clinical-2024-04-02-17-49-33-976


### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

