## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page: [Clinical De-identification for Romanian](https://aws.amazon.com/marketplace/pp/prodview-ytlqgqtgkou7s)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Clinical Deidentification Romanian

Deidentification is essential for safeguarding patient privacy in clinical data, including texts, PDFs, images, and DICOM files containing Protected Health Information (PHI). PHI encompasses various health-related data, including common identifiers such as name, address, birth date, and Social Security Number.

- **Model**: [ro.deid.clinical](https://nlp.johnsnowlabs.com/2023/06/17/clinical_deidentification_ro.html)
- **Model Description**: This pipeline is trained with w2v_cc_300d Romanian embeddings and can be used to deidentify PHI information from medical texts in Romanian. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask, fake or obfuscate the following entities: AGE, CITY, COUNTRY, DATE, DOCTOR, EMAIL, FAX, HOSPITAL, IDNUM, LOCATION-OTHER, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STREET, ZIP, ACCOUNT, LICENSE, PLATE

In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [None]:
import json
import os
import boto3
import pandas as pd
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from IPython.display import display
from urllib.parse import urlparse

In [None]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [4]:
model_name = "ro-deid-clinical"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.2xlarge"

## 2. Create a deployable model from the model package.

In [5]:
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session,
)

### Input Format

To use the model, you need to provide input in one of the following supported formats:

#### JSON Format

Provide input as JSON. We support two variations within this format:

1. **Array of Text Documents**: 
   Use an array containing multiple text documents. Each element represents a separate text document.

   ```json
   {
       "text": [
           "Text document 1",
           "Text document 2",
           ...
       ]
   }

    ```

2. **Single Text Document**:
   Provide a single text document as a string.


   ```json
    {
        "text": "Single text document"
    }
   ```

#### JSON Lines (JSONL) Format

Provide input in JSON Lines format, where each line is a JSON object representing a text document.

```
{"text": "Text document 1"}
{"text": "Text document 2"}
```

### Important Parameter

- **masking_policy**: `str`

    Users can select a masking policy to determine how sensitive entities are handled:

    Example: "**LABORATOR RADStrada AbabeidaSacueni, 354573i, 3545730265-21011065-210110 ,OFFICE@SMURDICE@SMURD**"

    - **masked**: Default policy that masks entities with their type.

      -> 'LABORATOR RADIOLOGIE, `<STREET>`, `<CITY>`, `<ZIP>` , TEL : `<PHONE>` , E-MAIL: `<EMAIL>`'

    - **obfuscated**: Replaces sensitive entities with random values of the same type.

      -> 'LABORATOR RADIOLOGIE, `Intrarea Diaconescu`, `Aiud`, `302784` , TEL : `0263 144 119` , E-MAIL: `jeneltudor@email.ro`'

    - **masked_fixed_length_chars**: Masks entities with a fixed length of asterisks (\*).

      -> 'LABORATOR RADIOLOGIE, `****`, `****`, `****` , TEL : `****` , E-MAIL: `****`'

    - **masked_with_chars**: Masks entities with asterisks (\*).

      -> 'LABORATOR RADIOLOGIE, [`***********`], [`*****`], [`****`] , TEL : [`*********`] , E-MAIL: [`**********`]'
    
You can specify these parameters in the input as follows:

```json
{
    "text": [
        "Text document 1",
        "Text document 2",
        ...
    ],
    "masking_policy": "masked",
}
```

## 3. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Deploy the SageMaker model to an endpoint

In [None]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type, 
    endpoint_name=model_name,
)

Once endpoint has been created, you would be able to perform real-time inference.

In [7]:
def invoke_realtime_endpoint(record, content_type="application/json", accept="application/json"):
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=json.dumps(record) if content_type == "application/json" else record,
    )

    response_body = response["Body"].read().decode("utf-8")

    if accept == "application/json":
        return json.loads(response_body)
    elif accept == "application/jsonlines":
        return response_body
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

### Initial Setup

In [8]:
docs = [
'''LABORATOR RADIOLOGIE, Strada Ababei, Sacueni, 354573 , TEL : 0265-210110 , E-MAIL: OFFICE@SMURD
Data... 
LABORATOR RADIOLOGIE, Strada Ababei, Sacueni, 354573 , TEL : 0265-210110 , E-MAIL: OFFICE@SMURD
Data setului de analize : 10 May 2022 18:45:00
Inregistrat de : PAUL DANIEL TUDOR 
Nume si prenume : MILASAN DANUT 
Varsta : 45 , Sex : Masculin
TEL : 456 45 789 2 
E-mail : damut__d@hotmail.com 
C.N.P : 1761218264378 
Cod pacient : 627480543615010
Licență : K0004567S, 
Înmatriculare : BD32904, 
Cont : SEWS324095192710408,
 
LA ORA : 10/12/2021 14:54_ 
VALIDAT DE :Dr. IOANA GHIBAN ; 
INVESTIGATII : CT TORACICÃ NATIVÃ''',

'''Medic : Dr. Agota EVELYN, 
C.N.P : 2450502264401, Data setului de analize: 25 May 2022 
Varsta : 77,... 
Medic : Dr. Agota EVELYN, 
C.N.P : 2450502264401, Data setului de analize: 25 May 2022 
Varsta : 77, Nume si Prenume : BUREAN MARIA 
Tel: +40(235)413773, 
E-mail : maria@gmail.com,
Licență : B004256985M, 
Înmatriculare : CD205113, 
Cont : FXHZ7170951927104999, 
Spitalul Pentru Ochi, Drumul de Deal ,Nr. 972 Vaslui, 737405

bolus 10 ml Iomeron 350 urmate de 30 ml flush salin cu 5 ml/s .
Se continua cu 65 ml contrast si 30 ml flush salin , cu acelasi flux .
Se efectueaza o examinare angio-CT coronariana cu achizitii spirale cu reconstructii retrospective la o frecventa cardiaca medie de 60/min .
Opacifiere buna a patului vascular si cavitatilor cardiace .
Fara incidente sau accidente pe parcursul examinarii .

CONCLUZII : - Stare dupa stentare ramus intermedius si artera coronara dreapta .
Aspect corespunzator al stenturilor , cu minima stenoza la nivelul ACD si cu flux distal corespunzator.'''

]


sample_text = """TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP 
Medic Laborator : BURIAN MONICA
PRO... 
TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP 
Medic Laborator : BURIAN MONICA
PROTOCOL DE EXPLORARE PRIN COMPUTER TOMOGRAFIE DATA : 11 FEBRUARIE 2022 
Nume si prenume : RUS OLIMPIU DUMITRU , 49 ani, Sex : M, TEL : +40 67 5745, E-mail : olimpiu_d@gmail.com
CNP : 1730104060763, Licență : E000198985A, Înmatriculare : CD32156, Cont : LZWZ7170951927104999,
 
Solicitare : Angio CT coronarian , kg : 100 Dg . de trimitere Stare post-AVCI . Stare post-trombectomie . Proces inlocuitor de spatiu ventricular stang
Trimis de : Sectia Clinica Neurologie 1a SCJU Tirgu Mures
Medic : Dr.Melanie Dana
CONCLUZII : - Tromb ventricular stang de 20/40/17 mm , - Leziuni sechelar ischemice in segmentele apicale septal si inferior , cu aspect anevrismal apical VS - Leziune non-ischemica/MINOCA ?
48	mediocardiaca septala - Examenul AngioCT coronarian nu pune in evidenta ateromatoza sau stenoze coronariene semnificative."""

### JSON

#### Example 1: masked (default-policy)

In [9]:
input_json_data = {"text": sample_text}
response_json = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")
print(response_json["predictions"][0])

TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP 
Medic Laborator : <DOCTOR>
PRO... 
TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP 
Medic Laborator : <DOCTOR>
PROTOCOL DE EXPLORARE PRIN COMPUTER TOMOGRAFIE DATA : <DATE> 
Nume si prenume : <PATIENT> , <AGE> ani, Sex : M, TEL : <PHONE>, E-mail : <EMAIL>
CNP : <LICENSE>, Licență : <LICENSE>, Înmatriculare : <PLATE>, Cont : <ACCOUNT>,
 
Solicitare : Angio CT coronarian , kg : 100 Dg . de trimitere Stare post-AVCI . Stare post-trombectomie . Proces inlocuitor de spatiu ventricular stang
Trimis de : Sectia Clinica Neurologie 1a <HOSPITAL>
Medic : <DOCTOR>
CONCLUZII : - Tromb ventricular stang de 20/40/17 mm , - Leziuni sechelar ischemice in segmentele apicale septal si inferior , cu aspect anevrismal apical VS - Leziune non-ischemica/MINOCA ?
48	mediocardiaca septala - Examenul AngioCT coronarian nu pune in evidenta ateromatoza sau stenoze coronariene semnificative.


#### Example 2: obfuscated

In [11]:
input_json_data = {"text": sample_text, "masking_policy": "obfuscated"}
response_json = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")
print(response_json["predictions"][0])

TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP 
Medic Laborator : DOINA GHEORGHIU
PRO... 
TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP 
Medic Laborator : DOINA GHEORGHIU
PROTOCOL DE EXPLORARE PRIN COMPUTER TOMOGRAFIE DATA : 11 FEBRUARIE 2022 
Nume si prenume : DINUT LILIANA DANIELA , 58 ani, Sex : M, TEL : +57 96 2652, E-mail : lstancu@zzup.ro
CNP : 4687475797698, Licență : R777403032X, Înmatriculare : HM81429, Cont : SEAE8282288015490352,
 
Solicitare : Angio CT coronarian , kg : 100 Dg . de trimitere Stare post-AVCI . Stare post-trombectomie . Proces inlocuitor de spatiu ventricular stang
Trimis de : Sectia Clinica Neurologie 1a Institutul Clinic de Urologie si Transplant Renal
Medic : Agata Voinea
CONCLUZII : - Tromb ventricular stang de 20/40/17 mm , - Leziuni sechelar ischemice in segmentele apicale septal si inferior , cu aspect anevrismal apical VS - Leziune non-ischemica/MINOCA ?
48	mediocardiaca septala - Examenul AngioCT coronarian nu pune 

### JSON Lines

In [13]:
def create_jsonl(records, masking_policy=None):
    json_records = []

    if isinstance(records, str):
        records = [records]

    for text in records:
        record = {"text": text}

        if masking_policy is not None:
            record["masking_policy"] = masking_policy
        json_records.append(record)

    json_lines = '\n'.join(json.dumps(record, ensure_ascii=False) for record in json_records)
    return json_lines


#### Example 1: masked (default-policy)

In [14]:
input_jsonl_data = create_jsonl(sample_text, masking_policy="masked")
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"predictions": "TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP \nMedic Laborator : <DOCTOR>\nPRO... \nTOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP \nMedic Laborator : <DOCTOR>\nPROTOCOL DE EXPLORARE PRIN COMPUTER TOMOGRAFIE DATA : <DATE> \nNume si prenume : <PATIENT> , <AGE> ani, Sex : M, TEL : <PHONE>, E-mail : <EMAIL>\nCNP : <LICENSE>, Licență : <LICENSE>, Înmatriculare : <PLATE>, Cont : <ACCOUNT>,\n \nSolicitare : Angio CT coronarian , kg : 100 Dg . de trimitere Stare post-AVCI . Stare post-trombectomie . Proces inlocuitor de spatiu ventricular stang\nTrimis de : Sectia Clinica Neurologie 1a <HOSPITAL>\nMedic : <DOCTOR>\nCONCLUZII : - Tromb ventricular stang de 20/40/17 mm , - Leziuni sechelar ischemice in segmentele apicale septal si inferior , cu aspect anevrismal apical VS - Leziune non-ischemica/MINOCA ?\n48\tmediocardiaca septala - Examenul AngioCT coronarian nu pune in evidenta ateromatoza sau stenoze coronariene semnificative."}


#### Example 2: obfuscated

In [16]:
input_jsonl_data = create_jsonl(sample_text, masking_policy="obfuscated")
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"predictions": "TOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP \nMedic Laborator : DOINA GHEORGHIU\nPRO... \nTOMOGRAFIA COMPUTERIZATA A CREIERULUI INVESTIGATII REZULTAT DLP \nMedic Laborator : DOINA GHEORGHIU\nPROTOCOL DE EXPLORARE PRIN COMPUTER TOMOGRAFIE DATA : 11 FEBRUARIE 2022 \nNume si prenume : DINUT LILIANA DANIELA , 58 ani, Sex : M, TEL : +57 96 2652, E-mail : lstancu@zzup.ro\nCNP : 4687475797698, Licență : R777403032X, Înmatriculare : HM81429, Cont : SEAE8282288015490352,\n \nSolicitare : Angio CT coronarian , kg : 100 Dg . de trimitere Stare post-AVCI . Stare post-trombectomie . Proces inlocuitor de spatiu ventricular stang\nTrimis de : Sectia Clinica Neurologie 1a Institutul Clinic de Urologie si Transplant Renal\nMedic : Agata Voinea\nCONCLUZII : - Tromb ventricular stang de 20/40/17 mm , - Leziuni sechelar ischemice in segmentele apicale septal si inferior , cu aspect anevrismal apical VS - Leziune non-ischemica/MINOCA ?\n48\tmediocardiaca septala - Exame

### B. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Batch inference

In [19]:
validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/jsonl/"

def upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/{file_format[1:]}/{file_name}",
        Body=input_data.encode("UTF-8"),
    )

In [20]:
# Create JSON and JSON Lines data
input_json_data = {
    "input1.json": json.dumps({"text": docs, "masking_policy": "masked"}, ensure_ascii=False),
    "input2.json": json.dumps({"text": docs, "masking_policy": "obfuscated"}, ensure_ascii=False),
    "input3.json": json.dumps({"text": docs, "masking_policy": "masked_fixed_length_chars"}, ensure_ascii=False),
    "input4.json": json.dumps({"text": docs, "masking_policy": "masked_with_chars"}, ensure_ascii=False),
}

input_jsonl_data = {
    "input1.jsonl": create_jsonl(docs, masking_policy="masked"),
    "input2.jsonl": create_jsonl(docs, masking_policy="obfuscated"),
    "input3.jsonl": create_jsonl(docs, masking_policy="masked_fixed_length_chars"),
    "input4.jsonl": create_jsonl(docs, masking_policy="masked_with_chars")
}

# Upload JSON and JSON Lines data to S3
for file_name, data in input_json_data.items():
    upload_to_s3(data, file_name)

for file_name, data in input_jsonl_data.items():
    upload_to_s3(data, file_name)


### JSON

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [22]:
def retrieve_json_output_from_s3(validation_file_name):
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    display(data)

In [23]:
masking_policies = {
    "masked": "input1.json",
    "obfuscated": "input2.json",
    "masked_fixed_length_chars": "input3.json",
    "masked_with_chars": "input4.json",
}

for policy_name, validation_file_name in masking_policies.items():
    print("-"*50, policy_name ,"-"*50)
    retrieve_json_output_from_s3(validation_file_name)
    print("\n")

-------------------------------------------------- masked --------------------------------------------------


{'predictions': ['LABORATOR RADIOLOGIE, <STREET>, <CITY>, <ZIP> , TEL : <PHONE> , E-MAIL: <EMAIL>\nData... \nLABORATOR RADIOLOGIE, <STREET>, <CITY>, <ZIP> , TEL : <PHONE> , E-MAIL: <EMAIL>\nData setului de analize : <DATE> 18:45:00\nInregistrat de : <DOCTOR> <DOCTOR> \nNume si prenume : <PATIENT> \nVarsta : <AGE> , Sex : Masculin\nTEL : <PHONE> \nE-mail : <EMAIL> \nC.N.P : <IDNUM> \nCod pacient : <IDNUM>\nLicență : <LICENSE>, \nÎnmatriculare : <PLATE>, \nCont : <ACCOUNT>,\n \nLA ORA : <DATE> 14:54_ \nVALIDAT DE :Dr. <DOCTOR> ; \nINVESTIGATII : CT TORACICÃ NATIVÃ',
  'Medic : Dr. <DOCTOR>, \nC.N.P : <IDNUM>, Data setului de analize: <DATE> \nVarsta : <AGE><PATIENT> \nMedic : Dr. <DOCTOR>, \nC.N.P : <IDNUM>, Data setului de analize: <DATE> \nVarsta : <AGE>, Nume si Prenume : <PATIENT> \nTel: <PHONE>, \nE-mail : <EMAIL>,\nLicență : <LICENSE>, \nÎnmatriculare : <PLATE>, \nCont : <ACCOUNT>, \n<HOSPITAL> ,<STREET> <CITY>, <ZIP>\n\nbolus 10 ml Iomeron 350 urmate de 30 ml flush salin cu 5 ml/s



-------------------------------------------------- obfuscated --------------------------------------------------


{'predictions': ['LABORATOR RADIOLOGIE, Drumul Tudor, Titu, 825268 , TEL : 7192-147447 , E-MAIL: jeneltudor@email.ro\nData... \nLABORATOR RADIOLOGIE, Drumul Tudor, Titu, 825268 , TEL : 7192-147447 , E-MAIL: jeneltudor@email.ro\nData setului de analize : 10 May 2022 18:45:00\nInregistrat de : EFTIMIE, SINICĂ DOINA GHEORGHIU \nNume si prenume : DOBRIN FIGAN \nVarsta : 42 , Sex : Masculin\nTEL : 529 52 630 1 \nE-mail : hcristea@141.ro \nC.N.P : 4694143195863 \nCod pacient : 916537258942747\nLicență : V7775296J, \nÎnmatriculare : CM81075, \nCont : OATD2344587666964477,\n \nLA ORA : 10/12/2021 14:54_ \nVALIDAT DE :Dr. IURIE CAIUS STOICA ; \nINVESTIGATII : CT TORACICÃ NATIVÃ',
  'Medic : Dr. R.T., \nC.N.P : 1527271195574, Data setului de analize: 25 May 2022 \nVarsta : 68DINUT LILIANA DANIELA \nMedic : Dr. R.T., \nC.N.P : 1527271195574, Data setului de analize: 25 May 2022 \nVarsta : 68, Nume si Prenume : DRAGAN MIHAI \nTel: +57(182)548668, \nE-mail : jeneltudor@email.ro,\nLicență : C7751290



-------------------------------------------------- masked_fixed_length_chars --------------------------------------------------


{'predictions': ['LABORATOR RADIOLOGIE, ****, ****, **** , TEL : **** , E-MAIL: ****\nData... \nLABORATOR RADIOLOGIE, ****, ****, **** , TEL : **** , E-MAIL: ****\nData setului de analize : **** 18:45:00\nInregistrat de : **** **** \nNume si prenume : **** \nVarsta : **** , Sex : Masculin\nTEL : **** \nE-mail : **** \nC.N.P : **** \nCod pacient : ****\nLicență : ****, \nÎnmatriculare : ****, \nCont : ****,\n \nLA ORA : **** 14:54_ \nVALIDAT DE :Dr. **** ; \nINVESTIGATII : CT TORACICÃ NATIVÃ',
  'Medic : Dr. ****, \nC.N.P : ****, Data setului de analize: **** \nVarsta : ******** \nMedic : Dr. ****, \nC.N.P : ****, Data setului de analize: **** \nVarsta : ****, Nume si Prenume : **** \nTel: ****, \nE-mail : ****,\nLicență : ****, \nÎnmatriculare : ****, \nCont : ****, \n**** ,**** ****, ****\n\nbolus 10 ml Iomeron 350 urmate de 30 ml flush salin cu 5 ml/s .\nSe continua cu 65 ml contrast si 30 ml flush salin , cu acelasi flux .\nSe efectueaza o examinare angio-CT coronariana cu achizitii



-------------------------------------------------- masked_with_chars --------------------------------------------------


{'predictions': ['LABORATOR RADIOLOGIE, [***********], [*****], [****] , TEL : [*********] , E-MAIL: [**********]\nData... \nLABORATOR RADIOLOGIE, [***********], [*****], [****] , TEL : [*********] , E-MAIL: [**********]\nData setului de analize : [*********] 18:45:00\nInregistrat de : [*********] [***] \nNume si prenume : [***********] \nVarsta : ** , Sex : Masculin\nTEL : [**********] \nE-mail : [******************] \nC.N.P : [***********] \nCod pacient : [*************]\nLicență : [*******], \nÎnmatriculare : [*****], \nCont : [*****************],\n \nLA ORA : [********] 14:54_ \nVALIDAT DE :Dr. [**********] ; \nINVESTIGATII : CT TORACICÃ NATIVÃ',
  'Medic : Dr. [**********], \nC.N.P : [***********], Data setului de analize: [*********] \nVarsta : **[**] \nMedic : Dr. [**********], \nC.N.P : [***********], Data setului de analize: [*********] \nVarsta : **, Nume si Prenume : [**********] \nTel: [************], \nE-mail : [*************],\nLicență : [*********], \nÎnmatriculare : [**





### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [25]:
def retrieve_jsonlines_output_from_s3(validation_file_name):

    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

In [26]:
masking_policies = {
    "masked": "input1.jsonl",
    "obfuscated": "input2.jsonl",
    "masked_fixed_length_chars": "input3.jsonl",
    "masked_with_chars": "input4.jsonl",
}

for policy_name, validation_file_name in masking_policies.items():
    print("-"*50, policy_name ,"-"*50)
    retrieve_jsonlines_output_from_s3(validation_file_name)
    print("\n")

-------------------------------------------------- masked --------------------------------------------------
{"predictions": "LABORATOR RADIOLOGIE, <STREET>, <CITY>, <ZIP> , TEL : <PHONE> , E-MAIL: <EMAIL>\nData... \nLABORATOR RADIOLOGIE, <STREET>, <CITY>, <ZIP> , TEL : <PHONE> , E-MAIL: <EMAIL>\nData setului de analize : <DATE> 18:45:00\nInregistrat de : <DOCTOR> <DOCTOR> \nNume si prenume : <PATIENT> \nVarsta : <AGE> , Sex : Masculin\nTEL : <PHONE> \nE-mail : <EMAIL> \nC.N.P : <IDNUM> \nCod pacient : <IDNUM>\nLicență : <LICENSE>, \nÎnmatriculare : <PLATE>, \nCont : <ACCOUNT>,\n \nLA ORA : <DATE> 14:54_ \nVALIDAT DE :Dr. <DOCTOR> ; \nINVESTIGATII : CT TORACICÃ NATIVÃ"}
{"predictions": "Medic : Dr. <DOCTOR>, \nC.N.P : <IDNUM>, Data setului de analize: <DATE> \nVarsta : <AGE><PATIENT> \nMedic : Dr. <DOCTOR>, \nC.N.P : <IDNUM>, Data setului de analize: <DATE> \nVarsta : <AGE>, Nume si Prenume : <PATIENT> \nTel: <PHONE>, \nE-mail : <EMAIL>,\nLicență : <LICENSE>, \nÎnmatriculare : <PLATE>,

In [None]:
model.delete_model()

### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

