## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page [Clinical De-identification for PDF (Signature aware)](https://aws.amazon.com/marketplace/pp/prodview-jmtngbxhcmmfk)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## DEID Multi Model


- **Model**: `pdf_deid_multi_model_context_signature_aware_pipeline`
- **Model Description**:  This pipeline is designed to deidentify all printed text from (Multiple and Single) page PDF.


In [1]:
model_package_arn = 'customer to specify model package arn'

In [2]:

import shutil

from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
import boto3
from IPython.display import Image, display, IFrame
from PIL import Image as ImageEdit
from urllib.parse import urlparse
import numpy as np
from IPython.display import display, IFrame, Image

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [3]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [4]:
model_name = "pdf_deid_multi_model_context_signature_aware_pipeline"

real_time_inference_instance_type = "ml.c5.9xlarge"
batch_transform_inference_instance_type = "ml.c5.9xlarge"

### A. Create an endpoint

In [5]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

-----------!

Once endpoint has been created, you would be able to perform real-time inference.

In [6]:
import pandas as pd
import os
import io


# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

def process_data_and_invoke_realtime_endpoint(pdf_file,content_type):
    with open(pdf_file, "rb") as file:
        pdf_data = file.read()

    i = 1
    input_extension = os.path.splitext(pdf_file)[-1]
    input_file_name = f'inputs/real-time/input{i}{input_extension}'

    while os.path.exists(input_file_name):
        i += 1
        input_file_name = f'inputs/real-time/input{i}{input_extension}'

    output_file_name = f'outputs/real-time/{os.path.basename(input_file_name)}.out'

    os.makedirs(os.path.dirname(input_file_name), exist_ok=True)
    os.makedirs(os.path.dirname(output_file_name), exist_ok=True)

    shutil.copy2(pdf_file, input_file_name)

    # Assuming s3_client is defined and used correctly
    validation_input_file_path = f"{model_name}/validation-input/real-time/{os.path.basename(input_file_name)}"
    s3_client.upload_file(pdf_file, s3_bucket, validation_input_file_path)

    # Assuming sm_runtime is defined and used correctly
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept="application/json",
        Body=pdf_data,
    )

    # Process response
    with io.FileIO(output_file_name, 'w') as file:
        for b in response['Body']._raw_stream:
            file.write(b)

    return output_file_name

### JSON

#### Example 1 - Multiple Page PDF

In [None]:
!wget --no-check-certificate https://raw.githubusercontent.com/JohnSnowLabs/pdf-deid-dataset/cc91a94e7fb044591d987b7650807731b73980fb/PDF_Original/Medium/PDF_Deid_Deidentification_Medium_7.pdf -O example_input_1.pdf

In [8]:
#IFrame("./example_input_1.pdf", width=1000, height=800)

In [9]:
data =  process_data_and_invoke_realtime_endpoint(pdf_file='./example_input_1.pdf', content_type='application/octet-stream')

In [None]:
#IFrame(data, width=1000, height=800)

#### Example 2

In [None]:
!wget --no-check-certificate https://raw.githubusercontent.com/JohnSnowLabs/pdf-deid-dataset/cc91a94e7fb044591d987b7650807731b73980fb/PDF_Original/Easy/PDF_Deid_Deidentification_0.pdf -O input_printed_1.pdf

In [12]:
#IFrame("./example_input_2.pdf", width=1000, height=800)

In [13]:
data =  process_data_and_invoke_realtime_endpoint(pdf_file='./example_input_2.pdf', content_type='application/octet-stream')

In [None]:
#IFrame(data, width=1000, height=800)

### C. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [19]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 3. Batch inference

In [20]:
validation_file_name1 = "example_input_1.pdf"
validation_file_name2 = "example_input_2.pdf"

validation_input_path = f"s3://{s3_bucket}/{model_name}/validation-input/batch"
validation_output_path = f"s3://{s3_bucket}/{model_name}/validation-output/batch"

input_dir = 'inputs/batch'
output_dir = 'outputs/batch'

os.makedirs(input_dir, exist_ok=True)
os.makedirs(output_dir, exist_ok=True)

In [21]:
def upload_pdf_to_s3(local_file_path):
    shutil.copy2(local_file_path, input_dir)
    base_file_name = os.path.basename(local_file_path)
    validation_input_file_path = f"{model_name}/validation-input/batch/{base_file_name}"
    s3_client.upload_file(local_file_path, s3_bucket, validation_input_file_path)

In [22]:
upload_pdf_to_s3(validation_file_name1)
upload_pdf_to_s3(validation_file_name2)

In [None]:
# Initialize a SageMaker Transformer object for making predictions
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_path
)

transformer.transform(validation_input_path, content_type='application/octet-stream')
transformer.wait()

In [24]:
parsed_url = urlparse(transformer.output_path)
file_key = f"{parsed_url.path[1:]}/input_printed.pdf.out"
response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)
with io.FileIO(f"{output_dir}/example_out_1.pdf", 'w') as file:
    for b in response['Body']._raw_stream:
        file.write(b)

IFrame(f"{output_dir}/example_out_1.pdf", width=600, height=300)

In [None]:
parsed_url = urlparse(transformer.output_path)
file_key = f"{parsed_url.path[1:]}/input_printed_1.pdf.out"
response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)
with io.FileIO(f"{output_dir}/example_out_2.pdf", 'w') as file:
    for b in response['Body']._raw_stream:
        file.write(b)

IFrame(f"{output_dir}/example_out_2.pdf", width=600, height=300)

In [28]:
model.delete_model()

INFO:sagemaker:Deleting model with name: en-printed-transformer-extraction-pipel-2024-07-24-11-48-08-864


### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

