## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page <font color='red'> For Seller to update:[Title_of_your_product](Provide link to your marketplace listing of your product).</font>
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Summarization

Text summarization involves condensing lengthy textual content into a brief format while retaining its essential information and significance. The primary aim is to extract key details from a text document and present them concisely and comprehensibly. It plays a crucial role across various domains, including healthcare, aiding in efficient communication and decision-making.


- **Model**: `en.summarize.radiology.pipeline`
- **Model Description**: This pretrained pipeline is built on the top of `summarizer_radiology` model, which is capable of summarizing radiology reports while preserving the important information such as imaging tests and findings.


In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [2]:
import base64
import json
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
import boto3
from IPython.display import Image, display
from PIL import Image as ImageEdit
import numpy as np

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [3]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [4]:
model_name = "en-summarize-radiology-pipeline"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.xlarge"

### A. Create an endpoint

In [5]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

--------!

Once endpoint has been created, you would be able to perform real-time inference.

### B. Perform real-time inference

In [6]:
import json
import pandas as pd
import os
import boto3

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

def process_data_and_invoke_realtime_endpoint(data, content_type, accept):

    content_type_to_format = {'application/json': 'json', 'application/jsonlines': 'jsonl'}
    input_format = content_type_to_format.get(content_type)
    if content_type not in content_type_to_format.keys() or accept not in content_type_to_format.keys():
        raise ValueError("Invalid content_type or accept. It should be either 'application/json' or 'application/jsonlines'.")

    i = 1
    input_dir = f'inputs/real-time/{input_format}'
    output_dir = f'outputs/real-time/{input_format}'
    s3_input_dir = f"{model_name}/validation-input/real-time/{input_format}"
    s3_output_dir = f"{model_name}/validation-output/real-time/{input_format}"

    input_file_name = f'{input_dir}/input{i}.{input_format}'
    output_file_name = f'{output_dir}/{os.path.basename(input_file_name)}.out'

    while os.path.exists(input_file_name) or os.path.exists(output_file_name):
        i += 1
        input_file_name = f'{input_dir}/input{i}.{input_format}'
        output_file_name = f'{output_dir}/{os.path.basename(input_file_name)}.out'

    os.makedirs(os.path.dirname(input_file_name), exist_ok=True)
    os.makedirs(os.path.dirname(output_file_name), exist_ok=True)

    input_data = json.dumps(data) if content_type == 'application/json' else data

    # Write input data to file
    with open(input_file_name, 'w') as f:
        f.write(input_data)

    # Upload input data to S3
    s3_client.put_object(Bucket=s3_bucket, Key=f"{s3_input_dir}/{os.path.basename(input_file_name)}", Body=bytes(input_data.encode('UTF-8')))

    # Invoke the SageMaker endpoint
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=input_data,
    )

    # Read response data
    response_data = json.loads(response["Body"].read().decode("utf-8")) if accept == 'application/json' else response['Body'].read().decode('utf-8')

    # Save response data to file
    with open(output_file_name, 'w') as f_out:
        if accept == 'application/json':
            json.dump(response_data, f_out, indent=4)
        else:
            for item in response_data.split('\n'):
                f_out.write(item + '\n')

    # Upload response data to S3
    output_s3_key = f"{s3_output_dir}/{os.path.basename(output_file_name)}"
    if accept == 'application/json':
        s3_client.put_object(Bucket=s3_bucket, Key=output_s3_key, Body=json.dumps(response_data).encode('UTF-8'))
    else:
        s3_client.put_object(Bucket=s3_bucket, Key=output_s3_key, Body=response_data)

    return response_data

### Initial setup

In [7]:
docs = [
    
    """INTERPRETATION: No significant pericardial effusion was identified.
    The aortic root dimensions are within normal limits. The four cardiac chambers dimensions are within normal limits. No discrete regional wall motion abnormalities are identified. The left ventricular systolic function is preserved with an estimated ejection fraction of 60%. The left ventricular wall thickness is within normal limits.

    The aortic valve is trileaflet with adequate excursion of the leaflets. The mitral valve and tricuspid valve motion is unremarkable. The pulmonic valve is not well visualized.

    Color flow and conventional Doppler interrogation of cardiac valvular structures revealed mild mitral regurgitation and mild tricuspid regurgitation with an RV systolic pressure calculated to be 28 mmHg. Doppler interrogation of the mitral in-flow pattern is within normal limits for age.

    IMPRESSION:

    1-Preserved left ventricular systolic function.

    2-Mild mitral regurgitation.

    3-Mild tricuspid regurgitation.""",


    """CT ABDOMEN WITHOUT CONTRAST AND CT PELVIS WITHOUT CONTRAST
    REASON FOR EXAM: Evaluate for retroperitoneal hematoma, the patient has been following, is currently on Coumadin.


    CT ABDOMEN: There is no evidence for a retroperitoneal hematoma.


    The liver, spleen, adrenal glands, and pancreas are unremarkable. Within the superior pole of the left kidney, there is a 3.9 cm cystic lesion. A 3.3 cm cystic lesion is also seen within the inferior pole of the left kidney. No calcifications are noted. The kidneys are small bilaterally.


    CT PELVIS: Evaluation of the bladder is limited due to the presence of a Foley catheter, the bladder is nondistended. The large and small bowels are normal in course and caliber. There is no obstruction.


    Bibasilar pleural effusions are noted.


    IMPRESSION:


    1-No evidence for retroperitoneal bleed.


    2-There are two left-sided cystic lesions within the kidney, correlation with a postcontrast study versus further characterization with an ultrasound is advised as the cystic lesions appear slightly larger as compared to the prior exam.


    3-The kidneys are small in size bilaterally.


    4-Bibasilar pleural effusions."""
]



sample_text = """INDICATIONS: Peripheral vascular disease with claudication.

RIGHT:
1. Normal arterial imaging of right lower extremity.
2. Peak systolic velocity is normal.
3. Arterial waveform is triphasic.
4. Ankle brachial index is 0.96.

LEFT:
1. Normal arterial imaging of left lower extremity.
2. Peak systolic velocity is normal.
3. Arterial waveform is triphasic throughout except in posterior tibial artery where it is biphasic.
4. Ankle brachial index is 1.06.

IMPRESSION: 
Normal arterial imaging of both lower lobes.
"""


### JSON

#### Example 1

  **Input format**:
  
  
```json
{
    "text": "Single text document"
}
```

In [8]:
input_json_data = {"text": sample_text}

data =  process_data_and_invoke_realtime_endpoint(input_json_data, content_type="application/json" , accept="application/json" )
pd.DataFrame(data)

Unnamed: 0,summary
0,"The patient has peripheral vascular disease with claudication. The right lower extremity shows normal arterial imaging, but the peak systolic velocity is normal. The arterial waveform is triphasic throughout, except for the posterior tibial artery, which is biphasic. The ankle brachial index is 0.96. The impression is normal arterial imaging of both lower lobes."


#### Example 2

  **Input format**:
  
  
```json
{
    "text": [
        "Text document 1",
        "Text document 2",
        ...
    ]
}
```

In [9]:
input_json_data = {"text": docs}

data =  process_data_and_invoke_realtime_endpoint(input_json_data, content_type="application/json" , accept="application/json" )
pd.DataFrame(data)

Unnamed: 0,summary
0,"The radiology report indicates that there is no significant pericardial effusion, and the aortic root dimensions, four cardiac chambers dimensions, and regional wall motion abnormalities are within normal limits. The left ventricular systolic function is preserved with an estimated ejection fraction of 60%, and the left ventricular wall thickness is within normal limits. The aortic valve is trileaflet with adequate excursion of the leaflets, and the mitral valve and tricuspid valve motion is unremarkable. The report also notes mild mitral regurgitation and mild tricuspid regurgitation with an RV systolic pressure calculated to be 28 mmHg. The impression is preserved left ventricular systolic function, mild mitral regurgitation, and mild tricuspid regurgitation."
1,"The patient underwent a CT abdomen without contrast and CT pelvis without contrast to evaluate for retroperitoneal hematoma. The findings showed no evidence of retroperitoneal hematoma, but there were two left-sided cystic lesions within the kidney, which were slightly larger than the prior exam. The kidneys were small in size bilaterally, and there were bibasilar pleural effusions. The bladder was limited due to the presence of a Foley catheter, and further characterization with an ultrasound is advised."


### JSON Lines


In [10]:
import json

def create_jsonl(records):
    json_records = []

    for text in records:
        record = {
            "text": text
        }
        json_records.append(record)

    json_lines = '\n'.join(json.dumps(record) for record in json_records)

    return json_lines

input_jsonl_data = create_jsonl(docs)

#### Example 1

  **Input format**:
  
```json
{"text": "Text document 1"}
{"text": "Text document 2"}
```

In [11]:
data = process_data_and_invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"summary": "The radiology report indicates that there is no significant pericardial effusion, and the aortic root dimensions, four cardiac chambers dimensions, and regional wall motion abnormalities are within normal limits. The left ventricular systolic function is preserved with an estimated ejection fraction of 60%, and the left ventricular wall thickness is within normal limits. The aortic valve is trileaflet with adequate excursion of the leaflets, and the mitral valve and tricuspid valve motion is unremarkable. The report also notes mild mitral regurgitation and mild tricuspid regurgitation with an RV systolic pressure calculated to be 28 mmHg. The impression is preserved left ventricular systolic function, mild mitral regurgitation, and mild tricuspid regurgitation."}
{"summary": "The patient underwent a CT abdomen without contrast and CT pelvis without contrast to evaluate for retroperitoneal hematoma. The findings showed no evidence of retroperitoneal hematoma, but there were

### C. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [12]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 3. Batch inference

In [13]:
import json
import os

input_dir = 'inputs/batch'
json_input_dir = f"{input_dir}/json"
jsonl_input_dir = f"{input_dir}/jsonl"

output_dir = 'outputs/batch'
json_output_dir = f"{output_dir}/json"
jsonl_output_dir = f"{output_dir}/jsonl"

os.makedirs(json_input_dir, exist_ok=True)
os.makedirs(jsonl_input_dir, exist_ok=True)
os.makedirs(json_output_dir, exist_ok=True)
os.makedirs(jsonl_output_dir, exist_ok=True)

validation_json_file_name = "input.json"

validation_jsonl_file_name = "input.jsonl"

validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/batch/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/batch/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/batch/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/batch/jsonl/"

def write_and_upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    if file_format == ".json":
        input_data = json.dumps(input_data)

    with open(file_name, "w") as f:
        f.write(input_data)

    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/batch/{file_format[1:]}/{os.path.basename(file_name)}",
        Body=(bytes(input_data.encode("UTF-8"))),
    )



In [14]:
input_jsonl_data = create_jsonl(docs)
input_json_data = {"text": docs}

write_and_upload_to_s3(input_json_data, f"{json_input_dir}/{validation_json_file_name}")

write_and_upload_to_s3(input_jsonl_data, f"{jsonl_input_dir}/{validation_jsonl_file_name}")

### JSON

In [None]:
# Initialize a SageMaker Transformer object for making predictions
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [16]:
from urllib.parse import urlparse

def process_s3_json_output_and_save(validation_file_name):

    output_file_path = f"{json_output_dir}/{validation_file_name}.out"
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    df = pd.DataFrame(data)
    display(df)

    # Save the data to the output file
    with open(output_file_path, 'w') as f_out:
        json.dump(data, f_out, indent=4)

In [17]:
process_s3_json_output_and_save(validation_json_file_name)

Unnamed: 0,summary
0,"The radiology report indicates that there is no significant pericardial effusion, and the aortic root dimensions, four cardiac chambers dimensions, and regional wall motion abnormalities are within normal limits. The left ventricular systolic function is preserved with an estimated ejection fraction of 60%, and the left ventricular wall thickness is within normal limits. The aortic valve is trileaflet with adequate excursion of the leaflets, and the mitral valve and tricuspid valve motion is unremarkable. The report also notes mild mitral regurgitation and mild tricuspid regurgitation with an RV systolic pressure calculated to be 28 mmHg. The impression is preserved left ventricular systolic function, mild mitral regurgitation, and mild tricuspid regurgitation."
1,"The patient underwent a CT abdomen without contrast and CT pelvis without contrast to evaluate for retroperitoneal hematoma. The findings showed no evidence of retroperitoneal hematoma, but there were two left-sided cystic lesions within the kidney, which were slightly larger than the prior exam. The kidneys were small in size bilaterally, and there were bibasilar pleural effusions. The bladder was limited due to the presence of a Foley catheter, and further characterization with an ultrasound is advised."


### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [19]:
from urllib.parse import urlparse

def process_s3_jsonlines_output_and_save(validation_file_name):

    output_file_path = f"{jsonl_output_dir}/{validation_file_name}.out"
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

    # Save the data to the output file
    with open(output_file_path, 'w') as f_out:
        for item in data.split('\n'):
            f_out.write(item + '\n')

In [20]:
process_s3_jsonlines_output_and_save(validation_jsonl_file_name)

{"summary": "The radiology report indicates that there is no significant pericardial effusion, and the aortic root dimensions, four cardiac chambers dimensions, and regional wall motion abnormalities are within normal limits. The left ventricular systolic function is preserved with an estimated ejection fraction of 60%, and the left ventricular wall thickness is within normal limits. The aortic valve is trileaflet with adequate excursion of the leaflets, and the mitral valve and tricuspid valve motion is unremarkable. The report also notes mild mitral regurgitation and mild tricuspid regurgitation with an RV systolic pressure calculated to be 28 mmHg. The impression is preserved left ventricular systolic function, mild mitral regurgitation, and mild tricuspid regurgitation."}
{"summary": "The patient underwent a CT abdomen without contrast and CT pelvis without contrast to evaluate for retroperitoneal hematoma. The findings showed no evidence of retroperitoneal hematoma, but there were

In [21]:
model.delete_model()

INFO:sagemaker:Deleting model with name: en-summarize-radiology-pipeline-2024-05-08-09-18-33-234


### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

