## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page: [Extract entities from patient narratives](https://aws.amazon.com/marketplace/pp/prodview-y45tz4mzlqlym)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Explain Voice Of Patient (VOP) Pipeline


- **Model**: [explain_clinical_doc_vop_small_en](https://nlp.johnsnowlabs.com/2024/09/09/explain_clinical_doc_vop_small_en.html)
- **Model Description**: This pipeline extracts all clinical/medical entities from text, assigns assertion statuses, and establishes relationships between the extracted entities.


In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [None]:
import json
import os
import boto3
import pandas as pd
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from IPython.display import display
from urllib.parse import urlparse

In [None]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [4]:
model_name = "explain-clinical-doc-vop-small-en"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.2xlarge"

## 2. Create a deployable model from the model package.

In [5]:
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session,
)

### Input Format

To use the model, you need to provide input in one of the following supported formats:

#### JSON Format

Provide input as JSON. We support two variations within this format:

1. **Array of Text Documents**: 
   Use an array containing multiple text documents. Each element represents a separate text document.

   ```json
   {
       "text": [
           "Text document 1",
           "Text document 2",
           ...
       ]
   }

    ```

2. **Single Text Document**:
   Provide a single text document as a string.


   ```json
    {
        "text": "Single text document"
    }
   ```

#### JSON Lines (JSONL) Format

Provide input in JSON Lines format, where each line is a JSON object representing a text document.

```
{"text": "Text document 1"}
{"text": "Text document 2"}
```

## 3. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Deploy the SageMaker model to an endpoint

In [None]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type, 
    endpoint_name=model_name,
)

Once endpoint has been created, you would be able to perform real-time inference.

In [8]:
def invoke_realtime_endpoint(record, content_type="application/json", accept="application/json"):
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=json.dumps(record) if content_type == "application/json" else record,
    )

    response_body = response["Body"].read().decode("utf-8")

    if accept == "application/json":
        return json.loads(response_body)
    elif accept == "application/jsonlines":
        return response_body
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

### Initial Setup

In [9]:
docs = [
    "My nose is dry most of the time, when my nose is not dry, I have to clear it of hard mucus – when I do that, it makes my nose throb.",
    "Morning Symptoms:Excessive Mucus build upExcessive coughing 10 minutes after waking upFuzzy sinus headache every morningSore throat every morningBloated stomachExtremely tight chest (Has an inhaler for the asthma)Mid-day Symptoms:Fuzzy headache throughout the dayExtreme fatigue around 3pm every dayEnd-of-day Symptoms:Headache increases to be unbearable.",
]

sample_text = """I had been feeling really tired all the time and was losing weight without even trying. My doctor checked my sugar levels and they came out to be high. So, I have type 2 diabetes. He put me on two medications - I take metformin 500 mg twice a day, and glipizide 5 mg before breakfast and dinner. I also have to watch what I eat and try to exercise more. Now, I also have chronic acid reflux disease or GERD. Now I take daily omeprazole 20 mg to control the heartburn symptoms."""

### JSON

In [10]:
input_json_data = {"text": sample_text}
data = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")

ner_df = pd.DataFrame(data[0]["ner_predictions"])
assertion_df = pd.DataFrame(data[0]["assertion_predictions"])
relation_df = pd.DataFrame(data[0]["relation_predictions"])

In [11]:
ner_df

Unnamed: 0,ner_chunk,begin,end,ner_label,ner_confidence
0,really,19,24,Modifier,0.5723
1,tired,26,30,Symptom,0.9959
2,all the time,32,43,Duration,0.5925
3,losing weight,53,65,Symptom,0.81445
4,doctor,91,96,Employment,0.9901
5,sugar levels,109,120,Test,0.82229996
6,high,146,149,TestResult,0.9231
7,type 2 diabetes,163,177,Disease,0.40653333
8,He,180,181,Gender,1.0
9,metformin,218,226,Drug,0.9985


In [12]:
assertion_df

Unnamed: 0,ner_chunk,begin,end,ner_label,assertion,assertion_confidence
0,tired,26,30,Symptom,Present_Or_Past,0.9999
1,losing weight,53,65,Symptom,Present_Or_Past,0.9983
2,doctor,91,96,Employment,SomeoneElse,1.0
3,sugar levels,109,120,Test,Present_Or_Past,0.886
4,high,146,149,TestResult,Present_Or_Past,0.5596
5,type 2 diabetes,163,177,Disease,Present_Or_Past,0.5282
6,metformin,218,226,Drug,Present_Or_Past,0.958
7,glipizide,252,260,Drug,Present_Or_Past,0.8632
8,acid reflux disease,379,397,Disease,Present_Or_Past,1.0
9,GERD,402,405,Disease,Present_Or_Past,0.9688


In [13]:
relation_df

Unnamed: 0,ner_chunk1,ner_chunk1_begin,ner_chunk1_end,ner_label1,ner_chunk2,ner_chunk2_begin,ner_chunk2_end,ner_label2,relations,relation_confidence
0,sugar levels,109,120,Test,high,146,149,TestResult,Test-TestResult,1.0
1,metformin,218,226,Drug,500 mg,228,233,Dosage,Drug-Dosage,1.0
2,metformin,218,226,Drug,twice a day,235,245,Frequency,Drug-Frequency,1.0
3,metformin,218,226,Drug,5 mg,262,265,Dosage,Drug-Dosage,1.0
4,500 mg,228,233,Dosage,glipizide,252,260,Drug,Dosage-Drug,1.0
5,glipizide,252,260,Drug,5 mg,262,265,Dosage,Drug-Dosage,1.0
6,Now,408,410,DateTime,omeprazole,425,434,Drug,DateTime-Drug,1.0
7,daily,419,423,Frequency,omeprazole,425,434,Drug,Frequency-Drug,1.0
8,omeprazole,425,434,Drug,20 mg,436,440,Dosage,Drug-Dosage,1.0


### JSON Lines

In [14]:
def create_jsonl(records):
    if isinstance(records, str):
        records = [records]
    json_records = [{"text": text} for text in records]
    json_lines = "\n".join(json.dumps(record) for record in json_records)
    return json_lines

In [15]:
input_jsonl_data = create_jsonl(sample_text)
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"ner_predictions": [{"ner_chunk": "really", "begin": 19, "end": 24, "ner_label": "Modifier", "ner_confidence": "0.5723"}, {"ner_chunk": "tired", "begin": 26, "end": 30, "ner_label": "Symptom", "ner_confidence": "0.9959"}, {"ner_chunk": "all the time", "begin": 32, "end": 43, "ner_label": "Duration", "ner_confidence": "0.5925"}, {"ner_chunk": "losing weight", "begin": 53, "end": 65, "ner_label": "Symptom", "ner_confidence": "0.81445"}, {"ner_chunk": "doctor", "begin": 91, "end": 96, "ner_label": "Employment", "ner_confidence": "0.9901"}, {"ner_chunk": "sugar levels", "begin": 109, "end": 120, "ner_label": "Test", "ner_confidence": "0.82229996"}, {"ner_chunk": "high", "begin": 146, "end": 149, "ner_label": "TestResult", "ner_confidence": "0.9231"}, {"ner_chunk": "type 2 diabetes", "begin": 163, "end": 177, "ner_label": "Disease", "ner_confidence": "0.40653333"}, {"ner_chunk": "He", "begin": 180, "end": 181, "ner_label": "Gender", "ner_confidence": "1.0"}, {"ner_chunk": "metformin", "beg

### B. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Batch inference

In [17]:
validation_json_file_name = "input.json"
validation_jsonl_file_name = "input.jsonl"

validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/jsonl/"

def upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/{file_format[1:]}/{file_name}",
        Body=input_data.encode("UTF-8"),
    )

In [18]:
# Create JSON and JSON Lines data
input_jsonl_data = create_jsonl(docs)
input_json_data = json.dumps({"text": docs})

# Upload JSON and JSON Lines data to S3
upload_to_s3(input_json_data, validation_json_file_name)
upload_to_s3(input_jsonl_data, validation_jsonl_file_name)

### JSON

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [20]:
def retrieve_json_output_from_s3(validation_file_name):
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    display(data)

In [21]:
retrieve_json_output_from_s3(validation_json_file_name)

[{'ner_predictions': [{'ner_chunk': 'nose',
    'begin': 3,
    'end': 6,
    'ner_label': 'BodyPart',
    'ner_confidence': '0.9717'},
   {'ner_chunk': 'dry',
    'begin': 11,
    'end': 13,
    'ner_label': 'Symptom',
    'ner_confidence': '0.9457'},
   {'ner_chunk': 'nose',
    'begin': 41,
    'end': 44,
    'ner_label': 'BodyPart',
    'ner_confidence': '0.9659'},
   {'ner_chunk': 'dry',
    'begin': 53,
    'end': 55,
    'ner_label': 'Symptom',
    'ner_confidence': '0.9596'},
   {'ner_chunk': 'hard',
    'begin': 80,
    'end': 83,
    'ner_label': 'Modifier',
    'ner_confidence': '0.7837'},
   {'ner_chunk': 'mucus',
    'begin': 85,
    'end': 89,
    'ner_label': 'Symptom',
    'ner_confidence': '0.9452'},
   {'ner_chunk': 'nose',
    'begin': 121,
    'end': 124,
    'ner_label': 'BodyPart',
    'ner_confidence': '0.9109'}],
  'assertion_predictions': [{'ner_chunk': 'dry',
    'begin': 11,
    'end': 13,
    'ner_label': 'Symptom',
    'assertion': 'Present_Or_Past',
    'a

### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [23]:
def retrieve_jsonlines_output_from_s3(validation_file_name):

    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

In [24]:
retrieve_jsonlines_output_from_s3(validation_jsonl_file_name)

{"ner_predictions": [{"ner_chunk": "nose", "begin": 3, "end": 6, "ner_label": "BodyPart", "ner_confidence": "0.9717"}, {"ner_chunk": "dry", "begin": 11, "end": 13, "ner_label": "Symptom", "ner_confidence": "0.9457"}, {"ner_chunk": "nose", "begin": 41, "end": 44, "ner_label": "BodyPart", "ner_confidence": "0.9659"}, {"ner_chunk": "dry", "begin": 53, "end": 55, "ner_label": "Symptom", "ner_confidence": "0.9596"}, {"ner_chunk": "hard", "begin": 80, "end": 83, "ner_label": "Modifier", "ner_confidence": "0.7837"}, {"ner_chunk": "mucus", "begin": 85, "end": 89, "ner_label": "Symptom", "ner_confidence": "0.9452"}, {"ner_chunk": "nose", "begin": 121, "end": 124, "ner_label": "BodyPart", "ner_confidence": "0.9109"}], "assertion_predictions": [{"ner_chunk": "dry", "begin": 11, "end": 13, "ner_label": "Symptom", "assertion": "Present_Or_Past", "assertion_confidence": "0.8839"}, {"ner_chunk": "dry", "begin": 53, "end": 55, "ner_label": "Symptom", "assertion": "Hypothetical_Or_Absent", "assertion_c

In [None]:
model.delete_model()

### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

