## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page: [Anatomic therapeutic chemical resolver](https://aws.amazon.com/marketplace/pp/prodview-53ydbwzfkmbmo).
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Pipeline for Anatomic Therapeutic Chemical (ATC) Sentence Entity Resolver

- **Model**: `atc_vdb_resolver`
- **Model Description**: This pretrained pipeline extracts `DRUG` entities from clinical text and maps them to their corresponding Anatomic Therapeutic Chemical (ATC) codes.

In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [None]:
import json
import os
import boto3
import pandas as pd
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from IPython.display import display
from urllib.parse import urlparse


In [3]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [5]:
model_name = "atc-vdb-resolver"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.2xlarge"

## 2. Create a deployable model from the model package.

In [6]:
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session,
)

### Input Format

To use the model, you need to provide input in one of the following supported formats:

#### JSON Format

Provide input as JSON. We support two variations within this format:

1. **Array of Text Documents**: 
   Use an array containing multiple text documents. Each element represents a separate text document.

   ```json
   {
       "text": [
           "Text document 1",
           "Text document 2",
           ...
       ]
   }

    ```

2. **Single Text Document**:
   Provide a single text document as a string.


   ```json
    {
        "text": "Single text document"
    }
   ```

#### JSON Lines (JSONL) Format

Provide input in JSON Lines format, where each line is a JSON object representing a text document.

```
{"text": "Text document 1"}
{"text": "Text document 2"}
```

## 3. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Deploy the SageMaker model to an endpoint

In [7]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type, 
    endpoint_name=model_name,
)

------------!

Once endpoint has been created, you would be able to perform real-time inference.

In [8]:
def invoke_realtime_endpoint(record, content_type="application/json", accept="application/json"):
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=json.dumps(record) if content_type == "application/json" else record,
    )

    response_body = response["Body"].read().decode("utf-8")

    if accept == "application/json":
        return json.loads(response_body)
    elif accept == "application/jsonlines":
        return response_body
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

### Initial Setup

In [10]:
docs = [
    """He was seen by the endocrinology service and she was discharged on eltrombopag at night, amlodipine with meals metformin two times a day.""",
    """She was given antidepressant for a month""",
]

sample_text = """She was immediately given hydrogen peroxide 30 mg and amoxicillin twice daily for 10 days to treat the infection on her leg. She has a history of taking magnesium hydroxide."""

### JSON

In [11]:
input_json_data = {"text": sample_text}
response_json = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")
pd.DataFrame(response_json["predictions"][0])

Unnamed: 0,begin,end,ner_chunk,ner_label,ner_confidence,concept_code,resolution,score,concept_class_id,all_codes,all_resolutions,all_score
0,26,48,hydrogen peroxide 30 mg,DRUG,0.661975,A01AB02,hydrogen peroxide,0.814119,"[ATC 5th, ATC 5th, ATC 5th, ATC 5th, ATC 5th]","[A01AB02, D11AX25, D11AX25, A01AB02, A02AA03]","[hydrogen peroxide , hydrogen peroxide / lactate , hydrogen peroxide / menthol , hydrogen peroxide oral solution, magnesium peroxide]","[0.8141188025474548, 0.806573748588562, 0.805327296257019, 0.803802490234375, 0.795920193195343]"
1,54,64,amoxicillin,DRUG,0.9962,J01CA04,amoxicillin,1.0,"[ATC 5th, ATC 5th, ATC 5th, ATC 5th, ATC 5th]","[J01CA04, J01CA04, J01CA04, J01CA04, J01CA04]","[amoxicillin , amoxicillin; systemic, amoxicillin / clonixin , amoxicillin / floxacillin , ambroxol / amoxicillin ]","[1.000000238418579, 0.8760582804679871, 0.8621703386306763, 0.855049192905426, 0.8429436683654785]"
2,153,171,magnesium hydroxide,DRUG,0.90405,A02AA04,magnesium hydroxide,1.0,"[ATC 5th, ATC 5th, ATC 5th, ATC 5th, ATC 5th]","[A02AA04, A02AA04, D10AX30, A02AA02, A02AA04]","[magnesium hydroxide , aluminum hydroxide / magnesium hydroxide , aluminum oxide / magnesium hydroxide , aluminum hydroxide / magnesium oxide , magnesium hydroxide / mineral oil ]","[0.9999998807907104, 0.876952588558197, 0.8733229637145996, 0.8709627389907837, 0.8609710335731506]"


### JSON Lines

In [13]:
def create_jsonl(records):
    if isinstance(records, str):
        records = [records]
    json_records = [{"text": text} for text in records]
    json_lines = "\n".join(json.dumps(record) for record in json_records)
    return json_lines

In [14]:
input_jsonl_data = create_jsonl(sample_text)
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"predictions": [{"begin": 26, "end": 48, "ner_chunk": "hydrogen peroxide 30 mg", "ner_label": "DRUG", "ner_confidence": "0.661975", "concept_code": "A01AB02", "resolution": "hydrogen peroxide ", "score": 0.8141188025474548, "concept_class_id": ["ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th"], "all_codes": ["A01AB02", "D11AX25", "D11AX25", "A01AB02", "A02AA03"], "all_resolutions": ["hydrogen peroxide ", "hydrogen peroxide / lactate ", "hydrogen peroxide / menthol ", "hydrogen peroxide oral solution", "magnesium peroxide"], "all_score": [0.8141188025474548, 0.806573748588562, 0.805327296257019, 0.803802490234375, 0.795920193195343]}, {"begin": 54, "end": 64, "ner_chunk": "amoxicillin", "ner_label": "DRUG", "ner_confidence": "0.9962", "concept_code": "J01CA04", "resolution": "amoxicillin ", "score": 1.000000238418579, "concept_class_id": ["ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th"], "all_codes": ["J01CA04", "J01CA04", "J01CA04", "J01CA04", "J01CA04"], "all_resolutions":

### B. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [16]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Batch inference

In [17]:
validation_json_file_name = "input.json"
validation_jsonl_file_name = "input.jsonl"

validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/jsonl/"

def upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/{file_format[1:]}/{file_name}",
        Body=input_data.encode("UTF-8"),
    )

In [18]:
# Create JSON and JSON Lines data
input_jsonl_data = create_jsonl(docs)
input_json_data = json.dumps({"text": docs})

# Upload JSON and JSON Lines data to S3
upload_to_s3(input_json_data, validation_json_file_name)
upload_to_s3(input_jsonl_data, validation_jsonl_file_name)

### JSON

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [None]:
def retrieve_json_output_from_s3(validation_file_name):
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    display(data)

In [22]:
retrieve_json_output_from_s3(validation_json_file_name)

{'predictions': [[{'begin': 67,
    'end': 77,
    'ner_chunk': 'eltrombopag',
    'ner_label': 'DRUG',
    'ner_confidence': '0.9923',
    'concept_code': 'B02BX05',
    'resolution': 'eltrombopag ',
    'score': 1.0000005960464478,
    'concept_class_id': ['ATC 5th',
     'ATC 5th',
     'ATC 5th',
     'ATC 5th',
     'ATC 5th'],
    'all_codes': ['B02BX05', 'B02BX05', 'B02BX08', 'B02BX07', 'L01AX02'],
    'all_resolutions': ['eltrombopag ',
     'eltrombopag; oral',
     'avatrombopag ',
     'lusutrombopag ',
     'pipobroman '],
    'all_score': [1.0000005960464478,
     0.8686228394508362,
     0.8568205833435059,
     0.8277719616889954,
     0.6771407723426819]},
   {'begin': 89,
    'end': 98,
    'ner_chunk': 'amlodipine',
    'ner_label': 'DRUG',
    'ner_confidence': '0.9991',
    'concept_code': 'C08CA01',
    'resolution': 'amlodipine ',
    'score': 1.0000001192092896,
    'concept_class_id': ['ATC 5th',
     'ATC 5th',
     'ATC 5th',
     'ATC 5th',
     'ATC 5th'],
 

### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [None]:
def retrieve_jsonlines_output_from_s3(validation_file_name):

    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

In [25]:
retrieve_jsonlines_output_from_s3(validation_jsonl_file_name)

{"predictions": [{"begin": 67, "end": 77, "ner_chunk": "eltrombopag", "ner_label": "DRUG", "ner_confidence": "0.9923", "concept_code": "B02BX05", "resolution": "eltrombopag ", "score": 1.0000005960464478, "concept_class_id": ["ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th"], "all_codes": ["B02BX05", "B02BX05", "B02BX08", "B02BX07", "L01AX02"], "all_resolutions": ["eltrombopag ", "eltrombopag; oral", "avatrombopag ", "lusutrombopag ", "pipobroman "], "all_score": [1.0000005960464478, 0.8686228394508362, 0.8568205833435059, 0.8277719616889954, 0.6771407723426819]}, {"begin": 89, "end": 98, "ner_chunk": "amlodipine", "ner_label": "DRUG", "ner_confidence": "0.9991", "concept_code": "C08CA01", "resolution": "amlodipine ", "score": 1.0000001192092896, "concept_class_id": ["ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th", "ATC 5th"], "all_codes": ["C08CA01", "C08CA17", "C08CA01", "C09XA02", "C09CA03"], "all_resolutions": ["amlodipine ", "levamlodipine ", "amlodipine; oral", "aliskiren / amlod

In [None]:
model.delete_model()

### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

