## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page [Extract Adverse Drug Events (ADE)](https://aws.amazon.com/marketplace/pp/prodview-ybvpckhvgtsb4)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Pipeline for Adverse Drug Events

Overview:

Clinical entity detection, assertion status assignment, and relation extraction are crucial for analyzing medical texts. These processes can be incredibly useful to healthcare professionals, researchers, and medical NLP practitioners, enabling the extraction of valuable insights from clinical literature, electronic health records, and patient notes.


- **Model**: [en.explain_doc.clinical_ade](https://nlp.johnsnowlabs.com/2024/03/20/explain_clinical_doc_ade_en.html)
- **Model Description**: This pipeline will classify the document, extract ADE and DRUG clinical entities, assign assertion status to ADE entities, and relate Drugs with their ADEs.

In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [None]:
import json
import os
import boto3
import pandas as pd
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from IPython.display import display
from urllib.parse import urlparse

In [None]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [4]:
model_name = "en-explain-doc-clinical-ade"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.2xlarge"

## 2. Create a deployable model from the model package.

In [5]:
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session,
)

### Input Format

To use the model, you need to provide input in one of the following supported formats:

#### JSON Format

Provide input as JSON. We support two variations within this format:

1. **Array of Text Documents**: 
   Use an array containing multiple text documents. Each element represents a separate text document.

   ```json
   {
       "text": [
           "Text document 1",
           "Text document 2",
           ...
       ]
   }

    ```

2. **Single Text Document**:
   Provide a single text document as a string.


   ```json
    {
        "text": "Single text document"
    }
   ```

#### JSON Lines (JSONL) Format

Provide input in JSON Lines format, where each line is a JSON object representing a text document.

```
{"text": "Text document 1"}
{"text": "Text document 2"}
```

## 3. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Deploy the SageMaker model to an endpoint

In [None]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type, 
    endpoint_name=model_name,
)

Once endpoint has been created, you would be able to perform real-time inference.

In [7]:
def invoke_realtime_endpoint(record, content_type="application/json", accept="application/json"):
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=json.dumps(record) if content_type == "application/json" else record,
    )

    response_body = response["Body"].read().decode("utf-8")

    if accept == "application/json":
        return json.loads(response_body)
    elif accept == "application/jsonlines":
        return response_body
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

### Initial Setup

In [8]:
docs = [
    '''Always tired, and possible blood clots. I was on Voltaren for about 4 years and all of the sudden had a minor stroke and had blood clots that traveled to my eye. I had every test in the book done at the hospital, and they couldn't find anything. I was completley healthy! I am thinking it was from the voltaren. I have been off of the drug for 8 months now, and have never felt better. I started eating healthy and working out and that has help alot. I can now sleep all thru the night. I wont take this again. If I have the pain, I will pop a tylonol instead.''',
    '''I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums.I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.'''
]


sample_text = '''Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! . Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps.'''

### JSON

In [9]:
input_json_data = {"text": sample_text}
data = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")

ner_df = pd.DataFrame(data[0]["ner_predictions"])
assertion_df = pd.DataFrame(data[0]["assertion_predictions"])
relation_df = pd.DataFrame(data[0]["relation_predictions"])
classification_df = pd.DataFrame(data[0]["classification_predictions"])

In [10]:
ner_df

Unnamed: 0,ner_chunk,begin,end,ner_label,ner_confidence
0,Lipitor,12,18,DRUG,0.9927
1,severe fatigue,52,65,ADE,0.48995
2,voltaren,97,104,DRUG,
3,cramps,152,157,ADE,0.7472


In [11]:
assertion_df

Unnamed: 0,ner_chunk,begin,end,ner_label,assertion
0,Lipitor,12,18,DRUG,Past
1,severe fatigue,52,65,ADE,Past
2,cramps,152,157,ADE,Past


In [12]:
relation_df

Unnamed: 0,ner_chunk1,ner_chunk1_begin,ner_chunk1_end,ner_label1,ner_chunk2,ner_chunk2_begin,ner_chunk2_end,ner_label2,relations,relation_confidence
0,Lipitor,12,18,DRUG,severe fatigue,52,65,ADE,1,1.0


In [13]:
classification_df

Unnamed: 0,sentence,begin,end,class,class_confidence
0,"Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .",0,76,ADE,0.9999447
1,"Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps.",78,158,ADE,0.9983479


### JSON Lines

In [14]:
def create_jsonl(records):
    if isinstance(records, str):
        records = [records]
    json_records = [{"text": text} for text in records]
    json_lines = "\n".join(json.dumps(record) for record in json_records)
    return json_lines

In [15]:
input_jsonl_data = create_jsonl(sample_text)
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"ner_predictions": [{"ner_chunk": "Lipitor", "begin": 12, "end": 18, "ner_label": "DRUG", "ner_confidence": "0.9927"}, {"ner_chunk": "severe fatigue", "begin": 52, "end": 65, "ner_label": "ADE", "ner_confidence": "0.48995"}, {"ner_chunk": "voltaren", "begin": 97, "end": 104, "ner_label": "DRUG", "ner_confidence": null}, {"ner_chunk": "cramps", "begin": 152, "end": 157, "ner_label": "ADE", "ner_confidence": "0.7472"}], "relation_predictions": [{"ner_chunk1": "Lipitor", "ner_chunk1_begin": "12", "ner_chunk1_end": "18", "ner_label1": "DRUG", "ner_chunk2": "severe fatigue", "ner_chunk2_begin": "52", "ner_chunk2_end": "65", "ner_label2": "ADE", "relations": "1", "relation_confidence": "1.0"}], "assertion_predictions": [{"ner_chunk": "Lipitor", "begin": 12, "end": 18, "ner_label": "DRUG", "assertion": "Past"}, {"ner_chunk": "severe fatigue", "begin": 52, "end": 65, "ner_label": "ADE", "assertion": "Past"}, {"ner_chunk": "cramps", "begin": 152, "end": 157, "ner_label": "ADE", "assertion": "P

### B. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Batch inference

In [17]:
validation_json_file_name = "input.json"
validation_jsonl_file_name = "input.jsonl"

validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/jsonl/"

def upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/{file_format[1:]}/{file_name}",
        Body=input_data.encode("UTF-8"),
    )

In [18]:
# Create JSON and JSON Lines data
input_jsonl_data = create_jsonl(docs)
input_json_data = json.dumps({"text": docs})

# Upload JSON and JSON Lines data to S3
upload_to_s3(input_json_data, validation_json_file_name)
upload_to_s3(input_jsonl_data, validation_jsonl_file_name)

### JSON

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [20]:
def retrieve_json_output_from_s3(validation_file_name):
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    display(data)

In [21]:
retrieve_json_output_from_s3(validation_json_file_name)

[{'ner_predictions': [{'ner_chunk': 'Voltaren',
    'begin': 49,
    'end': 56,
    'ner_label': 'DRUG',
    'ner_confidence': '0.9929'},
   {'ner_chunk': 'blood clots that traveled to my eye',
    'begin': 125,
    'end': 159,
    'ner_label': 'ADE',
    'ner_confidence': '0.7211286'},
   {'ner_chunk': 'voltaren',
    'begin': 302,
    'end': 309,
    'ner_label': 'DRUG',
    'ner_confidence': '0.9874'},
   {'ner_chunk': 'tylonol',
    'begin': 544,
    'end': 550,
    'ner_label': 'DRUG',
    'ner_confidence': '0.9445'}],
  'relation_predictions': [],
  'assertion_predictions': [{'ner_chunk': 'Voltaren',
    'begin': 49,
    'end': 56,
    'ner_label': 'DRUG',
    'assertion': 'Past'},
   {'ner_chunk': 'blood clots that traveled to my eye',
    'begin': 125,
    'end': 159,
    'ner_label': 'ADE',
    'assertion': 'Past'},
   {'ner_chunk': 'voltaren',
    'begin': 302,
    'end': 309,
    'ner_label': 'DRUG',
    'assertion': 'Past'},
   {'ner_chunk': 'tylonol',
    'begin': 544,
   

### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [23]:
def retrieve_jsonlines_output_from_s3(validation_file_name):

    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

In [24]:
retrieve_jsonlines_output_from_s3(validation_jsonl_file_name)

{"ner_predictions": [{"ner_chunk": "Voltaren", "begin": 49, "end": 56, "ner_label": "DRUG", "ner_confidence": "0.9929"}, {"ner_chunk": "blood clots that traveled to my eye", "begin": 125, "end": 159, "ner_label": "ADE", "ner_confidence": "0.7211286"}, {"ner_chunk": "voltaren", "begin": 302, "end": 309, "ner_label": "DRUG", "ner_confidence": "0.9874"}, {"ner_chunk": "tylonol", "begin": 544, "end": 550, "ner_label": "DRUG", "ner_confidence": "0.9445"}], "relation_predictions": [], "assertion_predictions": [{"ner_chunk": "Voltaren", "begin": 49, "end": 56, "ner_label": "DRUG", "assertion": "Past"}, {"ner_chunk": "blood clots that traveled to my eye", "begin": 125, "end": 159, "ner_label": "ADE", "assertion": "Past"}, {"ner_chunk": "voltaren", "begin": 302, "end": 309, "ner_label": "DRUG", "assertion": "Past"}, {"ner_chunk": "tylonol", "begin": 544, "end": 550, "ner_label": "DRUG", "assertion": "Planned"}], "classification_predictions": [{"sentence": "Always tired, and possible blood clots

In [None]:
model.delete_model()

### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

