## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page [Extract clinical risk factors](https://aws.amazon.com/marketplace/pp/prodview-u4jv2pqhkra7i)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Extract clinical risk factors

- **Model**: [en.med_ner.risk_factors.pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_risk_factors_pipeline_en.html)
- **Model Description**: Pretrained named entity recognition pipeline specifically designed for identifying heart disease risk factors and personal health information.
- **Predicted Entities:** `SMOKER`, `PHI`, `CAD`, `HYPERTENSION`, `HYPERLIPIDEMIA`, `MEDICATION`, `DIABETES`, `OBESE`, `FAMILY_HIST`

In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [22]:
import json
import os
import boto3
import pandas as pd
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from IPython.display import display
from urllib.parse import urlparse

In [23]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [24]:
model_name = "en-med-ner-risk-factors-pipeline"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.2xlarge"

## 2. Create a deployable model from the model package.

In [25]:
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session,
)

### Input Format

To use the model, you need to provide input in one of the following supported formats:

#### JSON Format

Provide input as JSON. We support two variations within this format:

1. **Array of Text Documents**: 
   Use an array containing multiple text documents. Each element represents a separate text document.

   ```json
   {
       "text": [
           "Text document 1",
           "Text document 2",
           ...
       ]
   }

    ```

2. **Single Text Document**:
   Provide a single text document as a string.


   ```json
    {
        "text": "Single text document"
    }
   ```

#### JSON Lines (JSONL) Format

Provide input in JSON Lines format, where each line is a JSON object representing a text document.

```
{"text": "Text document 1"}
{"text": "Text document 2"}
```

## 3. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Deploy the SageMaker model to an endpoint

In [None]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type, 
    endpoint_name=model_name,
)

Once endpoint has been created, you would be able to perform real-time inference.

In [27]:
def invoke_realtime_endpoint(record, content_type="application/json", accept="application/json"):
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=json.dumps(record) if content_type == "application/json" else record,
    )

    response_body = response["Body"].read().decode("utf-8")

    if accept == "application/json":
        return json.loads(response_body)
    elif accept == "application/jsonlines":
        return response_body
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

### Initial Setup

In [28]:
docs = [
    '''HISTORY OF PRESENT ILLNESS: The patient is a 40-year-old white male who presents with a chief complaint of "chest pain".

The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that his chest pain started yesterday evening and has been somewhat intermittent. The severity of the pain has progressively increased. He describes the pain as a sharp and heavy pain which radiates to his neck & left arm. He ranks the pain a 7 on a scale of 1-10. He admits some shortness of breath & diaphoresis. He states that he has had nausea & 3 episodes of vomiting tonight. He denies any fever or chills. He admits prior episodes of similar pain prior to his PTCA in 1995. He states the pain is somewhat worse with walking and seems to be relieved with rest. There is no change in pain with positioning. He states that he took 3 nitroglycerin tablets sublingually over the past 1 hour, which he states has partially relieved his pain. The patient ranks his present pain a 4 on a scale of 1-10. The most recent episode of pain has lasted one-hour.

The patient denies any history of recent surgery, head trauma, recent stroke, abnormal bleeding such as blood in urine or stool or nosebleed.

REVIEW OF SYSTEMS: All other systems reviewed & are negative.

PAST MEDICAL HISTORY: Diabetes mellitus type II, hypertension, coronary artery disease, atrial fibrillation, status post PTCA in 1995 by Dr. ABC.

SOCIAL HISTORY: Denies alcohol or drugs. Smokes 2 packs of cigarettes per day. Works as a banker.

FAMILY HISTORY: Positive for coronary artery disease (father & brother).''',


    '''HISTORY OF PRESENT ILLNESS: This 57-year-old black female complains of having pain and discomfort in the left upper arm, especially when she walks and after heavy meals. This lasts anywhere from a few hours and is not associated with shortness of breath, palpitations, dizziness, or syncope. Patient does not get any chest pain or choking in the neck or pain in the back. Patient denies history of hypertension, diabetes mellitus, enlarged heart, heart murmur, history suggestive of previous myocardial infarction, or acute rheumatic polyarthritis during childhood. Her exercise tolerance is one to two blocks for shortness of breath and easy fatigability.

MEDICATIONS: Patient does not take any specific medications.

PAST HISTORY: The patient underwent hysterectomy in 1986.

FAMILY HISTORY: The patient is married, has four children who are doing fine. Family history is positive for hypertension, congestive heart failure, obesity, cancer, and cerebrovascular accident.

SOCIAL HISTORY: The patient smokes one pack of cigarettes per day and takes drinks on social occasions.'''
]


sample_text = """In short, the patient is a 55-year-old gentleman with long-standing morbid obesity, resistant to nonsurgical methods of weight loss, with a BMI of 69.7. He has comorbidities including hypertension, atrial fibrillation, hyperlipidemia, possible sleep apnea, and osteoarthritis of the lower extremities. He is an ex-smoker, but he is currently smoking. He is planning to quit, and at least he should do this six to eight days before surgery for multiple reasons, including decreasing the DVT and PE rates and minimizing marginal ulcer problems after surgery, which will be discussed later on."""

### JSON

In [29]:
input_json_data = {"text": sample_text}
response_json = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")
pd.DataFrame(response_json["predictions"][0])

Unnamed: 0,ner_chunk,begin,end,ner_label,ner_confidence
0,morbid obesity,68,81,OBESE,0.7796
1,BMI of 69.7,140,150,OBESE,0.6635
2,hypertension,184,195,HYPERTENSION,0.8817


### JSON Lines

In [30]:
def create_jsonl(records):
    if isinstance(records, str):
        records = [records]
    json_records = [{"text": text} for text in records]
    json_lines = "\n".join(json.dumps(record) for record in json_records)
    return json_lines

In [31]:
input_jsonl_data = create_jsonl(sample_text)
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"predictions": [{"ner_chunk": "morbid obesity", "begin": 68, "end": 81, "ner_label": "OBESE", "ner_confidence": "0.7796"}, {"ner_chunk": "BMI of 69.7", "begin": 140, "end": 150, "ner_label": "OBESE", "ner_confidence": "0.6635"}, {"ner_chunk": "hypertension", "begin": 184, "end": 195, "ner_label": "HYPERTENSION", "ner_confidence": "0.8817"}]}


### B. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Batch inference

In [13]:
validation_json_file_name = "input.json"
validation_jsonl_file_name = "input.jsonl"

validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/jsonl/"

def upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/{file_format[1:]}/{file_name}",
        Body=input_data.encode("UTF-8"),
    )

In [14]:
# Create JSON and JSON Lines data
input_jsonl_data = create_jsonl(docs)
input_json_data = json.dumps({"text": docs})

# Upload JSON and JSON Lines data to S3
upload_to_s3(input_json_data, validation_json_file_name)
upload_to_s3(input_jsonl_data, validation_jsonl_file_name)

### JSON

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [16]:
def retrieve_json_output_from_s3(validation_file_name):
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    display(data)

In [17]:
retrieve_json_output_from_s3(validation_json_file_name)

{'predictions': [[{'ner_chunk': 'diabetic',
    'begin': 137,
    'end': 144,
    'ner_label': 'DIABETES',
    'ner_confidence': '0.9992'},
   {'ner_chunk': 'coronary artery disease',
    'begin': 173,
    'end': 195,
    'ner_label': 'CAD',
    'ner_confidence': '0.6896667'},
   {'ner_chunk': 'Diabetes mellitus type II',
    'begin': 1317,
    'end': 1341,
    'ner_label': 'DIABETES',
    'ner_confidence': '0.71244997'},
   {'ner_chunk': 'hypertension',
    'begin': 1344,
    'end': 1355,
    'ner_label': 'HYPERTENSION',
    'ner_confidence': '0.987'},
   {'ner_chunk': 'coronary artery disease',
    'begin': 1358,
    'end': 1380,
    'ner_label': 'CAD',
    'ner_confidence': '0.89136666'},
   {'ner_chunk': '1995',
    'begin': 1424,
    'end': 1427,
    'ner_label': 'PHI',
    'ner_confidence': '0.9998'},
   {'ner_chunk': 'ABC',
    'begin': 1436,
    'end': 1438,
    'ner_label': 'PHI',
    'ner_confidence': '0.9998'},
   {'ner_chunk': 'Smokes 2 packs of cigarettes per day',
    'be

### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [19]:
def retrieve_jsonlines_output_from_s3(validation_file_name):

    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

In [20]:
retrieve_jsonlines_output_from_s3(validation_jsonl_file_name)

{"predictions": [{"ner_chunk": "diabetic", "begin": 137, "end": 144, "ner_label": "DIABETES", "ner_confidence": "0.9992"}, {"ner_chunk": "coronary artery disease", "begin": 173, "end": 195, "ner_label": "CAD", "ner_confidence": "0.6896667"}, {"ner_chunk": "Diabetes mellitus type II", "begin": 1317, "end": 1341, "ner_label": "DIABETES", "ner_confidence": "0.71244997"}, {"ner_chunk": "hypertension", "begin": 1344, "end": 1355, "ner_label": "HYPERTENSION", "ner_confidence": "0.987"}, {"ner_chunk": "coronary artery disease", "begin": 1358, "end": 1380, "ner_label": "CAD", "ner_confidence": "0.89136666"}, {"ner_chunk": "1995", "begin": 1424, "end": 1427, "ner_label": "PHI", "ner_confidence": "0.9998"}, {"ner_chunk": "ABC", "begin": 1436, "end": 1438, "ner_label": "PHI", "ner_confidence": "0.9998"}, {"ner_chunk": "Smokes 2 packs of cigarettes per day", "begin": 1483, "end": 1518, "ner_label": "SMOKER", "ner_confidence": "0.63425714"}, {"ner_chunk": "banker", "begin": 1532, "end": 1537, "ner_

In [None]:
model.delete_model()

### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

