## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page [ICD-10-CM Sentence entity resolver](https://aws.amazon.com/marketplace/pp/prodview-dv5wwrwx4b6ve)
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

## Pipeline to Resolve ICD-10-CM Codes

- **Model**: `icd10cm_vdb_resolver`
- **Model Description**: This pretrained pipeline extracts clinical entities from clinical text and maps them to their corresponding ICD-10-CM codes.

In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

In [None]:
import json
import os
import boto3
import pandas as pd
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
from IPython.display import display
from urllib.parse import urlparse

In [None]:
sagemaker_session = sage.Session()
s3_bucket = sagemaker_session.default_bucket()
region = sagemaker_session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

sagemaker = boto3.client("sagemaker")
s3_client = sagemaker_session.boto_session.client("s3")
ecr = boto3.client("ecr")
sm_runtime = boto3.client("sagemaker-runtime")

# Set display options
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [4]:
model_name = "icd10cm-vdb-resolver"

real_time_inference_instance_type = "ml.m4.xlarge"
batch_transform_inference_instance_type = "ml.m4.2xlarge"

## 2. Create a deployable model from the model package.

In [5]:
model = ModelPackage(
    role=role, 
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session,
)

### Input Format

To use the model, you need to provide input in one of the following supported formats:

#### JSON Format

Provide input as JSON. We support two variations within this format:

1. **Array of Text Documents**: 
   Use an array containing multiple text documents. Each element represents a separate text document.

   ```json
   {
       "text": [
           "Text document 1",
           "Text document 2",
           ...
       ]
   }

    ```

2. **Single Text Document**:
   Provide a single text document as a string.


   ```json
    {
        "text": "Single text document"
    }
   ```

#### JSON Lines (JSONL) Format

Provide input in JSON Lines format, where each line is a JSON object representing a text document.

```
{"text": "Text document 1"}
{"text": "Text document 2"}
```

## 3. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

### A. Deploy the SageMaker model to an endpoint

In [None]:
predictor = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type, 
    endpoint_name=model_name,
)

Once endpoint has been created, you would be able to perform real-time inference.

In [7]:
def invoke_realtime_endpoint(record, content_type="application/json", accept="application/json"):
    response = sm_runtime.invoke_endpoint(
        EndpointName=model_name,
        ContentType=content_type,
        Accept=accept,
        Body=json.dumps(record) if content_type == "application/json" else record,
    )

    response_body = response["Body"].read().decode("utf-8")

    if accept == "application/json":
        return json.loads(response_body)
    elif accept == "application/jsonlines":
        return response_body
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

### Initial Setup

In [8]:
docs = [
    "An 86-year-old female with persistent abdominal pain, nausea and projectile vomiting, during evaluation in the emergency room, was found to have a high amylase, as well as lipase count and she is being admitted for management of unspecified gastrointestinal hemorrhage.", 
    "Complaints of unspecified upper abdominal pain and swelling in a 32-year-old woman led to the evaluation of possible disease of intestine. She had a history of mixed irritable bowel syndrome and congenital lactase deficiency. Several diagnostic tests were performed. Blood tests showed abnormal results of blood chemistry and colonoscopy was performed.",
]

sample_text = "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus, associated with an acute hepatitis, and obesity with a body mass index (BMI) of 33.5 kg/m2, presented with a one-week history of polyuria, polydipsia, poor appetite, and vomiting. Two weeks prior to presentation, she was treated with a five-day course of amoxicillin for a respiratory tract infection. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa; significantly, her abdominal examination was benign with no tenderness or guarding. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl, bicarbonate 18 mmol/l, anion gap 20, creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, glycated hemoglobin (HbA1c) 10%, and venous pH 7.27. Serum lipase was normal at 43 U/L. Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia. The patient was initially admitted for starvation ketosis, as she reported poor oral intake for three days prior to admission."

### JSON

In [9]:
input_json_data = {"text": sample_text}
response_json = invoke_realtime_endpoint(input_json_data, content_type="application/json", accept="application/json")
pd.DataFrame(response_json["predictions"][0])

Unnamed: 0,begin,end,ner_chunk,ner_label,ner_confidence,concept_code,resolution,score,billable_hcc,all_codes,concept_name_detailed,all_resolutions,all_score
0,39,67,gestational diabetes mellitus,PROBLEM,0.9255,O24.4,gestational diabetes mellitus,1.0,"[0||0||0, 0||0||0, 0||0||0, 0||0||0, 0||0||0]","[O24.4, O24.41, O24.41, O24.41, O24.41]","[gestational diabetes mellitus [gestational diabetes mellitus], gestational diabetes mellitus (disorder) [gestational diabetes mellitus in pregnancy], gestational diabetes [gestational diabetes mellitus in pregnancy], gdm - gestational diabetes mellitus [gestational diabetes mellitus in pregnancy], gestational diabetes mellitus in pregnancy [gestational diabetes mellitus in pregnancy]]","[gestational diabetes mellitus, gestational diabetes mellitus (disorder), gestational diabetes, gdm - gestational diabetes mellitus, gestational diabetes mellitus in pregnancy]","[1.0000001192092896, 0.9637089967727661, 0.9251672625541687, 0.9209030866622925, 0.9168062806129456]"
1,117,153,subsequent type two diabetes mellitus,PROBLEM,0.77357996,E13.9,secondary diabetes mellitus,0.849519,"[1||1||19, 1||1||19, 0||0||0, 1||1||19, 1||0||0]","[E13.9, E11.9, O24.11, E13.9, Z86.39]","[secondary diabetes mellitus [other specified diabetes mellitus without complications], diabetes mellitus type 2 [type 2 diabetes mellitus without complications], pre-existing type 2 diabetes mellitus [pre-existing type 2 diabetes mellitus, in pregnancy], secondary diabetes mellitus (disorder) [other specified diabetes mellitus without complications], history of diabetes mellitus type 2 [personal history of other endocrine, nutritional and metabolic disease]]","[secondary diabetes mellitus, diabetes mellitus type 2, pre-existing type 2 diabetes mellitus, secondary diabetes mellitus (disorder), history of diabetes mellitus type 2]","[0.8495188355445862, 0.8261897563934326, 0.8239018321037292, 0.819419264793396, 0.8158330917358398]"
2,172,189,an acute hepatitis,PROBLEM,0.9745667,K72.0,acute hepatitis,0.900641,"[0||0||0, 0||0||0, 1||0||0, 1||0||0, 1||0||0]","[K72.0, B15, Z03.89, B17.9, Z03.89]","[acute hepatitis [acute and subacute hepatic failure], acute hepatitis a [acute hepatitis a], acute hepatitis caused by infection suspected [encounter for observation for other suspected diseases and conditions ruled out], acute viral hepatitis [acute viral hepatitis, unspecified], acute hepatitis caused by infection suspected (situation) [encounter for observation for other suspected diseases and conditions ruled out]]","[acute hepatitis, acute hepatitis a, acute hepatitis caused by infection suspected, acute viral hepatitis, acute hepatitis caused by infection suspected (situation)]","[0.9006407260894775, 0.8742561936378479, 0.8741282224655151, 0.8730584979057312, 0.8706690073013306]"
3,196,202,obesity,PROBLEM,0.9973,E66.9,obesity,1.0,"[1||0||0, 0||0||0, 1||0||0, 1||0||0, 1||0||0]","[E66.9, E66, E66.9, E66.9, E66.8]","[obesity [obesity, unspecified], overweight and obesity [overweight and obesity], alimentary obesity [obesity, unspecified], obesity (disorder) [obesity, unspecified], abdominal obesity [other obesity]]","[obesity, overweight and obesity, alimentary obesity, obesity (disorder), abdominal obesity]","[0.9999998807907104, 0.8891978859901428, 0.8856317400932312, 0.8825734853744507, 0.869437575340271]"
4,209,225,a body mass index,PROBLEM,0.845925,Z68,body mass index [bmi],0.87087,"[0||0||0, 1||1||22, 1||0||0, 1||0||0, 1||0||0]","[Z68, Z68.41, E66.9, Z68.1, E66.9]","[body mass index [bmi] [body mass index [bmi]], finding of body mass index [body mass index [bmi] 40.0-44.9, adult], observation of body mass index [obesity, unspecified], finding of body mass index (finding) [body mass index [bmi] 19.9 or less, adult], increased body mass index [obesity, unspecified]]","[body mass index [bmi], finding of body mass index, observation of body mass index, finding of body mass index (finding), increased body mass index]","[0.8708701133728027, 0.8615623712539673, 0.83838951587677, 0.8280797004699707, 0.8276996612548828]"
5,285,292,polyuria,PROBLEM,0.9897,R35,polyuria,1.0,"[0||0||0, 0||0||0, 1||0||0, 1||0||0, 1||0||0]","[R35, R35, R35.89, R35.81, R35.0]","[polyuria [polyuria], polyuria (finding) [polyuria], other polyuria [other polyuria], nocturnal polyuria [nocturnal polyuria], micturition frequency and polyuria [frequency of micturition]]","[polyuria, polyuria (finding), other polyuria, nocturnal polyuria, micturition frequency and polyuria]","[1.0000003576278687, 0.9479261636734009, 0.9336186051368713, 0.8711942434310913, 0.8455271124839783]"
6,295,304,polydipsia,PROBLEM,0.9931,R63.1,polydipsia,1.0,"[1||0||0, 1||1||23, 1||0||0, 1||0||0, 0||0||0]","[R63.1, E23.2, F63.89, F63.9, O99.89]","[polydipsia [polydipsia], primary polydipsia [diabetes insipidus], psychogenic polydipsia [other impulse disorders], psychogenic polydipsia (disorder) [impulse disorder, unspecified], polyhydramnios (disorder) [other specified diseases and conditions complicating pregnancy, childbirth and the puerperium]]","[polydipsia, primary polydipsia, psychogenic polydipsia, psychogenic polydipsia (disorder), polyhydramnios (disorder)]","[1.0000004768371582, 0.8982400894165039, 0.8386476039886475, 0.8318062424659729, 0.7765091061592102]"
7,307,319,poor appetite,PROBLEM,0.998,R63.0,poor appetite,1.0,"[1||0||0, 1||0||0, 1||0||0, 1||0||0, 1||0||0]","[R63.0, R63.0, R63.0, R63.0, R63.0]","[poor appetite [anorexia], lack of appetite [anorexia], loss of appetite [anorexia], reduced appetite [anorexia], decrease in appetite [anorexia]]","[poor appetite, lack of appetite, loss of appetite, reduced appetite, decrease in appetite]","[1.0, 0.9270852208137512, 0.8898344039916992, 0.8832587003707886, 0.878162145614624]"
8,326,333,vomiting,PROBLEM,0.9934,R11.1,vomiting,1.0,"[0||0||0, 0||0||0, 0||0||0, 1||0||0, 1||0||0]","[R11.1, R11, R11, R11.10, K91.0]","[vomiting [vomiting], vomiting (disorder) [nausea and vomiting], vomiting food [nausea and vomiting], vomiting symptoms [vomiting, unspecified], vomiting bile [vomiting following gastrointestinal surgery]]","[vomiting, vomiting (disorder), vomiting food, vomiting symptoms, vomiting bile]","[1.000000238418579, 0.9199676513671875, 0.9054354429244995, 0.8747051954269409, 0.8656845092773438]"
9,427,455,a respiratory tract infection,PROBLEM,0.91305006,J98.8,respiratory tract infection,0.878662,"[1||0||0, 1||0||0, 1||0||0, 1||0||0, 1||0||0]","[J98.8, Z59.3, J98.8, A49.9, J06.9]","[respiratory tract infection [other specified respiratory disorders], institutionally acquired respiratory infection [problems related to living in residential institution], respiratory infection, institutional [other specified respiratory disorders], bacterial respiratory infection [bacterial infection, unspecified], upper respiratory tract infection [acute upper respiratory infection, unspecified]]","[respiratory tract infection, institutionally acquired respiratory infection, respiratory infection, institutional, bacterial respiratory infection, upper respiratory tract infection]","[0.8786619901657104, 0.8161045908927917, 0.8152045011520386, 0.8124687075614929, 0.8095167875289917]"


### JSON Lines

In [10]:
def create_jsonl(records):
    if isinstance(records, str):
        records = [records]
    json_records = [{"text": text} for text in records]
    json_lines = "\n".join(json.dumps(record) for record in json_records)
    return json_lines

In [11]:
input_jsonl_data = create_jsonl(sample_text)
data = invoke_realtime_endpoint(input_jsonl_data, content_type="application/jsonlines" , accept="application/jsonlines" )
print(data)

{"predictions": [{"begin": 39, "end": 67, "ner_chunk": "gestational diabetes mellitus", "ner_label": "PROBLEM", "ner_confidence": "0.9255", "concept_code": "O24.4", "resolution": "gestational diabetes mellitus", "score": 1.0000001192092896, "billable_hcc": ["0||0||0", "0||0||0", "0||0||0", "0||0||0", "0||0||0"], "all_codes": ["O24.4", "O24.41", "O24.41", "O24.41", "O24.41"], "concept_name_detailed": ["gestational diabetes mellitus [gestational diabetes mellitus]", "gestational diabetes mellitus (disorder) [gestational diabetes mellitus in pregnancy]", "gestational diabetes [gestational diabetes mellitus in pregnancy]", "gdm - gestational diabetes mellitus [gestational diabetes mellitus in pregnancy]", "gestational diabetes mellitus in pregnancy [gestational diabetes mellitus in pregnancy]"], "all_resolutions": ["gestational diabetes mellitus", "gestational diabetes mellitus (disorder)", "gestational diabetes", "gdm - gestational diabetes mellitus", "gestational diabetes mellitus in pre

### B. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)

## 4. Batch inference

In [13]:
validation_json_file_name = "input.json"
validation_jsonl_file_name = "input.jsonl"

validation_input_json_path = f"s3://{s3_bucket}/{model_name}/validation-input/json/"
validation_output_json_path = f"s3://{s3_bucket}/{model_name}/validation-output/json/"

validation_input_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-input/jsonl/"
validation_output_jsonl_path = f"s3://{s3_bucket}/{model_name}/validation-output/jsonl/"

def upload_to_s3(input_data, file_name):
    file_format = os.path.splitext(file_name)[1].lower()
    s3_client.put_object(
        Bucket=s3_bucket,
        Key=f"{model_name}/validation-input/{file_format[1:]}/{file_name}",
        Body=input_data.encode("UTF-8"),
    )

In [14]:
# Create JSON and JSON Lines data
input_jsonl_data = create_jsonl(docs)
input_json_data = json.dumps({"text": docs})

# Upload JSON and JSON Lines data to S3
upload_to_s3(input_json_data, validation_json_file_name)
upload_to_s3(input_jsonl_data, validation_jsonl_file_name)

### JSON

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/json",
    output_path=validation_output_json_path
)

transformer.transform(validation_input_json_path, content_type="application/json")
transformer.wait()

In [16]:
def retrieve_json_output_from_s3(validation_file_name):
    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = json.loads(response["Body"].read().decode("utf-8"))
    display(data)

In [17]:
retrieve_json_output_from_s3(validation_json_file_name)

{'predictions': [[{'begin': 27,
    'end': 51,
    'ner_chunk': 'persistent abdominal pain',
    'ner_label': 'PROBLEM',
    'ner_confidence': '0.9549',
    'concept_code': 'K80.5',
    'resolution': 'recurrent abdominal pain',
    'score': 0.8763471841812134,
    'billable_hcc': ['0||0||0', '0||0||0', '1||0||0', '1||0||0', '1||0||0'],
    'all_codes': ['K80.5', 'K80.5', 'R10.9', 'R10.0', 'R10.9'],
    'concept_name_detailed': ['recurrent abdominal pain [calculus of bile duct without cholangitis or cholecystitis]',
     'recurrent abdominal pain (finding) [calculus of bile duct without cholangitis or cholecystitis]',
     'refractory abdominal pain [unspecified abdominal pain]',
     'recurrent acute abdominal pain [acute abdomen]',
     'nonspecific abdominal pain [unspecified abdominal pain]'],
    'all_resolutions': ['recurrent abdominal pain',
     'recurrent abdominal pain (finding)',
     'refractory abdominal pain',
     'recurrent acute abdominal pain',
     'nonspecific abdomi

### JSON Lines

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=batch_transform_inference_instance_type,
    accept="application/jsonlines",
    output_path=validation_output_jsonl_path
)
transformer.transform(validation_input_jsonl_path, content_type="application/jsonlines")
transformer.wait()

In [19]:
def retrieve_jsonlines_output_from_s3(validation_file_name):

    parsed_url = urlparse(transformer.output_path)
    file_key = f"{parsed_url.path[1:]}{validation_file_name}.out"
    response = s3_client.get_object(Bucket=s3_bucket, Key=file_key)

    data = response["Body"].read().decode("utf-8")
    print(data)

In [20]:
retrieve_jsonlines_output_from_s3(validation_jsonl_file_name)

{"predictions": [{"begin": 27, "end": 51, "ner_chunk": "persistent abdominal pain", "ner_label": "PROBLEM", "ner_confidence": "0.9549", "concept_code": "K80.5", "resolution": "recurrent abdominal pain", "score": 0.8763471841812134, "billable_hcc": ["0||0||0", "0||0||0", "1||0||0", "1||0||0", "1||0||0"], "all_codes": ["K80.5", "K80.5", "R10.9", "R10.0", "R10.9"], "concept_name_detailed": ["recurrent abdominal pain [calculus of bile duct without cholangitis or cholecystitis]", "recurrent abdominal pain (finding) [calculus of bile duct without cholangitis or cholecystitis]", "refractory abdominal pain [unspecified abdominal pain]", "recurrent acute abdominal pain [acute abdomen]", "nonspecific abdominal pain [unspecified abdominal pain]"], "all_resolutions": ["recurrent abdominal pain", "recurrent abdominal pain (finding)", "refractory abdominal pain", "recurrent acute abdominal pain", "nonspecific abdominal pain"], "all_score": [0.8763471841812134, 0.86885005235672, 0.8286551237106323, 0

In [None]:
model.delete_model()

### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

