# Generate clinical notes with AI using AWS HealthScribe

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio* on an **`ml.t3.medium`** instance.

## Introduction

This notebook shows how to use AWS HealthScribe Python APIs to invoke the service and how to integrate it with other AWS services.

## Setup

Update boto3 SDK to version **`1.33.0`** or higher. This is the minimum version with support for HealthScribe APIs.

In [None]:
!pip install botocore boto3 --upgrade

Verify that the correct boto3 version is installed. Expected version is **`1.33.0`** or higher.

In [None]:
!python3 -c "import boto3; print(boto3.__version__)"

## 1. Batch Transcription Using Python SDK

#### 1.1. Starting an AWS HealthScribe job
Invoking **`start_medical_scribe_job`** API to start a transcription job:

In [None]:
import time
import boto3
import json

transcribe = boto3.client('transcribe', 'us-east-1')

This variable defines the name of the transcription job that will be created in HealthScribe.

In [None]:
job_name = "LowerBackPain"

The s3_input_uri variable defines the S3 URI where the input audio is located. In the cell below, replace the following constants with the appropriate details for your environment:
- **`[S3_BUCKET_NAME]`**: input S3 bucket URI
- **`[OBJECT_NAME]`**: file name including the extension (e.g. knee-consultation.m4a)
- **`[IAM_ROLE]`**: arn of the IAM role that will be used by HealthScribe.

In [None]:
s3_input_uri = "s3://[S3_BUCKET_NAME]/[OBJECT_NAME]"

output_bucket_name = "[S3_BUCKET_NAME]"

response = transcribe.start_medical_scribe_job(
    MedicalScribeJobName = job_name,
    Media = {
      'MediaFileUri': s3_input_uri
    },
    OutputBucketName = output_bucket_name,
    DataAccessRoleArn = '[IAM_ROLE]',
    Settings = {
      'ShowSpeakerLabels': True,
      'MaxSpeakerLabels': 2,
      'ChannelIdentification': False
    }
)
print(response)

#### 1.2. Checking job status

The code below will invoke HealthScribe's **`get_medical_scribe_job`** API to retrieve the status of the job we started in the previous step. If the status is not Completed or Failed, the code waits 5 seconds to retry until the job reaches a final state.

In [None]:
while True:
    status = transcribe.get_medical_scribe_job(MedicalScribeJobName = job_name)
    if status['MedicalScribeJob']['MedicalScribeJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
   
print("Job status: " + status.get('MedicalScribeJob').get('MedicalScribeJobStatus'))

start_time = status.get('MedicalScribeJob').get('StartTime')
completion_time = status.get('MedicalScribeJob').get('CompletionTime')
diff = completion_time - start_time

print("Job duration: " + str(diff))
print("Transcription file: " + status.get('MedicalScribeJob').get('MedicalScribeOutput').get('TranscriptFileUri'))
print("Summary file: " + status.get('MedicalScribeJob').get('MedicalScribeOutput').get('ClinicalDocumentUri'))

#### 1.3. Analysing the scribe results
The code below will download the **`summary.json`** file generated by HealthScribe, will parse the file and extract the treatment plan.

In [None]:
s3 = boto3.client('s3', 'us-east-1')

bucket = output_bucket_name
transcription_file = job_name + "/transcript.json"
summary_file = job_name + "/summary.json"

obj = s3.get_object(Bucket=bucket, Key=summary_file)
summary_json = json.loads(obj['Body'].read())
plan_list = summary_json.get("ClinicalDocumentation").get("Sections")[5].get("Summary")

print("Plan:")
plan = ""
for item in plan_list:
    plan = plan + "\n" + item.get("SummarizedSegment")
print(plan)

Store the plan as environment variable to be used later:

In [None]:
# save plan to be used later with Bedrock in a different notebook
%store plan

---

## 2. Combining HealthScribe with Comprehend Medical

Call RxNorm ontology in Comprehend Medical with the raw content of the transcript.

#### 3.1. RxNorm linking
Amazon Comprehend Medical lists the top potentially matching RxCUIs for each medication that it detects in descending order by confidence score. Use the RxCUI codes for downstream analysis that is not possible with unstructured text. Related information such as strength, frequency, dose, dose form, and route of administration are listed as attributes in JSON format.

In [None]:
cm_client = boto3.client(service_name='comprehendmedical', region_name='us-east-1')

result = cm_client.infer_rx_norm(Text= plan)
entities = result['Entities'];
print(json.dumps(entities, indent=2))

#### 3.2. ICD-10-CM linking
When medical conditions are detected, InferICD10CM returns the matching ICD-10-CM codes and descriptions. The detected conditions are listed in descending order of confidence. The scores indicate the confidence in the accuracy of the entities matched to the concepts found in the text. Related information such as family history, signs, symptoms, and negation are recognized as traits. Additional information such as anatomical designations and acuity are listed as attributes.

In [None]:
result = cm_client.infer_icd10_cm(Text= plan)
entities = result['Entities'];
print(json.dumps(entities, indent=2))