## Arrhythmia Identification from ECG waveform data 

This solution analyses ECG waveform data and classifies each peak as normal or one of the types of arrhythmia.

This sample notebook shows you how to train a custom ML model for detecting Arrhythmia from ECG data and infering the results.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to For Seller to update: Arrhythmia Identification from ECG. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure dataset](#B.-Configure-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Train a machine learning model](#3:-Train-a-machine-learning-model)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Train a model](#3.2-Train-a-model)
1. [Deploy model and verify results](#4:-Deploy-model-and-verify-results)
    1. [Deploy trained model](#A.-Deploy-trained-model)
    1. [Create input payload](#B.-Create-input-payload)
    1. [Perform real-time inference](#C.-Perform-real-time-inference)
    1. [Visualize output](#D.-Visualize-output)
    1. [Calculate relevant metrics](#E.-Calculate-relevant-metrics)
    1. [Delete the endpoint](#F.-Delete-the-endpoint)
1. [Perform Batch inference](#6.-Perform-Batch-inference)
1. [Clean-up](#7.-Clean-up)
	1. [Delete the model](#A.-Delete-the-model)
	1. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

# 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page Arrhythmia Identification from ECG.
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [40]:
algo_arn='arrhythmia-identification-from-ecg-v2'

### 2. Prepare dataset

In [11]:
import base64
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np

#### A. Dataset format expected by the algorithm

Usage Instructions:
- For model training the data can be uploaded in a folder.
- For inference all the patients data can be uploaded as a zip file.
- Correct naming convention should be followed for files uploaded.
- All the patients' data is used for training the model. 
- Inference engine provides predictions for each patient.

The algorithm requires data in the format as decribed for best results 
- There should be a .hea, .atr and .dat file for each patient (ex. <patient_ID>.hea, <patient_ID>.dat, <patient_ID>.atr).
- The data for inference must be in a zip file named as Input.zip
- They should be formatted in the same way as the provided sample files

#### B. Configure dataset

In [21]:
training_dataset="input/data/train/"
test_dataset = "input/data/test/Input.zip"

#### C. Upload datasets to Amazon S3

In [12]:
sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()
bucket

'sagemaker-us-east-2-786796469737'

In [23]:
# training input location
common_prefix = "mphasis-arrhythmia-identification"
training_input_prefix = common_prefix + "/input/train"
TRAINING_WORKDIR = "input" #Input directory in Jupyter Server
test_input_prefix = common_prefix + "/input/test"

#uploads data from jupyter server to S3
training_input=sagemaker_session.upload_data(training_dataset, bucket=bucket, key_prefix=training_input_prefix)
print("Training input uploaded to " + training_input)
test_input=sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix=test_input_prefix)
print("Test input uploaded to " + test_input)

Training input uploaded to s3://sagemaker-us-east-2-786796469737/mphasis-arrhythmia-identification/input/train
Test input uploaded to s3://sagemaker-us-east-2-786796469737/mphasis-arrhythmia-identification/input/test/Input.zip


## 3: Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model. 

### 3.1 Set up environment

In [24]:
role = get_execution_role()

In [26]:
output_location = 's3://{}/mphasis-arrhythmia-identification/{}'.format(bucket, 'output')
output_location

's3://sagemaker-us-east-2-786796469737/mphasis-arrhythmia-identification/output'

### 3.2 Train a model

In [37]:
training_instance_type="ml.m5.4xlarge"

In [43]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="arrhythmia-identification-from-ecg",
    role=role,
    train_instance_count=1,
    train_instance_type=training_instance_type,
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    instance_count=1,
    instance_type=training_instance_type
)

#Run the training job.
estimator.fit({"train": training_input})

INFO:sagemaker:Creating training-job with name: arrhythmia-identification-from-ecg-2023-01-06-11-06-48-733


2023-01-06 11:06:48 Starting - Starting the training job...
2023-01-06 11:07:03 Starting - Preparing the instances for training......
2023-01-06 11:08:02 Downloading - Downloading input data...
2023-01-06 11:08:32 Training - Training image download completed. Training in progress..[34m#015MLII [1/2274]#015MLII [2/2274]#015MLII [3/2274]#015MLII [4/2274]#015MLII [5/2274]#015MLII [6/2274]#015MLII [7/2274]#015MLII [8/2274]#015MLII [9/2274]#015MLII [10/2274]#015MLII [11/2274]#015MLII [12/2274]#015MLII [13/2274]#015MLII [14/2274]#015MLII [15/2274]#015MLII [16/2274]#015MLII [17/2274]#015MLII [18/2274]#015MLII [19/2274]#015MLII [20/2274]#015MLII [21/2274]#015MLII [22/2274]#015MLII [23/2274]#015MLII [24/2274]#015MLII [25/2274]#015MLII [26/2274]#015MLII [27/2274]#015MLII [28/2274]#015MLII [29/2274]#015MLII [30/2274]#015MLII [31/2274]#015MLII [32/2274]#015MLII [33/2274]#015MLII [34/2274]#015MLII [35/2274]#015MLII [36/2274]#015MLII [37/2274]#015MLII [38/2274]#015MLII [39/2274]#015MLII [40/2274]#0


2023-01-06 11:09:04 Uploading - Uploading generated training model
2023-01-06 11:09:04 Completed - Training job completed
Training seconds: 63
Billable seconds: 63


See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

### 4: Deploy model and verify results

Now you can deploy the model for performing real-time inference.

In [77]:
model_name='ECG_model.pth'

content_type='application/zip' #input

real_time_inference_instance_type='ml.m5.large'
batch_transform_inference_instance_type='ml.m5.large'

#### A. Deploy trained model

In [81]:
predictor = estimator.deploy(1, real_time_inference_instance_type)

INFO:sagemaker:Creating model package with name: arrhythmia-identification-from-ecg-2023-01-06-12-19-08-081


..........

INFO:sagemaker:Creating model with name: arrhythmia-identification-from-ecg-2023-01-06-12-19-08-081





INFO:sagemaker:Creating endpoint-config with name arrhythmia-identification-from-ecg-2023-01-06-12-19-08-081
INFO:sagemaker:Creating endpoint with name arrhythmia-identification-from-ecg-2023-01-06-12-19-08-081


----!

Once endpoint is created, you can perform real-time inference.

#### B. Create input payload

In [82]:
file_name = test_dataset
output_file_name = "inference_out.json"

#### C.Perform real-time inference

In [83]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $predictor.endpoint \
    --body fileb://$file_name \
    --content-type $content_type \
    --region $sagemaker_session.boto_region_name \
    $output_file_name

See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{
    "InvokedProductionVariant": "AllTraffic", 
    "ContentType": "application/json"
}


### D. Visualize output

In [91]:
file = open(output_file_name,"r+") 
print(file.read())

{"222": {"1576": "Right bundle branch", "664": "Normal", "625": "Normal", "534": "Normal", "823": "Normal", "351": "Right bundle branch", "1200": "Right bundle branch", "95": "Normal", "1358": "Normal", "432": "Normal", "241": "Right bundle branch", "517": "Normal", "229": "Normal", "372": "Right bundle branch", "918": "Normal", "1196": "Normal", "1287": "Right bundle branch", "147": "Right bundle branch", "1257": "Normal", "1732": "Normal", "127": "Right bundle branch", "311": "Normal", "676": "Right bundle branch", "962": "Normal", "1073": "Normal", "1388": "Normal", "1751": "Right bundle branch", "287": "Normal", "984": "Normal", "806": "Right bundle branch", "1081": "Normal", "172": "Right bundle branch", "281": "Normal", "567": "Normal", "514": "Normal", "1314": "Normal", "1723": "Normal", "946": "Normal", "1605": "Normal", "237": "Normal", "1304": "Normal", "294": "Normal", "1030": "Right bundle branch", "386": "Right bundle branch", "1316": "Normal", "815": "Normal", "1003": "No

#### F. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [85]:
predictor.delete_endpoint(delete_endpoint_config=True)

INFO:sagemaker:Deleting endpoint configuration with name: arrhythmia-identification-from-ecg-2023-01-06-12-19-08-081
INFO:sagemaker:Deleting endpoint with name: arrhythmia-identification-from-ecg-2023-01-06-12-19-08-081


Since this is an experiment, you do not need to run a hyperparameter tuning job. However, if you would like to see how to tune a model trained using a third-party algorithm with Amazon SageMaker's hyperparameter tuning functionality, you can run the optional tuning step.

### 5. Perform Batch inference

In this section, you will perform batch inference using multiple input payloads together.

In [88]:
#upload the batch-transform job input files to S3
transform_input_folder = test_dataset
batch_input_prefix = common_prefix + "/batch"
transform_input = sagemaker_session.upload_data(transform_input_folder, key_prefix=batch_input_prefix) 
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/mphasis-arrhythmia-identification/batch/Input.zip


In [87]:
#Run the batch-transform job
transformer = estimator.transformer(1, batch_transform_inference_instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

INFO:sagemaker:Creating model package with name: arrhythmia-identification-from-ecg-v2-2023-01-06-12-26-59-293


..........

INFO:sagemaker:Creating model with name: arrhythmia-identification-from-ecg-v2-2-2023-01-06-12-27-44-676





INFO:sagemaker:Creating transform job with name: arrhythmia-identification-from-ecg-2023-01-06-12-27-47-602


.......................[34mStarting the inference server with 2 workers.[0m
[34m[2023-01-06 12:31:33 +0000] [10] [INFO] Starting gunicorn 20.1.0[0m
[34m[2023-01-06 12:31:33 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[34m[2023-01-06 12:31:33 +0000] [10] [INFO] Using worker: gevent[0m
[34m[2023-01-06 12:31:33 +0000] [13] [INFO] Booting worker with pid: 13[0m
[34m[2023-01-06 12:31:33 +0000] [14] [INFO] Booting worker with pid: 14[0m

[34m169.254.255.130 - - [06/Jan/2023:12:31:41 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [06/Jan/2023:12:31:41 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"[0m
[34mFile Name                                             Modified             Size[0m
[34m222.atr                                        2022-09-13 17:43:50         6230[0m
[35m169.254.255.130 - - [06/Jan/2023:12:31:41 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[35m16

[34mStarting the inference server with 2 workers.[0m
[35mStarting the inference server with 2 workers.[0m
[34m[2023-01-06 12:31:33 +0000] [10] [INFO] Starting gunicorn 20.1.0[0m
[35m[2023-01-06 12:31:33 +0000] [10] [INFO] Starting gunicorn 20.1.0[0m
[34m[2023-01-06 12:31:33 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[34m[2023-01-06 12:31:33 +0000] [10] [INFO] Using worker: gevent[0m
[34m[2023-01-06 12:31:33 +0000] [13] [INFO] Booting worker with pid: 13[0m
[34m[2023-01-06 12:31:33 +0000] [14] [INFO] Booting worker with pid: 14[0m
[35m[2023-01-06 12:31:33 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[35m[2023-01-06 12:31:33 +0000] [10] [INFO] Using worker: gevent[0m
[35m[2023-01-06 12:31:33 +0000] [13] [INFO] Booting worker with pid: 13[0m
[35m[2023-01-06 12:31:33 +0000] [14] [INFO] Booting worker with pid: 14[0m
[34m169.254.255.130 - - [06/Jan/2023:12:31:41 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m


In [89]:
#output is available on following path
transformer.output_path

's3://sagemaker-us-east-2-786796469737/arrhythmia-identification-from-ecg-2023-01-06-12-27-47-602'

### 7. Clean-up

#### A. Delete the model

In [90]:
estimator.delete_endpoint()

See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

