# Human Activity Recognition - AWS

### Problem Statement
Let’s open the notebook “HAR Model training notebook”. The problem statement for this notebook is: Deploying the Human Activity Recognition problem using the Level 1 MLOps architecture, the aim is to enhance the experience of Blackmi's health app by overcoming the problems faced in the level 0 architecture. Utilising the Human Activity Recognition dataset, we will construct a machine-learning model along with the ML pipelines to categorise user activities for the real-time health alerts using AWS sagemaker studio. Here we will also be monitoring the model performance and deploy the model using different deployment techniques.

### Approach 
In this notebook we will be building the level 1 architecture of MLOps, and our major focus would be on creating ML pipeline, model monitoring and model deployment. The major take away for this lesson is to learn:

1. Feature engineering with the amazon sagemaker processing 



In [8]:
# Importing all the necessary libraries 
# Importing pandas and numpy for data preprocessing. 
import pandas as pd
import numpy as np
# Boto3 is used for launching the EC2 instances and manipulating s3 buckets.
import boto3
# Sagemaker is imported for building, training and deploying machine learning models.
import sagemaker


In [9]:
# Initialising new sagemaker session as "sess".
sess = sagemaker.Session()
# Check for necessary permission needed for training and deploying models. 
role = sagemaker.get_execution_role()
# To understand where this session is configured to operate.
region = boto3.Session().region_name
region


INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


'ap-south-1'

In [13]:
# Bucket variable is used for storing the location of the bucket
bucket = 'sagemaker-studio-009676737623-l4vs7j0o0ib'
# Assigning the prefix variable 
prefix = 'mlops-level1-data'
# input_source variable is used for storing the location of the dataset
input_source = 's3://sagemaker-studio-009676737623-l4vs7j0o0ib/mlops-level1-data/train_data.gzip'


In [11]:
train_path = f"s3://{bucket}/{prefix}/train"
validation_path = f"s3://{bucket}/{prefix}/validation"
test_path = f"s3://{bucket}/{prefix}/test"
feature_path = f"s3://{bucket}/{prefix}/feature"


## Training

In [5]:
s3_input_train = sagemaker.inputs.TrainingInput(s3_data=train_path.format(bucket, prefix), 
                                                    content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data=validation_path.format(bucket, prefix),
                                                     content_type='csv')

In [67]:
import sagemaker
import boto3
from sagemaker import image_uris
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

# initialize hyperparameters
hyperparameters = {
        "num_class":6,
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"multi:softmax",
        "num_round":"50"}

# set an output path where the trained model will be saved
#bucket = sagemaker.Session().default_bucket()
#prefix = 'DEMO-xgboost-as-a-built-in-algo'
output_path = 's3://{}/{}/{}/output'.format(bucket, prefix, 'abalone-xgb-built-in-algo')

# this line automatically looks for the XGBoost image URI and builds an XGBoost container.
# specify the repo_version depending on your preference.
xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.7-1")

# construct a SageMaker estimator that calls the xgboost-container
estimator = sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                          hyperparameters=hyperparameters,
                                          role=sagemaker.get_execution_role(),
                                          instance_count=1, 
                                          instance_type='ml.m5.2xlarge', 
                                          volume_size=5, # 5 GB 
                                          output_path=output_path)

# define the data type and paths to the training and validation datasets
content_type = "text/csv"#"libsvm"
train_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'train'), content_type=content_type)
validation_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'validation'), content_type=content_type)

# execute the XGBoost training job
estimator.fit({'train': train_input, 'validation': validation_input})

INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2023-09-25-09-06-54-010


2023-09-25 09:06:54 Starting - Starting the training job...
2023-09-25 09:07:08 Starting - Preparing the instances for training......
2023-09-25 09:08:11 Downloading - Downloading input data
2023-09-25 09:08:11 Training - Downloading the training image...
2023-09-25 09:08:47 Training - Training image download completed. Training in progress...[34m[2023-09-25 09:09:04.087 ip-10-0-185-47.ap-south-1.compute.internal:7 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2023-09-25 09:09:04.110 ip-10-0-185-47.ap-south-1.compute.internal:7 INFO profiler_config_parser.py:111] User has disabled profiler.[0m
[34m[2023-09-25:09:09:04:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2023-09-25:09:09:04:INFO] Failed to parse hyperparameter objective value multi:softmax to Json.[0m
[34mReturning the value itself[0m
[34m[2023-09-25:09:09:04:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2023-09-25:09:09:04:INFO] Running XGBoost Sagemaker in a

## Deployment 

In [68]:
model_url = f's3://{bucket}/{prefix}/abalone-xgb-built-in-algo/output/sagemaker-xgboost-2023-09-25-09-06-54-010/output/model.tar.gz'

In [69]:
from sagemaker import image_uris

# Name of the framework or algorithm
framework='xgboost'
#framework='xgboost' # Example

# Version of the framework or algorithm
version = '1.7-1'
#version = '0.90-1' # Example

# Specify an AWS container image. 
container = image_uris.retrieve(region=region, 
                                framework=framework, 
                                version=version)

INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


In [70]:
from sagemaker.model import Model

model = Model(image_uri=container, 
              model_data=model_url, 
              role=role)

In [71]:
from datetime import datetime

endpoint_name = f"xgboost-inference-{datetime.utcnow():%Y-%m-%d-%H%M}"
print("EndpointName =", endpoint_name)

EndpointName = xgboost-inference-2023-09-25-0911


In [72]:
from sagemaker.model_monitor import DataCaptureConfig

# Set to True to enable data capture
enable_capture = True

# Optional - Sampling percentage. Choose an integer value between 0 and 100
sampling_percentage = 100
# sampling_percentage = 30 # Example 30%

# Optional - The S3 URI of stored captured-data location
s3_capture_upload_path ="s3://sagemaker-studio-009676737623-l4vs7j0o0ib/mlops-level1-data/datacapture/"

# Specify either Input, Output or both.
capture_modes = ['REQUEST','RESPONSE'] # In this example, we specify both

# Configuration object passed in when deploying Models to SM endpoints
data_capture_config = DataCaptureConfig(
    enable_capture = enable_capture, 
    sampling_percentage = sampling_percentage, # Optional
    destination_s3_uri = s3_capture_upload_path, # Optional
    capture_options = ["REQUEST", "RESPONSE"],
)

In [73]:
initial_instance_count=1
# initial_instance_count=1 # Example

instance_type='ml.m4.xlarge'
# instance_type='ml.m4.xlarge' # Example

model.deploy(
    initial_instance_count=initial_instance_count,
    instance_type=instance_type,
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config,
    wait = True
)

INFO:sagemaker:Creating model with name: sagemaker-xgboost-2023-09-25-09-11-40-136
INFO:sagemaker:Creating endpoint-config with name xgboost-inference-2023-09-25-0911
INFO:sagemaker:Creating endpoint with name xgboost-inference-2023-09-25-0911


-----!

### Real time prediction

In [74]:
endpoint_name

'xgboost-inference-2023-09-25-0911'

In [86]:
%%time
import json

file_name = (
    "test.csv"  # customize to your test file, will be 'mnist.single.test' if use data above
)

with open(file_name, "r") as f:
    payload = f.read().strip()

for payload_input in payload.split("\n")[:10]:
    runtime_client = boto3.client("runtime.sagemaker")
    response = runtime_client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="text/csv", Body=payload_input
    )
    result = response["Body"].read().decode("ascii")
#print("Predicted Class Probabilities: {}.".format(result))

CPU times: user 781 ms, sys: 4.55 ms, total: 786 ms
Wall time: 1.2 s


In [87]:
data_capture_prefix = "{}/datacapture".format(prefix)
s3_client = boto3.Session().client("s3")
current_endpoint_capture_prefix = "{}/{}".format(data_capture_prefix, endpoint_name)
result = s3_client.list_objects(Bucket=bucket, Prefix=current_endpoint_capture_prefix)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


Found Capture Files:
mlops-level1-data/datacapture/xgboost-inference-2023-09-25-0911/AllTraffic/2023/09/25/09/18-21-625-e029529b-f453-403c-83ea-942ffc4f695d.jsonl
 mlops-level1-data/datacapture/xgboost-inference-2023-09-25-0911/AllTraffic/2023/09/25/09/36-00-255-6f1f9af6-697c-4bbc-9673-f362b9dea1be.jsonl
 mlops-level1-data/datacapture/xgboost-inference-2023-09-25-0911/AllTraffic/2023/09/25/10/34-34-667-136c72cf-7b8c-46cc-a12a-2f72e882995d.jsonl


In [88]:
def get_obj_body(obj_key):
    return s3_client.get_object(Bucket=bucket, Key=obj_key).get("Body").read().decode("utf-8")


capture_file = get_obj_body(capture_files[-1])
print(capture_file[:5000])

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"-0.84124676,0.96339614,0.89946864,0.89205451,0.97743631,-0.16126549,-0.93472378,-0.14083968,-0.12321341,0.17994061,-1.0,-0.97090521","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"1.0\n","encoding":"CSV"}},"eventMetadata":{"eventId":"b8eb498a-5d88-450f-a74e-3ed3345aa6ee","inferenceTime":"2023-09-25T10:34:34Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"-0.8447876,0.96656113,0.9078289,0.89206031,0.98452014,-0.16134256,-0.94306751,-0.14155127,-0.11489334,0.18028889,-1.0,-0.97058275","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"1.0\n","encoding":"CSV"}},"eventMetadata":{"eventId":"daad485d-d8bc-47a8-9b4e-957ec22b8523","inferenceTime":"2023-09-25T10:34:34Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"obse