Installing required libraries for stepfunctions

In [14]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=2.0.0"
!{sys.executable} -m pip install -qU "stepfunctions>=2.0.0"
!{sys.executable} -m pip show sagemaker stepfunctions

Name: sagemaker
Version: 2.109.0
Summary: Open source library for training and deploying models on Amazon SageMaker.
Home-page: https://github.com/aws/sagemaker-python-sdk/
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: attrs, boto3, google-pasta, importlib-metadata, numpy, packaging, pandas, pathos, protobuf, protobuf3-to-dict, smdebug-rulesconfig
Required-by: stepfunctions
---
Name: stepfunctions
Version: 2.3.0
Summary: Open source library for developing data science workflows on AWS Step Functions.
Home-page: https://github.com/aws/aws-step-functions-data-science-sdk-python
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: boto3, pyyaml, sagemaker
Required-by: 


# 1. Setting up notebook with parameters and libraries

In [12]:
!pip install stepfunctions



In [16]:
# importing Python libraies
import stepfunctions
import logging

from stepfunctions.steps import *
from stepfunctions.workflow import Workflow
from stepfunctions import steps
from stepfunctions.inputs import ExecutionInput
from sagemaker.processing import Processor,ProcessingInput, ProcessingOutput
import calendar
import time
import sagemaker
from sagemaker.inputs import TrainingInput
import boto3
from sagemaker.network import NetworkConfig

stepfunctions.set_stream_logger(level=logging.INFO)

ModuleNotFoundError: No module named 'sagemaker.workflow'; 'sagemaker' is not a package

### Defining paramters

This needs to be changed if we are taking it to different environment

In [3]:
v_workflow_execution_role = "arn:aws:iam::525102048888:role/poc-sagemaker-step-functi-MachineLearningWorkflowE-1XFI2UPRXFTXE" # Step function IAM role ARN
v_preprocessing_iam_role = "arn:aws:iam::525102048888:role/service-role/AmazonSageMaker-ExecutionRole-20191105T125227" # IAM role for preprocessing container
v_preprocessing_instance_type = "ml.m5.xlarge" # Instance type for preprocessing container it changes as per workload
v_s3_input_bucket = "wi-cred-datalake-dev-raw" # S3 bucket for input and output data
v_prefix_for_input_data = "nlp/feature/turi/Merged_groundtruth/"  # Prefix where data is stored
v_region = 'us-east-1' # AWS region
sec_groups = ["sg-044e0e7ce4f5721c0"]
subnets = ["subnet-0cf0e3f46326aa259",
           "subnet-0156b7f5500cf0b78",
           "subnet-032420199163cff9b"]
config_bucket = "wi-cred-datalake-dev-s3-mlops-config"

## 2. Defining preprocessing jobs

In [5]:
# Defining environment config for baseline jobs
environment = {
          "analysis_type": "MODEL_QUALITY",
            "dataset_format": "{\"csv\":{\"header\":true,\"output_columns_position\": \"START\"}}",
             "dataset_source": "/opt/ml/processing/input/baseline_dataset_input",
             "output_path": "/opt/ml/processing/output",
              "publish_cloudwatch_metrics": "Disabled",
               "ground_truth_attribute": "groundtruth",
             "inference_attribute": "prediction",
              "problem_type": "BinaryClassification"
        }

In [6]:
# Here we are creating baseline preprocesor
baseline_processor = Processor(image_uri='156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer',
                     role=v_preprocessing_iam_role,
                     instance_count=1,
                     instance_type=v_preprocessing_instance_type,
                    # network_config = NetworkConfig(security_group_ids = sec_groups, subnets = subnets),
                     env=environment)

In [7]:
input_data = "s3://{}/{}".format(v_s3_input_bucket,v_prefix_for_input_data)
inputs = [
    ProcessingInput(
        source=input_data, destination="/opt/ml/processing/input/baseline_dataset_input", input_name="input_data"
    )
]

outputs = [
    ProcessingOutput(
        source="/opt/ml/processing/output",
        destination="s3://{}/{}".format(v_s3_input_bucket,"vehicle/usedcars/feature/lr/ModelDrift-BaselineOutput/"),
        output_name="baseline_data",
    )
]
gmt = time.gmtime()
ts = calendar.timegm(gmt)
baseline_name = "baseline-{}".format(ts)

In [8]:
print("s3://{}/{}".format(v_s3_input_bucket,"vehicle/usedcars/feature/lr/ModelDrift-BaselineOutput/"))


s3://wi-cred-datalake-dev-raw/vehicle/usedcars/feature/lr/ModelDrift-BaselineOutput/


In [9]:
baseline_preprocessing_step = steps.ProcessingStep(
    state_id='Baseline', 
    processor=baseline_processor,
    job_name=baseline_name, 
    inputs=inputs, 
    #kms_key_id='3084dc48-1a82-435b-8a8d-8001f8890c08',
    outputs=outputs, 
    experiment_config=None, 
    wait_for_completion=True
)

## 3. Step Function

In [10]:
# First we chain the start pass state,preprocessing_step,
basic_path=Chain([baseline_preprocessing_step])

In [11]:
import time
print(time.localtime())

time.struct_time(tm_year=2022, tm_mon=6, tm_mday=7, tm_hour=5, tm_min=28, tm_sec=2, tm_wday=1, tm_yday=158, tm_isdst=0)


In [12]:
# Next, we define the workflow
import uuid
basic_workflow = Workflow(
    name="wi-mlops-modeldrift-baseline-job".format(
    uuid.uuid1().hex
) ,
    definition=basic_path,
    role=v_workflow_execution_role
)

#Render the workflow
basic_workflow.render_graph()



## 3.1 Create the workflow on AWS Step Functions

Create the workflow in AWS Step Functions with [create](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.create).

In [13]:
basic_workflow.create()

[32m[INFO] Workflow created successfully on AWS Step Functions.[0m


'arn:aws:states:us-east-1:525102048888:stateMachine:wi-mlops-modeldrift-baseline-job'

In [None]:
basic_workflow.update(definition=basic_workflow.definition,role=basic_workflow.role)

In [14]:
basic_workflow_execution = basic_workflow.execute(
    inputs={
    }
)

[32m[INFO] Workflow execution started successfully on AWS Step Functions.[0m


## 3.2 Review the execution progress

Render workflow progress with the [render_progress](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Execution.render_progress).

This generates a snapshot of the current state of your workflow as it executes. This is a static image. Run the cell again to check progress. 

In [None]:
basic_workflow_execution.render_progress()

## 4 Downloading generated report on notebook

In [None]:
# Downloading contraint file for evaluation
!aws s3 cp s3://$config_bucket/custommonitor/constraints.json .

In [None]:
# Dowloading stats file for evaluation
!aws s3 cp s3://$config_bucket/custommonitor/statistics.json .

We can perform analysis on this file and put it on git hub

**Note :**
In order to refer it for modified location in monitoring schedule lambda function we need to changes the baseline stats location to prefix custom_monitoring/ from monitoring
