## Introduction

This notebook describes using the AWS Step Functions Data Science SDK to create and manage workflows. The Step Functions SDK is an open source library that allows data scientists to easily create and execute machine learning workflows using AWS Step Functions and Amazon SageMaker. For more information, see the following.
* [AWS Step Functions](https://aws.amazon.com/step-functions/)
* [AWS Step Functions Developer Guide](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html)
* [AWS Step Functions Data Science SDK](https://aws-step-functions-data-science-sdk.readthedocs.io)

In this notebook we will use the SDK to create steps, link them together to create a workflow, and execute the workflow in AWS Step Functions. 

In [1]:
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=2.0.0"
!{sys.executable} -m pip install -qU "stepfunctions>=2.0.0"
!{sys.executable} -m pip show sagemaker stepfunctions

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 1.3.0 requires botocore<1.20.50,>=1.20.49, but you have botocore 1.26.4 which is incompatible.[0m
Name: sagemaker
Version: 2.91.1
Summary: Open source library for training and deploying models on Amazon SageMaker.
Home-page: https://github.com/aws/sagemaker-python-sdk/
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: attrs, boto3, google-pasta, importlib-metadata, numpy, packaging, pandas, pathos, protobuf, protobuf3-to-dict, smdebug-rulesconfig
Required-by: stepfunctions
---
Name: stepfunctions
Version: 2.3.0
Summary: Open source library for developing data science workflows on AWS Step Functions.
Home-page: https://github.com/aws/aws-step-functions-data-science-sdk-python
Author: Amazon W

## Prequisite 

In [2]:
import sagemaker

In [11]:
v_workflow_execution_role = "arn:aws:iam::525102048888:role/poc-sagemaker-step-functi-MachineLearningWorkflowE-1XFI2UPRXFTXE" # Step function IAM role ARN
v_preprocessing_iam_role = "arn:aws:iam::525102048888:role/service-role/AmazonSageMaker-ExecutionRole-20191105T125227" # IAM role for preprocessing container
v_lambda_execution_role = "arn:aws:iam::525102048888:role/poc-sagemaker-step-functi-LambaForDataGenerationEx-PKONGQTFWLRF"
v_preprocessing_instance_type = "ml.m5.xlarge" # Instance type for preprocessing container it changes as per workload
v_s3_input_bucket = "poc-vci-sagemaker" # S3 bucket for input and output data
v_score_instance_type = "ml.m5.xlarge" # Instance type for training
v_validation_scoring_instance_type = "ml.m5.large" # Instance type for batch scoring
v_model_name = "wi-mlops-lease-pric-ml-train-piln-lr-3031958562" # Name of DS_MLOPS model to be kept
#in above give model name to run it for XGBosst or Linear learner"
v_region = 'us-east-1' # AWS region
v_model_container = sagemaker.image_uris.retrieve('xgboost', v_region, '1.2-1') # XGboost conatiner
outputloc="s3://wi-cred-datalake-dev-raw/vehicle/usedcars/feature/lr/"

## 3 Import the required modules from the SDK and uploading code to s3

In [4]:
import stepfunctions
import logging

from stepfunctions.steps import *
from stepfunctions.workflow import Workflow
from stepfunctions import steps
from stepfunctions.inputs import ExecutionInput
from sagemaker.processing import Processor,ProcessingInput, ProcessingOutput
import uuid
import sagemaker
from sagemaker.inputs import TrainingInput
import boto3
from sagemaker.network import NetworkConfig

stepfunctions.set_stream_logger(level=logging.INFO)

## 4. Create workflow

In the following cell, you will define the step that you will use in our first workflow.  Then you will create, visualize and execute the workflow. 

Steps relate to states in AWS Step Functions. For more information, see [States](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-states.html) in the *AWS Step Functions Developer Guide*. For more information on the AWS Step Functions Data Science SDK APIs, see: https://aws-step-functions-data-science-sdk.readthedocs.io. 

In [9]:
import datetime
year=datetime.datetime.now().strftime("%Y")
month=datetime.datetime.now().strftime("%m")
day=datetime.datetime.now().strftime("%d")
hour=datetime.datetime.now().strftime("%H")
print(datetime.datetime.now().strftime("%Y"))
print(datetime.datetime.now().strftime("%m"))
print(datetime.datetime.now().strftime("%d"))
print(datetime.datetime.now().strftime("%H"))

2022
05
20
11


In [13]:
# SageMaker expects unique names for each job, model and endpoint.
# If these names are not unique the execution will fail. Pass these dynamically for each execution using placeholders.

##VV updated after review

execution_input = ExecutionInput(
    schema={
      #  "PreprocessingJobName": str,
        "scoringstep":str
           }
)

## 4.5 Create a batch transform step

Now once all the above steps are done we will perform scoring on a small data set to see all the components are working fine

In [16]:
from sagemaker.inputs import TransformInput

batch_scoring = TransformStep(
    state_id="batchscoring-step",
    job_name=execution_input["scoringstep"],
    transformer=lr,
    model_name=v_model_name,
    data="s3://wi-cred-datalake-dev-raw/vehicle/usedcars/feature/lr/baseline_modeldrift/",
    data_type='S3Prefix',
    content_type="text/csv",
    split_type='Line',
    wait_for_completion=True,
    input_filter="$[1:]",
    join_source='Input'
       
)

## 4.6 Chain together steps for the basic path

The following cell links together the steps you've created into a sequential group called `basic_path`. We will chain a single step to create our basic path. See [Chain](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/states.html#stepfunctions.steps.states.Chain) in the AWS Step Functions Data Science SDK documentation.

After chaining together the steps for the basic path, in this case only one step, we will visualize the basic path.

In [17]:
# First we chain the start pass state,preprocessing_step,
basic_path=Chain([batch_scoring])
#basic_path=Chain([batch_scoring])


## 4.7 Define the workflow instance

The following cell defines the [workflow](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow) with the path we just defined.

After defining the workflow, we will render the graph to see what our workflow looks like.

In [18]:
# Next, we define the workflow
basic_workflow = Workflow(
    name="ds-mlops-dev-lr-gen-pred-function",
    definition=basic_path,
    role=v_workflow_execution_role
)

#Render the workflow
basic_workflow.render_graph()



## 4.8 Review the Amazon States Language code for your workflow

The following renders the JSON of the [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) definition of the workflow you created. 

In [19]:
print(basic_workflow.definition.to_json(pretty=True)) # From this json we would be leveraging the codes to create the Cloud Formation parameterized template...

{
    "StartAt": "batchscoring-step",
    "States": {
        "batchscoring-step": {
            "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync",
            "Parameters": {
                "TransformJobName.$": "$$.Execution.Input['scoringstep']",
                "ModelName": "wi-mlops-lease-pric-ml-train-piln-lr-3031958562",
                "TransformInput": {
                    "DataSource": {
                        "S3DataSource": {
                            "S3DataType": "S3Prefix",
                            "S3Uri": "s3://wi-cred-datalake-dev-raw/vehicle/usedcars/feature/lr/baseline_modeldrift/"
                        }
                    },
                    "ContentType": "text/csv",
                    "SplitType": "Line"
                },
                "TransformOutput": {
                    "S3OutputPath": "s3://wi-cred-datalake-dev-raw/vehicle/usedcars/feature/lr/",
                    "AssembleWith": "Line",
                    "Accept": "tex

## 4.9 Create the workflow on AWS Step Functions

Create the workflow in AWS Step Functions with [create](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.create).

In [20]:
basic_workflow.create()

[32m[INFO] Workflow created successfully on AWS Step Functions.[0m


'arn:aws:states:us-east-1:525102048888:stateMachine:ds-mlops-dev-lr-gen-pred-function'

In [21]:
basic_workflow.update(definition=basic_workflow.definition,role=basic_workflow.role)

[32m[INFO] Workflow updated successfully on AWS Step Functions. All execute() calls will use the updated definition and role within a few seconds. [0m


'arn:aws:states:us-east-1:525102048888:stateMachine:ds-mlops-dev-lr-gen-pred-function'

## 5 Execute the workflow

Run the workflow with [execute](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Workflow.execute). Since the workflow only has a pass state, it will succeed immediately.

In [22]:
# Generate unique names for Pre-Processing Job, Training Job, and Model Evaluation Job for the Step Functions Workflow
 # Each Training Job requires a unique name
preprocessing_job_name = "xg-boost-score-preprocessing-{}".format(
    uuid.uuid1().hex
)  # Each Preprocessing job requires a unique name,
scoring_job_name = "LR-gen-Prediction-{}".format(
    uuid.uuid1().hex
)  # Each Evaluation Job requires a unique name


In [23]:
basic_workflow_execution = basic_workflow.execute(
    inputs={
      # "PreprocessingJobName": preprocessing_job_name,
        "scoringstep":scoring_job_name  # Each pre processing job (SageMaker processing job) requires a unique name,
            }
)

[32m[INFO] Workflow execution started successfully on AWS Step Functions.[0m


## 5.1 Review the execution progress

Render workflow progress with the [render_progress](https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/workflow.html#stepfunctions.workflow.Execution.render_progress).

This generates a snapshot of the current state of your workflow as it executes. This is a static image. Run the cell again to check progress. 

In [24]:
basic_workflow_execution.render_progress()