Name	Name	Last commit message	Last commit date
parent directory ..
resources	resources
src	src
statemachine	statemachine
README.md	README.md
edit_example-workflow.json	edit_example-workflow.json
template.yaml	template.yaml

Name

Last commit message

Last commit date

Train ML model

This sample project demonstrates how to use Step Functions to pre-process data with AWS Lambda & store in S3, train a ML model & implement batch transformation through Sagemaker. Deploying this sample project will create an AWS Step Functions state machine, a Lambda function, S3 bucket along with required IAM roles & Log Group.

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

Requirements

Create an AWS account if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
AWS CLI installed and configured
Git Installed
AWS Serverless Application Model (AWS SAM) installed

Deployment Instructions

Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```
git clone https://github.com/aws-samples/step-functions-workflows-collection
```
Change directory to the pattern directory:
```
cd train-ml-model
```
From the command line, use AWS SAM to deploy the AWS resources for the workflow as specified in the template.yaml file:
```
sam deploy --guided
```
During the prompts:
- Enter a stack name
- Enter the desired AWS Region
- Allow SAM CLI to create IAM roles with the required permissions.
- Accept all other defaults
Once you have run sam deploy --guided mode once and saved arguments to a configuration file (samconfig.toml), you can use sam deploy in future to use these defaults.
Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing.
- StateMachineName: Name of Step Functions State Machine orchestrating the process.
- StateMachineArn: ARN of the Step Functions State Machine orchestrating the process.

How it works

Below are the various stages of the Step Function workflow and how it orchestrates the various steps for training an machine learning model in Sagemaker.

a. The first stage of the Step Function workflow calls a Lambda function which generates data & processes it to create a train-test split dataset. These train & test data is places in the form of csv file in the S3 bucket.

b. In the second stage, using SageMaker service integration, Step Function starts a Sagemaker Training job to create a logistic regression ML model for the given train dataset using XGBoost to predict the value. 

c. Once the model is trained, it is saved to s3 bucket using Sagemaker model job in the third stage of the state machine run.

d. In the last stage, the test data is run through a batch transformation using Sagemaker transform job and the output file is places in the S3 output location.

Image

Testing

Manually trigger the workflow via the Console or the AWS CLI. The state machine ARN can be found as the StateMachineArn output and the state machine name can be found as StateMachineName in the output.

To trigger the workflow in the console, navigate to Step Functions and then click the step function name from the list of State Machines. In the Executions panel, click Start Execution. Click Start Execution again in the popup. No additional input is required.

On successful execution, machine learning model content is placed in 'models/' path & transformed data is place in 'output/' path in S3 bucket. You can also see & verify the logs generated using the Console or CLI.

Cleanup

Empty the S3 bucket using AWS Console.
Delete the stack
```
sam delete
```

During the prompts:

    Are you sure you want to delete the stack batch-sample in the region ```Region```? [y/N]: y
    Are you sure you want to delete the folder batch-sample in S3 which contains the artifacts? [y/N]: y

SPDX-License-Identifier: MIT-0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

train-ml-model

train-ml-model

README.md

Train ML model

Requirements

Deployment Instructions

How it works

Image

Testing

Cleanup

Files

train-ml-model

Directory actions

More options

Directory actions

More options

Latest commit

History

train-ml-model

Folders and files

parent directory

README.md

Train ML model

Requirements

Deployment Instructions

How it works

Image

Testing

Cleanup