# MLOps with Amazon SageMaker: Getting Started

> *This notebook works well with the `Python 3 (Data Science)` kernel on SageMaker Studio*

Welcome to this short workshop on MLOps with Amazon SageMaker!

First we'll install a couple of useful packages for later, in case they're not already available in the kernel environment:

In [1]:
!pip install altair

Collecting altair
  Downloading altair-4.1.0-py3-none-any.whl (727 kB)
[K     |████████████████████████████████| 727 kB 11.9 MB/s eta 0:00:01
Installing collected packages: altair
Successfully installed altair-4.1.0


## Locating our project environment

In [6]:
%load_ext autoreload
%autoreload 2
import util

project_id = "creditmodel"

project_config = util.project.init(project_id)
project_config

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


<util.project.ProjectSession(
  project_id=creditmodel,
  role=arn:aws:iam::024103970757:role/mlopsintro-SageMakerExecutionRole-1CERTWL51GJ97,
  raw_bucket=creditmodel-mlrawdata-024103970757-ap-northeast-1,
  sandbox_bucket=creditmodel-mlsandbox-024103970757-ap-northeast-1
) at 0x7f02f823b310>

In [7]:
print(f"Raw data bucket: s3://{project_config.raw_bucket}/")
print(f"Sandbox bucket: s3://{project_config.sandbox_bucket}/")

Raw data bucket: s3://creditmodel-mlrawdata-024103970757-ap-northeast-1/
Sandbox bucket: s3://creditmodel-mlsandbox-024103970757-ap-northeast-1/


## Preparing data with SageMaker Data Wrangler

- Open `credit-data.flow`
- Build the flow
- Export the final node to the feature store?

## Training an initial model with SageMaker XGBoost Algorithm

- Query the feature store to realise separate training/val/test sets?
- XGB

In [13]:
# Python Built-Ins:

# External Dependencies:
import sagemaker

smsess = sagemaker.Session()
region = smsess.boto_region_name
print(region)

ap-northeast-1


In [14]:
training_image = sagemaker.image_uris.retrieve("xgboost", region=region, version="1.0-1")
print(training_image)

354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3


In [23]:
# TODO: Replace '???' with the path where you saved your training file in S3
train_uri = f"s3://{project_config.sandbox_bucket}/???"
print(f"Training data: {train_uri}")
s3_input_train = sagemaker.inputs.TrainingInput(train_uri, content_type="csv")

# TODO: Replace '???' with the path where you saved your training file in S3
val_uri = f"s3://{project_config.sandbox_bucket}/???"
print(f"Validation data: {val_uri}")
s3_input_validation = sagemaker.inputs.TrainingInput(val_uri, content_type="csv")

Training data s3://creditmodel-mlsandbox-024103970757-ap-northeast-1/???
Validation data: s3://creditmodel-mlsandbox-024103970757-ap-northeast-1/???


In [None]:
# Instantiate an XGBoost estimator object
estimator = sagemaker.estimator.Estimator(
    image_uri=training_image,  # XGBoost algorithm container
    instance_type="ml.m5.xlarge",  # type of training instance
    instance_count=1,  # number of instances to be used
    role=sgmk_role,  # IAM role to be used
    max_run=20*60,  # Maximum allowed active runtime
    use_spot_instances=True,  # Use spot instances to reduce cost
    max_wait=30*60,  # Maximum clock time (including spot delays)
)

# define its hyperparameters
estimator.set_hyperparameters(
    num_round=150,  # int: [1,300]
    max_depth=5,  # int: [1,10]
    alpha=2.5,  # float: [0,5]
    eta=0.5,  # float: [0,1]
    objective="binary:logistic",
)

# start a training (fitting) job
estimator.fit({ "train": s3_input_train, "validation": s3_input_validation })

## Testing the model with SageMaker Batch Transform

In [None]:
transformer = estimator.transformer(
    instance_count=1,
    instance_type="ml.m5.xlarge",
    output_path=f"s3://{project_config.sandbox_bucket}/batch-results/",
)

# calls that object's transform method to create a transform job
transformer.transform(
    data=s3_batch_input,
    data_type="S3Prefix",
    content_type="text/csv",
    split_type="Line",
)

# wait=True by default right? Or transformer.wait()

In [None]:
util.plotting.generate_classification_report(
    y_real=,
    y_predict_proba=, 
    decision_threshold=0.5,
    class_names_list=[...],
    title="Initial model",
)

## Next steps