# Script mode on Amazon SageMaker


Sript mode is a way to work on Machine learning on Amazon Sagemaker only providing the script for processing, training or inference. In this notebook we will focuse on the lowest level of the Script mode usage that is to say using the base class provided by Amzon SageMaker.

This notebook will follow each parts of a usual ML workflow with some explaination of the different SageMaker command used.

First we want to import the different packages and load the data to S3 if it is not already done.

In [89]:
import boto3
import sagemaker

import pandas as pd
import numpy as np
import os

In [90]:
#Manage interactions with the Amazon SageMaker APIs and any other AWS services needed
session = sagemaker.Session()
#see the region in which we work
region = session.boto_region_name
print("AWS Region : {}".format(region))
#get the role of the running session
role = sagemaker.get_execution_role()
#get the bucket name of the session
bucket = session.default_bucket()

AWS Region : us-east-1


We can now push the data to S3 :

In [91]:
#Upload the dataset to S3
prefix = "data_script_mode"
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'data/dataset.csv')).upload_file('predictive_maintenance.csv')

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


## Processing

This graph sum-up how SageMaker handles the processing task :

<img src="images/smprocess.PNG" width="600" height="400">

A processing job requires the specification of a path to an input S3 bucket that holds the data to be processed. The job utilizes a provided script to perform the processing task. The resulting output data is then stored in a separate S3 path.

S3 effectively manages the job environment by utilizing Docker containers. These containers can either be pre-built containers provided by SageMaker, which are accessible on the Elastic Container Registry (ECR), or custom containers created from custom images that must be pushed to ECR.





SageMaker provide different class to instantiate some processing object to run processing job : 

<img src="images/processing.PNG" width="700" height="500">

We will use the Processor class. To do so we first need to have a docker image in which we get the sript we want to run for processing. Let's build such an image and push it to ECR :

In [98]:
!mkdir docker
!cp processing.py docker/

mkdir: cannot create directory ‘docker’: File exists


In [99]:
%%writefile docker/dockerfile

FROM python:3.7-slim-buster

RUN pip3 install pandas scikit-learn imblearn
ENV PYTHONUNBUFFERED=TRUE
COPY processing.py .

Overwriting docker/dockerfile


In [100]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
ecr_repository = 'sagemaker-processing-container'
tag = ':latest'
processing_repository_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

In [101]:
!docker build -t $ecr_repository docker

!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com

!aws ecr create-repository --repository-name $ecr_repository

!docker tag {ecr_repository + tag} $processing_repository_uri

!docker push $processing_repository_uri

Sending build context to Docker daemon  4.608kB
Step 1/4 : FROM python:3.7-slim-buster
 ---> 099f4583c701
Step 2/4 : RUN pip3 install pandas scikit-learn imblearn
 ---> Using cache
 ---> 2ccc7b4c985f
Step 3/4 : ENV PYTHONUNBUFFERED=TRUE
 ---> Using cache
 ---> 7cb0c1eac9d8
Step 4/4 : COPY processing.py .
 ---> bc38551ee091
Successfully built bc38551ee091
Successfully tagged sagemaker-processing-container:latest
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

An error occurred (RepositoryAlreadyExistsException) when calling the CreateRepository operation: The repository with name 'sagemaker-processing-container' already exists in the registry with id '222978838857'
The push refers to repository [222978838857.dkr.ecr.us-east-1.amazonaws.com/sagemaker-processing-container]

[1B058ba0b2: Preparing 
[1Be4f6bf59: Preparing 
[1B8d012914: Preparing 
[1Bd30bdfa9: Preparing 
[1B9f968310: Preparing 
[1B55769c5e: Preparing 
[7B058ba0b2: Pushed

One we have our image pushed on ECR, we need to implement a Processor object which will be used to launch the processing job, for more information about the processing class see https://sagemaker.readthedocs.io/en/stable/api/training/processing.html

The ProcessingInput class represents an input source for a processing job in Amazon SageMaker. It encapsulates information about the input data location, such as the S3 bucket path, and any optional configurations or preprocessing steps required before the processing job begins.
The ProcessingOutput class represents an output destination for a processing job. It contains information about where the processed data should be stored, including the S3 bucket path and any optional configurations or post-processing steps.
We can add some argument which are passed with argparse to our processing script. See the processing.py file to have more information about the architecture of the code.

In [102]:
from sagemaker.processing import Processor
from sagemaker.processing import ProcessingInput, ProcessingOutput

#first we instanciate the processor with the image uri of our ECR image, and as described above, we need to provide the entrypoint of the docker container
processor = Processor(
    role = role,
    image_uri = "222978838857.dkr.ecr.us-east-1.amazonaws.com/sagemaker-processing-container",
    instance_count = 1,
    instance_type = "local",
    entrypoint = ["python3", "processing.py"]
    )
#The path of our S3 bucket
bucket_path = 's3://{}'.format(bucket)

#we then launch the processing job
processor.run(
    inputs=[ProcessingInput(source=f"{bucket_path}/{prefix}/data/dataset.csv", destination="/opt/ml/processing/input")],
    outputs=[
        ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
    ],
    arguments=["--train-test-split-ratio", "0.2"],
)

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:sagemaker:Creating processing-job with name sagemaker-processing-container-2023-06-23-10-03-13-195
INFO:sagemaker.local.local_session:Starting processing job
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:sagemaker.local.image:No AWS credentials found in session but credentials from EC2 Metadata Service are available.
INFO:sagemaker.local.image:docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-local
services:
  algo-1-y06hu:
    container_name: xp57q3z8wp-algo-1-y06hu
    entrypoint:
    - python3
    - processing.py
    - --train-test-split-ratio
    - '0.2'
    environment: []
    image: 222978838857.dkr.ecr.us-east-1.amazonaws.com/sagemaker-processing-container
    networks:
      sagemaker-local:
        aliases:
        - algo-1-y06hu
    stdin_open: true
    tty: true
    volumes:
    - /tmp/tmp3z4d2tun/algo-

Creating xp57q3z8wp-algo-1-y06hu ... 
Creating xp57q3z8wp-algo-1-y06hu ... done
Attaching to xp57q3z8wp-algo-1-y06hu
[36mxp57q3z8wp-algo-1-y06hu |[0m Received arguments Namespace(train_test_split_ratio=0.2)
[36mxp57q3z8wp-algo-1-y06hu |[0m Reading input data from /opt/ml/processing/input/dataset.csv
[36mxp57q3z8wp-algo-1-y06hu |[0m Splitting data into train and test sets with ratio 0.2
[36mxp57q3z8wp-algo-1-y06hu |[0m Resampling the dataset...
[36mxp57q3z8wp-algo-1-y06hu |[0m Scaling the dataset...
[36mxp57q3z8wp-algo-1-y06hu exited with code 0
[0mAborting on container exit...
===== Job Complete =====


One the job is completed, we can retrieve some information about it, espacially get the S3 path of the output data so that we can use it for the training :

In [103]:
preprocessing_job_description = processor.jobs[-1].describe()

output_config = preprocessing_job_description["ProcessingOutputConfig"]

for output in output_config["Outputs"]:
    if output["OutputName"] == "train_data":
        preprocessed_training_data = output["S3Output"]["S3Uri"]
    if output["OutputName"] == "test_data":
        preprocessed_test_data = output["S3Output"]["S3Uri"]
        
        
print(preprocessed_training_data)
#Observe the processed data 
training_features = pd.read_csv(preprocessed_training_data + "/dataset_train.csv",nrows=10)
print("Training features shape: {}".format(training_features.shape))
training_features

s3://sagemaker-us-east-1-222978838857/sagemaker-processing-container-2023-06-23-10-03-13-195/output/train_data
Training features shape: (10, 8)


Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,Target
0,1,-0.558041,-1.176855,-1.099352,0.289338,-1.086507,-1.096994,0.0
1,2,-0.558041,0.004155,-0.26935,0.103445,-0.516629,0.323171,0.0
2,3,1.19399,1.595951,1.994294,0.573248,-0.822939,-1.532886,0.0
3,4,1.19399,-1.484944,-1.325717,-0.410296,0.302572,-0.872017,0.0
4,5,1.19399,-0.817417,-0.722078,-0.362978,-0.039355,0.210683,0.0
5,6,-0.558041,0.106852,1.164291,-0.068929,-0.445394,1.12465,0.0
6,7,-0.558041,-1.125506,-0.872988,-0.001331,-0.552246,-1.18136,0.0
7,8,-0.558041,-0.098541,0.862472,0.018948,-0.281554,-0.281453,0.0
8,9,1.19399,-1.536293,-0.722078,0.035848,-0.331418,1.138711,0.0
9,11,2.94602,-1.3309,-1.8539,0.333277,-0.623481,-0.464247,0.0


## Training

The training part is similar to the processing part in the code but has some difference on the way SageMaker handles the task.

<img src="images/training.PNG" width="500" height="700">

(1) On the Jupyter Notebook, you need to instanciate the training object to make the API call to SageMaker, push the data to S3 if needed, and push the image to ECR if needed.


(2) One you run the fit method of the estimator you instanciated, you call the SageMaker API with the create_training_job request (see : https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_training_job.html), SageMaker launch the EC2 instance with the information you provided in the request (which is basicaly a json file since we work with RESTful API)

(3) The training job run on the EC2 instance, when it has finished, it stores the output on a S3 bucket (model artifact, logs...) and shutdown every instance.

(4) All the training outputs are available on a S3 bucket and the model is ready to be deployed on an endpoint



In our case, we load a SageMaker image from ECR and use it for training. Then we just have to provide the training script as the entrypoit.

In [104]:
from sagemaker import image_uris
from sagemaker.estimator import Estimator

training_image = image_uris.retrieve(framework='sklearn',region='us-east-1',version='1.2-1',image_scope='training')


metric = {
    'Name' : 'Accuracy', 'Regex' : 'Accuracy : ([0-9\\.]+)'
}

estimator = Estimator(
    role = role,
    instance_count = 1,
    instance_type = "local",
    base_job_name = "job",
    image_uri = training_image,
    entry_point = "train.py",
    metric_definitions = [metric]
)

estimator.set_hyperparameters(
    C = 1,
    kernel = "poly"
)


estimator.fit({"train" : preprocessed_training_data, "test" : preprocessed_test_data})

INFO:sagemaker.image_uris:Defaulting to only available Python version: py3
INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:sagemaker:Creating training-job with name: job-2023-06-23-10-03-26-412
INFO:sagemaker.local.local_session:Starting training job
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:sagemaker.local.image:No AWS credentials found in session but credentials from EC2 Metadata Service are available.
INFO:sagemaker.local.image:docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-local
services:
  algo-1-3msu4:
    command: train
    container_name: kj86d2edkr-algo-1-3msu4
    environment:
    - '[Masked]'
    - '[Masked]'
    image: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
    networks:
      sagemaker-local:
        aliases:
        - algo-1-3

Creating kj86d2edkr-algo-1-3msu4 ... 
Creating kj86d2edkr-algo-1-3msu4 ... done
Attaching to kj86d2edkr-algo-1-3msu4
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:30,153 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:30,158 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:30,159 sagemaker-training-toolkit INFO     Failed to parse hyperparameter kernel value poly to Json.
[36mkj86d2edkr-algo-1-3msu4 |[0m Returning the value itself
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:30,170 sagemaker_sklearn_container.training INFO     Invoking user training script.
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:30,486 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:30,487 sagemaker-training-toolkit INFO     Failed t

[36mkj86d2edkr-algo-1-3msu4 |[0m accuracy on test is : 0.9651474530831099
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:31,719 - INFO - Accuracy : 0.9651474530831099
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:31,719 - INFO - saving the model...
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:31,722 - INFO - Training complete.
[36mkj86d2edkr-algo-1-3msu4 |[0m 2023-06-23 10:03:31,944 sagemaker-containers INFO     Reporting training SUCCESS
[36mkj86d2edkr-algo-1-3msu4 exited with code 0
[0mAborting on container exit...


INFO:root:creating /tmp/tmpypeo_72x/artifacts/output/data
INFO:root:copying /tmp/tmpypeo_72x/algo-1-3msu4/output/success -> /tmp/tmpypeo_72x/artifacts/output
INFO:root:copying /tmp/tmpypeo_72x/model/model.joblib -> /tmp/tmpypeo_72x/artifacts/model


===== Job Complete =====


As for the processing, we can retrieve some information about the job. For example, the S3 path of our model to use it for inference if we want to use a custom inference script.

In [105]:
training_job_description = estimator.jobs[-1].describe()
training_job_description
model_data_s3_uri = "{}".format(training_job_description["ModelArtifacts"]["S3ModelArtifacts"])
model_data_s3_uri

's3://sagemaker-us-east-1-222978838857/job-2023-06-23-10-03-26-412/model.tar.gz'

## Deploy

Once the training is done, our model is ready to be deployed to and enpoint. We could directly use the deploy() method on our estimator but here we have not implemented an inference part in our training script and we want to use a different image for training and inference.
We will use the class Model to deploy our model to an enpoint :

In [106]:
from sagemaker.model import Model

inference_image = image_uris.retrieve(framework='sklearn',region='us-east-1',version='1.2-1',image_scope='inference')

model = Model(
    image_uri = inference_image,
    model_data = model_data_s3_uri,
    role = role,
    entry_point = "inference.py",
)


INFO:sagemaker.image_uris:Defaulting to only available Python version: py3
INFO:sagemaker.image_uris:Defaulting to only supported image scope: cpu.


In [107]:

predictor = model.deploy(    
    initial_instance_count = 1,
    instance_type = "local",
    endpoint_name = "myendpoint")

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:sagemaker:Creating model with name: sagemaker-scikit-learn-2023-06-23-10-03-44-097
INFO:sagemaker:Creating endpoint-config with name myendpoint
INFO:sagemaker:Creating endpoint with name myendpoint
INFO:sagemaker.local.image:serving
INFO:sagemaker.local.image:creating hosting dir in /tmp/tmpj3ohlxzo
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:sagemaker.local.image:No AWS credentials found in session but credentials from EC2 Metadata Service are available.
INFO:sagemaker.local.image:docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-local
services:
  algo-1-4jqzs:
    command: serve
    container_name: o4afpz6lbw-algo-1-4jqzs
    environment:
    - '[Masked]'
    - '[Masked]'
    - '[Masked]'
    - '[Masked]'
    image: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
    networks:

Attaching to o4afpz6lbw-algo-1-4jqzs
[36mo4afpz6lbw-algo-1-4jqzs |[0m 2023-06-23 10:03:46,981 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
[36mo4afpz6lbw-algo-1-4jqzs |[0m 2023-06-23 10:03:46,985 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
[36mo4afpz6lbw-algo-1-4jqzs |[0m 2023-06-23 10:03:46,986 INFO - sagemaker-containers - nginx config: 
[36mo4afpz6lbw-algo-1-4jqzs |[0m worker_processes auto;
[36mo4afpz6lbw-algo-1-4jqzs |[0m daemon off;
[36mo4afpz6lbw-algo-1-4jqzs |[0m pid /tmp/nginx.pid;
[36mo4afpz6lbw-algo-1-4jqzs |[0m error_log  /dev/stderr;
[36mo4afpz6lbw-algo-1-4jqzs |[0m 
[36mo4afpz6lbw-algo-1-4jqzs |[0m worker_rlimit_nofile 4096;
[36mo4afpz6lbw-algo-1-4jqzs |[0m 
[36mo4afpz6lbw-algo-1-4jqzs |[0m events {
[36mo4afpz6lbw-algo-1-4jqzs |[0m   worker_connections 2048;
[36mo4afpz6lbw-algo-1-4jqzs |[0m }
[36mo4afpz6lbw-algo-1-4jqzs |[0m 
[36mo4afpz6lbw-algo-1-4jqzs |[0m http {
[36mo4afpz6l

INFO:sagemaker.local.entities:Checking if serving container is up, attempt: 10
INFO:sagemaker.local.entities:Container still not up, got: 502


[36mo4afpz6lbw-algo-1-4jqzs |[0m 2023/06/23 10:03:49 [crit] 14#14: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"
[36mo4afpz6lbw-algo-1-4jqzs |[0m 172.18.0.1 - - [23/Jun/2023:10:03:49 +0000] "GET /ping HTTP/1.1" 502 182 "-" "python-urllib3/1.26.14"
[36mo4afpz6lbw-algo-1-4jqzs |[0m [2023-06-23 10:03:50 +0000] [27] [INFO] Starting gunicorn 20.0.4
[36mo4afpz6lbw-algo-1-4jqzs |[0m [2023-06-23 10:03:50 +0000] [27] [INFO] Listening at: unix:/tmp/gunicorn.sock (27)
[36mo4afpz6lbw-algo-1-4jqzs |[0m [2023-06-23 10:03:50 +0000] [27] [INFO] Using worker: gevent
[36mo4afpz6lbw-algo-1-4jqzs |[0m [2023-06-23 10:03:50 +0000] [29] [INFO] Booting worker with pid: 29
[36mo4afpz6lbw-algo-1-4jqzs |[0m [2023-06-23 10:03:50 +0000] [30] [INFO] Booting worker with pid: 30


INFO:sagemaker.local.entities:Checking if serving container is up, attempt: 15


[36mo4afpz6lbw-algo-1-4jqzs |[0m 2023-06-23 10:03:54,530 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)
[36mo4afpz6lbw-algo-1-4jqzs |[0m 172.18.0.1 - - [23/Jun/2023:10:03:55 +0000] "GET /ping HTTP/1.1" 200 0 "-" "python-urllib3/1.26.14"
!

In [None]:
predictor.delete_model()
predictor.delete_endpoint()