## CatBoost Scikit Learn Script Mode Local Training and Serving 

This is a sample Python program that trains a simple CatBoost model using SageMaker scikit-learn Docker image, and then performs inference. This implementation will work on your *local computer* or in the *AWS Cloud*.

#### Prerequisites:
1. Install required Python packages:
   `pip install -r requirements.txt`
2. Docker Desktop installed and running on your computer:
   `docker ps`
3. You should have AWS credentials configured on your local machine in order to be able to pull the docker image from ECR.

In [None]:
import os
import sagemaker
import pandas as pd
from sagemaker.predictor import csv_serializer
from sagemaker.sklearn import SKLearn
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

In [None]:
# Useful SageMaker variables
try:
    # You're using a SageMaker notebook
    sess = sagemaker.Session()
    bucket = sess.default_bucket()
    role = sagemaker.get_execution_role()
except ValueError:
    # You're using a notebook somewhere else
    print("Setting role and SageMaker session manually...")
    
    #please change the bucket, region and iam role as needed
    region = "us-west-2"
    bucket = f"sagemaker-{region}-demo"
    
    iam = boto3.client("iam")
    sagemaker_client = boto3.client("sagemaker")

    sagemaker_execution_role_name = (
        "AmazonSageMaker-ExecutionRole-20200101T000001"  # Change this to your role name
    )
    role = iam.get_role(RoleName=sagemaker_execution_role_name)["Role"]["Arn"]
    boto3.setup_default_session(region_name=region, profile_name="default")
    sess = sagemaker.Session(sagemaker_client=sagemaker_client, default_bucket=bucket)
    
prefix = "catboost_scikit_learn"

## Downloading Data
Download training and eval data

In [None]:
local_train = './data/train/boston_train.csv'
local_validation = './data/validation/boston_validation.csv'
local_test = './data/test/boston_test.csv'

In [None]:
if os.path.isfile('./data/train/boston_train.csv') and \
        os.path.isfile('./data/validation/boston_validation.csv') and \
        os.path.isfile('./data/test/boston_test.csv'):
    print('Training dataset exist. Skipping Download')
else:
    print('Downloading training dataset')

    os.makedirs("./data", exist_ok=True)
    os.makedirs("./data/train", exist_ok=True)
    os.makedirs("./data/validation", exist_ok=True)
    os.makedirs("./data/test", exist_ok=True)

    data = load_boston()

    X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=45)
    X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=45)

    trainX = pd.DataFrame(X_train, columns=data.feature_names)
    trainX['target'] = y_train

    valX = pd.DataFrame(X_test, columns=data.feature_names)
    valX['target'] = y_test

    testX = pd.DataFrame(X_test, columns=data.feature_names)

    trainX.to_csv(local_train, header=None, index=False)
    valX.to_csv(local_validation, header=None, index=False)
    testX.to_csv(local_test, header=None, index=False)

    print('Downloading completed')

## Model Training
Starting model training using **local mode**. Note: if launching for the first time in local mode, container image download might take a few minutes to complete.

In [None]:
# Please change the below parameter to change between "local" and "remote" training mode 
#(change to False will launch the training job using a remote training instance)
local_training = False
if local_training:
    training_instance_type = "local"
    train_location = 'file://' + local_train
    validation_location = 'file://' + local_validation
else:
    training_instance_type = "ml.m5.xlarge"
    train_location = sess.upload_data(
        local_train, key_prefix="{}/data/{}".format(prefix, "train")
    )
    validation_location = sess.upload_data(
        local_validation, key_prefix="{}/data/{}".format(prefix, "validation")
    )
        

In [None]:
# create a SKLearn estimator and call fit() to start the training

sklearn = SKLearn(
    entry_point="catboost_train_deploy.py",
    source_dir='code',
    framework_version="0.23-1",
    instance_type=training_instance_type,
    role=role,
)

sklearn.fit({'train': train_location, 'validation': validation_location})
print('Completed model training')


## local training behind the scene

When user run the training/hosting in "*local*" mode, the job will firstly pull the requested docker image from ECR (in this case, we are pulling from the service team account to get the prebuilt [docker container for scikit learn](https://github.com/aws/sagemaker-scikit-learn-container)). You can run the below command in terminal to check which docker images are available on your local instance/machine:
```
$ docker images
```

![](./image/docker_images.png)

Then the pulled image will be run in the local environment similarly like it would be run in the remote SageMaker training instance. You can mimic the same behavor and run the script in a running docker container manually to understand better how the entry point script is executed inside the running container. 
<div class="alert alert-block alert-danger">
<b>Warning:</b> when you run the below command, make sure you are under the current directory (catboost_scikit_learn_script_mode_local_training_and_serving). Also you need to create a "model" folder under the current directory, otherwise you will see error message in the model saving stage.
</div>

![](./image/folder_structure.png)

#### <span style="color:blue">Step 1: Running docker image from terminal<span>
```
$ docker run -v $(pwd):/opt/ml -v $(pwd)/data:/opt/ml/input/data -it $(docker images -f "reference=*/*0.23-1-cpu-py3" --quiet)
```

#### <span style="color:blue">Step 2: Prepare the environment and install python packages<span>

Once get inside the running docker container, we can execute the same command that we saw from the logs emitted from the previous local traning job. Firstly, we will install the packages specified in the *requirements.txt* file in the code folder, and then execute the entry point script defined by the job. 


```
# export SM_OUTPUT_DATA_DIR=/opt/ml/model
# export SM_CHANNEL_TRAIN=/opt/ml/input/data/train
# export SM_CHANNEL_VALIDATION=/opt/ml/input/data/validation
# cd /opt/ml/code
# python -m pip install -r requirements.txt
```

![](./image/docker_run.png)

*Note that* the first three command is to set up environment variables manually, but when you run in local or remote mode using the prebuilt containers, these environment variables are set up by the [SageMaker Training Toolkit](https://github.com/aws/sagemaker-training-toolkit/blob/master/src/sagemaker_training/environment.py). You can find all the available environment variables setup by SageMaker Training Toolkit in the above local training logs as well.

#### <span style="color:blue">Step 3: Execute training<span>

Run the training script using below command:
```
# /miniconda3/bin/python catboost_train_deploy.py
```

![](./image/training_in_docker.png)

You can compare the above training job outputs and the outputs in the local training logs, they should be the same.

*To exit from the running docker image, you can use ctrl + D*

## Deploying trained model in local instance
We can also deploy the trained model and perform invocation against the local endpoint

In [None]:
print('Deploying endpoint in local mode')
predictor = sklearn.deploy(1, 'local', serializer=csv_serializer)


In [None]:
with open(local_test, 'r') as f:
    payload = f.read().strip()

predictions = predictor.predict(payload)
print('predictions: {}'.format(predictions))

## Clear up resources
Delete the endpoint deployed in local

In [None]:
predictor.delete_endpoint(predictor.endpoint)