In [None]:
import sagemaker
import pandas as pd
import numpy as np
from ag_model import AutoGluonTraining
from sagemaker import utils
import os

# Training

In [None]:
role = sagemaker.get_execution_role()

We are using [official](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#autogluon-training-containers) 0.3.1 AutoGluon Deep Learning Container images with custom training scripts (see `scripts/` directory).

In [None]:
ag = AutoGluonTraining(
    role=role,
    entry_point="scripts/tabular_train.py",
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    base_job_name="autogluon-tabular-train",
)

### Writing Training Scripts

Users can create their own training/inference scripts using [SageMaker Python SDK examples](https://sagemaker.readthedocs.io/en/stable/overview.html#prepare-a-training-script).
The scripts we created allow to pass AutoGluon configuration as a YAML file (located in `data/config` directory).

AutoGluon code can be quickly authored starting with [tutorials](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-quickstart.html) available on our website.

### Data Pre-processing
Let's upload data to S3 ready for SageMaker to use

In [None]:
!wget https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv -O data/train.csv
!wget https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv -O data/test.csv

In [None]:
device = "cpu"
data_path = os.path.join("data")
s3_prefix = "autogluon_sm/{}".format(utils.sagemaker_timestamp())
train_input = ag.sagemaker_session.upload_data(
    path=os.path.join(data_path, f"train.csv"), key_prefix=s3_prefix
)
eval_input = ag.sagemaker_session.upload_data(
    path=os.path.join(data_path, f"test.csv"), key_prefix=s3_prefix
)
config_input = ag.sagemaker_session.upload_data(
    path=os.path.join(data_path, "config", f"config-full.{device}.yaml"), key_prefix=s3_prefix
)

### Model Fitting
Fit a model using SageMaker Training

In [None]:
job_name = utils.unique_name_from_base("test-autogluon-image")
ag.fit({"config": config_input, "train": train_input, "test": eval_input}, job_name=job_name)

### Model export

AutoGluon models are portable: everything you need to deploy a trained model is in the tarball created by SageMaker.

The artifact can be used locally, on EC2/ECS/EKS or served via SageMaker Inference.

In [None]:
!aws s3 cp {ag.model_data} .

In [None]:
!ls -alF model.tar.gz