# Train TLC Demands Predicator with SageMaker AutoGluon Tabular

[AutoGluon](https://github.com/awslabs/autogluon) automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy deep learning models on tabular, image, and text data.
This example shows how to use AutoGluon-Tabular with Amazon SageMaker by applying [pre-built deep learning containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#autogluon-training-containers).

# Prerequisites

In [7]:
import sagemaker
import pandas as pd
from ag_model import (
    AutoGluonSagemakerEstimator,
    AutoGluonNonRepackInferenceModel,
    AutoGluonSagemakerInferenceModel,
    AutoGluonRealtimePredictor,
    AutoGluonBatchPredictor,
)
from sagemaker import utils
from sagemaker.serializers import CSVSerializer
import os
import boto3

role = "arn:aws:iam::178770047227:role/service-role/SageMaker-ExecutionRole-20231202T212840" # change to your role
sagemaker_session = sagemaker.session.Session()
region = sagemaker_session._region_name

bucket = "qiaoshi-aws-ml"
s3_prefix = f"tlc/ml/{utils.sagemaker_timestamp()}"
output_path = f"s3://{bucket}/{s3_prefix}/output/"

# Training

Users can create their own training/inference scripts using [SageMaker Python SDK examples](https://sagemaker.readthedocs.io/en/stable/overview.html#prepare-a-training-script).
The scripts we created allow to pass AutoGluon configuration as a YAML file (located in `data/config` directory).

We are using [official AutoGluon Deep Learning Container images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#autogluon-training-containers) with custom training scripts (see `scripts/` directory).

In [2]:
ag = AutoGluonSagemakerEstimator(
    role=role,
    entry_point="scripts/tabular_train.py",
    region=region,
    instance_count=1,
    instance_type="ml.p3.8xlarge",
    framework_version="0.8.2",
    py_version="py39",
    base_job_name="tlc-tabular-train",
    disable_profiler=True,
    debugger_hook_config=False,
)

Upload the data to s3

In [8]:
s3_prefix = f"autogluon_sm/{utils.sagemaker_timestamp()}"

train_input = "s3://qiaoshi-aws-ml/tlc/results/ml/trips_with_weather_merged/train.csv"

eval_input = "s3://qiaoshi-aws-ml/tlc/results/ml/trips_with_weather_merged/eval.csv"


config_input = ag.sagemaker_session.upload_data(
    path=os.path.join("config", "config-med.yaml"), key_prefix=s3_prefix
)

# Provide inference script so the script repacking is not needed later
# See more here: https://docs.aws.amazon.com/sagemaker/latest/dg/mlopsfaq.html
# Q. Why do I see a repack step in my SageMaker pipeline?
inference_script = ag.sagemaker_session.upload_data(
    path=os.path.join("scripts", "tabular_serve.py"), key_prefix=s3_prefix
)

In [9]:
eval_input

's3://qiaoshi-aws-ml/tlc/results/ml/trips_with_weather_merged/eval.csv'

In [10]:
config_input

's3://sagemaker-us-east-1-178770047227/autogluon_sm/2024-01-17-03-44-57-240/config-med.yaml'

### Fit The Model
For local training set `instance_type` to local.

For non-local training the recommended instance type is `ml.m5.2xlarge`.

In [11]:
job_name = utils.unique_name_from_base("tlc-training")
ag.fit(
    {
        "config": config_input,
        "train": train_input,
        "test": eval_input,
        "serving": inference_script,
    },
    job_name=job_name,
)

INFO:sagemaker:Creating training-job with name: tlc-training-1705463118-1fb3


2024-01-17 03:45:21 Starting - Starting the training job...
2024-01-17 03:45:35 Starting - Preparing the instances for training......
2024-01-17 03:46:51 Downloading - Downloading input data......
2024-01-17 03:47:46 Downloading - Downloading the training image...
2024-01-17 03:48:30 Training - Training image download completed. Training in progress...bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2024-01-17 03:48:53,284 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
2024-01-17 03:48:53,286 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2024-01-17 03:48:53,288 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)
2024-01-17 03:48:53,299 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
2024-01-17 03:48:53,301 sagemaker_pytorch_container.training INFO     Invoking user tr

### Model export

AutoGluon models are portable: everything needed to deploy a trained model is in the tarball created by SageMaker.

The artifact can be used locally, on EC2/ECS/EKS or served via SageMaker Inference.

In [None]:
!aws s3 cp {ag.model_data} .

In [None]:
!ls -alF model.tar.gz

# Endpoint Deployment

Upload the model we trained earlier

In [None]:
endpoint_name = sagemaker.utils.unique_name_from_base("sagemaker-autogluon-serving-trained-model")

model_data = sagemaker_session.upload_data(
    path=os.path.join(".", "model.tar.gz"), key_prefix=f"{endpoint_name}/models"
)

Deploy remote or local endpoint

In [None]:
instance_type = "ml.m5.2xlarge"
# instance_type = 'local'

In [None]:
model = AutoGluonNonRepackInferenceModel(
    model_data=model_data,
    role=role,
    region=region,
    framework_version="0.6",
    py_version="py38",
    instance_type=instance_type,
    source_dir="scripts",
    entry_point="tabular_serve.py",
)

In [None]:
model.deploy(initial_instance_count=1, serializer=CSVSerializer(), instance_type=instance_type)

In [None]:
predictor = AutoGluonRealtimePredictor(model.endpoint_name)

### Predict on unlabeled test data

Remove target variable (`class`) from the data and get predictions for a sample of 100 rows using the deployed endpoint.

In [None]:
df = pd.read_csv("data/test.csv")
data = df[:100]

In [None]:
preds = predictor.predict(data.drop(columns="class"))
preds

In [None]:
p = preds[["pred"]]
p = p.join(data["class"]).rename(columns={"class": "actual"})
p.head()

In [None]:
print(f"{(p.pred==p.actual).astype(int).sum()}/{len(p)} are correct")

### Cleanup Endpoint

In [None]:
predictor.delete_endpoint()

# Batch Transform

Deploying a trained model to a hosted endpoint has been available in SageMaker since launch and is a great way to provide real-time predictions to a service like a website or mobile app. But, if the goal is to generate predictions from a trained model on a large dataset where minimizing latency isn’t a concern, then the batch transform functionality may be easier, more scalable, and more appropriate.

[Read more about Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html).

In [None]:
endpoint_name = sagemaker.utils.unique_name_from_base(
    "sagemaker-autogluon-batch_transform-trained-model"
)

model_data = sagemaker_session.upload_data(
    path=os.path.join(".", "model.tar.gz"), key_prefix=f"{endpoint_name}/models"
)

In [None]:
instance_type = "ml.m5.2xlarge"

In [None]:
model = AutoGluonSagemakerInferenceModel(
    model_data=model_data,
    role=role,
    region=region,
    framework_version="0.6",
    py_version="py38",
    instance_type=instance_type,
    entry_point="tabular_serve-batch.py",
    source_dir="scripts",
    predictor_cls=AutoGluonBatchPredictor,
)

In [None]:
transformer = model.transformer(
    instance_count=1,
    instance_type=instance_type,
    strategy="MultiRecord",
    max_payload=6,
    max_concurrent_transforms=1,
    output_path=output_path,
    accept="application/json",
    assemble_with="Line",
)

Prepare data for batch transform

In [None]:
pd.read_csv(f"data/test.csv")[:100].to_csv("data/test_no_header.csv", header=False, index=False)

Upload data to sagemaker session

In [None]:
test_input = transformer.sagemaker_session.upload_data(
    path=os.path.join("data", "test_no_header.csv"), key_prefix=s3_prefix
)

In [None]:
transformer.transform(
    test_input,
    input_filter="$[:14]",  # filter-out target variable
    split_type="Line",
    content_type="text/csv",
    output_filter="$['class']",  # keep only prediction class in the output
)

transformer.wait()

Download batch transform outputs

In [None]:
!aws s3 cp {transformer.output_path[:-1]}/test_no_header.csv.out .

In [None]:
p = pd.concat(
    [
        pd.read_json("test_no_header.csv.out", orient="index")
        .sort_index()
        .rename(columns={0: "preds"}),
        pd.read_csv("data/test.csv")[["class"]].iloc[:100].rename(columns={"class": "actual"}),
    ],
    axis=1,
)
p.head()

In [None]:
print(f"{(p.preds==p.actual).astype(int).sum()}/{len(p)} are correct")

# Conclusion

In this tutorial we successfully trained an AutoGluon model and explored a few options how to deploy it using SageMaker. Any of the sections of this tutorial (training/endpoint inference/batch inference) can be used independently (i.e. train locally, deploy to SageMaker, or vice versa).

Next steps:
* [Learn more](https://auto.gluon.ai) about AutoGluon, explore [tutorials](https://auto.gluon.ai/stable/tutorials/index.html).
* Explore [SageMaker inference documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).