# AutoGluon Tabular with SageMaker

[AutoGluon](https://github.com/awslabs/autogluon) automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy deep learning models on tabular, image, and text data.
This notebook shows how to use AutoGluon-Tabular with Amazon SageMaker by creating custom containers.

In [2]:
# Imports
import os
import boto3
import sagemaker
from time import sleep
from collections import Counter
import pandas as pd
from sagemaker import get_execution_role, local, Model, utils, fw_utils, s3
from sagemaker.estimator import Estimator
from sagemaker.predictor import Predictor as RealTimePredictor
from sagemaker.serializers import CSVSerializer

from sklearn.metrics import accuracy_score, classification_report
from IPython.core.display import display, HTML
from IPython.core.interactiveshell import InteractiveShell

# Print settings
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 10)

# Account/s3 setup
session = sagemaker.Session()
local_session = local.LocalSession()
bucket = session.default_bucket()
prefix = 'sagemaker/autogluon-tabular'
region = session.boto_region_name
role = get_execution_role()
client = session.boto_session.client(
    "sts", region_name=region, endpoint_url=utils.sts_regional_endpoint(region)
    )


First, build autogluon package to copy into docker image.

In [4]:
# Download and unzip the data
!aws s3 cp --region {region} s3://sagemaker-sample-data-{region}/autopilot/direct_marketing/bank-additional.zip .
!unzip -qq -o bank-additional.zip
!rm bank-additional.zip

local_data_path = './bank-additional/bank-additional-full.csv'
data = pd.read_csv(local_data_path)

# Split train/test data
train = data.sample(frac=0.7, random_state=42)
test = data.drop(train.index)

# Split test X/y
label = 'y'
y_test = test[label]
X_test = test.drop(columns=[label])

download: s3://sagemaker-sample-data-us-east-1/autopilot/direct_marketing/bank-additional.zip to ./bank-additional.zip


##### Check the data

Upload the data to s3

In [5]:
train_file = 'train.csv'
train.to_csv(train_file,index=False)
train_s3_path = session.upload_data(train_file, key_prefix='{}/data'.format(prefix))

test_file = 'test.csv'
test.to_csv(test_file,index=False)
test_s3_path = session.upload_data(test_file, key_prefix='{}/data'.format(prefix))

X_test_file = 'X_test.csv'
X_test.to_csv(X_test_file,index=False)
X_test_s3_path = session.upload_data(X_test_file, key_prefix='{}/data'.format(prefix))

train_s3_path = 's3://sagemaker-us-east-1-181880743555/sagemaker/autogluon-tabular/data/train.csv'

In [16]:
%%writefile src/requirements.txt
# git+git://github.com/EmilyWebber/autogluon.git
autogluon
PrettyTable
bokeh
numpy==1.16.1
boto3
matplotlib

Overwriting src/requirements.txt


In [21]:
from sagemaker.mxnet.estimator import MXNet
from sagemaker import get_execution_role

role = get_execution_role()

estimator = MXNet(source_dir = 'src',
                    entry_point = 'train.py',
                    role=role,
                    framework_version = '1.7.0',
                    py_version = 'py3',
                    instance_count=1,
                    instance_type='ml.m5.2xlarge',
                    volume_size=100,
                    tags = [{"Key": "autogluon-version",
                        "Value":"vanilla"}],
#                              "Value":"git+git://github.com/EmilyWebber/autogluon.git"}],
                    hyperparameters = {'fit_args': { 'label': 'y', 
                                        'presets': ['optimize_for_deployment']},
                                        'feature_importance': True })         

train_s3_path = 's3://sagemaker-us-east-1-181880743555/sagemaker/autogluon-tabular/data/train.csv'

estimator.fit(train_s3_path, wait=False)