# Churn Prediction with Multimodality of Tabular and Text features


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

---


Customer churn is a problem faced by a wide range of companies, from
telecommunications to banking, where customers are typically lost to
competitors. It's in a company's best interest to retain existing
customer instead of acquiring new customers because it usually costs
significantly more to attract new customers. When trying to retain
customers, companies often focus their efforts on customers who are more
likely to leave. User behaviour and customer support chat logs can
contain valuable indicators on the likelihood of a customer ending the
service. In this solution, we train and deploy a churn prediction model
that uses state-of-the-art natural language processing model to find
useful signals in text. In addition to textual inputs, this model uses
traditional structured data inputs such as numerical and categorical
fields.

In this notebook, we train, deploy and use a churn prediction model
that processes numerical, categorical and textual features to make its
prediction.

**Note**: When running this notebook on SageMaker Studio, you should make
sure to use the `Python 3 (PyTorch 1.10 Python 3.8 CPU Optimized)` image/kernel.

Install required packages to run this notebook

In [None]:
!pip install -U sagemaker

We start by importing a variety of packages that are used throughout
the notebook. One of the most important packages is the Amazon SageMaker
Python SDK (i.e. `import sagemaker`). 

In [None]:
import boto3
from pathlib import Path
import sagemaker
from sagemaker.pytorch import PyTorch
import sys

Up next, we can use the SageMaker client to call SageMaker APIs
directly, as an alternative to using the Amazon SageMaker SDK. We use
it at the end of the notebook to delete certain resources that are
created in this notebook. We define some naming variables that are used in the following sections, including s3 bucket prefixes and model, training job prefixes in the config file.

In [None]:
sys.path
import config

sagemaker_client = boto3.client("sagemaker")
sagemaker_session = sagemaker.Session()
IAM_ROLE = sagemaker.get_execution_role()

## 1. Data visualization

Download the data from source S3 buckets for visualization.

In [None]:
sagemaker.s3.S3Downloader.download(
    f"s3://sagemaker-example-files-prod-{sagemaker_session.boto_region_name}/datasets/tabular/synthetic_churn_prediction_with_text",
    "data",
)

Upload the data to the S3 bucket in your own account for training.

In [None]:
DEFAULT_BUCKET = sagemaker_session.default_bucket()
sagemaker.s3.S3Uploader.upload("data", f"s3://{DEFAULT_BUCKET}/{config.DATASETS_S3_PREFIX}")

In [None]:
import pandas as pd
import json


def load_jsonl(filepath):
    with open(filepath, "r") as f:
        lines = f.readlines()
    data = [json.loads(line) for line in lines]
    return data


train_data = pd.DataFrame(load_jsonl("data/train.jsonl"))
validation_data = pd.DataFrame(load_jsonl("data/validation.jsonl"))
test_data = load_jsonl("data/test.jsonl")

Prepare test data to be used as hold-out dataset for evaluating model performance

In [None]:
ground_truth_label = []

for each_example in test_data:
    ground_truth_label.append(each_example["y"])
    del each_example["y"]

Here are the first 10 observations of the training data.

In [None]:
train_data.head(10)

By modern standards, it’s a medium size dataset, with only 43,000 records, where eac h record uses 21 attributes to describe the profile of a customer of an unknown US mobile operator. The attributes are:

`CustServ Calls`: the number of calls placed to Customer Service. Positive values mean customers called customer service and negative values mean customer service called customers 

`Day Charge`: the billed cost of daytime calls

`Day Mins`: the total number of calling minutes used during the day

`Day Calls`: the total number of calls placed during the day

`VMail Message`: the average number of voice mail messages per month

`Eve Mins`, `Eve Calls`, `Eve Charge`: the billed cost for calls placed during the evening

`Night Mins`, `Night Calls`, `Night Charge`: the billed cost for calls placed during nighttime

`Intl Mins`, `Intl Calls`, `Intl Charge`: the billed cost for international calls

`Account Length`: the number of days that this account has been active

`State`: the US state in which the customer resides, indicated by a two-letter abbreviation; for example, OH or NJ

`Location`: the location of the corresponding customer’s phone number: 'urban', 'suburban', 'rural', None, or 'other'

`Phone`: the remaining seven-digit phone number

`Plan`: the plan customer has with the company

`Limit`: whether the customer's plan is limited or unlimited

`Text`: chat record written in text


`y`: whether the customer left the service: true/false

The attribute, `y`, is known as the target attribute: the attribute that we want the ML model to predict. Because the target attribute is binary, our model performs binary prediction, also known as binary classification.

**In addition, the features include missing values, which are taken cared of in the following fitting model stage.**

Let’s begin exploring the data:

Summary statistics for train and validation data.

In [None]:
train_data.describe()

In [None]:
validation_data.describe()

Shape of train and validation data

In [None]:
train_data.shape, validation_data.shape

## 2. Fitting a multimodality model with huggingface sentence transformer and scikit-learn random forest classifier

The model consists of two components: 1. feature engineering step that processes numerical, categorical, and text features. 2. model fitting step that fits the transformed numerical, categorical, and text features into [scikit-learn RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html). 

For the feature engineering step, we conduct following step.
1. Fill missing values for numerical features
2. Encode categorical features into one-hot values where the missing values are counted as one of categories for each feature.
3. Use [HuggingFace sentence transformer](https://huggingface.co/sentence-transformers?sort_models=downloads#models) to encode the text feature to generate a X dimensional dense vector where X value depends on particular sentence transformer. 
> We choose top 3 most downloaded sentence transformer models and use them in the following model fitting and hyperparameter optimization. Specifically, they are [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), [multi-qa-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1), [paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2).

For hyperparameters of [scikit-learn RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html), please check [github link](https://github.com/scikit-learn/scikit-learn/blob/0.24.X/sklearn/ensemble/_forest.py#L899).

Here is the architecture diagram.

<p align="center">
  <img src="diagram/architecture_rf.png" style="width: 600px;"/>
</p>

For demonstartion purpose, we only use numerical features `CustServ Calls` and `Account Length`, categorical features `plan` and `limit` and text feature `text` to fit the model. Multiple features should be seperated by `,` as shown below.

Hyperparameters are explained as below.

* `numerical-feature-names`: numerical feature names separted by comma `,`.
* `categorical-feature-names`: categorical feature names separated by comma `,`.
* `textual-feature-names`: text feature names separated by comma `,`.
* `label-name`: target column.

Hyperparameters for fandom forest
* `n-estimators`:  the number of trees in the forest.
* `min-impurity-decrease`: a node is the split if this split induces a decrease of the impurity greater than or equal to this value. For details, see [random forest documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html).
* `ccp-alpha`: complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha are chosen. By default, no pruning is performed.
* `criterion`: the function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon information gain.
* `max-depth`: the maximum depth of the tree. if -1, then nodes are expanded until all leaves are pure.
* `boostrap`: whether bootstrap samples are used when building trees. If "False", the whole dataset is used to build each tree.
* `min-samples-split`: the minimum number of samples required to split an internal node.
* `min-samples-leaf`: the minimum number of samples required to be at a leaf node.
* `balanced-data`: whether use different weights based on the data imbalance.

Hyperparameter for text sentence transformer:
* `sentence-transformer`: Use [HuggingFace sentence transformer](https://huggingface.co/sentence-transformers?sort_models=downloads#models) to encode the text feature to generate a X dimensional dense vector where X value depends on particular sentence transformer. 
> We choose top 3 most downloaded sentence transformer models and use them in the following model fitting and hyperparameter optimization. Specifically, they are [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), [multi-qa-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1), [paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2).

In [None]:
hyperparameters = {
    "n-estimators": 50,
    "min-impurity-decrease": 0.0,
    "ccp-alpha": 0.0,
    "sentence-transformer": "sentence-transformers/all-MiniLM-L6-v2",
    "criterion": "gini",
    "max-depth": 6,
    "boostrap": "True",
    "min-samples-split": 4,
    "min-samples-leaf": 1,
    "balanced-data": True,
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y",
}


current_folder = config.get_current_folder(globals())
estimator = PyTorch(
    framework_version="1.5.0",
    py_version="py3",
    entry_point="entry_point.py",
    source_dir=str(
        Path(current_folder, "containers/huggingface_transformer_randomforest").resolve()
    ),
    hyperparameters=hyperparameters,
    role=IAM_ROLE,
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    output_path="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    code_location="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    base_job_name=config.SOLUTION_PREFIX,
    tags=[{"Key": config.TAG_KEY, "Value": config.SOLUTION_PREFIX}],
    sagemaker_session=sagemaker_session,
    volume_size=30,
)

In [None]:
estimator.fit(
    {
        "train": "s3://" + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "train.jsonl")),
        "validation": "s3://"
        + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "validation.jsonl")),
    }
)

We use the unique solution prefix to name the model and endpoint.

In [None]:
endpoint_name = f"{config.SOLUTION_PREFIX}-endpoint"

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor = estimator.deploy(
    endpoint_name=endpoint_name,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    initial_instance_count=1,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
)

When calling our new endpoint from the notebook, we use a Amazon
SageMaker SDK
[`Predictor`](https://sagemaker.readthedocs.io/en/stable/predictors.html).
A `Predictor` is used to send data to an endpoint (as part of a request),
and interpret the response. Our `estimator.deploy` command returned a
`Predictor` but, by default, it sends and receive numpy arrays. Our
endpoint expects to receive (and also sends) JSON formatted objects, so
we modify the `Predictor` to use JSON instead of the PyTorch endpoint
default of numpy arrays. JSON is used here because it is a standard
endpoint format and the endpoint response can contain nested data
structures.

With our model successfully deployed and our predictor configured, we can
try out the churn prediction model out on example inputs.

In [None]:
data = {
    "CustServ Calls": -20.0,
    "Account Length": 133.12,
    "plan": "D",
    "limit": "unlimited",
    "text": "Well, I've been dealing with TelCom for three months now, and I feel like they're very helpful and responsive to my issues, but for a month now, I've only had one technical support call and that was very long and involved. My phone number was wrong on both contracts, and they gave me a chance to work with TelCom customer service and it was extremely helpful, so I've decided to stick with it. But I would like to have more help in terms of technical support, I haven't had the kind of help with my phone line and I don't have the type of tech support I want. So I would like to negotiate a phone contract, maybe an upgrade from a Sprint plan, or maybe from a Verizon plan.\\nTelCom Agent: Very good.",
}
response = predictor.predict(data=[data])

In [None]:
response

We have the response and we can print out the probability of churn.

In [None]:
print("{:.2%} probability of churn".format(response["probability"][0][1]))

**Caution**: the probability returned by this model has not been
calibrated. When the model gives a probability of churn of 20%,
for example, this does not necessarily mean that 20% of customers with
a probability of 20% resulted in churn. Calibration is a useful
property in certain circumstances, but is not required in cases where
discrimination between cases of churn and non-churn is sufficient.
[CalibratedClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html)
from [Scikit-learn](https://scikit-learn.org/stable/modules/calibration.html) can be used to calibrate a model.

Now, we query each of the text example to get prediction and compute the evaluation metrics. Note, even though we send full features of each test example to the endpoint. The code script in the endpoint only looks for numerical features `CustServ Calls` and `Account Length`, categorical features `plan` and `limit` and text feature `text` to make predictions, which aligns with training process (the numerical, categorical, and text feature names are saved in the training container and loaded back in the inference container during the inference).

Note. Even though we only use `CustServ Calls`, `Account Length`, `plan`,`limit`, and `text` features to fit the model. During inference of the random forest and transformer model, you can send an example with full features into the endpoint as the feature names are saved during training and are retrieved during the endpoint is queried and used to select the test example for generating prediction. 

In [None]:
import numpy as np

batch_size = 20
num_examples = len(test_data)
predicted_prob = []

for i in np.arange(0, num_examples, step=batch_size):
    query_response_batch = predictor.predict(
        test_data[i : (i + batch_size)],
    )

    predicted_prob_batch = query_response_batch["probability"]
    predicted_prob.append(predicted_prob_batch)

predicted_prob = np.concatenate(predicted_prob, axis=0)
predicted_label = np.argmax(predicted_prob, axis=1)

In [None]:
# Measure the prediction results quantitatively.
import pandas as pd
from sklearn.metrics import accuracy_score, roc_auc_score

eval_accuracy = accuracy_score(ground_truth_label, predicted_label)
eval_auc = roc_auc_score(ground_truth_label, predicted_prob[:, 1])

Randomforest_BERT = pd.DataFrame.from_dict(
    {
        "Accuracy": eval_accuracy,
        "ROC AUC": eval_auc,
    },
    orient="index",
    columns=["BERT + Random Forest"],
)

In [None]:
Randomforest_BERT

## 3. Fitting a multimodality model with hyperparameter optimization (HPO)

In this section we further improve the model performance by adding HPO tuning with [SageMaker Automatic Model Tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html). Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. The best model and its corresponded hyperparmeters are selected on the validation data. Next, the best model is evaluated on the hold-out test data, which is the same test data created in Stage I. Finally, we show that the performance of model trained with HPO is significantly better than the one trained without HPO.

Below are static hyperparameters we do not tune and dynamic hyperparameters we want to tune and their searching ranges

In [None]:
from sagemaker.tuner import (
    ContinuousParameter,
    IntegerParameter,
    CategoricalParameter,
    HyperparameterTuner,
)

hyperparameters = {
    "min_impurity_decrease": 0.0,
    "ccp_alpha": 0.0,
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y",
}

hyperparameter_ranges = {
    "sentence-transformer": CategoricalParameter(
        [
            "sentence-transformers/all-MiniLM-L6-v2",
            "sentence-transformers/multi-qa-mpnet-base-dot-v1",
            "sentence-transformers/paraphrase-MiniLM-L6-v2",
        ]
    ),
    "criterion": CategoricalParameter(["gini", "entropy"]),
    "max-depth": CategoricalParameter([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, -1]),
    "boostrap": CategoricalParameter(["True", "False"]),
    "min-samples-split": IntegerParameter(2, 10),
    "min-samples-leaf": IntegerParameter(1, 5),
    "n-estimators": CategoricalParameter([100, 200, 400, 800, 1000]),
}

In [None]:
tuning_job_name = f"{config.SOLUTION_PREFIX}-hpo"

current_folder = config.get_current_folder(globals())
estimator = PyTorch(
    framework_version="1.5.0",
    py_version="py3",
    entry_point="entry_point.py",
    source_dir=str(
        Path(current_folder, "containers/huggingface_transformer_randomforest").resolve()
    ),
    hyperparameters=hyperparameters,
    role=IAM_ROLE,
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    output_path="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    code_location="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    tags=[{"Key": config.TAG_KEY, "Value": config.SOLUTION_PREFIX}],
    sagemaker_session=sagemaker_session,
    volume_size=30,
)

Define the objective metric name, metric definition (with regex pattern), and objective type for the tuning job.

In [None]:
objective_metric_name = "roc auc"
metric_definitions = [{"Name": "roc auc", "Regex": "roc auc score on validation data: ([0-9\\.]+)"}]
objective_type = "Maximize"

In [None]:
tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=18,  # increase the maximum number of jobs will likely get better performance
    max_parallel_jobs=3,
    objective_type=objective_type,
    base_tuning_job_name=tuning_job_name,
)

In [None]:
tuner.fit(
    {
        "train": "s3://" + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "train.jsonl")),
        "validation": "s3://"
        + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "validation.jsonl")),
    },
    logs=True,
)

Find the tuning job name

In [None]:
sm_client = boto3.Session().client("sagemaker")

tuning_job_name = tuner.latest_tuning_job.name
tuning_job_name

Checking the status of the tuning jobs: whether all of them are finished with 'Completed' status.

In [None]:
tuning_job_result = sm_client.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuning_job_name
)

status = tuning_job_result["HyperParameterTuningJobStatus"]
if status != "Completed":
    print("Reminder: the tuning job has not been completed.")

job_count = tuning_job_result["TrainingJobStatusCounters"]["Completed"]
print("%d training jobs have completed" % job_count)

is_minimize = (
    tuning_job_result["HyperParameterTuningJobConfig"]["HyperParameterTuningJobObjective"]["Type"]
    != objective_type
)
objective_name = tuning_job_result["HyperParameterTuningJobConfig"][
    "HyperParameterTuningJobObjective"
]["MetricName"]

Once the tuning job finishes, we can bring in a table of metrics.

In [None]:
tuner_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner_analytics.dataframe()

if len(full_df) > 0:
    df = full_df[full_df["FinalObjectiveValue"] > -float("inf")]
    if len(df) > 0:
        df = df.sort_values("FinalObjectiveValue", ascending=True)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest": min(df["FinalObjectiveValue"]), "highest": max(df["FinalObjectiveValue"])})
        pd.set_option("display.max_colwidth", -1)  # Don't truncate TrainingJobName
    else:
        print("No training jobs have reported valid results yet.")

df

Deploy the best model

In [None]:
endpoint_name_hpo = f"{config.SOLUTION_PREFIX}-hpo-endpoint"

# Use the estimator from the previous step to deploy to a SageMaker endpoint
predictor_hpo = tuner.deploy(
    endpoint_name=endpoint_name_hpo,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    initial_instance_count=1,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
)

After deploying the endpoint, we query the endpoint using the same test data, compute the evaluaton metrics, and compare with the results without HPO tuning.

In [None]:
import numpy as np

batch_size = 20
num_examples = len(test_data)
predicted_prob_hpo = []

for i in np.arange(0, num_examples, step=batch_size):
    query_response_batch = predictor_hpo.predict(
        test_data[i : (i + batch_size)],
    )

    predicted_prob_hpo_batch = query_response_batch["probability"]
    predicted_prob_hpo.append(predicted_prob_hpo_batch)

predicted_prob_hpo = np.concatenate(predicted_prob_hpo, axis=0)
predicted_label_hpo = np.argmax(predicted_prob_hpo, axis=1)

In [None]:
# Measure the prediction results quantitatively.

eval_accuracy_hpo = accuracy_score(ground_truth_label, predicted_label_hpo)
eval_auc_hpo = roc_auc_score(ground_truth_label, predicted_prob_hpo[:, 1])

Randomforest_BERT_HPO = pd.DataFrame.from_dict(
    {
        "Accuracy": eval_accuracy_hpo,
        "ROC AUC": eval_auc_hpo,
    },
    orient="index",
    columns=["BERT + Random Forest with HPO"],
)

In [None]:
rf_hpo = pd.concat([Randomforest_BERT, Randomforest_BERT_HPO], axis=1)

In [None]:
rf_hpo

## 4. Fitting a AutoGluon multimodality weighted / stacked ensemble model

There are two types of AutoGluon multimodality: 1. Train multiple tabular models as well as the `TextPredictor` model (utilizing `TextPredictor` model inside of `TabularPredictor`), and then combine them via either a weighted ensemble or stack ensemble, as explained in [AutoGluon Tabular Paper](https://arxiv.org/pdf/2003.06505.pdf). 2. Fuse multiple neural network models directly and handles raw text (which are also capable of handling additional numerical/categorical columns) with a diagram shown as below. 

We try to train a multimodality weighted / stacked ensemble model first in this section and training a fusion nerual network model in the next section.

Retrieve the training image.

In [None]:
from sagemaker import image_uris
from sagemaker.estimator import Estimator

train_image_uri = image_uris.retrieve(
    "autogluon",
    region=boto3.Session().region_name,
    version="0.5.2",
    py_version="py38",
    image_scope="training",
    instance_type=config.TRAINING_INSTANCE_TYPE,
)

In [None]:
train_image_uri

Hyperparameters are explained as below.

* `numerical-feature-names`: numerical feature names separted by comma `,`.
* `categorical-feature-names`: categorical feature names separated by comma `,`.
* `textual-feature-names`: text feature names separated by comma `,`.
* `label-name`: target column.
* `problem_type`: either 'classification' or 'regression'. For classification task, we identify binary or multiclass classification in the training script.
* `eval_metric`:  evaluation metrics for validation data. For all the options, see [AutoGluon-Tabular documentation](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-0).
* `presets`: set presets=`best_quality` means best predictive accuracy with little consideration to inference time or disk usage. `high_quality` means high predictive accuracy with fast inference. `good_quality` means good predictive accuracy with very fast inference. `medium_quality` means medium predictive accuracy with very fast inference and very fast training time. `optimize_for_deployment` means optimizing result immediately for deployment by deleting unused models and removing training artifacts. `interpretable` means fitting only interpretable rule-based models from the `imodels` package. For details, see [AutoGluon documentation](https://auto.gluon.ai/stable/api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.fit).
* `text_nn_presets`: the presets for text neural network models. Options: `medium_quality_faster_train`, `high_quality`, and `best_quality`.
* `auto_stack`: whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy. Set this = True if you are willing to tolerate longer training times in order to maximize predictive accuracy! Automatically sets num_bag_folds and num_stack_levels arguments based on dataset properties. Note: Setting num_bag_folds and num_stack_levels arguments overrides auto_stack. Note: This can increase training time (and inference time) by up to 20x, but can greatly improve predictive performance.
* `num_bag_folds`: number of folds used for bagging of models. When num_bag_folds = k, training time is roughly increased by a factor of k (set = 0 to disable bagging). Disabled by default (0), but we recommend values between 5-10 to maximize predictive performance. Increasing num_bag_folds will result in models with lower bias but that are more prone to overfitting. num_bag_folds = 1 is an invalid value, and raises a ValueError. Values > 10 may produce diminishing returns, and can even harm overall results due to overfitting. To further improve predictions, avoid increasing num_bag_folds much beyond 10 and instead increase num_bag_sets.
* `num_bag_sets`: number of repeats of kfold bagging to perform (values must be >= 1). Total number of models trained during bagging = num_bag_folds * num_bag_sets. Defaults to 1 if time_limit is not specified, otherwise 20 (always disabled if num_bag_folds is not specified). Values greater than 1 will result in superior predictive performance, especially on smaller problems and with stacking enabled (reduces overall variance).
* `num_stack_levels`: number of stacking levels to use in stack ensemble. Roughly increases model training time by factor of num_stack_levels+1 (set = 0 to disable stack ensembling). Disabled by default (0), but we recommend values between 1-3 to maximize predictive performance. To prevent overfitting, num_bag_folds >= 2 must also be set or else a ValueError will be raised.
* `refit_full`: whether to retrain all models on all of the data (training + validation) after the normal training procedure. For details, see [AutoGluon documentation](https://auto.gluon.ai/stable/api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.fit).
* `set_best_to_refit_full`: if True, will change the default model that Predictor uses for prediction when model is not specified to the refit_full version of the model that exhibited the highest validation score. Only valid if refit_full is set.
* `save_space`: if True, reduces the memory and disk size of predictor by deleting auxiliary model files that aren't needed for prediction on new data. This has NO impact on inference accuracy. It is recommended if the only goal is to use the trained model for prediction.
* `verbosity`: verbosity levels range from `0` to `4` and control how much information is printed. Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings).
* `pretrained-transformer`: the pre-trained transformer to encode the text data. The transformers can be selected from [Hugginface AutoModel](https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#automodel). Some examples are showns as below. 
  - 'microsoft/deberta-v3-base'
  - 'bert-base-uncased'
  - 'google/electra-base-discriminator'
  - 'distilroberta-base'




Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluonTabular succeeds by ensembling multiple models
and stacking them in multiple layers. Thus hyperparameter optimization is usually not required for AutoGluon ensemble models.

In [None]:
hyperparameters = {
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y",
    "problem_type": "classification",  # either classification or regression. For classification, we identify binary or multiclass classification in the training script
    "eval_metric": "roc_auc",
    "presets": "medium_quality",
    "text_nn_presets": "medium_quality_faster_train",
    "auto_stack": "False",
    "num_bag_folds": 0,
    "num_bag_sets": 1,
    "num_stack_levels": 0,
    "refit_full": "False",
    "set_best_to_refit_full": "False",
    "save_space": "True",
    "verbosity": 2,
    "pretrained-transformer": "google/electra-small-discriminator",
}

In [None]:
# Create SageMaker Estimator instance

training_job_name_ag = f"{config.SOLUTION_PREFIX}-ag"

tabular_estimator_ag = Estimator(
    role=sagemaker.get_execution_role(),
    image_uri=train_image_uri,
    entry_point="train.py",
    source_dir=str(Path(current_folder, "containers/autogluon_multimodal_ensemble").resolve()),
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    max_run=360000,
    hyperparameters=hyperparameters,
    base_job_name=training_job_name_ag,
    output_path="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_AG_ENSEMBLE)),
    code_location="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_AG_ENSEMBLE)),
    tags=[{"Key": config.TAG_KEY, "Value": config.SOLUTION_PREFIX}],
)

In [None]:
tabular_estimator_ag.fit(
    {
        "train": "s3://" + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "train.jsonl")),
        "validation": "s3://"
        + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "validation.jsonl")),
    },
    logs=False,
)

In [None]:
import config

# Retrieve the inference docker container uri
inference_image_uri = image_uris.retrieve(
    "autogluon",
    region=boto3.Session().region_name,
    version="0.5.2",
    py_version="py38",
    image_scope="inference",
    instance_type=config.HOSTING_INSTANCE_TYPE,
)

endpoint_name_ag = f"{config.SOLUTION_PREFIX}-ag-endpoint"
current_folder = config.get_current_folder(globals())
# Use the estimator from the previous step to deploy to a SageMaker endpoint
predictor_ag = tabular_estimator_ag.deploy(
    initial_instance_count=1,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    entry_point="inference.py",
    image_uri=inference_image_uri,
    source_dir=str(Path(current_folder, "containers/autogluon_multimodal_ensemble").resolve()),
    endpoint_name=endpoint_name_ag,
)

In [None]:
test_features = pd.DataFrame(test_data)
test_features = test_features[["CustServ Calls", "Account Length", "plan", "limit", "text"]]

In [None]:
test_features.shape

In [None]:
import numpy as np

content_type = "text/csv"


def query_endpoint(encoded_tabular_data, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType=content_type,
        Body=encoded_tabular_data,
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    predicted_probabilities = model_predictions["probabilities"]
    return np.array(predicted_probabilities)


# split the test data into smaller size of batches to query the endpoint if test data has large size.
batch_size = 300
predict_prob_ag = []
for i in np.arange(0, num_examples, step=batch_size):
    query_response_batch = query_endpoint(
        test_features.iloc[i : (i + batch_size), :]
        .to_csv(header=False, index=False)
        .encode("utf-8"),
        endpoint_name_ag,
    )
    predict_prob_batch = parse_response(query_response_batch)  # prediction probability per batch
    predict_prob_ag.append(predict_prob_batch)


predict_prob_ag = np.concatenate(predict_prob_ag, axis=0)
predict_label_ag = np.argmax(predict_prob_ag, axis=1)

In [None]:
eval_accuracy_ag = accuracy_score(ground_truth_label, predict_label_ag)
eval_auc_ag = roc_auc_score(ground_truth_label, predict_prob_ag[:, 1])

AG_multimodality = pd.DataFrame.from_dict(
    {
        "Accuracy": eval_accuracy_ag,
        "ROC AUC": eval_auc_ag,
    },
    orient="index",
    columns=["AutoGluon Multimodality Ensemble"],
)

In [None]:
rf_hpo_ag = pd.concat([rf_hpo, AG_multimodality], axis=1)

In [None]:
rf_hpo_ag

## 5. Fitting a AutoGluon multimodality fusion model

The architecture of the models are shown as below. For details, see the official [AutoGluon documentation](https://auto.gluon.ai/stable/tutorials/multimodal/multimodal_text_tabular.html).

<p align="center">
  <img src="diagram/architecture_ag.png" style="width: 600px;"/>
</p>

In [None]:
# Create SageMaker Estimator instance
training_job_name_ag_fusion = f"{config.SOLUTION_PREFIX}-ag-fusion"


hyperparameters = {
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y",
    "problem_type": "classification",  # either classification or regression. For classification, we identify binary or multiclass classification in the training script
    "eval_metric": "roc_auc",
    "verbosity": 2,
    "pretrained-transformer": "google/electra-small-discriminator",
}


tabular_estimator_ag_fusion = Estimator(
    role=IAM_ROLE,
    image_uri=train_image_uri,
    entry_point="train.py",
    source_dir=str(Path(current_folder, "containers/autogluon_multimodal_fusion").resolve()),
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    max_run=360000,
    hyperparameters=hyperparameters,
    base_job_name=training_job_name_ag_fusion,
    output_path="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_AG_FUSION)),
    code_location="s3://" + str(Path(DEFAULT_BUCKET, config.OUTPUTS_S3_PREFIX_AG_FUSION)),
    tags=[{"Key": config.TAG_KEY, "Value": config.SOLUTION_PREFIX}],
)

In [None]:
tabular_estimator_ag_fusion.fit(
    {
        "train": "s3://" + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "train.jsonl")),
        "validation": "s3://"
        + str(Path(DEFAULT_BUCKET, config.DATASETS_S3_PREFIX, "validation.jsonl")),
    },
    logs=False,
)

In [None]:
endpoint_name_ag_fusion = f"{config.SOLUTION_PREFIX}-ag-fusion-endpoint"

# Use the estimator from the previous step to deploy to a SageMaker endpoint
predictor_ag_fusion = tabular_estimator_ag_fusion.deploy(
    initial_instance_count=1,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    entry_point="inference.py",
    image_uri=inference_image_uri,
    source_dir=str(Path(current_folder, "containers/autogluon_multimodal_fusion").resolve()),
    endpoint_name=endpoint_name_ag_fusion,
)

In [None]:
# split the test data into smaller size of batches to query the endpoint if test data has large size.

batch_size = 50
predict_prob_ag_fusion = []
for i in np.arange(0, num_examples, step=batch_size):
    query_response_batch = query_endpoint(
        test_features.iloc[i : (i + batch_size), :]
        .to_csv(header=False, index=False)
        .encode("utf-8"),
        endpoint_name_ag_fusion,
    )
    predict_prob_batch = parse_response(query_response_batch)  # prediction probability per batch
    predict_prob_ag_fusion.append(predict_prob_batch)


predict_prob_ag_fusion = np.concatenate(predict_prob_ag_fusion, axis=0)
predict_label_ag_fusion = np.argmax(predict_prob_ag_fusion, axis=1)

In [None]:
eval_accuracy_ag_fusion = accuracy_score(ground_truth_label, predict_label_ag_fusion)
eval_auc_ag_fusion = roc_auc_score(ground_truth_label, predict_prob_ag_fusion[:, 1])

AG_multimodality_fusion = pd.DataFrame.from_dict(
    {
        "Accuracy": eval_accuracy_ag_fusion,
        "ROC AUC": eval_auc_ag_fusion,
    },
    orient="index",
    columns=["AutoGluon Multimodality Fusion"],
)

In [None]:
pd.concat([rf_hpo_ag, AG_multimodality_fusion], axis=1)

We can see all of four models deliver close performance on the hold-out test data. For your own use cases, please replace the example dataset by yours to find out the optimal model that works best for your dataset.

## Clean Up

When you've finished with the relationship extraction endpoint (and associated
endpoint-config), make sure that you delete it to avoid accidental
charges.

In [None]:
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name)

sagemaker_client.delete_endpoint(EndpointName=endpoint_name_hpo)
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name_hpo)

sagemaker_client.delete_endpoint(EndpointName=endpoint_name_ag)
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name_ag)

sagemaker_client.delete_endpoint(EndpointName=endpoint_name_ag_fusion)
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name_ag_fusion)

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/introduction_to_applying_machine_learning|churn_prediction_multimodality_of_text_and_tabular|churn_prediction_multimodality_of_text_and_tabular.ipynb)
