# AWS Re:Invent  Autogluon Workshop
### This workshop will demonstrate a machine learning problem solved by autogluon.
* Use the documentation of autogluon and the different tutorials [here](https://auto.gluon.ai/stable/index.html).

### Context
In this notebook we are going to use Autogluon in different ways. We will start with default setting and let it fit our Dataset. In a second time we will use hyperparameters to specify a given set of model to try with a defined range of hyperparameters to try. We will compare the accuracy between Default setting and hyperparameters. Lastly, we will pass ensembling arguments to Autogluon predictor with the same hyperparameter setting of the previous section. We will compare the different accuracy.

In [1]:
# If running on your own computer please refer to AutoGluon installation instructions:
# https://auto.gluon.ai/stable/install.html
# This notebook assumes running in SageMaker Studio with "PyTorch 1.12 Python 3.8 CPU Optimized" kernel.
!pip3 install autogluon
!pip3 install ipywidgets

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


In [2]:
import autogluon
from autogluon.tabular import TabularDataset, TabularPredictor

In [3]:
# Importing data directly from Autogluon datasets.
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
# subsample subset of data for faster demo, try setting this to much larger values
subsample_size = 500  
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
6118,51,Private,39264,Some-college,10,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,>50K
23204,58,Private,51662,10th,6,Married-civ-spouse,Other-service,Wife,White,Female,0,0,8,United-States,<=50K
29590,40,Private,326310,Some-college,10,Married-civ-spouse,Craft-repair,Husband,White,Male,0,0,44,United-States,<=50K
18116,37,Private,222450,HS-grad,9,Never-married,Sales,Not-in-family,White,Male,0,2339,40,El-Salvador,<=50K
33964,62,Private,109190,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,15024,0,40,United-States,>50K


In this dataset we want to predict what is the class of the people given the set of columns.

In [4]:
# We specify the column of our dataset that is the label.
label = 'class'
print("Summary of class variable: \n", train_data[label].describe())

Summary of class variable: 
 count        500
unique         2
top        <=50K
freq         365
Name: class, dtype: object


In [5]:
# Run tabular predictor with default settings.
predictor = TabularPredictor(label=label).fit(train_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20221125_121604/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20221125_121604/"
AutoGluon Version:  0.5.2
Python Version:     3.8.12
Operating System:   Linux
Train Data Rows:    500
Train Data Columns: 14
Label Column: class
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify po

With the default settings Autogluon finds that the WeightedEnsemble_L2 is the best model. With a Validation accuracy of 87%

In [6]:
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]  # values to predict
# delete label column to prove we're not cheating
test_data_nolab = test_data.drop(columns=[label])  
test_data_nolab.head()

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country
0,31,Private,169085,11th,7,Married-civ-spouse,Sales,Wife,White,Female,0,0,20,United-States
1,17,Self-emp-not-inc,226203,12th,8,Never-married,Sales,Own-child,White,Male,0,0,45,United-States
2,47,Private,54260,Assoc-voc,11,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,1887,60,United-States
3,21,Private,176262,Some-college,10,Never-married,Exec-managerial,Own-child,White,Female,0,0,30,United-States
4,17,Private,241185,12th,8,Never-married,Prof-specialty,Own-child,White,Male,0,0,20,United-States


In [7]:
y_pred = predictor.predict(test_data_nolab)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=False)

Evaluation: accuracy on test data: 0.8374449790152523
Evaluations on test data:
{
    "accuracy": 0.8374449790152523
}


In [8]:
predictor.evaluate(test_data, auxiliary_metrics=False)

Evaluation: accuracy on test data: 0.8374449790152523
Evaluations on test data:
{
    "accuracy": 0.8374449790152523
}


{'accuracy': 0.8374449790152523}

There is two function of evalutions for tabular predictor, 'evaluate_predictions' where you pass the prediction and the true labels. 'evaluate' where you pass direclty the test dataset.

In [9]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,RandomForestGini,0.842973,0.84,0.158744,0.062258,0.579492,0.158744,0.062258,0.579492,1,True,5
1,CatBoost,0.842461,0.85,0.041671,0.010297,1.129016,0.041671,0.010297,1.129016,1,True,7
2,RandomForestEntr,0.84113,0.83,0.163591,0.064237,0.474408,0.163591,0.064237,0.474408,1,True,6
3,LightGBM,0.839799,0.85,0.028759,0.005999,0.226991,0.028759,0.005999,0.226991,1,True,4
4,XGBoost,0.837445,0.87,0.039582,0.007091,0.229637,0.039582,0.007091,0.229637,1,True,11
5,WeightedEnsemble_L2,0.837445,0.87,0.042729,0.007671,0.609865,0.003148,0.000581,0.380228,2,True,14
6,LightGBMXT,0.836421,0.83,0.016908,0.005784,1.010078,0.016908,0.005784,1.010078,1,True,3
7,ExtraTreesGini,0.833453,0.82,0.169352,0.060101,0.470258,0.169352,0.060101,0.470258,1,True,8
8,ExtraTreesEntr,0.832839,0.81,0.203741,0.059756,0.469765,0.203741,0.059756,0.469765,1,True,9
9,LightGBMLarge,0.828949,0.83,0.032964,0.00631,0.351349,0.032964,0.00631,0.351349,1,True,13


We use the leaderboard function to compare the tested model of the predictor. We have RandomForestGini with the best score test (accuracy of the test dataset). If we don't have constraint we can choose RandomForestGini as the production model. If we need a model that gives a prediction in less than 0.1s we can select CatBoost. The final decision on the production model can be done using the leaderboard above.

### Using hyperparameter settings
#### You can find [here](https://auto.gluon.ai/dev/api/autogluon.tabular.models.html) some models and which hyperparameters you can set.

In [10]:
# We need the core lib of Autogluon for hyperparameter
import autogluon.core as ag

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv | Columns = 15 / 15 | Rows = 39073 -> 39073


We are going to use 4 different models Neural Network, Gradient Boost, Random Forest and Cat boost. If you want to use other models or to tune other hyperparameter you can use the [documentation](https://auto.gluon.ai/stable/api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.fit).

In [11]:
nn_options = {  # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10,  # number of training epochs (controls training time of NN models)
    'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),  # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'),  # activation function used in NN (categorical hyperparameter, default = first entry)
    'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1),  # dropout probability (real-valued hyperparameter)
}

gbm_options = {  # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100,  # number of boosting rounds (controls training time of GBM models)
    'num_leaves': ag.space.Int(lower=26, upper=66, default=36),  # number of leaves in trees (integer hyperparameter)
}

rf_options = {
    'n_estimators': 400, # number of estimators
    'max_leaf_nodes' : ag.space.Int(lower=500, upper=15000, default=2000), # range of max leaf number per node.
}

cat_options = {
    'learning_rate' : ag.space.Real(0.0,0.2,default=0.05),
}

hyperparameters = {  # hyperparameters of each model type
                   'GBM': gbm_options,
                   'NN_TORCH': nn_options,  # NOTE: comment this line out if you get errors on Mac OSX
                   'RF': rf_options,
                   'CAT': cat_options,
                  }  # When these keys are missing from hyperparameters dict, no models of that type are trained

time_limit = 2*60  # train various models for ~10 min
num_trials = 5  # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'auto'  # to tune hyperparameters using random search routine with a local scheduler
                          # you can use different option like bayesian


hyperparameter_tune_kwargs = {  # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'num_trials': num_trials,
    'scheduler' : 'local',
    'searcher': search_strategy,
}

predictor = TabularPredictor(label=label, eval_metric='acc').fit(
    train_data, time_limit=time_limit,
    hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)

No path specified. Models will be saved in: "AutogluonModels/ag-20221125_121620/"
Beginning AutoGluon training ... Time limit = 120s
AutoGluon will save models to "AutogluonModels/ag-20221125_121620/"
AutoGluon Version:  0.5.2
Python Version:     3.8.12
Operating System:   Linux
Train Data Rows:    39073
Train Data Columns: 14
Label Column: class
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' <=50K', ' >50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 

  0%|          | 0/5 [00:00<?, ?it/s]

Fitted model: LightGBM/T1 ...
	0.8756	 = Validation score   (accuracy)
	0.5s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM/T2 ...
	0.8764	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM/T3 ...
	0.8756	 = Validation score   (accuracy)
	0.64s	 = Training   runtime
	0.03s	 = Validation runtime
Fitted model: LightGBM/T4 ...
	0.8186	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM/T5 ...
	0.8742	 = Validation score   (accuracy)
	0.58s	 = Training   runtime
	0.02s	 = Validation runtime
Hyperparameter tuning model: RandomForest ... Tuning model for up to 26.94s of the 116.55s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Stopping HPO to satisfy time limit...
Fitted model: RandomForest/T1 ...
	0.8637	 = Validation score   (accuracy)
	4.0s	 = Training   runtime
	1.51s	 = Validation runtime
Fitted model: RandomForest/T2 ...
	0.8621	 = Validation score   (accuracy)
	4.96s	 = Training   runtime
	1.62s	 = Validation runtime
Fitted model: RandomForest/T3 ...
	0.8575	 = Validation score   (accuracy)
	4.99s	 = Training   runtime
	1.84s	 = Validation runtime
Fitted model: RandomForest/T4 ...
	0.8575	 = Validation score   (accuracy)
	4.96s	 = Training   runtime
	1.83s	 = Validation runtime
Hyperparameter tuning model: CatBoost ... Tuning model for up to 26.94s of the 88.59s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Stopping HPO to satisfy time limit...
Fitted model: CatBoost/T1 ...
	0.8766	 = Validation score   (accuracy)
	12.27s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: CatBoost/T2 ...
	0.877	 = Validation score   (accuracy)
	6.25s	 = Training   runtime
	0.01s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch ... Tuning model for up to 26.94s of the 69.96s of remaining time.
NaN or Inf found in input tensor.
Fitted model: NeuralNetTorch/16846_00000 ...
	0.8514	 = Validation score   (accuracy)
	20.19s	 = Training   runtime
	0.07s	 = Validation runtime
Fitted model: NeuralNetTorch/16846_00001 ...
	0.8568	 = Validation score   (accuracy)
	19.73s	 = Training   runtime
	0.15s	 = Validation runtime
Fitted model: NeuralNetTorch/16846_00002 ...
	0.846	 = Validation score   (accuracy)
	19.7s	 = Training   runtime
	0.13s	 = Validation runtime
Fitted model: NeuralNetTorch/16846_00003 ...
	0.8486	 = Validation score   (accuracy)
	19.63s	 = Training   runtime
	0.13s	 =

In [12]:
y_pred = predictor.predict(test_data_nolab)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=False)

Evaluation: accuracy on test data: 0.8751151602006346
Evaluations on test data:
{
    "accuracy": 0.8751151602006346
}


With Hyperparameter settings we improved the test accuracy going from 0.83 to 0.87.

In [13]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM/T2,0.875934,0.8764,0.035081,0.017068,0.529088,0.035081,0.017068,0.529088,1,True,2
1,WeightedEnsemble_L2,0.875115,0.879,1.3538,3.787422,71.307823,0.00575,0.007946,2.433342,2,True,16
2,CatBoost/T1,0.874706,0.8766,0.021959,0.013993,12.268428,0.021959,0.013993,12.268428,1,True,10
3,LightGBM/T1,0.874296,0.8756,0.037819,0.017509,0.503016,0.037819,0.017509,0.503016,1,True,1
4,CatBoost/T2,0.873989,0.877,0.017755,0.01382,6.253347,0.017755,0.01382,6.253347,1,True,11
5,LightGBM/T3,0.873887,0.8756,0.052321,0.02902,0.637372,0.052321,0.02902,0.637372,1,True,3
6,LightGBM/T5,0.872249,0.8742,0.045976,0.018671,0.58248,0.045976,0.018671,0.58248,1,True,5
7,RandomForest/T1,0.864367,0.863704,0.343115,1.510821,4.004969,0.343115,1.510821,4.004969,1,True,6
8,RandomForest/T2,0.860272,0.862061,0.401631,1.623583,4.955801,0.401631,1.623583,4.955801,1,True,7
9,RandomForest/T4,0.856792,0.857453,0.505308,1.833908,4.957541,0.505308,1.833908,4.957541,1,True,9


CatBoost/T5 and LightGBM/T2 are the best models on the test accuracy.

### Using ensembling techniques 

In [14]:
# We define the same set of hyperparameter and we will use ensembling.

nn_options = {  # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10,  # number of training epochs (controls training time of NN models)
    'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),  # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'),  # activation function used in NN (categorical hyperparameter, default = first entry)
    'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1),  # dropout probability (real-valued hyperparameter)
}

gbm_options = {  # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100,  # number of boosting rounds (controls training time of GBM models)
    'num_leaves': ag.space.Int(lower=26, upper=66, default=36),  # number of leaves in trees (integer hyperparameter)
}

rf_options = {
    'n_estimators': 400,
    'max_leaf_nodes' : ag.space.Int(lower=500, upper=15000, default=2000),
}

cat_options = {
    'learning_rate' : ag.space.Real(0.0,0.2,default=0.05),
}

hyperparameters = {  # hyperparameters of each model type
                   'GBM': gbm_options,
                   'NN_TORCH': nn_options,  # NOTE: comment this line out if you get errors on Mac OSX
                   'RF': rf_options,
                   'CAT': cat_options,
                  }  # When these keys are missing from hyperparameters dict, no models of that type are trained

time_limit = 10*60  # train various models for ~10 min
num_trials = 5  # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'auto'  # to tune hyperparameters using random search routine with a local scheduler

hyperparameter_tune_kwargs = {  # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'num_trials': num_trials,
    'scheduler' : 'local',
    'searcher': search_strategy,
}

# To run ensembling you have to pass the arguments num_bag_folds, num_bag_sets and num_stack_levels.
# You can also pass auto_stack=True run ensembling if you don't know what set of
# num_bag_folds, num_bag_sets and num_stack_levels to use.

predictor = TabularPredictor(label=label, eval_metric='acc').fit(
    train_data, time_limit=time_limit, #auto_stack=True,
    num_bag_folds=5, num_bag_sets=2, num_stack_levels=2,
    hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)

No path specified. Models will be saved in: "AutogluonModels/ag-20221125_121748/"
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221125_121748/"
AutoGluon Version:  0.5.2
Python Version:     3.8.12
Operating System:   Linux
Train Data Rows:    39073
Train Data Columns: 14
Label Column: class
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' <=50K', ' >50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 

  0%|          | 0/5 [00:00<?, ?it/s]

Fitted model: LightGBM_BAG_L1/T1 ...
	0.8713	 = Validation score   (accuracy)
	0.5s	 = Training   runtime
	0.03s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T2 ...
	0.8715	 = Validation score   (accuracy)
	0.57s	 = Training   runtime
	0.03s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T3 ...
	0.8694	 = Validation score   (accuracy)
	0.64s	 = Training   runtime
	0.03s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T4 ...
	0.8193	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T5 ...
	0.8681	 = Validation score   (accuracy)
	0.6s	 = Training   runtime
	0.03s	 = Validation runtime
Hyperparameter tuning model: RandomForest_BAG_L1 ... Tuning model for up to 11.99s of the 596.18s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Stopping HPO to satisfy time limit...
Fitted model: RandomForest_BAG_L1/T1 ...
	0.8644	 = Validation score   (accuracy)
	4.24s	 = Training   runtime
	1.54s	 = Validation runtime
Hyperparameter tuning model: CatBoost_BAG_L1 ... Tuning model for up to 11.99s of the 588.39s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Ran out of time, early stopping on iteration 279.
	Stopping HPO to satisfy time limit...
Fitted model: CatBoost_BAG_L1/T1 ...
	0.8704	 = Validation score   (accuracy)
	9.55s	 = Training   runtime
	0.02s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L1 ... Tuning model for up to 11.99s of the 578.72s of remaining time.
2022-11-25 12:18:20,464	INFO stopper.py:363 -- Reached timeout of 9.59327265955925 seconds. Stopping all trials.
Fitted model: NeuralNetTorch_BAG_L1/T1 ...
	0.8395	 = Validation score   (accuracy)
	5.84s	 = Training   runtime
	0.2s	 = Validation runtime
Fitting model: LightGBM_BAG_L1/T1 ... Training model for up to 234.38s of the 567.62s of remaining time.
	Fitting 4 child models (S1F2 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
	0.8733	 = Validation score   (accuracy)
	3.15s	 = Training   runtime
	0.38s	 = Validation runtime
Fitting model: LightGBM_BAG_L1/T2 ... Training model for up to 229.22s of the 562.47s of remaining time.
	Fitti

  0%|          | 0/5 [00:00<?, ?it/s]

Fitted model: LightGBM_BAG_L2/T1 ...
	0.8741	 = Validation score   (accuracy)
	0.59s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T2 ...
	0.8737	 = Validation score   (accuracy)
	0.61s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T3 ...
	0.8742	 = Validation score   (accuracy)
	0.74s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T4 ...
	0.8398	 = Validation score   (accuracy)
	0.62s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T5 ...
	0.8728	 = Validation score   (accuracy)
	0.67s	 = Training   runtime
	0.01s	 = Validation runtime
Hyperparameter tuning model: RandomForest_BAG_L2 ... Tuning model for up to 12.64s of the 417.59s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Stopping HPO to satisfy time limit...
Fitted model: RandomForest_BAG_L2/T1 ...
	0.8739	 = Validation score   (accuracy)
	9.06s	 = Training   runtime
	1.52s	 = Validation runtime
Hyperparameter tuning model: CatBoost_BAG_L2 ... Tuning model for up to 12.64s of the 404.95s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Ran out of time, early stopping on iteration 55.
	Stopping HPO to satisfy time limit...
Fitted model: CatBoost_BAG_L2/T1 ...
	0.8733	 = Validation score   (accuracy)
	3.72s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: CatBoost_BAG_L2/T2 ...
	0.8729	 = Validation score   (accuracy)
	2.92s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: CatBoost_BAG_L2/T3 ...
	0.8747	 = Validation score   (accuracy)
	3.32s	 = Training   runtime
	0.01s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L2 ... Tuning model for up to 12.64s of the 394.76s of remaining time.
NaN or Inf found in input tensor.
2022-11-25 12:21:25,153	INFO stopper.py:363 -- Reached timeout of 10.111261961485866 seconds. Stopping all trials.
Fitted model: NeuralNetTorch_BAG_L2/T1 ...
	0.8717	 = Validation score   (accuracy)
	6.14s	 = Training   runtime
	0.23s	 = Validation runtime
Fitted model: NeuralNetTorch_BAG_L2/T2 ...
	0.8722	 = Validation score   (accuracy)
	6.63s	 =

  0%|          | 0/5 [00:00<?, ?it/s]

Fitted model: LightGBM_BAG_L3/T1 ...
	0.8815	 = Validation score   (accuracy)
	0.62s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM_BAG_L3/T2 ...
	0.8833	 = Validation score   (accuracy)
	0.64s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: LightGBM_BAG_L3/T3 ...
	0.8815	 = Validation score   (accuracy)
	0.8s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: LightGBM_BAG_L3/T4 ...
	0.8464	 = Validation score   (accuracy)
	0.67s	 = Training   runtime
	0.02s	 = Validation runtime
Fitted model: LightGBM_BAG_L3/T5 ...
	0.8815	 = Validation score   (accuracy)
	0.73s	 = Training   runtime
	0.02s	 = Validation runtime
Hyperparameter tuning model: RandomForest_BAG_L3 ... Tuning model for up to 10.35s of the 225.86s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Stopping HPO to satisfy time limit...
Fitted model: RandomForest_BAG_L3/T1 ...
	0.8715	 = Validation score   (accuracy)
	7.38s	 = Training   runtime
	1.05s	 = Validation runtime
Hyperparameter tuning model: CatBoost_BAG_L3 ... Tuning model for up to 10.35s of the 215.93s of remaining time.


  0%|          | 0/5 [00:00<?, ?it/s]

	Ran out of time, early stopping on iteration 222.
	Stopping HPO to satisfy time limit...
Fitted model: CatBoost_BAG_L3/T1 ...
	0.8828	 = Validation score   (accuracy)
	8.22s	 = Training   runtime
	0.02s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L3 ... Tuning model for up to 10.35s of the 207.57s of remaining time.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
Fitted model: NeuralNetTorch_BAG_L3/T1 ...
	0.88	 = Validation score   (accuracy)
	3.97s	 = Training   runtime
	0.13s	 = Validation runtime
Fitted model: NeuralNetTorch_BAG_L3/T2 ...
	0.8801	 = Validation score   (accuracy)
	4.13s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: LightGBM_BAG_L3/T1 ... Training model for up to 197.99s of the 197.98s of remaining time.
	Fitting 4 child models (S1F2 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
	0.8751	 = Validation score   (accuracy)
	3.19s	 = Training   runtime
	0.16s	 = 

In [15]:
y_pred = predictor.predict(test_data_nolab)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=False)

Evaluation: accuracy on test data: 0.8757293479373528
Evaluations on test data:
{
    "accuracy": 0.8757293479373528
}


We improved the accuracy from 0.8751 to 0.8757

In [16]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM_BAG_L2/T5,0.876548,0.873928,2.496034,4.421388,144.474092,0.134856,0.167992,3.682369,2,True,14
1,LightGBM_BAG_L2/T2,0.876446,0.874773,2.45506,4.357116,144.298052,0.093883,0.103721,3.506329,2,True,11
2,NeuralNetTorch_BAG_L2/T2,0.876241,0.874287,3.128216,5.040683,172.095147,0.767038,0.787287,31.303424,2,True,20
3,CatBoost_BAG_L2/T3,0.876139,0.875106,2.411528,4.360543,167.986573,0.050351,0.107147,27.194849,2,True,18
4,LightGBM_BAG_L2/T1,0.876036,0.873928,2.481079,4.421201,144.171853,0.119901,0.167805,3.38013,2,True,10
5,WeightedEnsemble_L3,0.875934,0.875131,2.461216,4.5126,194.881244,0.001792,0.064934,11.882261,3,True,21
6,CatBoost_BAG_L2/T1,0.875729,0.874645,2.407532,4.378627,157.155596,0.046355,0.125231,16.363873,2,True,16
7,CatBoost_BAG_L3/T1,0.875729,0.875259,5.602559,8.980485,310.395371,0.053654,0.118517,21.109342,3,True,28
8,WeightedEnsemble_L4,0.875729,0.875259,5.603869,9.042653,320.114155,0.00131,0.062168,9.718784,4,True,31
9,RandomForest_BAG_L3/T1,0.875422,0.871548,5.767019,9.913685,296.668267,0.218114,1.051717,7.382238,3,True,27


## Extra automation with Longer training time

In [25]:
time_limit = 60*60  # train various models for ~1 h
search_strategy = 'auto'  # to tune hyperparameters using random search routine with a local scheduler

hyperparameter_tune_kwargs = {  # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'num_trials': 20,
    'scheduler' : 'local',
    'searcher': search_strategy,
}

# We have to cast the label from str to int to use the sample weight method
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
train_data[label] = train_data[label].apply(lambda x: 0 if '<=50K' in x else 1)

predictor = TabularPredictor(label=label, eval_metric='acc',
                             sample_weight=label, problem_type='binary'
                            ).fit(train_data, time_limit=time_limit,
                                  auto_stack=True, presets='best_quality',
                                  hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
                                 )

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv | Columns = 15 / 15 | Rows = 39073 -> 39073
No path specified. Models will be saved in: "AutogluonModels/ag-20221128_020212/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=0, num_bag_folds=8, num_bag_sets=20
Values in column 'class' used as sample weights instead of predictive features. Evaluation metrics will ignore sample weights, specify weight_evaluation=True to instead report weighted metrics.
Beginning AutoGluon training ... Time limit = 3600s
AutoGluon will save models to "AutogluonModels/ag-20221128_020212/"
AutoGluon Version:  0.5.2
Python Version:     3.8.12
Operating System:   Linux
Train Data Rows:    39073
Train Data Columns: 14
Label Column: class
Preprocessing data ...
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:            

  0%|          | 0/20 [00:00<?, ?it/s]

[1000]	valid_set's binary_error: 0.132242
[2000]	valid_set's binary_error: 0.130604


	Stopping HPO to satisfy time limit...
Fitted model: LightGBMXT_BAG_L1/T1 ...
	0.8727	 = Validation score   (accuracy)
	3.19s	 = Training   runtime
	0.11s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T2 ...
	0.87	 = Validation score   (accuracy)
	1.81s	 = Training   runtime
	0.11s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T3 ...
	0.8698	 = Validation score   (accuracy)
	1.42s	 = Training   runtime
	0.08s	 = Validation runtime
Fitted model: LightGBMXT_BAG_L1/T4 ...
	0.87	 = Validation score   (accuracy)
	16.42s	 = Training   runtime
	1.11s	 = Validation runtime
Hyperparameter tuning model: LightGBM_BAG_L1 ... Tuning model for up to 31.15s of the 3569.53s of remaining time.


  0%|          | 0/20 [00:00<?, ?it/s]

[1000]	valid_set's binary_error: 0.125281


	Ran out of time, early stopping on iteration 497. Best iteration is:
	[495]	valid_set's binary_error: 0.125896
	Stopping HPO to satisfy time limit...
Fitted model: LightGBM_BAG_L1/T1 ...
	0.8747	 = Validation score   (accuracy)
	1.63s	 = Training   runtime
	0.09s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T2 ...
	0.8745	 = Validation score   (accuracy)
	2.05s	 = Training   runtime
	0.06s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T3 ...
	0.8755	 = Validation score   (accuracy)
	2.48s	 = Training   runtime
	0.07s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T4 ...
	0.8747	 = Validation score   (accuracy)
	7.0s	 = Training   runtime
	0.47s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T5 ...
	0.8759	 = Validation score   (accuracy)
	1.16s	 = Training   runtime
	0.03s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T6 ...
	0.8743	 = Validation score   (accuracy)
	2.81s	 = Training   runtime
	0.11s	 = Validation runtime
Fitted model: LightGBM_BAG_L1/T7 .

  0%|          | 0/20 [00:00<?, ?it/s]

	Ran out of time, early stopping on iteration 428.
	Stopping HPO to satisfy time limit...
Fitted model: CatBoost_BAG_L1/T1 ...
	0.8755	 = Validation score   (accuracy)
	24.89s	 = Training   runtime
	0.02s	 = Validation runtime
Hyperparameter tuning model: ExtraTreesGini_BAG_L1 ... Tuning model for up to 31.15s of the 3495.53s of remaining time.
	No hyperparameter search space specified for ExtraTreesGini. Skipping HPO. Will train one model based on the provided hyperparameters.
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1380, in _train_single_full
    hpo_models, hpo_results = model.hyperparameter_tune(
  File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/home/ec2-user/anaconda3

  0%|          | 0/20 [00:00<?, ?it/s]

	Stopping HPO to satisfy time limit...
Fitted model: XGBoost_BAG_L1/T1 ...
	0.8776	 = Validation score   (accuracy)
	2.96s	 = Training   runtime
	0.07s	 = Validation runtime
Fitted model: XGBoost_BAG_L1/T2 ...
	0.8716	 = Validation score   (accuracy)
	2.96s	 = Training   runtime
	0.06s	 = Validation runtime
Fitted model: XGBoost_BAG_L1/T3 ...
	0.8712	 = Validation score   (accuracy)
	3.58s	 = Training   runtime
	0.07s	 = Validation runtime
Fitted model: XGBoost_BAG_L1/T4 ...
	0.8712	 = Validation score   (accuracy)
	2.32s	 = Training   runtime
	0.03s	 = Validation runtime
Fitted model: XGBoost_BAG_L1/T5 ...
	0.8716	 = Validation score   (accuracy)
	3.99s	 = Training   runtime
	0.07s	 = Validation runtime
Fitted model: XGBoost_BAG_L1/T6 ...
	0.8684	 = Validation score   (accuracy)
	2.14s	 = Training   runtime
	0.05s	 = Validation runtime
Fitted model: XGBoost_BAG_L1/T7 ...
	0.8678	 = Validation score   (accuracy)
	1.38s	 = Training   runtime
	0.04s	 = Validation runtime
Fitted model: XG

In [26]:
test_data2 = test_data.copy()
test_data2[label] = test_data2[label].apply(lambda x: 0 if '<=50K' in x else 1)
perf = predictor.evaluate(test_data2, auxiliary_metrics=False)

Evaluation: accuracy on test data: 0.877162452656362
Evaluations on test data:
{
    "accuracy": 0.877162452656362
}


We improved the accuracy from 0.8757 to 0.8760

In [27]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,CatBoost_BAG_L1/T1,0.0,0.8738,0.5632,0.366137,172.614355,0.5632,0.366137,172.614355,1,True,14
1,LightGBM_BAG_L1/T7,0.0,0.873852,0.719223,0.568975,11.952137,0.719223,0.568975,11.952137,1,True,11
2,LightGBM_BAG_L1/T8,0.0,0.874159,0.809996,0.75538,11.260965,0.809996,0.75538,11.260965,1,True,12
3,LightGBM_BAG_L1/T2,0.0,0.87485,0.848019,0.74908,12.478494,0.848019,0.74908,12.478494,1,True,6
4,LightGBM_BAG_L1/T5,0.0,0.874568,0.921795,0.597973,11.775184,0.921795,0.597973,11.775184,1,True,9
5,LightGBM_BAG_L1/T3,0.0,0.873979,0.995435,0.899142,13.289551,0.995435,0.899142,13.289551,1,True,7
6,LightGBM_BAG_L1/T1,0.0,0.874543,1.037089,0.667051,11.771457,1.037089,0.667051,11.771457,1,True,5
7,XGBoost_BAG_L1/T1,0.0,0.875797,1.112917,0.89137,19.428157,1.112917,0.89137,19.428157,1,True,19
8,XGBoost_BAG_L1/T5,0.0,0.872111,1.37293,1.289406,30.363995,1.37293,1.289406,30.363995,1,True,23
9,NeuralNetFastAI_BAG_L1/T4,0.0,0.861644,1.386764,0.745765,98.869826,1.386764,0.745765,98.869826,1,True,18
