<p style="padding: 10px; border: 1px solid black;">
<img src=".././images/MLU-NEW-logo.png" alt="drawing" width="400"/> <br/>

# MLU Day One Machine Learning - Walkthrough & Advanced AutoGluon Features

# Part I - Walkthrough & Discussions
Now that you have finished your hands-on activity, let's walk through the code you have used and discuss it. <br/>

In [1]:
# Importing the newly installed AutoGluon code library
from autogluon.tabular import TabularPredictor, TabularDataset

train = TabularDataset(".././datasets/training.csv")
mlu_test_data = TabularDataset(".././datasets/mlu-leaderboard-test.csv")


predictor = TabularPredictor(label="Price").fit(train_data=train, time_limit=60)

predictions = predictor.predict(mlu_test_data)

# Creating a new dataframe for the submission
submission = mlu_test_data[["ID"]].copy(deep=True)

# Creating label column from price prediction list
submission["Price"] = predictions

# Saving our csv file for Leaderboard submission
# index=False prevents printing the row IDs as separate values
submission.to_csv(
    ".././datasets/predictions/Solution-Demo.csv",
    index=False,
)

No path specified. Models will be saved in: "AutogluonModels/ag-20210814_182843/"
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "AutogluonModels/ag-20210814_182843/"
AutoGluon Version:  0.2.0
Train Data Rows:    5051
Train Data Columns: 9
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == float and many unique label-values observed).
	Label info (max, min, mean, stddev): (4.149249912590282, 1.414973347970818, 2.60147, 0.33003)
	If 'regression' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    5203.6 MB
	Train Data (Original)  Memory Usage: 10.91 MB (0.2% of available memory)
	Inferring data type of each feature based on column values. Set 

---

# Part II - Advanced AutoGluon Features

## ML Problem Description
Predict the occupation of individuals using census data. 
> This is a multiclass classification task (15 distinct classes). <br>

For the advanced feature demonstration we want to use a new dataset: Census data. In this particular dataset, each row corresponds to an individual person, and the columns contain various demographic characteristics collected for the census.

We’ll predict the occupation of an individual - this is a multiclass classification problem. Start by importing AutoGluon’s `TabularPredictor` and `TabularDataset`, and load the data from a S3 bucket.

In [2]:
!pip install -q bokeh==2.0.1

### Loading the data

In [3]:
from autogluon.tabular import TabularDataset, TabularPredictor
from sklearn.model_selection import train_test_split
import numpy as np

# Load in the dataset
train_data = TabularDataset("https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv")
# Subsample a subset of data for faster demo, try setting this to much larger values
subsample_size = 5000
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv | Columns = 15 / 15 | Rows = 39073 -> 39073


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
6118,51,Private,39264,Some-college,10,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,>50K
23204,58,Private,51662,10th,6,Married-civ-spouse,Other-service,Wife,White,Female,0,0,8,United-States,<=50K
29590,40,Private,326310,Some-college,10,Married-civ-spouse,Craft-repair,Husband,White,Male,0,0,44,United-States,<=50K
18116,37,Private,222450,HS-grad,9,Never-married,Sales,Not-in-family,White,Male,0,2339,40,El-Salvador,<=50K
33964,62,Private,109190,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,15024,0,40,United-States,>50K


### Setting the target


In [4]:
# Assign column that contains the label to a variable that can be re-used later
label = "occupation"

print("Summary of occupation column: \n")
train_data["occupation"].describe()

Summary of occupation column: 



count              5000
unique               15
top        Craft-repair
freq                672
Name: occupation, dtype: object

### Train, validation, test split

In [5]:
# Create a train & validation split
train_data, val_data = train_test_split(
    train_data, test_size=0.1, shuffle=True, random_state=23
)

# Let's load the test data
test_data = TabularDataset("https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv")

# We need to split the test dataset into a features and a label subset
y_test = test_data[label]
test_data_nolabel = test_data.drop(columns=[label])  # delete label column

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769


### Specifying performance metric

In [6]:
# We specify eval-metric just for demo (unnecessary as it's the default)
metric = "accuracy"

The full list of parameters can be found here:

`'accuracy', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_weighted', 'roc_auc', 'average_precision', 'precision', 'precision_macro', 'precision_micro', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_weighted', 'log_loss', 'pac_score'`

### Specifying hyperparameters and tuning them

In [7]:
import autogluon.core as ag

# Set Neural Net options
# Specifies non-default hyperparameter values for neural network models
nn_options = {
    # number of training epochs (controls training time of NN models)
    "num_epochs": 10,
    # learning rate used in training (real-valued hyperparameter searched on log-scale)
    "learning_rate": ag.space.Real(1e-4, 1e-2, default=5e-4, log=True),
    # activation function used in NN (categorical hyperparameter, default = first entry)
    "activation": ag.space.Categorical("relu", "softrelu", "tanh"),
    # each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use
    "layers": ag.space.Categorical([100], [1000], [200, 100], [300, 200, 100]),
    # dropout probability (real-valued hyperparameter)
    "dropout_prob": ag.space.Real(0.0, 0.5, default=0.1),
}

# Set GBM options
# Specifies non-default hyperparameter values for lightGBM gradient boosted trees
gbm_options = {
    # number of boosting rounds (controls training time of GBM models)
    "num_boost_round": 100,
    # number of leaves in trees (integer hyperparameter)
    "num_leaves": ag.space.Int(lower=26, upper=66, default=36),
}

# Add both NN and GBM options into a hyperparameter dictionary
# hyperparameters of each model type
# When these keys are missing from the hyperparameters dict, no models of that type are trained
hyperparameters = {
    "GBM": gbm_options,
    "NN": nn_options,
}

# Train various models for ~2 min
time_limit = 2 * 60
# Number of trials for hyperparameters
num_trials = 5

# To tune hyperparameters using Bayesian optimization to find best combination of params
search_strategy = "auto"

# HPO is not performed unless hyperparameter_tune_kwargs is specified
hyperparameter_tune_kwargs = {
    "num_trials": num_trials,
    "scheduler": "local",
    "searcher": search_strategy,
}

### Specifying settings for TabularPredictor

In [8]:
# Train various models for ~2 min
time_limit = 2 * 60
# Number of trials for hyperparameters
num_trials = 5

### Train Model using TabularPredictor

In [9]:
predictor = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data,
    tuning_data=val_data,
    time_limit=time_limit,
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)

No path specified. Models will be saved in: "AutogluonModels/ag-20210814_183542/"
Beginning AutoGluon training ... Time limit = 120s
AutoGluon will save models to "AutogluonModels/ag-20210814_183542/"
AutoGluon Version:  0.2.0
Train Data Rows:    4500
Train Data Columns: 14
Tuning Data Rows:    500
Tuning Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	First 10 (of 15) unique label values:  [' Sales', ' Adm-clerical', ' ?', ' Prof-specialty', ' Other-service', ' Machine-op-inspct', ' Craft-repair', ' Exec-managerial', ' Handlers-cleaners', ' Transport-moving']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9995555555555555
Train Data Class Count: 14
Using Fe

  0%|          | 0/5 [00:00<?, ?it/s]

	Ran out of time, early stopping on iteration 16. Best iteration is:
	[6]	train_set's multi_error: 0.5249	valid_set's multi_error: 0.599198
	Time limit exceeded
Fitted model: LightGBM/T0 ...
	0.4008	 = Validation accuracy score
	53.93s	 = Training runtime
	0.03s	 = Validation runtime
Hyperparameter tuning model: NeuralNetMXNet ...


  0%|          | 0/5 [00:00<?, ?it/s]

Fitted model: NeuralNetMXNet/T0 ...
	0.4028	 = Validation accuracy score
	7.35s	 = Training runtime
	0.05s	 = Validation runtime
Fitted model: NeuralNetMXNet/T1 ...
	0.3888	 = Validation accuracy score
	6.6s	 = Training runtime
	0.05s	 = Validation runtime
Fitted model: NeuralNetMXNet/T2 ...
	0.4008	 = Validation accuracy score
	9.76s	 = Training runtime
	0.06s	 = Validation runtime
Fitted model: NeuralNetMXNet/T3 ...
	0.4008	 = Validation accuracy score
	7.2s	 = Training runtime
	0.05s	 = Validation runtime
Fitted model: NeuralNetMXNet/T4 ...
	0.2906	 = Validation accuracy score
	7.19s	 = Training runtime
	0.05s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 119.83s of the 21.76s of remaining time.
	0.4128	 = Validation accuracy score
	0.34s	 = Training runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 98.6s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20210814_1835

### Predict on the test data

In [10]:
y_pred = predictor.predict(test_data_nolabel)
print(f"Predictions:  {list(y_pred)[:5]}")
perf = predictor.evaluate(test_data, auxiliary_metrics=False)

Predictions:  [' Other-service', ' Craft-repair', ' Craft-repair', ' Other-service', ' Other-service']


Evaluation: accuracy on test data: 0.37015047599549594
Evaluations on test data:
{
    "accuracy": 0.37015047599549594
}


Use the following to view a summary of what happened during the fit. Now this command will show details of the hyperparameter-tuning process for each type of model:

In [11]:
predictor.fit_summary()

*** Summary of fit() ***
Estimated performance of each model:
                 model  score_val  pred_time_val   fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0  WeightedEnsemble_L2   0.412826       0.141045  71.390454                0.000708           0.340977            2       True          7
1    NeuralNetMXNet/T0   0.402806       0.053254   7.354919                0.053254           7.354919            1       True          2
2          LightGBM/T0   0.400802       0.027492  53.932863                0.027492          53.932863            1       True          1
3    NeuralNetMXNet/T3   0.400802       0.053175   7.196842                0.053175           7.196842            1       True          5
4    NeuralNetMXNet/T2   0.400802       0.059591   9.761696                0.059591           9.761696            1       True          4
5    NeuralNetMXNet/T1   0.388778       0.052324   6.600852                0.052324           6.600852        

{'model_types': {'LightGBM/T0': 'LGBModel',
  'NeuralNetMXNet/T0': 'TabularNeuralNetModel',
  'NeuralNetMXNet/T1': 'TabularNeuralNetModel',
  'NeuralNetMXNet/T2': 'TabularNeuralNetModel',
  'NeuralNetMXNet/T3': 'TabularNeuralNetModel',
  'NeuralNetMXNet/T4': 'TabularNeuralNetModel',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel'},
 'model_performance': {'LightGBM/T0': 0.40080160320641284,
  'NeuralNetMXNet/T0': 0.4028056112224449,
  'NeuralNetMXNet/T1': 0.38877755511022044,
  'NeuralNetMXNet/T2': 0.40080160320641284,
  'NeuralNetMXNet/T3': 0.40080160320641284,
  'NeuralNetMXNet/T4': 0.2905811623246493,
  'WeightedEnsemble_L2': 0.41282565130260523},
 'model_best': 'WeightedEnsemble_L2',
 'model_paths': {'LightGBM/T0': 'AutogluonModels/ag-20210814_183542/models/LightGBM/T0/',
  'NeuralNetMXNet/T0': 'AutogluonModels/ag-20210814_183542/models/NeuralNetMXNet/T0/',
  'NeuralNetMXNet/T1': 'AutogluonModels/ag-20210814_183542/models/NeuralNetMXNet/T1/',
  'NeuralNetMXNet/T2': 'AutogluonModels

In the above example, the predictive performance may be poor because we are using few training datapoints and small ranges for hyperparameters to ensure quick runtimes. You can call `fit()` multiple times while modifying these settings to better understand how these choices affect performance outcomes. For example: you can increase `subsample_size` to train using a larger dataset, increase the `num_epochs` and `num_boost_round` hyperparameters, and increase the `time_limit` (which you should do for all code in these tutorials). To see more detailed output during the execution of `fit()`, you can also pass in the argument: `verbosity = 3`.

### Model ensembling with stacking/bagging
Beyond hyperparameter-tuning with a correctly-specified evaluation metric, thera re two other methods to boost predictive performance:
- bagging and 
- stack-ensembling

You’ll often see performance improve if you specify `num_bag_folds = 5-10`, `num_stack_levels = 1-3` in the call to `fit()`. Beware that doing this will increase training times and memory/disk usage.



In [12]:
predictor = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data,
    num_bag_folds=5,
    num_bag_sets=1,
    num_stack_levels=1,
    # last  argument is just for quick demo here, omit it in real applications
    hyperparameters={
        "NN": {"num_epochs": 2},
        "GBM": {"num_boost_round": 20},
    },
)

No path specified. Models will be saved in: "AutogluonModels/ag-20210814_183733/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20210814_183733/"
AutoGluon Version:  0.2.0
Train Data Rows:    4500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	First 10 (of 15) unique label values:  [' Sales', ' Adm-clerical', ' ?', ' Prof-specialty', ' Other-service', ' Machine-op-inspct', ' Craft-repair', ' Exec-managerial', ' Handlers-cleaners', ' Transport-moving']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9995555555555555
Train Data Class Count: 14
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineF

You should not provide `tuning_data` when stacking/bagging, and instead provide all your available data as train_data (which AutoGluon will split in more intelligent ways). Parameter `num_bag_sets` controls how many times the K-fold bagging process is repeated to further reduce variance (increasing this may further boost accuracy but will substantially increase training times, inference latency, and memory/disk usage). Rather than manually searching for good bagging/stacking values yourself, AutoGluon will automatically select good values for you if you specify `auto_stack` instead:

In [13]:
# Folder where to store trained models
save_path = "agModels-predictOccupation"

predictor = TabularPredictor(label=label, eval_metric=metric, path=save_path).fit(
    train_data,
    auto_stack=True,
    time_limit=30,
    # Last 2 arguments are for quick demo, omit them in real applications
    hyperparameters={
        "NN": {"num_epochs": 2},
        "GBM": {"num_boost_round": 20},
    },
)

Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "agModels-predictOccupation/"
AutoGluon Version:  0.2.0
Train Data Rows:    4500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	First 10 (of 15) unique label values:  [' Sales', ' Adm-clerical', ' ?', ' Prof-specialty', ' Other-service', ' Machine-op-inspct', ' Craft-repair', ' Exec-managerial', ' Handlers-cleaners', ' Transport-moving']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9995555555555555
Train Data Class Count: 14
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    14080.86 MB
	Tra

Often stacking/bagging will produce superior accuracy than hyperparameter-tuning, but you may try combining both techniques (note: specifying `presets='best_quality'` in `fit()` simply sets `auto_stack=True`).

### Prediction options (inference)

Even if you’ve started a new Python session since last calling `fit()`, you can still load a previously trained predictor from disk:

In [14]:
# `predictor.path` is another way to get the relative path needed to later load predictor.
predictor = TabularPredictor.load(save_path)

Above `save_path` is the same folder previously passed to `TabularPredictor`, in which all the trained models have been saved. You can train easily models on one machine and deploy them on another. Simply copy the `save_path` folder to the new machine and specify its new path in `TabularPredictor.load()`.

We can make a prediction on an individual example rather than on a full dataset:

In [15]:
# Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame
datapoint = test_data_nolabel.iloc[[0]]

predictor.predict(datapoint)

0     Other-service
Name: occupation, dtype: object

To output predicted class probabilities instead of predicted classes, you can use:



In [16]:
# Returns a DataFrame that shows which probability corresponds to which class
predictor.predict_proba(datapoint)

Unnamed: 0,?,Adm-clerical,Armed-Forces,Craft-repair,Exec-managerial,Farming-fishing,Handlers-cleaners,Machine-op-inspct,Other-service,Priv-house-serv,Prof-specialty,Protective-serv,Sales,Tech-support,Transport-moving
0,0.038169,0.106904,0.0,0.090346,0.089637,0.021648,0.052957,0.054275,0.278535,0.004144,0.083228,0.015408,0.111708,0.019857,0.033182


By default, `predict()` and `predict_proba()` will utilize the model that AutoGluon thinks is most accurate, which is usually an ensemble of many individual models. Here’s how to see which model this corresponds to:

In [17]:
predictor.get_model_best()

'LightGBM_BAG_L1'

We can instead specify a particular model to use for predictions (e.g. to reduce inference latency). Note that a ‘model’ in AutoGluon may refer to for example a single Neural Network, a bagged ensemble of many Neural Network copies trained on different training/validation splits, a weighted ensemble that aggregates the predictions of many other models, or a stacked model that operates on predictions output by other models. This is akin to viewing a RandomForest as one ‘model’ when it is in fact an ensemble of many decision trees.

Before deciding which model to use, let’s evaluate all of the models AutoGluon has previously trained on our test data:

### AutoGluon leaderboard function options

In [18]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM_BAG_L1,0.369843,0.382837,1.43324,0.243411,18.828165,1.43324,0.243411,18.828165,1,True,1
1,WeightedEnsemble_L2,0.369843,0.382837,1.437013,0.244729,18.831213,0.003772,0.001318,0.003048,2,True,2


The leaderboard shows each model’s predictive performance on the test data (`score_test`) and validation data (`score_val`), as well as the time required to: produce predictions for the test data (`pred_time_val`), produce predictions on the validation data (`pred_time_val`), and train only this model (`fit_time`). Below, we show that a leaderboard can be produced without new data (just uses the data previously reserved for validation inside `fit`) and can display extra information about each model:

In [19]:
predictor.leaderboard(extra_info=True, silent=True)

Unnamed: 0,model,score_val,pred_time_val,fit_time,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order,num_features,...,child_model_type,hyperparameters,hyperparameters_fit,ag_args_fit,features,child_hyperparameters,child_hyperparameters_fit,child_ag_args_fit,ancestors,descendants
0,LightGBM_BAG_L1,0.382837,0.243411,18.828165,0.243411,18.828165,1,True,1,14,...,LGBModel,"{'use_orig_features': True, 'max_base_models':...",{},"{'max_memory_usage_ratio': 1.0, 'max_time_limi...","[education-num, age, fnlwgt, hours-per-week, e...","{'num_boost_round': 20, 'num_threads': -1, 'le...",{'num_boost_round': 11},"{'max_memory_usage_ratio': 1.0, 'max_time_limi...",[],[WeightedEnsemble_L2]
1,WeightedEnsemble_L2,0.382837,0.244729,18.831213,0.001318,0.003048,2,True,2,14,...,GreedyWeightedEnsembleModel,"{'use_orig_features': False, 'max_base_models'...",{},"{'max_memory_usage_ratio': 1.0, 'max_time_limi...","[LightGBM_BAG_L1_9, LightGBM_BAG_L1_11, LightG...",{'ensemble_size': 100},{'ensemble_size': 1},"{'max_memory_usage_ratio': 1.0, 'max_time_limi...",[LightGBM_BAG_L1],[]


The expanded leaderboard shows properties like how many features are used by each model (`num_features`), which other models are ancestors whose predictions are required inputs for each model (`ancestors`), and how much memory each model and all its ancestors would occupy if simultaneously persisted (`memory_size_w_ancestors`). See AutoGluon's leaderboard documentation for full details.

To show scores for other metrics, you can specify the extra_metrics argument when passing in `test_data`:

In [20]:
predictor.leaderboard(
    test_data, extra_metrics=["accuracy", "balanced_accuracy", "log_loss"], silent=True
)

Unnamed: 0,model,score_test,accuracy,balanced_accuracy,log_loss,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM_BAG_L1,0.369843,0.369843,0.255293,-7.175304,0.382837,1.166313,0.243411,18.828165,1.166313,0.243411,18.828165,1,True,1
1,WeightedEnsemble_L2,0.369843,0.369843,0.255293,-7.175304,0.382837,1.170238,0.244729,18.831213,0.003925,0.001318,0.003048,2,True,2


Notice that `log_loss` scores are negative. This is because metrics in AutoGluon are always shown in `higher_is_better` form. This means that metrics such as `log_loss` and `root_mean_squared_error` will have their signs __FLIPPED__, and values will be negative. This is necessary to avoid the user needing to know the metric to understand if higher is better when looking at leaderboard.

One additional caveat: It is possible that `log_loss` values can be `-inf` when computed via `extra_metrics`. This is because the models were not optimized with `log_loss` in mind during training and may have prediction probabilities giving a class 0 (particularly common with K Nearest Neighbors models). Because `log_loss` gives infinite error when the correct class was given 0 probability, this results in a score of `-inf`. It is therefore recommended that `log_loss` not be used as a secondary metric to determine model quality. Either use `log_loss` as the `eval_metric` or avoid it altogether.

### Selecting individual models
Here’s how to specify a particular model to use for prediction instead of AutoGluon’s default model-choice:

In [21]:
# index of model to use
i = 0
model_to_use = predictor.get_model_names()[i]
model_pred = predictor.predict(datapoint, model=model_to_use)
print(f"Prediction from {model_to_use} model: {model_pred.iloc[0]}")

Prediction from LightGBM_BAG_L1 model:  Other-service


We can easily access information about the trained predictor or a particular model:

In [22]:
all_models = predictor.get_model_names()
model_to_use = all_models[i]
specific_model = predictor._trainer.load_model(model_to_use)

# Objects defined below are dicts with information (not printed here as they are quite large):
model_info = specific_model.get_info()
predictor_information = predictor.info()

The predictor also remembers which metric predictions should be evaluated with, which can be done with ground truth labels as follows:

In [23]:
y_pred_proba = predictor.predict_proba(test_data_nolabel)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred_proba)

Evaluation: accuracy on test data: 0.36984338212713685
Evaluations on test data:
{
    "accuracy": 0.36984338212713685,
    "balanced_accuracy": 0.2552934550252014,
    "mcc": 0.29567504858557747
}


Since the label columns remains in the `test_data` DataFrame, we can instead use the shorthand:

In [24]:
perf = predictor.evaluate(test_data)

Evaluation: accuracy on test data: 0.36984338212713685
Evaluations on test data:
{
    "accuracy": 0.36984338212713685,
    "balanced_accuracy": 0.2552934550252014,
    "mcc": 0.29567504858557747
}


___
## Interpretability: Feature importance
To better understand our trained predictor, we can estimate the overall importance of each feature:

In [25]:
predictor.feature_importance(test_data)

Computing feature importance via permutation shuffling for 14 features using 1000 rows with 3 shuffle sets...
	24.92s	= Expected runtime (8.31s per shuffle set)
	6.35s	= Actual runtime (Completed 3 of 3 shuffle sets)


Unnamed: 0,importance,stddev,p_value,n,p99_high,p99_low
workclass,0.081333,0.010066,0.002534,3,0.139015,0.023651
education-num,0.081,0.022517,0.012402,3,0.210023,-0.048023
sex,0.049333,0.016166,0.016989,3,0.141965,-0.043299
hours-per-week,0.024333,0.00611,0.010188,3,0.059345,-0.010678
age,0.019667,0.002082,0.001857,3,0.031595,0.007738
education,0.004667,0.003055,0.059041,3,0.022172,-0.012839
capital-gain,0.001333,0.003055,0.264298,3,0.018839,-0.016172
race,0.001333,0.000577,0.028595,3,0.004642,-0.001975
class,0.001,0.001732,0.211325,3,0.010925,-0.008925
native-country,0.0,0.0,0.5,3,0.0,0.0


Computed via permutation-shuffling, these feature importance scores quantify the drop in predictive performance (of the already trained predictor) when one column’s values are randomly shuffled across rows. The top features in this list contribute most to AutoGluon’s accuracy (for predicting when/if a patient will be re-admitted to the hospital). Features with non-positive importance score hardly contribute to the predictor’s accuracy, or may even be actively harmful to include in the data (consider removing these features from your data and calling `fit` again). These scores facilitate interpretability of the predictor’s global behavior (which features it relies on for all predictions) rather than local explanations that only rationalize one particular prediction.


___
## Inference Speed: Model distillation

While computationally-favorable, single individual models will usually have lower accuracy than weighted/stacked/bagged ensembles. Model Distillation offers one way to retain the computational benefits of a single model, while enjoying some of the accuracy-boost that comes with ensembling. The idea is to train the individual model (which we can call the student) to mimic the predictions of the full stack ensemble (the teacher). Like `refit_full()`, the `distill()` function will produce additional models we can opt to use for prediction.

### Training student models

In [26]:
# Specify much longer time limit in real applications
student_models = predictor.distill(time_limit=30)
student_models

Distilling with teacher='WeightedEnsemble_L2', teacher_preds=soft, augment_method=spunge ...
SPUNGE: Augmenting training data with 19990 synthetic samples for distillation...
Distilling with each of these student models: ['LightGBM_DSTL', 'NeuralNetMXNet_DSTL', 'RandomForestMSE_DSTL', 'CatBoost_DSTL']
Fitting model: LightGBM_DSTL ... Training model for up to 30.0s of the 30.0s of remaining time.
	Ran out of time, early stopping on iteration 161. Best iteration is:
	[161]	train_set's soft_log_loss: -2.16912	valid_set's soft_log_loss: -1.8642
	Note: model has different eval_metric than default.
	-1.8642	 = Validation soft_log_loss score
	30.57s	 = Training runtime
	0.09s	 = Validation runtime
Distilling with each of these student models: ['WeightedEnsemble_L2_DSTL']
Fitting model: WeightedEnsemble_L2_DSTL ... Training model for up to 30.0s of the -1.21s of remaining time.
	Note: model has different eval_metric than default.
	-1.8642	 = Validation soft_log_loss score
	0.0s	 = Training run

['LightGBM_DSTL', 'WeightedEnsemble_L2_DSTL']

In [27]:
preds_student = predictor.predict(test_data_nolabel, model=student_models[0])
print(f"predictions from {student_models[0]}: {list(preds_student)[:5]}")

predictions from LightGBM_DSTL: [' Other-service', ' Farming-fishing', ' Exec-managerial', ' Other-service', ' Other-service']


In [28]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM_DSTL,0.371891,0.4,1.227564,0.087048,30.5737,1.227564,0.087048,30.5737,1,True,3
1,WeightedEnsemble_L2_DSTL,0.371891,0.4,1.229951,0.089478,30.578162,0.002387,0.00243,0.004462,2,True,4
2,LightGBM_BAG_L1,0.369843,0.382837,1.079025,0.243411,18.828165,1.079025,0.243411,18.828165,1,True,1
3,WeightedEnsemble_L2,0.369843,0.382837,1.084935,0.244729,18.831213,0.005911,0.001318,0.003048,2,True,2


### Presets

If you know inference latency or memory will be an issue, then you can adjust the training process accordingly to ensure `fit()` does not produce unwieldy models.

One option is to specify more lightweight presets:

In [29]:
presets = ["good_quality_faster_inference_only_refit", "optimize_for_deployment"]

predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data, presets=presets, time_limit=30
)

No path specified. Models will be saved in: "AutogluonModels/ag-20210814_184406/"
Presets specified: ['good_quality_faster_inference_only_refit', 'optimize_for_deployment']
Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "AutogluonModels/ag-20210814_184406/"
AutoGluon Version:  0.2.0
Train Data Rows:    4500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	First 10 (of 15) unique label values:  [' Sales', ' Adm-clerical', ' ?', ' Prof-specialty', ' Other-service', ' Machine-op-inspct', ' Craft-repair', ' Exec-managerial', ' Handlers-cleaners', ' Transport-moving']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.99955555555

### Lightweight hyperparameters
Another option is to specify more lightweight hyperparameters:

In [30]:
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data, hyperparameters="very_light", time_limit=30
)

No path specified. Models will be saved in: "AutogluonModels/ag-20210814_184441/"
Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "AutogluonModels/ag-20210814_184441/"
AutoGluon Version:  0.2.0
Train Data Rows:    4500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	First 10 (of 15) unique label values:  [' Sales', ' Adm-clerical', ' ?', ' Prof-specialty', ' Other-service', ' Machine-op-inspct', ' Craft-repair', ' Exec-managerial', ' Handlers-cleaners', ' Transport-moving']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9995555555555555
Train Data Class Count: 14
Using Feature Generators to preprocess the data ...
Fittin

Here you can set hyperparameters to either `'light'`, `'very_light'`, or `'toy'` to obtain progressively smaller (but less accurate) models and predictors. Advanced users may instead try manually specifying particular models’ hyperparameters in order to make them faster/smaller.

### Excluding models

Finally, you may also exclude specific unwieldy models from being trained at all. Below we exclude models that tend to be slower (K Nearest Neighbors, Neural Network, models with custom larger-than-default hyperparameters):

In [31]:
excluded_model_types = ["KNN", "NN", "custom"]
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data, excluded_model_types=excluded_model_types, time_limit=30
)

No path specified. Models will be saved in: "AutogluonModels/ag-20210814_184514/"
Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "AutogluonModels/ag-20210814_184514/"
AutoGluon Version:  0.2.0
Train Data Rows:    4500
Train Data Columns: 14
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	First 10 (of 15) unique label values:  [' Sales', ' Adm-clerical', ' ?', ' Prof-specialty', ' Other-service', ' Machine-op-inspct', ' Craft-repair', ' Exec-managerial', ' Handlers-cleaners', ' Transport-moving']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9995555555555555
Train Data Class Count: 14
Using Feature Generators to preprocess the data ...
Fittin

<p style="padding: 10px; border: 1px solid black;">
<img src=".././images/MLU-NEW-logo.png" alt="drawing" width="400"/> <br/>

# Thank you!