# WS 12 AutoML with AutoGluon Hands on Module

We start by pip installing the `utogluon` and `ucimlrepo` packages

In [1]:
!pip install autogluon
!pip install ucimlrepo

Collecting autogluon
  Downloading autogluon-1.5.0-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.core==1.5.0 (from autogluon.core[all]==1.5.0->autogluon)
  Downloading autogluon_core-1.5.0-py3-none-any.whl.metadata (13 kB)
Collecting autogluon.features==1.5.0 (from autogluon)
  Downloading autogluon_features-1.5.0-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.tabular==1.5.0 (from autogluon.tabular[all]==1.5.0->autogluon)
  Downloading autogluon_tabular-1.5.0-py3-none-any.whl.metadata (16 kB)
Collecting autogluon.multimodal==1.5.0 (from autogluon)
  Downloading autogluon_multimodal-1.5.0-py3-none-any.whl.metadata (13 kB)
Collecting autogluon.timeseries==1.5.0 (from autogluon.timeseries[all]==1.5.0->autogluon)
  Downloading autogluon_timeseries-1.5.0-py3-none-any.whl.metadata (13 kB)
Collecting boto3<2,>=1.10 (from autogluon.core==1.5.0->autogluon.core[all]==1.5.0->autogluon)
  Downloading boto3-1.42.46-py3-none-any.whl.metadata (6.8 kB)
Collecting autogluon.common==1.5

Now we import pacakges and load in heart disease data from UCI Machine Learning Repository and the Pima Indian Diabetes Dataset hosted on the github repo

In [2]:

import pandas as pd
from ucimlrepo import fetch_ucirepo

In [None]:
# load in the heart disease dataset from UCI
heart_disease = fetch_ucirepo(id=45)

# data (as pandas dataframes)
X = heart_disease.data.features
y = heart_disease.data.targets



In [9]:

# variable information
print(heart_disease.variables)

        name     role         type demographic  \
0        age  Feature      Integer         Age   
1        sex  Feature  Categorical         Sex   
2         cp  Feature  Categorical        None   
3   trestbps  Feature      Integer        None   
4       chol  Feature      Integer        None   
5        fbs  Feature  Categorical        None   
6    restecg  Feature  Categorical        None   
7    thalach  Feature      Integer        None   
8      exang  Feature  Categorical        None   
9    oldpeak  Feature      Integer        None   
10     slope  Feature  Categorical        None   
11        ca  Feature      Integer        None   
12      thal  Feature  Categorical        None   
13       num   Target      Integer        None   

                                          description  units missing_values  
0                                                None  years             no  
1                                                None   None             no  
2              

In [10]:
# finalize the heart disease dataset in a single DataFrame with predictors and labels
heart_disease_df = X.assign(
    binary_label=y.map(lambda value: value > 0).astype(int) # convert categorical labels to binary (1=heart disease, 0=no heart disease)
)

In [11]:
# load in the diabetes dataset from the GitHub repository
diabetes_df = pd.read_csv('https://github.com/btwooton/arch_workshop_automl_ws14/raw/refs/heads/main/data/diabetes.csv')

In [12]:
diabetes_df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


Now we split the two datasets into 80%/20% training/test set splits

In [16]:
# splitting the heart disease dataset into training and test sets using DataFrame.sample()
hd_train = heart_disease_df.sample(frac=0.8)
hd_test = heart_disease_df.drop(hd_train.index)

In [23]:
hd_train['binary_label'].value_counts()

Unnamed: 0_level_0,count
binary_label,Unnamed: 1_level_1
0,128
1,114


In [24]:
hd_test['binary_label'].value_counts()

Unnamed: 0_level_0,count
binary_label,Unnamed: 1_level_1
0,36
1,25


In [19]:
# splitting the diabetes dataset into training and test sets
diabetes_train = diabetes_df.sample(frac=0.8)
diabetes_test = diabetes_df.drop(diabetes_train.index)

In [20]:
diabetes_train['Outcome'].value_counts()

Unnamed: 0_level_0,count
Outcome,Unnamed: 1_level_1
0,408
1,206


In [21]:
diabetes_test['Outcome'].value_counts()

Unnamed: 0_level_0,count
Outcome,Unnamed: 1_level_1
0,92
1,62


Now we use AutoGluon's `TabularPredictor` class to fit a weighted ensemble of classifiers on each of the two datasets, which will automatically use the best performing models on Validation data

In [22]:
from autogluon.tabular import TabularPredictor

In [25]:
# Fitting a tabular predictor on the Heart Disease Dataset
predictor_hd = TabularPredictor(label='binary_label', eval_metric='roc_auc').fit(hd_train)

No path specified. Models will be saved in: "AutogluonModels/ag-20260210_234907"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.5.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          2
Pytorch Version:    2.9.0+cpu
CUDA Version:       CUDA is not available
Memory Avail:       10.97 GB / 12.67 GB (86.6%)
Disk Space Avail:   85.47 GB / 107.72 GB (79.3%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Require

In [26]:
predictor_diabetes = TabularPredictor(label='Outcome', eval_metric='roc_auc').fit(diabetes_train)

No path specified. Models will be saved in: "AutogluonModels/ag-20260211_000313"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.5.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          2
Pytorch Version:    2.9.0+cpu
CUDA Version:       CUDA is not available
Memory Avail:       10.84 GB / 12.67 GB (85.5%)
Disk Space Avail:   85.42 GB / 107.72 GB (79.3%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme'  : New in v1.5: The state-of-the-art for tabular data. Massively better than 'best' on datasets <100000 samples by using new Tabular Foundation Models (TFMs) meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, TabDPT, and TabM. Require

In [28]:
# now we can refit the predictors on the full training datasets using the refit_full() method to get slightly better performance
predictor_hd.refit_full()
predictor_diabetes.refit_full()

Refitting models via `predictor.refit_full` using all of the data (combined train and validation)...
	Models trained in this way will have the suffix "_FULL" and have NaN validation score.
	This process is not bound by time_limit, but should take less time than the original `predictor.fit` call.
	To learn more, refer to the `.refit_full` method docstring which explains how "_FULL" models differ from normal models.
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBMXT_FULL ...
	Fitting with cpus=1, gpus=0, mem=0.0/10.7 GB
	0.73s	 = Training   runtime
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_FULL ...
	Fitting with cpus=1, gpus=0, mem=0.0/10.7 GB
	0.67s	 = Training   runtime
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: RandomForestGini_FULL ...
	Fitting with cpus=2, gpus=0
	0.61s	 = Training   runtime
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: RandomForestEntr_FULL ...
	Fitting with cpus=2

{'LightGBMXT': 'LightGBMXT_FULL',
 'LightGBM': 'LightGBM_FULL',
 'RandomForestGini': 'RandomForestGini_FULL',
 'RandomForestEntr': 'RandomForestEntr_FULL',
 'CatBoost': 'CatBoost_FULL',
 'ExtraTreesGini': 'ExtraTreesGini_FULL',
 'ExtraTreesEntr': 'ExtraTreesEntr_FULL',
 'NeuralNetFastAI': 'NeuralNetFastAI_FULL',
 'XGBoost': 'XGBoost_FULL',
 'NeuralNetTorch': 'NeuralNetTorch_FULL',
 'LightGBMLarge': 'LightGBMLarge_FULL',
 'WeightedEnsemble_L2': 'WeightedEnsemble_L2_FULL'}

Now we evaluate the models on the test datasets, and also show a leaderboard with a performance breakdown across all models trained during construction of the ensemble

In [31]:
predictor_hd.evaluate(hd_test)

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


{'roc_auc': np.float64(0.9222222222222223),
 'accuracy': 0.8688524590163934,
 'balanced_accuracy': np.float64(0.8827777777777778),
 'mcc': np.float64(0.753106668091906),
 'f1': 0.8571428571428571,
 'precision': 0.7741935483870968,
 'recall': 0.96}

In [32]:
predictor_diabetes.evaluate(diabetes_test)

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


{'roc_auc': np.float64(0.8374824684431978),
 'accuracy': 0.7532467532467533,
 'balanced_accuracy': np.float64(0.7093267882187939),
 'mcc': np.float64(0.4851234967443302),
 'f1': 0.6122448979591837,
 'precision': 0.8333333333333334,
 'recall': 0.4838709677419355}