<a href="https://colab.research.google.com/github/akshatamadavi/data_mining/blob/main/autogluon/tabular-gpu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training models with GPU support

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/stable/docs/tutorials/tabular/advanced/tabular-gpu.ipynb)
[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/stable/docs/tutorials/tabular/advanced/tabular-gpu.ipynb)



Training with GPU can significantly speed up base algorithms, and is a necessity for text and vision models where training without GPU is infeasibly slow.
CUDA toolkit is required for GPU training. Please refer to the [official documentation](https://docs.nvidia.com/cuda/) for the installation instructions.

```python
predictor = TabularPredictor(label=label).fit(
    train_data,
    num_gpus=1,  # Grant 1 gpu for the entire Tabular Predictor
)
```


To enable GPU acceleration on only specific models, the same parameter can be passed into model `hyperparameters`:

```python
hyperparameters = {
    'GBM': [
        {'ag_args_fit': {'num_gpus': 0}},  # Train with CPU
        {'ag_args_fit': {'num_gpus': 1}}   # Train with GPU. This amount needs to be <= total num_gpus granted to TabularPredictor
    ]
}
predictor = TabularPredictor(label=label).fit(
    train_data,
    num_gpus=1,
    hyperparameters=hyperparameters,
)
```


## Multi-modal

In [Multimodal Data Tables: Tabular, Text, and Image](../tabular-multimodal.ipynb) tutorial we presented how to train an ensemble which can utilize tabular, text and images.
If available GPUs don't have enough VRAM to fit the default model, or it is needed to speedup testing, different backends can be used:

Regular configuration is retrieved like this:

In [1]:
!pip install autogluon.tabular[all]


Collecting autogluon.tabular[all]
  Downloading autogluon.tabular-1.4.0-py3-none-any.whl.metadata (16 kB)
Collecting autogluon.core==1.4.0 (from autogluon.tabular[all])
  Downloading autogluon.core-1.4.0-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.features==1.4.0 (from autogluon.tabular[all])
  Downloading autogluon.features-1.4.0-py3-none-any.whl.metadata (11 kB)
Collecting catboost<1.3,>=1.2 (from autogluon.tabular[all])
  Downloading catboost-1.2.8-cp312-cp312-manylinux2014_x86_64.whl.metadata (1.2 kB)
Collecting loguru (from autogluon.tabular[all])
  Downloading loguru-0.7.3-py3-none-any.whl.metadata (22 kB)
Collecting einx (from autogluon.tabular[all])
  Downloading einx-0.3.0-py3-none-any.whl.metadata (6.9 kB)
Collecting xgboost<3.1,>=2.0 (from autogluon.tabular[all])
  Downloading xgboost-3.0.5-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Collecting torch<2.8,>=2.2 (from autogluon.tabular[all])
  Downloading torch-2.7.1-cp312-cp312-manylinux_2_28_x86_64.whl.me

In [2]:
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
hyperparameters = get_hyperparameter_config('multimodal')
hyperparameters

{'NN_TORCH': {},
 'GBM': [{},
  {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
  {'learning_rate': 0.03,
   'num_leaves': 128,
   'feature_fraction': 0.9,
   'min_data_in_leaf': 3,
   'ag_args': {'name_suffix': 'Large',
    'priority': 0,
    'hyperparameter_tune_kwargs': None}}],
 'CAT': {},
 'XGB': {},
 'AG_AUTOMM': {}}

## Enabling GPU for LightGBM

The default installation of LightGBM does not support GPU training, however GPU support can be enabled via a special install. If `num_gpus` is set, the following warning will be displayed:

```
Warning: GPU mode might not be installed for LightGBM, GPU training raised an exception. Falling back to CPU training...Refer to LightGBM GPU documentation: https://github.com/Microsoft/LightGBM/tree/master/python-package#build-gpu-versionOne possible method is:	pip uninstall lightgbm -y	pip install lightgbm --install-option=--gpu
```


If the suggested commands do not work, uninstall existing lightgbm `pip uninstall -y lightgbm` and install from sources following the instructions in the [official guide](https://lightgbm.readthedocs.io/en/latest/GPU-Tutorial.html). The
optional [Install Python Interface](https://lightgbm.readthedocs.io/en/latest/GPU-Tutorial.html#install-python-interface-optional) section is also required to make it work with AutoGluon.

## Advanced Resource Allocation

Most of the time, you would only need to set `num_cpus` and `num_gpus` at the predictor `fit` level to control the total resources you granted to the TabularPredictor.
However, if you want to have more detailed control, we offer the following options.

`ag_args_ensemble: ag_args_fit: { RESOURCES }` allows you to control the total resources granted to a bagged model.
If using parallel folding strategy, individual base model's resources will be calculated respectively.
This value needs to be <= total resources granted to TabularPredictor
This parameter will be ignored if bagging model is not enabled.

`ag_args_fit: { RESOURCES }` allows you to control the total resources granted to a single base model.
This value needs to be <= total resources granted to TabularPredictor and <= total resources granted to a bagged model if applicable.

As an example, consider the following scenario

```python
predictor.fit(
    num_cpus=32,
    num_gpus=4,
    hyperparameters={
        'NN_TORCH': {},
    },
    num_bag_folds=2,
    ag_args_ensemble={
        'ag_args_fit': {
            'num_cpus': 10,
            'num_gpus': 2,
        }
    },
    ag_args_fit={
        'num_cpus': 4,
        'num_gpus': 0.5,
    }
    hyperparameter_tune_kwargs={
        'searcher': 'random',
        'scheduler': 'local',
        'num_trials': 2
    }
)
```


In [3]:
predictor.fit(
    num_cpus=32,
    num_gpus=4,
    hyperparameters={
        'NN_TORCH': {},
    },
    num_bag_folds=2,
    ag_args_ensemble={
        'ag_args_fit': {
            'num_cpus': 10,
            'num_gpus': 2,
        }
    },
    ag_args_fit={
        'num_cpus': 4,
        'num_gpus': 0.5,
    }
    hyperparameter_tune_kwargs={
        'searcher': 'random',
        'scheduler': 'local',
        'num_trials': 2
    }
)

SyntaxError: invalid syntax. Perhaps you forgot a comma? (ipython-input-3956862809.py, line 14)

In [4]:
# ============================
# 🚀 AutoGluon GPU Colab Setup
# ============================

# 1️⃣ Install dependencies
!pip install -U pip
!pip install -q autogluon==1.1.1  # GPU-compatible version
!nvidia-smi  # ✅ Check GPU is available

# ============================
# 📦 Import Libraries
# ============================
from autogluon.tabular import TabularDataset, TabularPredictor
import pandas as pd

# ============================
# 📂 Load Sample Dataset
# ============================
train_data = TabularDataset("https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv")
test_data = TabularDataset("https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv")

print("Train shape:", train_data.shape)
print("Test shape:", test_data.shape)

label = 'class'
print("Label column:", label)

# ============================
# ⚙️ Train Models with GPU
# ============================
predictor = TabularPredictor(label=label, path="AutogluonModels/") \
    .fit(train_data, presets='best_quality', time_limit=600)

# ============================
# 📈 Evaluate Performance
# ============================
leaderboard = predictor.leaderboard(test_data, silent=True)
print(leaderboard)

# ============================
# 🔮 Make Predictions
# ============================
preds = predictor.predict(test_data)
print(preds[:10])

# ============================
# 💾 Save Model
# ============================
predictor.save()
print("Model saved to:", predictor.path)


Collecting pip
  Downloading pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.3-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m69.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.3
[31mERROR: Ignored the following yanked versions: 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.0.10, 0.0.11, 0.0.12, 0.0.13, 0.0.14, 0.0.15[0m[31m
[0m[31mERROR: Ignored the following versions that require a different python version: 0.1.0 Requires-Python >=3.6,<3.9; 0.1.0b20210207 Requires-Python >=3.6,<3.8; 0.1.0b20210208 Requires-Python >=3.6,<3.8; 0.1.0b20210209 Requires-Python >=3.6,<3.8; 0.1.0b20210210 Requires-Python >=3.6,<3.8; 0.1.0b20210211 Requires-Python >=3.6,<3.8; 0.1.0b20210212 Requires-Python >=3.6,<3.8; 0.1.

Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          8
Memory Avail:       49.10 GB / 50.99 GB (96.3%)
Disk Space Avail:   190.35 GB / 235.68 GB (80.8%)
Presets specified: ['best_quality']
Using hyperparameters preset: hyperparameters='zeroshot'
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfit

Train shape: (39073, 15)
Test shape: (9769, 15)
Label column: class


	Running DyStack sub-fit in a ray process to avoid memory leakage. Enabling ray logging (enable_ray_logging=True). Specify `ds_args={'enable_ray_logging': False}` if you experience logging issues.
2025-10-31 02:41:37,848	INFO worker.py:1843 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m
		Context path: "/content/AutogluonModels/ds_sub_fit/sub_fit_ho"
[36m(_dystack pid=6570)[0m Running DyStack sub-fit ...
[36m(_dystack pid=6570)[0m Beginning AutoGluon training ... Time limit = 137s
[36m(_dystack pid=6570)[0m AutoGluon will save models to "/content/AutogluonModels/ds_sub_fit/sub_fit_ho"
[36m(_dystack pid=6570)[0m Train Data Rows:    34731
[36m(_dystack pid=6570)[0m Train Data Columns: 14
[36m(_dystack pid=6570)[0m Label Column:       class
[36m(_dystack pid=6570)[0m Problem Type:       binary
[36m(_dystack pid=6570)[0m Preprocessing data ...
[36m(_dystack pid=6570)[0m Selected class <--> label mapping:  class 1 =  >50K, 

                      model  score_test  score_val eval_metric  \
0            XGBoost_BAG_L1    0.876651   0.875771    accuracy   
1       WeightedEnsemble_L2    0.876651   0.875771    accuracy   
2           CatBoost_BAG_L1    0.876139   0.873775    accuracy   
3           LightGBM_BAG_L1    0.876036   0.874543    accuracy   
4      CatBoost_r177_BAG_L1    0.875115   0.874926    accuracy   
5      LightGBMLarge_BAG_L1    0.874092   0.873800    accuracy   
6         LightGBMXT_BAG_L1    0.870816   0.868298    accuracy   
7   RandomForestEntr_BAG_L1    0.861910   0.856781    accuracy   
8    NeuralNetFastAI_BAG_L1    0.861603   0.859238    accuracy   
9   RandomForestGini_BAG_L1    0.861501   0.856883    accuracy   
10    NeuralNetTorch_BAG_L1    0.859453   0.859187    accuracy   
11    ExtraTreesEntr_BAG_L1    0.853414   0.851765    accuracy   
12    ExtraTreesGini_BAG_L1    0.851878   0.850869    accuracy   

    pred_time_test  pred_time_val    fit_time  pred_time_test_marginal  \
0

TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/content/AutogluonModels")


Model saved to: /content/AutogluonModels


We train 2 HPO trials, which trains 2 folds in parallel at the same time. The total resources granted to the TabularPredictor is 32 cpus and 4 gpus.

For a bagged model, we grant 10 cpus and 2 gpus.
This means we would run two HPO trials in parallel, each granted 10 cpus and 2 gpus -> 20 cpus and 4 gpus in total.

We also specified that for an individual model base we want 4 cpus and 0.5 gpus and we can train two folds in parallel according to the bagged level resources -> 8 cpus and 1 gpus for a bagged model -> 16 cpus and 2 gpus when two trials running in parallel.

Therefore, we will use 16 cpus and 2 gpus in total and have two trials of bagged model running in parallel each running two folds in parallel -> 4 models training in parallel.

In [5]:
!nvidia-smi


Fri Oct 31 02:52:56 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   51C    P0             28W /   70W |     102MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                