# Fit Presets

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mli/ag-docs/blob/main/tabular/fit/presets.ipynb)

We you call `fit`, AutoGluon will explore a set of models to save your time on manual hyperparameter tuning. The more models to explore, the better accuracy you often get. But it also leads to more computational cost. There are several ways to balance the model accuracy and computational cost. The easiest one is through the `presets` argument in the `fit` method.

A preset setting specifies a particular set of models and how they are combined for prediction. AutoGluon provides 4 presets: `medium_quality`, `good_quality`, `high_quality`, and `best_quality`. The differences are listed in the following table. 

| Preset | Mode Quality | Fit Time | Predict Time | Disk Usage | Use Cases | 
|:---|:---|:---|:---|:---|:---|
| `best_quality` | Best | 16x  | 32x | 16x | When accuracy is what matters |
| `high_quality` | High | 16x  | 4x | 2x | When you need a very powerful solution with fast (batch) inference |
| `good_quality` | Good | 16x | 2x | 1x | When a powerful, highly portable solution with very fast inference is required: Billion-scale batch inference, sub-100ms online-inference, edge-devices |
| `medium_quality` | Medium | 1x | 1x | 1x | Initial prototyping, establishing a performance baseline |

We recommend you to start with `medium_quality`, which is the default setting, to get a sense of the problem and identify any data related issues. It's the fastest option. You can further accelerate it by subsampling your data or specifying a proper `time_limit` argument for the `fit` method. 

Once you are comfortable, next try `best_quality`. Make sure to specify at least 16x the `time_limit` value as used in `medium_quality`. Once finished, you should have a very powerful solution that is often stronger than `medium_quality`, especially for complex data. 

Once you evaluate both `best_quality` and `medium_quality`, check if either satisfies your needs. If neither do, consider trying `high_quality` and/or `good_quality`. 

Now let's train a model with the `high_quality` preset and evaluate its performance.

In [1]:
#@title Install autogluon
!pip install autogluon==0.5.0









In [None]:
#@title Load the knot theory data
from autogluon.tabular import TabularDataset, TabularPredictor

url = 'https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/'
train_data = TabularDataset(url+'train.csv')
test_data = TabularDataset(url+'test.csv')
label = 'signature'

In [3]:
predictor = TabularPredictor(label=label).fit(
    train_data, presets='best_quality')

Loaded data from: https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/train.csv | Columns = 19 / 19 | Rows = 10000 -> 10000
Loaded data from: https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/test.csv | Columns = 19 / 19 | Rows = 5000 -> 5000
No path specified. Models will be saved in: "AutogluonModels/ag-20220709_053417/"
Presets specified: ['best_quality']
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20220709_053417/"
AutoGluon Version:  0.5.0
Python Version:     3.9.12
Operating System:   Linux
Train Data Rows:    10000
Train Data Columns: 18
Label Column: signature
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	First 10 (of 13) unique label values:  [-2, 0, 2, -8, 4, -4, -6, 8, 6, 10]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You

	0.956	 = Validation score   (accuracy)
	60.35s	 = Training   runtime
	0.6s	 = Validation runtime
Fitting model: WeightedEnsemble_L3 ...
	0.9569	 = Validation score   (accuracy)
	2.51s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 551.4s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220709_053417/")


From the log you can see 16x+ more models were trained compared to the default `medium_quality` preset in {doc}`../tabular_quick_start`. But the training time is only increased by 5x, from 1.7 min to 9 min on Intel E5-2686 CPU, due to more models can be trained parallelly. 

In [4]:
predictor.evaluate(test_data, silent=True)

  from pandas import MultiIndex, Int64Index
  return torch._C._cuda_getDeviceCount() > 0


{'accuracy': 0.9524,
 'balanced_accuracy': 0.7701982259040702,
 'mcc': 0.9417052737452929}

You can see the accuracy, especially the balanced accuracy, is increased compared to the default preset.


```{seealso}
If none of the presets satisfy your requirements, you can manually specify the set of models to fit with their hyperparameters. Refer to {doc}`./model_hyperparameters` for more details.
```
