# Tabluar Quickstart

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/awslabs/autogluon/blob/new/docs/tutorials/tabular/tabular_quick_start.ipynb)
[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/awslabs/autogluon/blob/new/docs/tutorials/tabular/tabular_quick_start.ipynb)

In this tutorial, we will see how to use AutoGluon's `TabularPredictor` to predict the values of a target column based on the other columns in a tabular dataset.

To begin, make sure AutoGluon is installed, and then import AutoGluon's `TabularDataset` and `TabularPredictor`. We will use the former to load data and the latter to train models and make predictions. 

In [None]:
!python -m pip --upgrade pip
!python -m pip install autogluon

In [4]:
from autogluon.tabular import TabularDataset, TabularPredictor

The dataset we will use is from the cover story of [Nature issue 7887](https://www.nature.com/nature/volumes/600/issues/7887): [AI guided tuition for math theorems](https://www.nature.com/articles/s41586-021-04086-x.pdf). The task is to predict a knot's signature based on its properties. We sampled 10K training and 5K test examples from the [original data](https://github.com/deepmind/mathematics_conjectures/blob/main/knot_theory.ipynb). The sampled dataset make this tutorial run quickly, but AutoGluon can handle the full dataset if desired.

We load this dataset directly from a URL. Note that the `TabularDataset` class is a subclass of [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), and any pandas `DataFrame` methods can be used on it as well.

In [None]:
data_url = 'https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/'
train_data = TabularDataset(f'{data_url}train.csv')
train_data.head()

Our targets are stored in the "signature" column, which has 18 unique integers. Even though Pandas didn't correctly recognize this data type as categorical, AutoGluon will fix this issue.


In [None]:
label = 'signature'
train_data[label].describe()

We now construct a `TabularPredictor` by specifying the label column name and then train on the dataset with `TabularPredictor.fit()`. We don't need to specify any other hyperparameters. AutoGluon will recognize this is a multi-class classification task, perform automatic feature engineering, train multiple models, and then ensemble them to form the final predictions. 

In [None]:
predictor = TabularPredictor(label=label).fit(train_data)

Model fitting should take a few minutes or less depending on your CPU. You can make training faster by specifying the `time_limit` argument. For example, `fit(..., time_limit=60)` will stop training after 60 seconds. Higher time limits will generally result in better prediction performance, and excessively low time limits will prevent AutoGluon from training and ensembling a reasonable set of models.

Once training is done, we can load a separate set of data to use for prediction and evaulation.

In [26]:
test_data = TabularDataset(f'{data_url}test.csv')

y_pred = predictor.predict(test_data.drop(columns=[label]))
y_pred.head()

Loaded data from: https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/test.csv | Columns = 19 / 19 | Rows = 5000 -> 5000


0   -4
1   -2
2    0
3    4
4    2
Name: signature, dtype: int64

If you just want to evaluate the model performance, you can call the {func}`autogluon.tabular.TabularPredictor.evaluate` method.

In [27]:
predictor.evaluate(test_data, silent=True)

{'accuracy': 0.95,
 'balanced_accuracy': 0.7619277504882699,
 'mcc': 0.9387411901257484}

Now we did a quick through about using AutoGluon for tabular prediction. We used two classes, {class}`autogluon.tabular.TabularDataset` (essentially a pandas DataFrame) to load data and {class}`autogluon.tabular.TabularPredictor` to train (via the `fit` method) and predict (via the `predict` method). You will see similar APIs for other tasks, namely a `Dataset` class to load data and a `Prediction` class to train and predict. 


In addition, AutoGluon simplifies the model training by not requiring feature engineering and specifying model hyperparameters. AutoGluon automatically performs these jobs when running `fit`. You may worry about the resulted longer training time, AutoGluon balances the computational cost and model quality. You can benchmark AutoGluon's performance on the whole dataset loaded above against your favorite machine learning model. But to be fair, you also need to count the time you spend on preprocessing data and tuning your models. 

```{seealso}
To know more about AutoGluon, next you can read

- the cheetsheet for a quick overview of the APIs
- tutorials to customize the training and inference
- understand how AutoGluon performs feature engineering and model ensemble. 
```