# Building Classifiers and Regressors in Teras

teras 0.3 offers backbone models which are task independent, i.e. they are headless. Hence they can be coupled with a classification head or a regression head to build a classifier or a regressor respectively.


Let's see how easy it is!

For the purpose of this tutorial, we'll take on a binary classification task using the famous [Adult Income](https://archive.ics.uci.edu/dataset/2/adult) dataset. And for our model, we'll use the ``TabTransformerBackbone`` [arXiv](https://arxiv.org/abs/2012.06678).

But, first, let's set our backend. My personal preference is JAX, as among other things, it's the most efficient and fun to play with. So, for this tutorial I'll be using that!

**NOTE:** You must configure your Keras backend before importing ``teras`` or ``keras``

In [1]:
import os
os.environ["KERAS_BACKEND"] = "jax"

We'll take on a classification task using the famous Adult Income dataset from the UCI dataset repository.

In [2]:
!curl https://archive.ics.uci.edu/static/public/2/adult.zip --output adult.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  605k    0  605k    0     0  57806      0 --:--:--  0:00:10 --:--:-- 72609


In [3]:
!unzip adult.zip

Archive:  adult.zip
replace Index? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C


Since you're here, you must already be familiar with the tabular machine learning ecosystem, the pandas, the sklearn etc. So, I'll assume you understand the following boilerplate code to load and preprocess dataset.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder, Normalizer


# Load dataset
columns = ["age", "workclass", "fnlwgt", "education", "education-num", 
           "marital", "occupation", "relationship", "race", "sex",
           "capital-gain", "capital-loss", "hours-per-week", "native-country",
           ">50K"]
continous_columns = ["age", "fnlwgt", "education-num", "capital-gain",
                     "capital-loss", "hours-per-week"]
categorical_columns = list(set(columns) - set(continous_columns))
target_column = ">50K"
df = pd.read_csv("adult.data", names=columns, header=None)

# Ordinally encode categorical values
encoder = OrdinalEncoder()
df[categorical_columns] = encoder.fit_transform(df[categorical_columns])
# Normalize continuous features
normalizer = Normalizer()
df[continous_columns] = normalizer.fit_transform(df[continous_columns])
y = df.pop(target_column)
X = df

# Split into train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Let's now import the ``TabTransformerBackbone``. In addition, we'll also import a ``Classifier`` model class that will wrap our backbone model to give us a model ready to be trained for the classification task at hand. You can think of the backbone and the classifier class as LEGO pieces, combined they make a classification model, but they can both be combined with other different LEGO pieces made availble by teras, like the ``Regressor`` class for instance, to build a model for regression.

Anyway, let's get to coding! 

In [5]:
from teras.models import TabTransformerBackbone
from teras.models import Classifier

2024-04-10 11:34:48.108866: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-10 11:34:48.108954: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-10 11:34:48.110946: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Looking at the [documentation](file:///home/abaid/projects/teras/docs/build/html/_autosummary/teras.models.TabTransformerBackbone.html#teras.models.TabTransformerBackbone) of ``TabTransformerBackbone``, we see that it requires 3 positional arguments, in the order ``input_dim``, ``cardinalities`` and ``embedding_dim``.

Now ``input_dim`` and ``embedding_dim`` is easy. ``input_dim`` is equal to the dimensionality of the dataset and for ``embedding_dim`` we can pass any reasonable value like 32, 64 etc. 

'So... am i going to have to compute cardinalities myself, or is there a quick way of doing it?' you might ask. And, sure enough, unsprisingly, teras offers a handy utility function ``compute_cardinalities`` for this purpose. Let's import it as follows

In [6]:
from teras.utils import compute_cardinalities

In [7]:
categorical_idx = [idx for idx, col in enumerate(columns) 
                   if col in categorical_columns]

cardinalties = compute_cardinalities(X_train.values,
                                     categorical_idx=categorical_idx)
cardinalties    # A value of zero indicates a continous feature

array([ 0,  9,  0, 16,  0,  7, 15,  6,  5,  2,  0,  0,  0, 42])

Now we're ready to instantiate our backbone.

In [8]:
backbone = TabTransformerBackbone(input_dim=X_train.shape[1],
                                  cardinalities=cardinalties,
                                  embedding_dim=32)

Let's print backbone model's summary to see what's going on under the hood. This is of great help when you want to get familiar with the underlying structure.

In [9]:
backbone.summary()

Let's now plug our backbone and classifier models

In [10]:
model = Classifier(backbone=backbone,
                   num_classes=1,
                   activation="sigmoid")

Before training our model, we need to compile it.
In this compile step, we specify the loss function and the optimizer to use for the model.

In [11]:
import keras

model.compile(loss=keras.losses.BinaryCrossentropy(),
              optimizer=keras.optimizers.RMSprop(),
              metrics=[keras.metrics.BinaryAccuracy()])

Now we're ready to train!

In [12]:
history = model.fit(X_train, y_train, epochs=2,
                    batch_size=512, validation_split=0.1)

Epoch 1/2
[1m46/46[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m54s[0m 876ms/step - binary_accuracy: 0.7196 - loss: 0.6436 - val_binary_accuracy: 0.7973 - val_loss: 0.4587
Epoch 2/2
[1m46/46[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 470ms/step - binary_accuracy: 0.7884 - loss: 0.4669 - val_binary_accuracy: 0.7939 - val_loss: 0.4549


And just like that, teras makes it super easy to make use of tabular deep learning!

If you have any questions or run into an issue, reach us at twitter 
[@TerasML](https://twitter.com/TerasML) or file an issue at [teras github repository](https://github.com/KhawajaAbaid/teras).