# Tabular Regression with ThirdAI's Universal Deep Transformer
This notebook shows how to build a tabular regression model with ThirdAI's Universal Deep Transformer (UDT) model, our all-purpose classifier for tabular datasets. In this demo, we will train and evaluate the model on the Mercedes Benz Greener Manufacturing dataset, but you can easily replace this with your own dataset.

To run this notebook, you will need to obtain a ThirdAI license at the following link if you have not already: https://www.thirdai.com/try-bolt/

In [None]:
!pip3 install -r requirements.txt
!pip3 install thirdai --upgrade

# Dataset Download
We will use the utils module in this repo to download dataset_name (if you have just copied this notebook and not cloned the entire repo, you will need to copy the utils.py file as well). You can replace this step and the next step with a download method and a UDT initialization step that is specific to your dataset.

In [None]:
import utils

train_filename, test_filename, test_x, test_y = utils.download_mercedes_manufacturing_data()

# UDT Initialization
We can now create a UDT model by passing in the types of each column in the dataset and the target column we want to be able to predict.

In [None]:
from thirdai import bolt, data

model = bolt.UniversalDeepTransformer(
    data_types=data.get_udt_col_types(train_filename),
    target="y",
)

# Training
We can now train our UDT model with just one line! Feel free to customize the number of epochs and the learning rate; we have chosen values that give good convergence.

In [None]:
model.train(train_filename, epochs=25, learning_rate=0.001)

# Evaluation
Evaluating the performance of the UDT model is also just two lines!

In [None]:
import numpy as np

y_pred = model.evaluate(test_filename)
np.mean(np.square(y_pred - test_y))

# Saving and Loading
Saving and loading a trained UDT model to disk is also extremely straight forward.

In [None]:
save_location = "tabular_regression.model"

# Saving
model.save(save_location)

# Loading
model = bolt.UniversalDeepTransformer.load(save_location)

# Testing Predictions
The evaluation method is great for testing, but it requires labels, which don't exist in a production setting. We also have a predict method that can take in an in-memory batch of rows or a single row (without the target column), allowing easy integration into production pipelines.

In [None]:
prediction = model.predict(test_x[0])
print("Label:", test_y[0], "Prediction:", prediction, "\n")

prediction_batch = model.predict_batch(test_x[:20])

print("Batch Prediction Results")
for input_sample, class_name in zip(test_y[:20], prediction_batch):
    print("Label:", input_sample, "Prediction:", class_name)