# Click-Through Prediction with ThirdAI's Universal Deep Transformer

This notebook shows how to build a Click-Through Prediction model with ThirdAI's
Universal Deep Transformer (UDT), our all-purpose classifier for a multitude of problems.
In this demo, we will train and evaluate the model on the Kaggle Display Advertising (Criteo) dataset, 
but you can easily replace this with your own dataset.

To run this notebook, you will need to obtain a ThirdAI license at the following link if you have not already:
https://www.thirdai.com/try-bolt/



In [None]:
!pip3 install thirdai==0.5.1
!pip3 install pandas # We need Pandas in the utils module to convert downloaded datasets to CSV format
!pip3 install numpy # We use numpy to analyze UDT performance in this notebook
!pip3 install sklearn

# Dataset Download

We will use the utils module in this repo to download the Kaggle Display Advertising dataset (if you have just copied this notebook and not cloned the entire repo, you will need to copy the utils.py file as well). You can replace 
this step with your own downloader and preprocessor that is specific to your dataset. The file that you pass into the UDT's train and evaluate methods have to be in CSV format with a header that has all column names.


In [None]:
import utils

train_filename, test_filename, y_train, y_test, min_vals_of_numeric_cols, max_vals_of_numeric_cols, n_unique_classes, sample_batch = utils.download_criteo()

# UDT Initilization

We can now create a UDT model by passing in the types of each column in the dataset
and the target column we want to be able to predict.

In [None]:
from thirdai import bolt
from sklearn.metrics import roc_auc_score
import numpy as np

model = bolt.UniversalDeepTransformer(
    data_types={
        "num_1": bolt.types.numerical(range=(min_vals_of_numeric_cols[0], max_vals_of_numeric_cols[0])),
        "num_2": bolt.types.numerical(range=(min_vals_of_numeric_cols[1], max_vals_of_numeric_cols[1])),
        "num_3": bolt.types.numerical(range=(min_vals_of_numeric_cols[2], max_vals_of_numeric_cols[2])),
        "num_4": bolt.types.numerical(range=(min_vals_of_numeric_cols[3], max_vals_of_numeric_cols[3])),
        "num_5": bolt.types.numerical(range=(min_vals_of_numeric_cols[4], max_vals_of_numeric_cols[4])),
        "num_6": bolt.types.numerical(range=(min_vals_of_numeric_cols[5], max_vals_of_numeric_cols[5])),
        "num_7": bolt.types.numerical(range=(min_vals_of_numeric_cols[6], max_vals_of_numeric_cols[6])),
        "num_8": bolt.types.numerical(range=(min_vals_of_numeric_cols[7], max_vals_of_numeric_cols[7])),
        "num_9": bolt.types.numerical(range=(min_vals_of_numeric_cols[8], max_vals_of_numeric_cols[8])),
        "num_10": bolt.types.numerical(range=(min_vals_of_numeric_cols[9], max_vals_of_numeric_cols[9])),
        "num_11": bolt.types.numerical(range=(min_vals_of_numeric_cols[10], max_vals_of_numeric_cols[10])),
        "num_12": bolt.types.numerical(range=(min_vals_of_numeric_cols[11], max_vals_of_numeric_cols[11])),
        "num_13": bolt.types.numerical(range=(min_vals_of_numeric_cols[12], max_vals_of_numeric_cols[12])),
        "cat_1": bolt.types.categorical(n_unique_classes=n_unique_classes[0]),
        "cat_2": bolt.types.categorical(n_unique_classes=n_unique_classes[1]),
        "cat_3": bolt.types.categorical(n_unique_classes=n_unique_classes[2]),
        "cat_4": bolt.types.categorical(n_unique_classes=n_unique_classes[3]),
        "cat_5": bolt.types.categorical(n_unique_classes=n_unique_classes[4]),
        "cat_6": bolt.types.categorical(n_unique_classes=n_unique_classes[5]),
        "cat_7": bolt.types.categorical(n_unique_classes=n_unique_classes[6]),
        "cat_8": bolt.types.categorical(n_unique_classes=n_unique_classes[7]),
        "cat_9": bolt.types.categorical(n_unique_classes=n_unique_classes[8]),
        "cat_10": bolt.types.categorical(n_unique_classes=n_unique_classes[9]),
        "cat_11": bolt.types.categorical(n_unique_classes=n_unique_classes[10]),
        "cat_12": bolt.types.categorical(n_unique_classes=n_unique_classes[11]),
        "cat_13": bolt.types.categorical(n_unique_classes=n_unique_classes[12]),
        "cat_14": bolt.types.categorical(n_unique_classes=n_unique_classes[13]),
        "cat_15": bolt.types.categorical(n_unique_classes=n_unique_classes[14]),
        "cat_16": bolt.types.categorical(n_unique_classes=n_unique_classes[15]),
        "cat_17": bolt.types.categorical(n_unique_classes=n_unique_classes[16]),
        "cat_18": bolt.types.categorical(n_unique_classes=n_unique_classes[17]),
        "cat_19": bolt.types.categorical(n_unique_classes=n_unique_classes[18]),
        "cat_20": bolt.types.categorical(n_unique_classes=n_unique_classes[19]),
        "cat_21": bolt.types.categorical(n_unique_classes=n_unique_classes[20]),
        "cat_22": bolt.types.categorical(n_unique_classes=n_unique_classes[21]),
        "cat_23": bolt.types.categorical(n_unique_classes=n_unique_classes[22]),
        "cat_24": bolt.types.categorical(n_unique_classes=n_unique_classes[23]),
        "cat_25": bolt.types.categorical(n_unique_classes=n_unique_classes[24]),
        "cat_26": bolt.types.categorical(n_unique_classes=n_unique_classes[25]),
        "label": bolt.types.categorical(n_unique_classes=2),
    },
    target="label",
)

# Training

We can now train our UDT model with just two lines! You can customize the
number of epochs and the learning rate through the *train_config* object. If you do not give any train_config, the *train()* method will auto-tune the best *train_config*.

In [None]:
train_config = bolt.TrainConfig(epochs=1, learning_rate=0.001)

model.train(train_filename, train_config)

# Evaluation

Evaluating the performance of the UDT model is also just two lines! For this specific application, AUC is the preferred metric. We will first get the activations and then compute the AUC.

In [None]:
eval_config = bolt.EvalConfig().return_activations()

activations = model.evaluate(test_filename, eval_config)

roc_auc = roc_auc_score(y_test, activations[:,1])

# Saving and Loading

Saving and loading a trained UDT model to disk is also extremely straight forward.

In [None]:
save_location = "click_through.model"

# Saving
model.save(save_location)

# Loading
model = bolt.UniversalDeepTransformer.load(save_location)


# Testing Predictions

The evaluation method is great for testing, but it requires labels, which don't
exist in a production setting. We also have a predict method that can take in an 
in-memory batch of rows or a single row (without the target column), allowing 
easy integration into production pipelines.

In [None]:
import numpy as np

prediction = model.predict(sample_batch[0])
prediction_batch = model.predict_batch(sample_batch)