# Criteo Click Through Prediction using ThirdAI's Universal Deep Transformer(UDT) APIs
This notebook shows how to build a Click Through Prediction model using ThirdAI's UDT.

In [None]:
!pip3 install thirdai --upgrade

# Download Dataset

# UDT Initialization


We can create a UDT model specific for Criteo Click Through Prediction as follows. Here we define Bolt Datatype for all the columns, specify the target, n_target classes and the embedding dimension for UDT. 

In [None]:
from thirdai import bolt
import numpy as np
from sklearn.metrics import roc_auc_score


tabular_model = bolt.UniversalDeepTransformer(
    data_types={
        "numeric_1": bolt.types.numerical(range=(0, 1500)),
        "numeric_2": bolt.types.numerical(range=(0, 1500)),
        "numeric_3": bolt.types.numerical(range=(0, 1500)),
        "numeric_4": bolt.types.numerical(range=(0, 1500)),
        "numeric_5": bolt.types.numerical(range=(0, 1500)),
        "numeric_6": bolt.types.numerical(range=(0, 1500)),
        "numeric_7": bolt.types.numerical(range=(0, 1500)),
        "numeric_8": bolt.types.numerical(range=(0, 1500)),
        "numeric_9": bolt.types.numerical(range=(0, 1500)),
        "numeric_10": bolt.types.numerical(range=(0, 1500)),
        "numeric_11": bolt.types.numerical(range=(0, 1500)),
        "numeric_12": bolt.types.numerical(range=(0, 1500)),
        "numeric_13": bolt.types.numerical(range=(0, 1500)),
        "cat_1": bolt.types.categorical(),
        "cat_2": bolt.types.categorical(),
        "cat_3": bolt.types.categorical(),
        "cat_4": bolt.types.categorical(),
        "cat_5": bolt.types.categorical(),
        "cat_6": bolt.types.categorical(),
        "cat_7": bolt.types.categorical(),
        "cat_8": bolt.types.categorical(),
        "cat_9": bolt.types.categorical(),
        "cat_10": bolt.types.categorical(),
        "cat_11": bolt.types.categorical(),
        "cat_12": bolt.types.categorical(),
        "cat_13": bolt.types.categorical(),
        "cat_14": bolt.types.categorical(),
        "cat_15": bolt.types.categorical(),
        "cat_16": bolt.types.categorical(),
        "cat_17": bolt.types.categorical(),
        "cat_18": bolt.types.categorical(),
        "cat_19": bolt.types.categorical(),
        "cat_20": bolt.types.categorical(),
        "cat_21": bolt.types.categorical(),
        "cat_22": bolt.types.categorical(),
        "cat_23": bolt.types.categorical(),
        "cat_24": bolt.types.categorical(),
        "cat_25": bolt.types.categorical(),
        "cat_26": bolt.types.categorical(),
        "label": bolt.types.categorical(),
    },
    target="label",
    n_target_classes=2,
    options={"embedding_dimension": 512},
)

# Training

We will now train the UDT with just one line of code. Here we are specifying the training file name, number of epochs to train and maximum number of batches we want to load in memory at once. You can change max_in_memory_batches based on your RAM specification.

In [None]:


tabular_model.train(filename="/share/data/criteo_tb/bigfile.txt", epochs=1, max_in_memory_batches=1000)



# Evaluation

Evaluating the UDT is also just one line of code. Evaluate function by default returns activations. Then, we are using sklearn's roc_auc_score to calculate roc_auc_score for the model we developed here.

In [None]:
activations = tabular_model.evaluate(filename="/share/data/criteo_tb/train_csvs/test", metrics=["categorical_accuracy"])

true_labels = np.zeros(activations.shape[0], dtype=np.float32)
with open("/share/data/criteo_tb/train_csvs/test") as f:
    header = f.readline()
    count = 0
    for line in f:
        true_labels[count] = np.float32(line.split(",")[0])
        count += 1

roc_auc = roc_auc_score(true_labels, activations[:, 1])

print("ROC_AUC:", roc_auc)