## Introduction

The goal here is to test the Mean Squared Error of our model on the test data, and compare it to a basline.

This step is to make sure our model actually learned from the training before continuing.

## Imports

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np
import os

2025-05-17 10:37:20.065104: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-17 10:37:20.122736: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747471040.133835 1719589 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747471040.138944 1719589 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1747471040.167700 1719589 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

## Loading the training data

In [2]:
export_dir = "./exports/test_data/"
test_dataset = pd.read_parquet(export_dir + "test_dataset.pq")

## Creating the tensorflow test dataset

In [3]:
user_columns = [f"avg_feat_{i}" for i in range(31)] + [f"avg_category_{i}" for i in range(1, 40)]
video_columns = ["video_duration", "trend_score"] + [f"feat_{i}" for i in range(31)] + [f"category_{i}" for i in range(1, 40)]
label_column = ["watch_ratio"]

X_user_test = test_dataset[user_columns].to_numpy()
X_video_test = test_dataset[video_columns].to_numpy()
y_test = test_dataset[label_column].values

dataset = tf.data.Dataset.from_tensor_slices(((X_user_test, X_video_test),))
BATCH_SIZE = 2048
dataset = dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

I0000 00:00:1747471044.066499 1719589 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13687 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4080 SUPER, pci bus id: 0000:01:00.0, compute capability: 8.9


## Loading the model

In [4]:
export_dir = "./exports/trained_model/"
model = tf.keras.models.load_model(export_dir + "model.keras")

## Comparing to a baseline on the test data

In order to know if our model learned something, we have to compare it to a baseline.

The baseline we chose is one that would just output the average watch ratio of all the videos accross the entire dataset, no matter the features. We also chose the mae (Mean Absolute Error) to compare the base to our model.

The goal is for our trained neural network to be better than this baseline.

In [5]:
baseline_predictions = np.full_like(y_test, np.mean(y_test), dtype=np.float32)
baseline_mae = np.mean(np.abs(baseline_predictions.flatten() - y_test.flatten()))
print(f"Baseline MAE: {baseline_mae:.4f}")

model_predictions = model.predict(dataset, verbose=False)
model_mae = np.mean(np.abs(model_predictions.flatten() - y_test.flatten()))
print(f"Trained model mean absolute error: {model_mae:.4f}")

Baseline MAE: 0.4207
Trained model mean absolute error: 0.3485


We can see that our model is better than our baseline by a significant margin, which is promising.