## Demo to learn with the $\mathcal{T}$-similarity

In this notebook, we show how to fit and predict with the diverse ensemble estimator illustated in the Figure 2 of the original [paper](https://arxiv.org/pdf/2310.14814.pdf).

In [5]:
%reload_ext autoreload
%autoreload 2

In [6]:
import matplotlib.pyplot as pl
import numpy as np
import seaborn as sns
import sys
import os

sys.path.append("..")
from src.datasets.read_dataset import RealDataSet
from src.models.diverse_ensemble import DiverseEnsembleMLP

custom_params = {"axes.grid": False}
sns.set_theme(style="ticks", rc=custom_params)
sns.set_context("talk")
pl.rcParams.update({"figure.autolayout": True})

In [7]:
dataset_name = "mushrooms"
seed = 0
nb_lab_samples_per_class = 40
num_epochs = 5
gamma = 1

# Fixed params
test_size = 0.25
n_iters = 100
n_classifiers = 5

# Data split
dataset = RealDataSet(dataset_name=dataset_name, seed=seed)

# Percentage of labeled data
num_classes = len(list(set(dataset.y)))
ratio = num_classes / ((1 - test_size) * len(dataset.y))
lab_size = nb_lab_samples_per_class * ratio

real_biases = ["IID", "SSB"]
for i, selection_bias in enumerate([False, True]):
    # Split
    x_l, x_u, y_l, y_u, x_test, y_test, n_classes = dataset.get_split(
        test_size=test_size, lab_size=lab_size, selection_bias=selection_bias
    )

    # Define base classifier
    base_classifier = DiverseEnsembleMLP(
        num_epochs=num_epochs,
        gamma=gamma,
        n_iters=n_iters,
        n_classifiers=n_classifiers,
        device="cpu",
        verbose=False,
        random_state=seed,
    )

    # Train
    base_classifier.fit(x_l, y_l, x_u)
    test_acc = (base_classifier.predict(x_test) == y_test).mean() * 100
    tsim = base_classifier.predict_t_similarity(x_test).mean()
    print(f"Selection bias: {real_biases[i]}")
    print(
        f"The supervised prediction head achieves an accuracy of {test_acc:.3}% on the test set."
    )
    print(f"The average T-similarity on the test set is {tsim:.3}. \n")

Selection bias: IID
The supervised prediction head achieves an accuracy of 97.7% on the test set.
The average T-similarity on the test set is 0.694. 

Selection bias: SSB
The supervised prediction head achieves an accuracy of 67.1% on the test set.
The average T-similarity on the test set is 0.583. 

