Repo retrieves only one class for binary classification tasks #61

sebastianpinedaar · 2024-05-28T09:08:24Z

Thanks for the nice package! It fosters research in different aspects of AutoML!

However, I am curious about the expected behavior when loading the predictions for binary classification tasks (e.g. "Australian" dataset). According to the documentation, it should output a tensor with shapes: (n_configs, n_rows, n_classes). However, the code below yields predictions of size (n_configs, n_rows).

Given that it is a binary classification problem, I expected something like (n_configs, n_rows, n_classes), where n_classes = 2. Is the current setup giving just the probability of one class? If so, I can easily compute the probability of the other class, however, it would be better to output directly both. Otherwise, please let me know what I am missing.

To reproduce the issue:

from tabrepo import load_repository, get_context, EvaluationRepository

context_name = "D244_F3_C1530_100"
repo: EvaluationRepository = load_repository(context_name, cache=True)

shape1 = repo.predict_val_multi(dataset="Australian",fold=0, configs=["CatBoost_r1_BAG_L1", "LightGBM_r41_BAG_L1"]).shape
shape2 = repo.predict_val_multi(dataset="autoUniv-au7-700",fold=0, configs=["CatBoost_r1_BAG_L1", "LightGBM_r41_BAG_L1"]).shape

print("Shape 1:", shape1)
print("Shape 2:", shape2)

Output:

Shape 1: (2, 621)
Shape 2: (2, 630, 3)

Regards,

Sebastian

geoalgo · 2024-05-29T11:54:10Z

Sorry for the confusion our doc is indeed to be improved there, we return a tensor with shape (n_configs, n_rows, n_classes) in case of multi-class classification and a tensor with shape (n_configs, n_rows)else for both regression and binary classification.

For binary classification, we return the probability of the first class IIRC.

Thanks for pointing this out, we will improve our doc to avoid this confusion.

Innixma · 2024-06-13T23:06:28Z

Yeah, @geoalgo's answer is correct. The reason we only return the positive class prediction probability is for efficiency. It allows us to halve the memory usage and runtime of the operations. It might be a good idea for us to add a flag that users can set to make it return the multiclass representation though, for ease of use purposes.

sebastianpinedaar · 2024-06-18T09:15:16Z

Thanks for the information! Indeed, a flag would be useful, especially if someone wants to write a general code that works for any number of classes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo retrieves only one class for binary classification tasks #61

Repo retrieves only one class for binary classification tasks #61

sebastianpinedaar commented May 28, 2024

geoalgo commented May 29, 2024

Innixma commented Jun 13, 2024

sebastianpinedaar commented Jun 18, 2024

Repo retrieves only one class for binary classification tasks #61

Repo retrieves only one class for binary classification tasks #61

Comments

sebastianpinedaar commented May 28, 2024

geoalgo commented May 29, 2024

Innixma commented Jun 13, 2024

sebastianpinedaar commented Jun 18, 2024