Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo retrieves only one class for binary classification tasks #61

Open
sebastianpinedaar opened this issue May 28, 2024 · 3 comments
Open

Comments

@sebastianpinedaar
Copy link

Thanks for the nice package! It fosters research in different aspects of AutoML!

However, I am curious about the expected behavior when loading the predictions for binary classification tasks (e.g. "Australian" dataset). According to the documentation, it should output a tensor with shapes: (n_configs, n_rows, n_classes). However, the code below yields predictions of size (n_configs, n_rows).

Given that it is a binary classification problem, I expected something like (n_configs, n_rows, n_classes), where n_classes = 2. Is the current setup giving just the probability of one class? If so, I can easily compute the probability of the other class, however, it would be better to output directly both. Otherwise, please let me know what I am missing.

To reproduce the issue:

from tabrepo import load_repository, get_context, EvaluationRepository

context_name = "D244_F3_C1530_100"
repo: EvaluationRepository = load_repository(context_name, cache=True)

shape1 = repo.predict_val_multi(dataset="Australian",fold=0, configs=["CatBoost_r1_BAG_L1", "LightGBM_r41_BAG_L1"]).shape
shape2 = repo.predict_val_multi(dataset="autoUniv-au7-700",fold=0, configs=["CatBoost_r1_BAG_L1", "LightGBM_r41_BAG_L1"]).shape

print("Shape 1:", shape1)
print("Shape 2:", shape2)

Output:

Shape 1: (2, 621)
Shape 2: (2, 630, 3)

Regards,

Sebastian

@geoalgo
Copy link
Collaborator

geoalgo commented May 29, 2024

Sorry for the confusion our doc is indeed to be improved there, we return a tensor with shape (n_configs, n_rows, n_classes) in case of multi-class classification and a tensor with shape (n_configs, n_rows)else for both regression and binary classification.

For binary classification, we return the probability of the first class IIRC.

Thanks for pointing this out, we will improve our doc to avoid this confusion.

@Innixma
Copy link
Collaborator

Innixma commented Jun 13, 2024

Yeah, @geoalgo's answer is correct. The reason we only return the positive class prediction probability is for efficiency. It allows us to halve the memory usage and runtime of the operations. It might be a good idea for us to add a flag that users can set to make it return the multiclass representation though, for ease of use purposes.

@sebastianpinedaar
Copy link
Author

Thanks for the information! Indeed, a flag would be useful, especially if someone wants to write a general code that works for any number of classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants