Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse engineer fold indices to simulate cross-validation early stopping #59

Open
2 tasks
Tracked by #60
Innixma opened this issue May 7, 2024 · 0 comments
Open
2 tasks
Tracked by #60
Assignees

Comments

@Innixma
Copy link
Collaborator

Innixma commented May 7, 2024

We know that all folds were generated with the same seed in TabRepo's current suite (AutoGluon default seed), so we can go back and get those indices for each dataset for the purposes of simulating cross-validation early stopping at the fold level.

Reference Paper: "Don’t Waste Your Time: Early Stopping Cross-Validation"

  • Check for datasets with <800 rows how many folds were used in the training logs. It might be using 8 always, or otherwise it could be using the old logic that uses between 5-8 folds depending on the number of rows.
  • Special handling for RandomForest, ExtraTrees, KNN where we did not use traditional folds when fitting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant