Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change "best" model_selection to include a loss threshold #177

Merged
merged 7 commits into from
Aug 10, 2022

Conversation

MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Aug 10, 2022

In PySR v0.10.0, this should change the definition of model_selection="best" to also include a loss threshold. Sometimes, just returning the equation with the max score will have a very bad loss - perhaps because a very simple equation just happens to have a large derivative in the loss-complexity curve.

This PR changes this so that only equations with loss < 1.5 * min_loss are considered. This is what was used by PySR in the GECCO contest and seemed to work well in that situation.

To select purely based on score, one can now use model_selection="score"

This PR also refactors the model_selection to avoid shotgun edits.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Aug 10, 2022

  • Should add a unit-test with a manually-created dataframe so that the model selection strategies work as expected.

@MilesCranmer MilesCranmer merged commit 35e6ab1 into master Aug 10, 2022
@MilesCranmer MilesCranmer deleted the improved-model-selection branch November 4, 2022 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant