New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing reminder #574
Comments
@ddugovic While I welcome that you now consider to do regression testing, testing a trivial merge like bd3e681 with 10000 games for all variants in my opinion is massively overdoing things. There is a wide range between testing nothing and testing everything. First of all one should be clear about the goal. E.g., I think restricting detection of regressions to >=20 Elo is very reasonable, and 1000 games are already enough for that (a threshold of ~50 Elo and limiting to something like 400 games could also make sense, as investigating 20 Elo regressions might not be worth the time). In any case >>100 Elo regressions can be detected easily. E.g., my policy for regression testing of upstream merges from official SF for Fairy-Stockfish has two parts:
Such a combination between targeted and regular testing in my opinion is a good compromise between coverage and effort of regression testing. |
That's a fine policy, although the first part is more than I can understand. I would make claims about the tests I run before pushing to GitHub, but there is no value in arguing. Progress building tests into Travis CI (checking for large mistakes; incidentally this means I will also tackle #158 by creating "puzzle" EPDs for all variants) is gradual:
|
Detecting regressions that would otherwise be missed is not the point of the targeted tests for me. Running regular regression tests for all variants already ensures that all significant regressions are found, so the targeted tests are only to detect obvious potential regressions early and to avoid time consuming bisection later on, but they of course require making good guesses about the potential regressions, otherwise they are not more efficient than bisection. This is just my suggestion since it works well for me, and if you do not feel comfortable with it there of course also are many other good ways to test. |
is everything okay |
During merging I have concerns which I need to write down and act upon; or I need to develop a habit of blindly testing.
As the repository's owner I need to remind myself to frequently submit tests; relying on my bot to measure Elo swings after a mis-merge is proving to be insufficient. (I want to offer to transfer ownership, but that would be grossly unfair to the new owner; besides, the current owner/maintainer arrangement is working well despite my incompetence.)
I have enjoyed owning this project each time I added a variant, or competed with other engines (crazyhouse, atomic) although I'm likely done adding variants and NN engines are too strong. My most recent variant additions are:
The text was updated successfully, but these errors were encountered: