-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Refactor & Verification #27
Conversation
Null-move pruning was previously restricted to be used when beta <= CHECKMATE, which essentially was no limit at all. We only do NMP for a potential beta-cutoff, i.e. the null-move-score is >= beta. But we also don't trust null-move search when it gives mate scores. These two things don't match up. So, we only use NMP when beta is less than any mating score. Also added is the limitation that we do not do two successive null-moves. This just amounts to doing a reduced search of the initial position, and won't be much help. It's currently still allowed with more than one null-move in total in a variation though. This is potentially a fix for #25. However, this will not be closed before more rigorous testing has been done.
Causes mates to be found way slower, and per now no play strength increase has been proven. Might re-apply after tuning and proper verification.
Leads to fewer nodes searched in benchmark, as it only triggers the NM when evaluation indicates that a NM will give a beta cutoff. The margin could be subject to tuning.
Thought this was a sure thing, but this actually seemed to fix an issue with not finding mates.
Meant to serve as the primary (only?) verification of search, by ensuring that certain puzzles are solved correctly. Other types of "puzzles can easily be implemented in the same fashion.
Some calls where made after save_pv, which would cause the save_pv call to be deleted. Now update_search is called once per node.
4x depth can take forever when the mate is not found. Stick to 2x, which should be enough anyway
Was trying to have stockfish play a move after us each time, but if a mate is found, then there is no move to play!
Still a bit strange with the puzzle tests...
Not quite sure yet why, but the order of update_search calls w.r.t. save_pv calls is quite sensitive. This is the old way, which seems to work.
Even "safe" optimization "-Og" leaves the debugger quite useless...
Also assign `Bound::EXACT` to stalemate/checkmates, as there is no doubt about their score.
Allows use of &, | operators, which are relevant for its use. Fixes compile issue with last commit (which used & on bounds).
Was not being careful with the difference between how mates are interpreted when they are stored vs. when they are retrieved (plies to mate from the root vs. plies to mate from the current position). Passing the value through the lightweight functions added here ensures that this is handled consistently. This issue resulted in bugs where Goldfish reported mates which where shorter than possible. This is another candidate for the bugs seen in #25, which now might be resolved (to be verified).
Disabled in all public commits, but there for easy enable/disable when debugging.
Now checks all reported cases that failed in #25.
Codecov Report
@@ Coverage Diff @@
## master #27 +/- ##
==========================================
+ Coverage 85.64% 90.62% +4.97%
==========================================
Files 20 31 +11
Lines 1066 1450 +384
==========================================
+ Hits 913 1314 +401
+ Misses 153 136 -17
Continue to review full report at Codecov.
|
Moved all non-search critical functions outside of src/search.cpp and into separate files. This reduces clutter, and makes this (most important) file shorter and easier to comprehend (IMO).
Slightly less code duplication across search/search_root.
Repealed aspiration window search in #27.
No assertions made other than demanding that no errors occur during play, such that a result is found. Not a strict test, but if for some reason the game clock logic breaks, this might catch it.
Forgot to reset on copy construction.
Contains a needed bugfix compared to 0.2.0
Preliminary testing indicates a great improvement of ~100 ELO compared to previous versions. All test matches were bullet, 1 min no increment. The games were played _without_ arbitration, so all games where played out to their end. This was to verify that engines could actually perform a checkmate when given the chance. Will add more games for future revisions,but this seems quite clearly to be a positive change. Strangely, on of the version v1.7.0, v1.8.2 or v1.9.0 crashed after 100 or so rounds. Could be a random thing, but should update chester package to handle this with more information in the future. Hard to tell exactly which part caused the increase, since this PR contains quite a few bugfixes pluss changes. In any case, nice to see that the now less buggy version plays better! Head to head statistics: 1) Goldfish v1.9.0 2353.1 : 430 (+201,=159,-70), 65.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 215 ( 101, 84, 30), 66.5 : +101.6 Goldfish v1.8.2 : 215 ( 100, 75, 40), 64.0 : +119.1 2) Goldfish v1.7.0 2251.5 : 1699 (+383,=989,-327), 51.6 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.9.0 : 215 ( 30, 84, 101), 33.5 : -101.6 Goldfish v1.7.1 : 250 ( 44, 168, 38), 51.2 : +7.7 Goldfish v1.8.0 : 150 ( 32, 84, 34), 49.3 : +14.5 Goldfish v1.8.1 : 500 ( 114, 305, 81), 53.3 : +17.4 Goldfish v1.8.2 : 214 ( 48, 139, 27), 54.9 : +17.5 Goldfish v1.6.0 : 160 ( 47, 85, 28), 55.9 : +32.8 Goldfish v1.5.1 : 160 ( 43, 102, 15), 58.8 : +82.7 Goldfish v1.5 : 10 ( 4, 5, 1), 65.0 : +92.9 Goldfish v1.4 : 10 ( 4, 6, 0), 70.0 : +98.0 Goldfish v1.3 : 10 ( 5, 5, 0), 75.0 : +120.8 Goldfish v1.2 : 10 ( 4, 4, 2), 60.0 : +139.3 Goldfish v1.1 : 10 ( 8, 2, 0), 90.0 : +196.3 3) Goldfish v1.7.1 2243.8 : 477 (+97,=294,-86), 51.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 250 ( 38, 168, 44), 48.8 : -7.7 Goldfish v1.6.0 : 77 ( 19, 45, 13), 53.9 : +25.1 Goldfish v1.7.2 : 150 ( 40, 81, 29), 53.7 : +25.8 4) Goldfish v1.8.0 2237.0 : 650 (+133,=382,-135), 49.8 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 150 ( 34, 84, 32), 50.7 : -14.5 Goldfish v1.8.1 : 500 ( 99, 298, 103), 49.6 : +2.9 5) Goldfish v1.8.1 2234.1 : 1000 (+184,=603,-213), 48.5 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 500 ( 81, 305, 114), 46.7 : -17.4 Goldfish v1.8.0 : 500 ( 103, 298, 99), 50.4 : -2.9 6) Goldfish v1.8.2 2234.0 : 429 (+67,=214,-148), 40.6 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.9.0 : 215 ( 40, 75, 100), 36.0 : -119.1 Goldfish v1.7.0 : 214 ( 27, 139, 48), 45.1 : -17.5 7) Goldfish v1.6.0 2218.7 : 797 (+193,=483,-121), 54.5 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 160 ( 28, 85, 47), 44.1 : -32.8 Goldfish v1.7.1 : 77 ( 13, 45, 19), 46.1 : -25.1 Goldfish v1.5.1 : 260 ( 66, 162, 32), 56.5 : +49.8 Goldfish v1.5 : 260 ( 77, 163, 20), 61.0 : +60.1 Goldfish v1.4 : 10 ( 1, 8, 1), 50.0 : +65.2 Goldfish v1.3 : 10 ( 1, 8, 1), 50.0 : +87.9 Goldfish v1.2 : 10 ( 4, 6, 0), 70.0 : +106.5 Goldfish v1.1 : 10 ( 3, 6, 1), 60.0 : +163.4 8) Goldfish v1.7.2 2218.0 : 150 (+29,=81,-40), 46.3 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.1 : 150 ( 29, 81, 40), 46.3 : -25.8 9) Goldfish v1.5.1 2168.8 : 970 (+145,=631,-194), 47.5 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 160 ( 15, 102, 43), 41.2 : -82.7 Goldfish v1.6.0 : 260 ( 32, 162, 66), 43.5 : -49.8 Goldfish v1.5 : 260 ( 45, 172, 43), 50.4 : +10.2 Goldfish v1.4 : 260 ( 45, 176, 39), 51.2 : +15.3 Goldfish v1.3 : 10 ( 2, 7, 1), 55.0 : +38.1 Goldfish v1.2 : 10 ( 2, 6, 2), 50.0 : +56.6 Goldfish v1.1 : 10 ( 4, 6, 0), 70.0 : +113.6 10) Goldfish v1.5 2158.6 : 1145 (+174,=761,-210), 48.4 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 1, 5, 4), 35.0 : -92.9 Goldfish v1.6.0 : 260 ( 20, 163, 77), 39.0 : -60.1 Goldfish v1.5.1 : 260 ( 43, 172, 45), 49.6 : -10.2 Goldfish v1.4 : 510 ( 88, 352, 70), 51.8 : +5.1 Goldfish v1.3 : 85 ( 12, 61, 12), 50.0 : +27.8 Goldfish v1.2 : 10 ( 4, 6, 0), 70.0 : +46.4 Goldfish v1.1 : 10 ( 6, 2, 2), 70.0 : +103.4 11) Goldfish v1.4 2153.5 : 970 (+164,=646,-160), 50.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 0, 6, 4), 30.0 : -98.0 Goldfish v1.6.0 : 10 ( 1, 8, 1), 50.0 : -65.2 Goldfish v1.5.1 : 260 ( 39, 176, 45), 48.8 : -15.3 Goldfish v1.5 : 510 ( 70, 352, 88), 48.2 : -5.1 Goldfish v1.3 : 60 ( 13, 37, 10), 52.5 : +22.8 Goldfish v1.2 : 60 ( 17, 34, 9), 56.7 : +41.3 Goldfish v1.1 : 60 ( 24, 33, 3), 67.5 : +98.3 12) Goldfish v1.3 2130.7 : 325 (+55,=215,-55), 50.0 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 0, 5, 5), 25.0 : -120.8 Goldfish v1.6.0 : 10 ( 1, 8, 1), 50.0 : -87.9 Goldfish v1.5.1 : 10 ( 1, 7, 2), 45.0 : -38.1 Goldfish v1.5 : 85 ( 12, 61, 12), 50.0 : -27.8 Goldfish v1.4 : 60 ( 10, 37, 13), 47.5 : -22.8 Goldfish v1.2 : 90 ( 17, 61, 12), 52.8 : +18.6 Goldfish v1.1 : 60 ( 14, 36, 10), 53.3 : +75.5 13) Goldfish v1.2 2112.2 : 230 (+37,=141,-52), 46.7 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 2, 4, 4), 40.0 : -139.3 Goldfish v1.6.0 : 10 ( 0, 6, 4), 30.0 : -106.5 Goldfish v1.5.1 : 10 ( 2, 6, 2), 50.0 : -56.6 Goldfish v1.5 : 10 ( 0, 6, 4), 30.0 : -46.4 Goldfish v1.4 : 60 ( 9, 34, 17), 43.3 : -41.3 Goldfish v1.3 : 90 ( 12, 61, 17), 47.2 : -18.6 Goldfish v1.1 : 40 ( 12, 24, 4), 60.0 : +57.0 14) Goldfish v1.1 2055.2 : 232 (+27,=132,-73), 40.1 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 0, 2, 8), 10.0 : -196.3 Goldfish v1.6.0 : 10 ( 1, 6, 3), 40.0 : -163.4 Goldfish v1.5.1 : 10 ( 0, 6, 4), 30.0 : -113.6 Goldfish v1.5 : 10 ( 2, 2, 6), 30.0 : -103.4 Goldfish v1.4 : 60 ( 3, 33, 24), 32.5 : -98.3 Goldfish v1.3 : 60 ( 10, 36, 14), 46.7 : -75.5 Goldfish v1.2 : 40 ( 4, 24, 12), 40.0 : -57.0 Goldfish v1.0 : 32 ( 7, 23, 2), 57.8 : +55.2 15) Goldfish v1.0 2000.0 : 32 (+2,=23,-7), 42.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.1 : 32 ( 2, 23, 7), 42.2 : -55.2 File: match-history.pgn Total games 4769 - White wins 966 - Draws 2877 - Black wins 925 - Truncated/Discarded 1 Unique head to head 1.59% Reference rating 2000.0 (set to "Goldfish v1.0") players with no games = 0 players with all wins = 0 players w/ all losses = 0 White Advantage = 3.0 Draw Rate (eq.) = 61.8 % # PLAYER : RATING POINTS PLAYED (%) 1 Goldfish v1.9.0 : 2353.1 280.5 430 65 2 Goldfish v1.7.0 : 2251.5 877.5 1699 52 3 Goldfish v1.7.1 : 2243.8 244.0 477 51 4 Goldfish v1.8.0 : 2237.0 324.0 650 50 5 Goldfish v1.8.1 : 2234.1 485.5 1000 49 6 Goldfish v1.8.2 : 2234.0 174.0 429 41 7 Goldfish v1.6.0 : 2218.7 434.5 797 55 8 Goldfish v1.7.2 : 2218.0 69.5 150 46 9 Goldfish v1.5.1 : 2168.8 460.5 970 47 10 Goldfish v1.5 : 2158.6 554.5 1145 48 11 Goldfish v1.4 : 2153.5 487.0 970 50 12 Goldfish v1.3 : 2130.7 162.5 325 50 13 Goldfish v1.2 : 2112.2 107.5 230 47 14 Goldfish v1.1 : 2055.2 93.0 232 40 15 Goldfish v1.0 : 2000.0 13.5 32 42
Significant strength boost:
|
This PR fixes #25, at least to the confidence provided by the testing that is now done. This also fixes #26 with the addition of search tests. This test suite can always be extended, and the current setup makes this very easy via simply providing an EDP describing a puzzle. With the above ELO strength improvement that has resulted from this PR, I'm considering this to be satisfactory for a merge into master, closing the two issues mentioned. |
The search logic should get a cleanup with the aim of being more clear and easy to follow.
More importantly, proper verification of the search results should be implemented.
This PR will not be merged until #26 is satisfactorily resolved, and #25 is confirmed to finally be fixed.