Search Refactor & Verification #27

bsamseth · 2019-02-10T15:37:30Z

The search logic should get a cleanup with the aim of being more clear and easy to follow.

More importantly, proper verification of the search results should be implemented.

This PR will not be merged until #26 is satisfactorily resolved, and #25 is confirmed to finally be fixed.

Null-move pruning was previously restricted to be used when beta <= CHECKMATE, which essentially was no limit at all. We only do NMP for a potential beta-cutoff, i.e. the null-move-score is >= beta. But we also don't trust null-move search when it gives mate scores. These two things don't match up. So, we only use NMP when beta is less than any mating score. Also added is the limitation that we do not do two successive null-moves. This just amounts to doing a reduced search of the initial position, and won't be much help. It's currently still allowed with more than one null-move in total in a variation though. This is potentially a fix for #25. However, this will not be closed before more rigorous testing has been done.

coveralls · 2019-02-10T15:43:01Z

Coverage increased (+5.7%) to 91.71% when pulling a56b2f8 on search-refactor into 6ed378c on master.

Causes mates to be found way slower, and per now no play strength increase has been proven. Might re-apply after tuning and proper verification.

Leads to fewer nodes searched in benchmark, as it only triggers the NM when evaluation indicates that a NM will give a beta cutoff. The margin could be subject to tuning.

Thought this was a sure thing, but this actually seemed to fix an issue with not finding mates.

Meant to serve as the primary (only?) verification of search, by ensuring that certain puzzles are solved correctly. Other types of "puzzles can easily be implemented in the same fashion.

Some calls where made after save_pv, which would cause the save_pv call to be deleted. Now update_search is called once per node.

4x depth can take forever when the mate is not found. Stick to 2x, which should be enough anyway

Was trying to have stockfish play a move after us each time, but if a mate is found, then there is no move to play!

Still a bit strange with the puzzle tests...

Not quite sure yet why, but the order of update_search calls w.r.t. save_pv calls is quite sensitive. This is the old way, which seems to work.

Even "safe" optimization "-Og" leaves the debugger quite useless...

Also assign `Bound::EXACT` to stalemate/checkmates, as there is no doubt about their score.

Allows use of &, | operators, which are relevant for its use. Fixes compile issue with last commit (which used & on bounds).

Was not being careful with the difference between how mates are interpreted when they are stored vs. when they are retrieved (plies to mate from the root vs. plies to mate from the current position). Passing the value through the lightweight functions added here ensures that this is handled consistently. This issue resulted in bugs where Goldfish reported mates which where shorter than possible. This is another candidate for the bugs seen in #25, which now might be resolved (to be verified).

Disabled in all public commits, but there for easy enable/disable when debugging.

Now checks all reported cases that failed in #25.

codecov · 2019-02-12T17:58:53Z

Codecov Report

Merging #27 into master will increase coverage by 4.97%.
The diff coverage is 91.01%.

@@            Coverage Diff             @@
##           master      #27      +/-   ##
==========================================
+ Coverage   85.64%   90.62%   +4.97%     
==========================================
  Files          20       31      +11     
  Lines        1066     1450     +384     
==========================================
+ Hits          913     1314     +401     
+ Misses        153      136      -17

Impacted Files	Coverage Δ
src/position.cpp	`97.94% <100%> (-0.08%)`	⬇️
include/search.hpp	`100% <100%> (ø)`
src/semaphore.cpp	`100% <100%> (ø)`
include/tt.hpp	`100% <100%> (+7.69%)`	⬆️
src/timer.cpp	`100% <100%> (ø)`
src/searchmanagement.cpp	`85.22% <85.22%> (ø)`
src/search.cpp	`97.29% <92.68%> (ø)`
include/value.hpp	`57.14% <0%> (-42.86%)`	⬇️
include/depth.hpp	`100% <0%> (ø)`	⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ed378c...a56b2f8. Read the comment docs.

Moved all non-search critical functions outside of src/search.cpp and into separate files. This reduces clutter, and makes this (most important) file shorter and easier to comprehend (IMO).

Slightly less code duplication across search/search_root.

Repealed aspiration window search in #27.

No assertions made other than demanding that no errors occur during play, such that a result is found. Not a strict test, but if for some reason the game clock logic breaks, this might catch it.

Forgot to reset on copy construction.

Contains a needed bugfix compared to 0.2.0

Preliminary testing indicates a great improvement of ~100 ELO compared to previous versions. All test matches were bullet, 1 min no increment. The games were played _without_ arbitration, so all games where played out to their end. This was to verify that engines could actually perform a checkmate when given the chance. Will add more games for future revisions,but this seems quite clearly to be a positive change. Strangely, on of the version v1.7.0, v1.8.2 or v1.9.0 crashed after 100 or so rounds. Could be a random thing, but should update chester package to handle this with more information in the future. Hard to tell exactly which part caused the increase, since this PR contains quite a few bugfixes pluss changes. In any case, nice to see that the now less buggy version plays better! Head to head statistics: 1) Goldfish v1.9.0 2353.1 : 430 (+201,=159,-70), 65.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 215 ( 101, 84, 30), 66.5 : +101.6 Goldfish v1.8.2 : 215 ( 100, 75, 40), 64.0 : +119.1 2) Goldfish v1.7.0 2251.5 : 1699 (+383,=989,-327), 51.6 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.9.0 : 215 ( 30, 84, 101), 33.5 : -101.6 Goldfish v1.7.1 : 250 ( 44, 168, 38), 51.2 : +7.7 Goldfish v1.8.0 : 150 ( 32, 84, 34), 49.3 : +14.5 Goldfish v1.8.1 : 500 ( 114, 305, 81), 53.3 : +17.4 Goldfish v1.8.2 : 214 ( 48, 139, 27), 54.9 : +17.5 Goldfish v1.6.0 : 160 ( 47, 85, 28), 55.9 : +32.8 Goldfish v1.5.1 : 160 ( 43, 102, 15), 58.8 : +82.7 Goldfish v1.5 : 10 ( 4, 5, 1), 65.0 : +92.9 Goldfish v1.4 : 10 ( 4, 6, 0), 70.0 : +98.0 Goldfish v1.3 : 10 ( 5, 5, 0), 75.0 : +120.8 Goldfish v1.2 : 10 ( 4, 4, 2), 60.0 : +139.3 Goldfish v1.1 : 10 ( 8, 2, 0), 90.0 : +196.3 3) Goldfish v1.7.1 2243.8 : 477 (+97,=294,-86), 51.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 250 ( 38, 168, 44), 48.8 : -7.7 Goldfish v1.6.0 : 77 ( 19, 45, 13), 53.9 : +25.1 Goldfish v1.7.2 : 150 ( 40, 81, 29), 53.7 : +25.8 4) Goldfish v1.8.0 2237.0 : 650 (+133,=382,-135), 49.8 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 150 ( 34, 84, 32), 50.7 : -14.5 Goldfish v1.8.1 : 500 ( 99, 298, 103), 49.6 : +2.9 5) Goldfish v1.8.1 2234.1 : 1000 (+184,=603,-213), 48.5 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 500 ( 81, 305, 114), 46.7 : -17.4 Goldfish v1.8.0 : 500 ( 103, 298, 99), 50.4 : -2.9 6) Goldfish v1.8.2 2234.0 : 429 (+67,=214,-148), 40.6 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.9.0 : 215 ( 40, 75, 100), 36.0 : -119.1 Goldfish v1.7.0 : 214 ( 27, 139, 48), 45.1 : -17.5 7) Goldfish v1.6.0 2218.7 : 797 (+193,=483,-121), 54.5 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 160 ( 28, 85, 47), 44.1 : -32.8 Goldfish v1.7.1 : 77 ( 13, 45, 19), 46.1 : -25.1 Goldfish v1.5.1 : 260 ( 66, 162, 32), 56.5 : +49.8 Goldfish v1.5 : 260 ( 77, 163, 20), 61.0 : +60.1 Goldfish v1.4 : 10 ( 1, 8, 1), 50.0 : +65.2 Goldfish v1.3 : 10 ( 1, 8, 1), 50.0 : +87.9 Goldfish v1.2 : 10 ( 4, 6, 0), 70.0 : +106.5 Goldfish v1.1 : 10 ( 3, 6, 1), 60.0 : +163.4 8) Goldfish v1.7.2 2218.0 : 150 (+29,=81,-40), 46.3 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.1 : 150 ( 29, 81, 40), 46.3 : -25.8 9) Goldfish v1.5.1 2168.8 : 970 (+145,=631,-194), 47.5 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 160 ( 15, 102, 43), 41.2 : -82.7 Goldfish v1.6.0 : 260 ( 32, 162, 66), 43.5 : -49.8 Goldfish v1.5 : 260 ( 45, 172, 43), 50.4 : +10.2 Goldfish v1.4 : 260 ( 45, 176, 39), 51.2 : +15.3 Goldfish v1.3 : 10 ( 2, 7, 1), 55.0 : +38.1 Goldfish v1.2 : 10 ( 2, 6, 2), 50.0 : +56.6 Goldfish v1.1 : 10 ( 4, 6, 0), 70.0 : +113.6 10) Goldfish v1.5 2158.6 : 1145 (+174,=761,-210), 48.4 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 1, 5, 4), 35.0 : -92.9 Goldfish v1.6.0 : 260 ( 20, 163, 77), 39.0 : -60.1 Goldfish v1.5.1 : 260 ( 43, 172, 45), 49.6 : -10.2 Goldfish v1.4 : 510 ( 88, 352, 70), 51.8 : +5.1 Goldfish v1.3 : 85 ( 12, 61, 12), 50.0 : +27.8 Goldfish v1.2 : 10 ( 4, 6, 0), 70.0 : +46.4 Goldfish v1.1 : 10 ( 6, 2, 2), 70.0 : +103.4 11) Goldfish v1.4 2153.5 : 970 (+164,=646,-160), 50.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 0, 6, 4), 30.0 : -98.0 Goldfish v1.6.0 : 10 ( 1, 8, 1), 50.0 : -65.2 Goldfish v1.5.1 : 260 ( 39, 176, 45), 48.8 : -15.3 Goldfish v1.5 : 510 ( 70, 352, 88), 48.2 : -5.1 Goldfish v1.3 : 60 ( 13, 37, 10), 52.5 : +22.8 Goldfish v1.2 : 60 ( 17, 34, 9), 56.7 : +41.3 Goldfish v1.1 : 60 ( 24, 33, 3), 67.5 : +98.3 12) Goldfish v1.3 2130.7 : 325 (+55,=215,-55), 50.0 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 0, 5, 5), 25.0 : -120.8 Goldfish v1.6.0 : 10 ( 1, 8, 1), 50.0 : -87.9 Goldfish v1.5.1 : 10 ( 1, 7, 2), 45.0 : -38.1 Goldfish v1.5 : 85 ( 12, 61, 12), 50.0 : -27.8 Goldfish v1.4 : 60 ( 10, 37, 13), 47.5 : -22.8 Goldfish v1.2 : 90 ( 17, 61, 12), 52.8 : +18.6 Goldfish v1.1 : 60 ( 14, 36, 10), 53.3 : +75.5 13) Goldfish v1.2 2112.2 : 230 (+37,=141,-52), 46.7 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 2, 4, 4), 40.0 : -139.3 Goldfish v1.6.0 : 10 ( 0, 6, 4), 30.0 : -106.5 Goldfish v1.5.1 : 10 ( 2, 6, 2), 50.0 : -56.6 Goldfish v1.5 : 10 ( 0, 6, 4), 30.0 : -46.4 Goldfish v1.4 : 60 ( 9, 34, 17), 43.3 : -41.3 Goldfish v1.3 : 90 ( 12, 61, 17), 47.2 : -18.6 Goldfish v1.1 : 40 ( 12, 24, 4), 60.0 : +57.0 14) Goldfish v1.1 2055.2 : 232 (+27,=132,-73), 40.1 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.7.0 : 10 ( 0, 2, 8), 10.0 : -196.3 Goldfish v1.6.0 : 10 ( 1, 6, 3), 40.0 : -163.4 Goldfish v1.5.1 : 10 ( 0, 6, 4), 30.0 : -113.6 Goldfish v1.5 : 10 ( 2, 2, 6), 30.0 : -103.4 Goldfish v1.4 : 60 ( 3, 33, 24), 32.5 : -98.3 Goldfish v1.3 : 60 ( 10, 36, 14), 46.7 : -75.5 Goldfish v1.2 : 40 ( 4, 24, 12), 40.0 : -57.0 Goldfish v1.0 : 32 ( 7, 23, 2), 57.8 : +55.2 15) Goldfish v1.0 2000.0 : 32 (+2,=23,-7), 42.2 % vs. : games ( +, =, -), (%) : Diff Goldfish v1.1 : 32 ( 2, 23, 7), 42.2 : -55.2 File: match-history.pgn Total games 4769 - White wins 966 - Draws 2877 - Black wins 925 - Truncated/Discarded 1 Unique head to head 1.59% Reference rating 2000.0 (set to "Goldfish v1.0") players with no games = 0 players with all wins = 0 players w/ all losses = 0 White Advantage = 3.0 Draw Rate (eq.) = 61.8 % # PLAYER : RATING POINTS PLAYED (%) 1 Goldfish v1.9.0 : 2353.1 280.5 430 65 2 Goldfish v1.7.0 : 2251.5 877.5 1699 52 3 Goldfish v1.7.1 : 2243.8 244.0 477 51 4 Goldfish v1.8.0 : 2237.0 324.0 650 50 5 Goldfish v1.8.1 : 2234.1 485.5 1000 49 6 Goldfish v1.8.2 : 2234.0 174.0 429 41 7 Goldfish v1.6.0 : 2218.7 434.5 797 55 8 Goldfish v1.7.2 : 2218.0 69.5 150 46 9 Goldfish v1.5.1 : 2168.8 460.5 970 47 10 Goldfish v1.5 : 2158.6 554.5 1145 48 11 Goldfish v1.4 : 2153.5 487.0 970 50 12 Goldfish v1.3 : 2130.7 162.5 325 50 13 Goldfish v1.2 : 2112.2 107.5 230 47 14 Goldfish v1.1 : 2055.2 93.0 232 40 15 Goldfish v1.0 : 2000.0 13.5 32 42

bsamseth · 2019-02-15T08:22:12Z

Significant strength boost:

    Goldfish v1.9.0 2353.1 :    430 (+201,=159,-70),  65.2 %

    vs.                    :  games (   +,   =,  -),   (%) :    Diff
    Goldfish v1.7.0        :    215 ( 101,  84, 30),  66.5 :  +101.6
    Goldfish v1.8.2        :    215 ( 100,  75, 40),  64.0 :  +119.1

   # PLAYER             :  RATING  POINTS  PLAYED   (%)
   1 Goldfish v1.9.0    :  2353.1   280.5     430    65
   2 Goldfish v1.7.0    :  2251.5   877.5    1699    52
   3 Goldfish v1.7.1    :  2243.8   244.0     477    51
   4 Goldfish v1.8.0    :  2237.0   324.0     650    50
   5 Goldfish v1.8.1    :  2234.1   485.5    1000    49
   6 Goldfish v1.8.2    :  2234.0   174.0     429    41
   7 Goldfish v1.6.0    :  2218.7   434.5     797    55
   8 Goldfish v1.7.2    :  2218.0    69.5     150    46
   9 Goldfish v1.5.1    :  2168.8   460.5     970    47
  10 Goldfish v1.5      :  2158.6   554.5    1145    48
  11 Goldfish v1.4      :  2153.5   487.0     970    50
  12 Goldfish v1.3      :  2130.7   162.5     325    50
  13 Goldfish v1.2      :  2112.2   107.5     230    47
  14 Goldfish v1.1      :  2055.2    93.0     232    40
  15 Goldfish v1.0      :  2000.0    13.5      32    42

bsamseth · 2019-02-15T08:27:27Z

This PR fixes #25, at least to the confidence provided by the testing that is now done.

This also fixes #26 with the addition of search tests. This test suite can always be extended, and the current setup makes this very easy via simply providing an EDP describing a puzzle.

With the above ELO strength improvement that has resulted from this PR, I'm considering this to be satisfactory for a merge into master, closing the two issues mentioned.

bsamseth added 2 commits February 10, 2019 16:19

Keep track of moves made.

faca376

bsamseth added 27 commits February 11, 2019 09:48

Repeal aspiration window search.

2dc3f8c

Causes mates to be found way slower, and per now no play strength increase has been proven. Might re-apply after tuning and proper verification.

Only do NMP when eval indicates it.

d1471b8

Leads to fewer nodes searched in benchmark, as it only triggers the NM when evaluation indicates that a NM will give a beta cutoff. The margin could be subject to tuning.

Store mate/draw with current depth, not MAX.

59c2290

Thought this was a sure thing, but this actually seemed to fix an issue with not finding mates.

Clean up bound determining in search

4eb5220

Fix unused variable warning.

9375007

Add puzzle tests for search verification.

10429da

Meant to serve as the primary (only?) verification of search, by ensuring that certain puzzles are solved correctly. Other types of "puzzles can easily be implemented in the same fashion.

Update puzzle tests to play nice with unittest

f27766b

Set up build dir smart if no dir is present.

1e11ed6

Add search verification to CI

d17a96c

Fix CI

cf40f0a

Non-functional change

37d2a22

Fix update_search calls

b1f28b2

Some calls where made after save_pv, which would cause the save_pv call to be deleted. Now update_search is called once per node.

Store correct depth for null value entries

2afe4a1

Change search limits in puzzles.

5ed86d4

4x depth can take forever when the mate is not found. Stick to 2x, which should be enough anyway

Fix bug in multi-variation mate distance puzzle

2919e7d

Was trying to have stockfish play a move after us each time, but if a mate is found, then there is no move to play!

Attempt to fix sudden abort due to update_search

da91cca

Still a bit strange with the puzzle tests...

Revert update_search confusion

5d829e4

Not quite sure yet why, but the order of update_search calls w.r.t. save_pv calls is quite sensitive. This is the old way, which seems to work.

Remove all optimization in debug mode.

735bd53

Even "safe" optimization "-Og" leaves the debugger quite useless...

Consolidate bound type assignment

7e11573

Also assign `Bound::EXACT` to stalemate/checkmates, as there is no doubt about their score.

Simplify ttable lookup logic

dc4971f

Allow Bound to be used as integer

bc6f96b

Allows use of &, | operators, which are relevant for its use. Fixes compile issue with last commit (which used & on bounds).

Add optional debug logging to puzzle tests.

117241f

Disabled in all public commits, but there for easy enable/disable when debugging.

Add more puzzles to search verification.

13ac30e

Now checks all reported cases that failed in #25.

Include search verification in coverage report.

52a267b

Use make target for bench in CI

b785278

Fix CI

f0edccf

bsamseth added 13 commits February 12, 2019 19:27

Add codecov badge to README

2ecba25

Fragment src/search.cpp

44d265b

Moved all non-search critical functions outside of src/search.cpp and into separate files. This reduces clutter, and makes this (most important) file shorter and easier to comprehend (IMO).

Add new source files to source list.

d542035

Split out PVS as separate function.

2560c62

Slightly less code duplication across search/search_root.

Add non-mate test positions to search verification

96c0ef9

Bench before calculating coverage

bae0cd1

Update todo-list

4be34ed

Repealed aspiration window search in #27.

Add tests of node- and time limited search.

3aa38ee

Add pipfile to handle python deps

4b0f8c1

Add test for playing matches.

909a37c

No assertions made other than demanding that no errors occur during play, such that a result is found. Not a strict test, but if for some reason the game clock logic breaks, this might catch it.

Bump version number to v1.9.0

5ada898

Fix bug with move_count implementation.

349ecbf

Forgot to reset on copy construction.

Remove unused import

13bc0b1

bsamseth added this to the v1.9.0 milestone Feb 14, 2019

Repository owner deleted a comment Feb 14, 2019

bsamseth added 2 commits February 15, 2019 06:36

Update chester dep to v0.2.1

807eb58

Contains a needed bugfix compared to 0.2.0

bsamseth merged commit 112802e into master Feb 15, 2019

bsamseth deleted the search-refactor branch February 15, 2019 08:32

bsamseth mentioned this pull request Feb 15, 2019

Add tests of search logic #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Refactor & Verification #27

Search Refactor & Verification #27

bsamseth commented Feb 10, 2019 •

edited

coveralls commented Feb 10, 2019 •

edited

codecov bot commented Feb 12, 2019 •

edited

bsamseth commented Feb 15, 2019

bsamseth commented Feb 15, 2019 •

edited

Search Refactor & Verification #27

Search Refactor & Verification #27

Conversation

bsamseth commented Feb 10, 2019 • edited

coveralls commented Feb 10, 2019 • edited

codecov bot commented Feb 12, 2019 • edited

Codecov Report

bsamseth commented Feb 15, 2019

bsamseth commented Feb 15, 2019 • edited

bsamseth commented Feb 10, 2019 •

edited

coveralls commented Feb 10, 2019 •

edited

codecov bot commented Feb 12, 2019 •

edited

bsamseth commented Feb 15, 2019 •

edited