Crazyhouse: missed mate in 4 #354

Vinvin20 · 2017-05-25T11:07:22Z

opperwezen won one game after many losts.
SF-Lvl8 and the server analyze overlooked this very nice 15...Rg1!! (found by SF after about 17 millions nodes)
https://fr.lichess.org/1JQ3DgMe#29

-> this game is very interesting to tune search and eval : https://fr.lichess.org/TFT5J2FS#66
A lot of up and down after analyze.
Especially this very strange 45...gxh5??? https://fr.lichess.org/TFT5J2FS#89

Note that opperwezen plays a lot of 25+2 and 10+2 these days : https://fr.lichess.org/@/opperwezen/search?perf=18&hasAi=1&aiLevelMin=8&sort.field=d&sort.order=desc

ddugovic · 2017-05-25T13:08:22Z

Bear in mind even at AI level 8, Lichess caps the maximum search depth and thinking time. So unless the AI selects an easily refuted bad move (as it did with 45...gxh5??) there isn't a bug.

Vinvin20 · 2017-05-25T13:21:07Z

I'm interesting in computer chess for more than 25 years, I never forget this kind of things.

ddugovic · 2017-06-23T08:07:47Z

Author of http://chessvariants.training/ supplied me with a collection of 4,648 crazyhouse checkmate puzzles. I am using these to help identify whether there is a bug.

Vinvin20 · 2017-06-23T08:22:09Z

Very nice !

niklasf · 2017-06-23T09:02:20Z

Nice indeed, but note that afaik most puzzles were created using this Stockfish in the first place.

ddugovic · 2017-06-25T15:48:10Z

True. I found that (at the rate of 1 second/puzzle) the error rate is halved by a simplifying material-based singular extensions since a newer upstream change works well; however that simplification loses Elo overall.

So unless I'm missing something the problem of "Stockfish occasionally misses mate in 4" is not a solvable problem. Stockfish is a strong engine, it's just bad at analyzing crazyhouse checkmates.

ianfab · 2017-06-25T18:51:48Z

In my experience, writing a patch that improves Stockfish specifically for certain positions and gains Elo overall is very difficult, but simply improving Elo in general will over time also improve play in specific positions. E.g., I did not design patches for SF to choose better antichess openings, but nevertheless over time it has started to play 1. e3, to avoid (1. e3 b5 2. Bxb5) Bb7, etc. I have observed a similar trend for crazyhouse mating combinations in the past. IMO, if SF can not find some mating combinations that is not really an issue, since you will always be able to find such positions (even for standard chess, although on a different level), but if an earlier version is way better at the same task, it should be checked what caused this and whether it can be fixed.

ddugovic · 2017-07-02T22:05:23Z

Thanks to #360 being resolved, Stockfish now scores 4392.33 / 4648 on the chessvariants.training test suite (at 4 threads, 1 second per puzzle) consisting of mate-in-[2-6] puzzles. Previously it scored 4357.17 / 4648.

When there's a lull in the queue, I'll redo the test suite with today's crazyhouse improvements and attempt to identify "easy puzzles" that it misses & identify another pattern.

ianfab · 2017-07-02T22:34:41Z

Apropos, it would be a nice feature to support test suites on fishtest. I'll note that down, although I do not think that I will have time to implement that any time soon.

Vinvin20 · 2017-07-03T08:50:19Z

Thanks to #360 being resolved, Stockfish now scores 4392.33 / 4648 on the chessvariants.training test suite (at 4 threads, 1 second per puzzle)

Do you mean 4 threads for 1 engine ?
4 (or 2) threads configuration finds solutions more easily sometimes.
It's interesting to test in single thread too.

ddugovic · 2017-07-03T12:52:07Z

Do you mean 4 threads for 1 engine ?
4 (or 2) threads configuration finds solutions more easily sometimes.
It's interesting to test in single thread too.

I'm going to assume you mean the following:

2 engines in parallel each set at 2 threads/engine and 2 seconds/puzzle

versus:

1 engine each set at 4 threads and 1 second/puzzle

and while I agree the former may be more accurate, it's more work to set up because I want to be able to easily compare the output results (in the same order).

Vinvin20 · 2017-07-03T13:04:25Z

Yes, I meant "the best is to set only 1 thread inside each engine" (as you did if I understand well).

ddugovic · 2017-07-16T23:52:09Z

Stockfish is rapidly improving at solving for mate. The most recent puzzle it struggles with is:

setoption name Threads value 1
setoption name MultiPV value 1
setoption name UCI_Variant value crazyhouse
position fen r2q1rk1/pp3ppp/2p1p3/3p4/3p4/3BPP1P/PPPKN1n1/R6R[BNPqbbn] b - - 0 1
d
go movetime 10000

ddugovic · 2017-07-17T00:19:21Z

After some thought, I have created two short tuning sessions (perhaps these should be tuned together, I don't know, that really depends upon the "shape" of the Elo gain curve in the N-dimensional parameter space):
http://35.161.250.236:6543/tests/view/596c01bc6e23db67e90ddaf6
http://35.161.250.236:6543/tests/view/596c02246e23db67e90ddaf8

Vinvin20 · 2017-07-17T18:18:48Z

The values didn't changed a lot after tuning, are they ?

ianfab · 2017-07-17T18:28:10Z

@Vinvin20 He stopped the two tuning sessions early and combined them in http://35.161.250.236:6543/tests/view/596c9ffd6e23db67e90ddb01. If the change is still small, decreasing the tuning parameter A or increasing the number of games might help.

ddugovic · 2017-07-17T19:36:55Z

Indeed, initially I assumed that tuning them together would be much too large a change, but so far that isn't the case...

I think I'll double the number of games in this same session (without doubling A mid-session) which might produce a more precise result. I think the SPSA documentation cautions this algorithm only produces approximations, but increasing the game count (or decreasing A) could in principle yield a more precise approximation.

ianfab · 2017-07-17T20:02:39Z

@ddugovic Increasing the number of games does not work for a started tuning session, since the number of iterations is not updated if I remember correctly, so it is necessary to resubmit it with a larger number of games. I'll have a look at the code to see whether this can be fixed easily to avoid wasting resources.

ddugovic · 2017-07-17T20:09:21Z

@ianfab Oops. Well, worst case I can copy the test results, manually translate them into the input format, and submit a new test with A reduced (though I'm unsure to what).

ianfab · 2017-07-17T20:36:59Z

@ddugovic Right. After a quick look at the code I am not sure what will happen after reaching the 10000 games limit, so let's see.

ddugovic · 2017-07-18T11:52:38Z

Also, in atomic chess allowing a ShelterWeakness value to be negative proved useful. I'm trying that (using SPSA on my PC) right now...

[Main]
Simulate = 0
Variables = crazyhouse.var
Log = crazyhouse.log
GameLog = crazyhouse_$THREAD.log
Iterations = 10000
A = 1000
Gamma = 0.101
Alpha = 0.602

[Engine]
Engine1 = ./stockfish
Engine2 = ./stockfish
EPDBook = ./books/crazyhouse.epd
BaseTime = 1000
IncTime = 50
Concurrency = 4
DrawScoreLimit = 4
DrawMoveLimit = 8
WinScoreLimit = 650
WinMoveLimit = 8
Variant = crazyhouse

ddugovic · 2017-07-21T11:14:37Z

I've set aside my tweaks as they aren't making a significant impact.

ddugovic · 2017-07-22T17:26:13Z

I have created a suite of 100 challenging puzzles for Stockfish to solve!

Vinvin20 · 2017-07-23T17:13:04Z

Don't you have the shortest distance to mate ?
How does the current SF on this set ?

ddugovic · 2017-07-25T07:58:50Z

I'm unsure that my generated solutions are even correct. Assume DTM=6.
About 30% at last test (single-threaded, 1 second/puzzle), but I haven't tested with latest master.

Vinvin20 · 2017-07-27T18:37:45Z

I checked the first 10 positions.
I copy/paste the analyzed there.

I found some more solutions and improvements :
Position 3 : Nf5 is mate in 5
Position 7 : p@h5 is mate in 5
Position 8 : R@g8 is mate in 5 too
Position 9 : Qxg4 mate in 6

The incomplete solutions could explained why SF get a so low score ;-)

rpdelaney · 2017-07-28T20:02:45Z

@ddugovic I have quite a few more puzzles than we have currently published on chessvariants.training; if you would find them useful I could supply them to you as well.

ianfab · 2017-07-28T20:20:18Z

The most important thing is to have correct puzzles, i.e. there is only one solution or all winning moves are given in the EPD. If such puzzles are available, that would be very useful.

rpdelaney · 2017-07-28T20:58:43Z

@ianfab Mating combinations with exactly one winning line are extremely rare in crazyhouse. My puzzle generator builds a solution tree in json with all of the mating lines found by stockfish.

ianfab · 2017-07-28T21:17:55Z

@rpdelaney I meant that the first move has to be an only move (or the alternatives have to be given), but the subsequent moves can contain alternative solutions, since it usually is sufficient to only check the first move when running a test suite.

rpdelaney · 2017-07-28T21:19:50Z

@ianfab Alternatives are returned at every level, including the first move. Only those lines that mate equally fast as the fastest mating line are returned, though.

ianfab · 2017-07-28T21:35:11Z

Finding the fastest mate is also an interesting task, but it probably is not a good indicator for general playing strength, so I am hesitant to use such problems for optimizing stockfish, but nevertheless the puzzles might be interesting.

rpdelaney · 2017-07-28T21:39:02Z

However, as @niklasf pointed out, they were generated using this version of stockfish to begin with. So I'm not sure what use they would be for debugging stockfish.

ianfab · 2017-07-28T22:21:03Z

Yes, that's clearly a limitation. It might still be useful if Stockfish can be tuned towards achieving similar results as at much longer time control (perhaps at least 1-2 orders of magnitude), but calculating a set of all winning moves is important for that, I think, because it otherwise gets overfitted towards finding the fastest solution.

ddugovic · 2017-07-29T00:46:47Z

It's true that finding the fastest mate is more expensive than finding a strong move. Given that the success rate is already over 94% I'm not interested in tuning to maximize that result, but rather to help identify new ideas for trying in the testing queue.

Vinvin20 · 2017-07-29T18:00:02Z

The incomplete solutions could explained why SF get a so low score ;-)

We need a script to make SF analyze all the positions, 10 best moves, 1 core, 2 minutes per positions.

rpdelaney · 2017-07-29T19:08:47Z

@Vinvin20 This seems to work, but I haven't tested it much https://gist.github.com/rpdelaney/7e7e19ea1f6ed6cb2a67c03137e0f040

ddugovic · 2017-07-29T20:12:10Z

@Vinvin20 Here is another option:
https://github.com/niklasf/python-chess/blob/master/examples/bratko_kopec/bratko_kopec.py

ppigazzini · 2017-08-07T12:22:06Z

@rpdelaney JHellis has derived Mate Finder from Official SF. Perhaps some of those tricks can be useful for a "Multi Variant Mate Finder" to be used to build some harder mates for Multi Variant SF.

jhellis3 · 2017-08-07T17:39:04Z

I'm not sure how many of my changes in Matefinder would be generally applicable, however, the changes to LMR are probably the most likely to improve the tactical awareness of the engine in a general sense:

     && !(depth >= 16 * ONE_PLY && ss->ply <= 3 * ONE_PLY))

     if (newDepth - r + 8 * ONE_PLY < thisThread->rootDepth)
         r = std::min(r, 3 * ONE_PLY);

Both are likely to cost Elo (due to longer TTD)... but you are much less likely to miss tactical shots.

Vinvin20 · 2017-08-23T08:09:47Z

Both are likely to cost Elo (due to longer TTD)... but you are much less likely to miss tactical shots.

Crazyhouse is way more tactical than chess.
After 5-10 moves, each side begin to attack heavily around the king.
One missed move and it can be over a couple of moves later.

ddugovic · 2017-08-23T09:39:53Z

Right, although one missed move isn't necessarily a tactical shot move -- in general we measure Elo gain of patches.

I do wonder whether adding a "Study" mode similar to matefinder could be useful for puzzle generation and puzzle-solving.

ddugovic · 2017-08-27T17:17:56Z

I have updated the 100 challenging puzzles link. Currently SF scores 96% on the chessvariants.training crazyhouse set, and of these 100 it fails it scores 40% on a second attempt.

Again, the goal isn't to score 100%, but to find low-cost improvements.

ianfab · 2017-09-13T17:42:07Z

According to a test that uses @ddugovic's set of 100 puzzles as starting positions and some local tests with difficult mate combinations, mate finding has (to my surprise) improved a lot with the crazyhouse qsearch patch.

ppigazzini · 2017-09-15T07:11:32Z

@ianfab from the regression test it seems that the patch a16ebb0 performs better with LTC

ddugovic · 2017-09-15T07:18:05Z

Wow, did this release gain 121 Elo over the previous release?

ianfab · 2017-09-15T09:59:46Z

@ppigazzini @ddugovic
This Elo gain is expected, but nevertheless an awesome result considering that improvement had slowed down before. Search for "crazyhouse" on the diff page niklasf/Stockfish@4257b04...af68133 to find the relevant patches that gave the improvement. The improvements mainly came from the qsearch patch (~50-60 Elo), PSQT tweaks (~20-30 Elo), and king safety related parameter tweaks (~20-30 Elo). (The Elo gains are just rough estimates from the SPRT results at LTC, no direct measurements.)

Vinvin20 · 2017-09-15T12:10:34Z

Great ! Incredible improvement !
I see often Yasser Seirawan playing ZH against SF Level 7 : https://lichess.org/@/yasser-seirawan/search?hasAi=1&sort.field=d&sort.order=desc
I'll watch more closely with this improvement !
Very good job from the whole team !

ddugovic · 2017-09-16T15:31:31Z

Take a difficult position such as r1bn1r1b/ppp1Nppk/3pnP1p/8/2BpP3/5N2/PP3PPP/R2Q1RK1/Qbp w - - 0 1 where the mate is 1. Ng5+! Nxg5 2. Bxf7!! with unstoppable threats of Bg6#, Q@g6#, and Q@g8+ forcing mate. Currently this is a bit much for Stockfish to find in < 1 second, but it did highlight two things a human player considers:

Atari - Black's king is surrounded, so White more aggressively looks for a knock-out blow
Persistent threats - after 2. Bxf7 Black needs to deliver check/checkmate, or White mates by force. @ianfab 's qsearch patch improves null-move search which resolves "both kings are getting mated" race situations for a ~50-60 Elo (estimated) gain.

ianfab · 2017-09-16T16:25:29Z

I tried to address the first point with a patch some time ago, but my tests unfortunately were not very successful, see, e.g., http://35.161.250.236:6543/tests/view/58f75dbc6e23db2fa80810d1. The second point should mostly already be addressed in king safety and threat evaluation, but perhaps some more crazyhouse specific ideas could be added there.

ddugovic · 2017-10-04T07:44:48Z

I think #416 resolves the initial issue. There will always be new ideas & room for further improvement but shallow mates seem not to pose a problem anymore:

setoption name UCI_Variant value crazyhouse
position fen r1bn1r1b/ppp1Nppk/3pnP1p/8/2BpP3/5N2/PP3PPP/R2Q1RK1/Qbp w - - 0 1
go movetime 1000info string variant crazyhouse startpos rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR[] w KQkq - 0 1

info depth 1 seldepth 1 multipv 1 score cp 1041 nodes 178 nps 178000 tbhits 0 time 1 pv c4e6 c8e6
info depth 2 seldepth 2 multipv 1 score cp 1077 nodes 1228 nps 614000 tbhits 0 time 2 pv e4e5 g7g6 f3d4 e6d4
info depth 3 seldepth 3 multipv 1 score cp 1083 nodes 1392 nps 464000 tbhits 0 time 3 pv e4e5 g7g6 c4e6 c8e6
info depth 4 seldepth 4 multipv 1 score cp 1083 nodes 1585 nps 528333 tbhits 0 time 3 pv e4e5 g7g6 c4e6 c8e6
info depth 5 seldepth 5 multipv 1 score cp 1095 nodes 2532 nps 633000 tbhits 0 time 4 pv e4e5 P@e4 f3d4 g7f6 P@g6 f7g6
info depth 6 seldepth 7 multipv 1 score cp 1117 nodes 4478 nps 746333 tbhits 0 time 6 pv e4e5 g7g6 c4e6 d8e6 f3d4 e6d4 d1d4
info depth 7 seldepth 8 multipv 1 score cp 984 nodes 9353 nps 623533 tbhits 0 time 15 pv f3d4 g7f6 d4e6 d8e6 e7c8 P@g6 c4e6 f7e6
info depth 8 seldepth 10 multipv 1 score cp 1134 nodes 18037 nps 546575 tbhits 0 time 33 pv f3d4 P@e2 d4e2 g7f6 e7c8 a8c8 P@e7 P@g6 e7f8q e6f8
info depth 9 seldepth 13 multipv 1 score cp 1175 nodes 30785 nps 513083 tbhits 0 time 60 pv f3d4 g7f6 c4e6 d8e6 e7c8 e6g7 B@e3 a8c8 P@e7 P@g6 e7f8q c8f8
info depth 10 seldepth 17 multipv 1 score cp 1190 nodes 56157 nps 492605 tbhits 0 time 114 pv f3d4 P@g6 e7c8 B@e5 P@e7 a8c8 d4e6 d8e6 e7f8q c8f8 c4e6 f7e6
info depth 11 seldepth 15 multipv 1 score cp 1243 nodes 91278 nps 490741 tbhits 0 time 186 pv f3d4 P@g6 e7c8 e6d4 P@e7 f8e8 c8d6 c7d6 d1d4 d8e6 d4d6 g7f6
info depth 12 seldepth 16 multipv 1 score mate 6 nodes 292311 nps 526686 tbhits 0 time 555 pv f3g5 e6g5 c4f7 g5f3 d1f3 N@h3 g2h3 c8f5 f3f5 P@g6 f7g6

ddugovic changed the title ~~ZH : humans still have a chance against level 8~~ Crazyhouse: missed mate in 4 May 25, 2017

ddugovic closed this as completed Oct 4, 2017

Crazyhouse: missed mate in 4 #354

Crazyhouse: missed mate in 4 #354

Comments

Vinvin20 commented May 25, 2017 • edited Loading

ddugovic commented May 25, 2017

Vinvin20 commented May 25, 2017 • edited Loading

ddugovic commented Jun 23, 2017 • edited Loading

Vinvin20 commented Jun 23, 2017

niklasf commented Jun 23, 2017

ddugovic commented Jun 25, 2017 • edited Loading

ianfab commented Jun 25, 2017

ddugovic commented Jul 2, 2017

ianfab commented Jul 2, 2017

Vinvin20 commented Jul 3, 2017

ddugovic commented Jul 3, 2017

Vinvin20 commented Jul 3, 2017

ddugovic commented Jul 16, 2017

ddugovic commented Jul 17, 2017 • edited Loading

Vinvin20 commented Jul 17, 2017

ianfab commented Jul 17, 2017

ddugovic commented Jul 17, 2017 • edited Loading

ianfab commented Jul 17, 2017

ddugovic commented Jul 17, 2017

ianfab commented Jul 17, 2017

ddugovic commented Jul 18, 2017 • edited Loading

ddugovic commented Jul 21, 2017

ddugovic commented Jul 22, 2017

Vinvin20 commented Jul 23, 2017

ddugovic commented Jul 25, 2017

Vinvin20 commented Jul 27, 2017

rpdelaney commented Jul 28, 2017

ianfab commented Jul 28, 2017

rpdelaney commented Jul 28, 2017 • edited Loading

ianfab commented Jul 28, 2017

rpdelaney commented Jul 28, 2017

ianfab commented Jul 28, 2017

rpdelaney commented Jul 28, 2017

ianfab commented Jul 28, 2017

ddugovic commented Jul 29, 2017

Vinvin20 commented Jul 29, 2017

rpdelaney commented Jul 29, 2017

ddugovic commented Jul 29, 2017

ppigazzini commented Aug 7, 2017 • edited Loading

jhellis3 commented Aug 7, 2017 • edited Loading

Vinvin20 commented Aug 23, 2017

ddugovic commented Aug 23, 2017

ddugovic commented Aug 27, 2017

ianfab commented Sep 13, 2017

ppigazzini commented Sep 15, 2017

ddugovic commented Sep 15, 2017

ianfab commented Sep 15, 2017

Vinvin20 commented Sep 15, 2017

ddugovic commented Sep 16, 2017 • edited Loading

ianfab commented Sep 16, 2017

ddugovic commented Oct 4, 2017

Vinvin20 commented May 25, 2017 •

edited

Loading

Vinvin20 commented May 25, 2017 •

edited

Loading

ddugovic commented Jun 23, 2017 •

edited

Loading

ddugovic commented Jun 25, 2017 •

edited

Loading

ddugovic commented Jul 17, 2017 •

edited

Loading

ddugovic commented Jul 17, 2017 •

edited

Loading

ddugovic commented Jul 18, 2017 •

edited

Loading

rpdelaney commented Jul 28, 2017 •

edited

Loading

ppigazzini commented Aug 7, 2017 •

edited

Loading

jhellis3 commented Aug 7, 2017 •

edited

Loading

ddugovic commented Sep 16, 2017 •

edited

Loading