Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crazyhouse: missed mate in 4 #354

Closed
Vinvin20 opened this issue May 25, 2017 · 51 comments
Closed

Crazyhouse: missed mate in 4 #354

Vinvin20 opened this issue May 25, 2017 · 51 comments

Comments

@Vinvin20
Copy link

Vinvin20 commented May 25, 2017

opperwezen won one game after many losts.
SF-Lvl8 and the server analyze overlooked this very nice 15...Rg1!! (found by SF after about 17 millions nodes)
https://fr.lichess.org/1JQ3DgMe#29

-> this game is very interesting to tune search and eval : https://fr.lichess.org/TFT5J2FS#66
A lot of up and down after analyze.
Especially this very strange 45...gxh5??? https://fr.lichess.org/TFT5J2FS#89

Note that opperwezen plays a lot of 25+2 and 10+2 these days : https://fr.lichess.org/@/opperwezen/search?perf=18&hasAi=1&aiLevelMin=8&sort.field=d&sort.order=desc

@ddugovic
Copy link
Owner

Bear in mind even at AI level 8, Lichess caps the maximum search depth and thinking time. So unless the AI selects an easily refuted bad move (as it did with 45...gxh5??) there isn't a bug.

@ddugovic ddugovic changed the title ZH : humans still have a chance against level 8 Crazyhouse: missed mate in 4 May 25, 2017
@Vinvin20
Copy link
Author

Vinvin20 commented May 25, 2017

I'm interesting in computer chess for more than 25 years, I never forget this kind of things.

@ddugovic
Copy link
Owner

ddugovic commented Jun 23, 2017

Author of http://chessvariants.training/ supplied me with a collection of 4,648 crazyhouse checkmate puzzles. I am using these to help identify whether there is a bug.

@Vinvin20
Copy link
Author

Very nice !

@niklasf
Copy link
Collaborator

niklasf commented Jun 23, 2017

Nice indeed, but note that afaik most puzzles were created using this Stockfish in the first place.

@ddugovic
Copy link
Owner

ddugovic commented Jun 25, 2017

True. I found that (at the rate of 1 second/puzzle) the error rate is halved by a simplifying material-based singular extensions since a newer upstream change works well; however that simplification loses Elo overall.

So unless I'm missing something the problem of "Stockfish occasionally misses mate in 4" is not a solvable problem. Stockfish is a strong engine, it's just bad at analyzing crazyhouse checkmates.

@ianfab
Copy link
Collaborator

ianfab commented Jun 25, 2017

In my experience, writing a patch that improves Stockfish specifically for certain positions and gains Elo overall is very difficult, but simply improving Elo in general will over time also improve play in specific positions. E.g., I did not design patches for SF to choose better antichess openings, but nevertheless over time it has started to play 1. e3, to avoid (1. e3 b5 2. Bxb5) Bb7, etc. I have observed a similar trend for crazyhouse mating combinations in the past. IMO, if SF can not find some mating combinations that is not really an issue, since you will always be able to find such positions (even for standard chess, although on a different level), but if an earlier version is way better at the same task, it should be checked what caused this and whether it can be fixed.

@ddugovic
Copy link
Owner

ddugovic commented Jul 2, 2017

Thanks to #360 being resolved, Stockfish now scores 4392.33 / 4648 on the chessvariants.training test suite (at 4 threads, 1 second per puzzle) consisting of mate-in-[2-6] puzzles. Previously it scored 4357.17 / 4648.

When there's a lull in the queue, I'll redo the test suite with today's crazyhouse improvements and attempt to identify "easy puzzles" that it misses & identify another pattern.

@ianfab
Copy link
Collaborator

ianfab commented Jul 2, 2017

Apropos, it would be a nice feature to support test suites on fishtest. I'll note that down, although I do not think that I will have time to implement that any time soon.

@Vinvin20
Copy link
Author

Vinvin20 commented Jul 3, 2017

Thanks to #360 being resolved, Stockfish now scores 4392.33 / 4648 on the chessvariants.training test suite (at 4 threads, 1 second per puzzle)

Do you mean 4 threads for 1 engine ?
4 (or 2) threads configuration finds solutions more easily sometimes.
It's interesting to test in single thread too.

@ddugovic
Copy link
Owner

ddugovic commented Jul 3, 2017

Do you mean 4 threads for 1 engine ?
4 (or 2) threads configuration finds solutions more easily sometimes.
It's interesting to test in single thread too.

I'm going to assume you mean the following:

  • 2 engines in parallel each set at 2 threads/engine and 2 seconds/puzzle

versus:

  • 1 engine each set at 4 threads and 1 second/puzzle

and while I agree the former may be more accurate, it's more work to set up because I want to be able to easily compare the output results (in the same order).

@Vinvin20
Copy link
Author

Vinvin20 commented Jul 3, 2017

Yes, I meant "the best is to set only 1 thread inside each engine" (as you did if I understand well).

@ddugovic
Copy link
Owner

Stockfish is rapidly improving at solving for mate. The most recent puzzle it struggles with is:

setoption name Threads value 1
setoption name MultiPV value 1
setoption name UCI_Variant value crazyhouse
position fen r2q1rk1/pp3ppp/2p1p3/3p4/3p4/3BPP1P/PPPKN1n1/R6R[BNPqbbn] b - - 0 1
d
go movetime 10000

@ddugovic
Copy link
Owner

ddugovic commented Jul 17, 2017

After some thought, I have created two short tuning sessions (perhaps these should be tuned together, I don't know, that really depends upon the "shape" of the Elo gain curve in the N-dimensional parameter space):
http://35.161.250.236:6543/tests/view/596c01bc6e23db67e90ddaf6
http://35.161.250.236:6543/tests/view/596c02246e23db67e90ddaf8

@Vinvin20
Copy link
Author

The values didn't changed a lot after tuning, are they ?

@ianfab
Copy link
Collaborator

ianfab commented Jul 17, 2017

@Vinvin20 He stopped the two tuning sessions early and combined them in http://35.161.250.236:6543/tests/view/596c9ffd6e23db67e90ddb01. If the change is still small, decreasing the tuning parameter A or increasing the number of games might help.

@ddugovic
Copy link
Owner

ddugovic commented Jul 17, 2017

Indeed, initially I assumed that tuning them together would be much too large a change, but so far that isn't the case...

I think I'll double the number of games in this same session (without doubling A mid-session) which might produce a more precise result. I think the SPSA documentation cautions this algorithm only produces approximations, but increasing the game count (or decreasing A) could in principle yield a more precise approximation.

@ianfab
Copy link
Collaborator

ianfab commented Jul 17, 2017

@ddugovic Increasing the number of games does not work for a started tuning session, since the number of iterations is not updated if I remember correctly, so it is necessary to resubmit it with a larger number of games. I'll have a look at the code to see whether this can be fixed easily to avoid wasting resources.

@ddugovic
Copy link
Owner

@ianfab Oops. Well, worst case I can copy the test results, manually translate them into the input format, and submit a new test with A reduced (though I'm unsure to what).

@ianfab
Copy link
Collaborator

ianfab commented Jul 17, 2017

@ddugovic Right. After a quick look at the code I am not sure what will happen after reaching the 10000 games limit, so let's see.

@ddugovic
Copy link
Owner

ddugovic commented Jul 18, 2017

Also, in atomic chess allowing a ShelterWeakness value to be negative proved useful. I'm trying that (using SPSA on my PC) right now...

[Main]
Simulate = 0
Variables = crazyhouse.var
Log = crazyhouse.log
GameLog = crazyhouse_$THREAD.log
Iterations = 10000
A = 1000
Gamma = 0.101
Alpha = 0.602

[Engine]
Engine1 = ./stockfish
Engine2 = ./stockfish
EPDBook = ./books/crazyhouse.epd
BaseTime = 1000
IncTime = 50
Concurrency = 4
DrawScoreLimit = 4
DrawMoveLimit = 8
WinScoreLimit = 650
WinMoveLimit = 8
Variant = crazyhouse

@ddugovic
Copy link
Owner

I've set aside my tweaks as they aren't making a significant impact.

@ddugovic
Copy link
Owner

I have created a suite of 100 challenging puzzles for Stockfish to solve!

@Vinvin20
Copy link
Author

  1. Don't you have the shortest distance to mate ?
  2. How does the current SF on this set ?

@ddugovic
Copy link
Owner

  1. I'm unsure that my generated solutions are even correct. Assume DTM=6.
  2. About 30% at last test (single-threaded, 1 second/puzzle), but I haven't tested with latest master.

@Vinvin20
Copy link
Author

I checked the first 10 positions.
I copy/paste the analyzed there.

I found some more solutions and improvements :
Position 3 : Nf5 is mate in 5
Position 7 : p@h5 is mate in 5
Position 8 : R@g8 is mate in 5 too
Position 9 : Qxg4 mate in 6

The incomplete solutions could explained why SF get a so low score ;-)

@rpdelaney
Copy link

@ddugovic I have quite a few more puzzles than we have currently published on chessvariants.training; if you would find them useful I could supply them to you as well.

@ianfab
Copy link
Collaborator

ianfab commented Jul 28, 2017

The most important thing is to have correct puzzles, i.e. there is only one solution or all winning moves are given in the EPD. If such puzzles are available, that would be very useful.

@rpdelaney
Copy link

rpdelaney commented Jul 28, 2017

@ianfab Mating combinations with exactly one winning line are extremely rare in crazyhouse. My puzzle generator builds a solution tree in json with all of the mating lines found by stockfish.

@ianfab
Copy link
Collaborator

ianfab commented Jul 28, 2017

@rpdelaney I meant that the first move has to be an only move (or the alternatives have to be given), but the subsequent moves can contain alternative solutions, since it usually is sufficient to only check the first move when running a test suite.

@rpdelaney
Copy link

@ianfab Alternatives are returned at every level, including the first move. Only those lines that mate equally fast as the fastest mating line are returned, though.

@ianfab
Copy link
Collaborator

ianfab commented Jul 28, 2017

Finding the fastest mate is also an interesting task, but it probably is not a good indicator for general playing strength, so I am hesitant to use such problems for optimizing stockfish, but nevertheless the puzzles might be interesting.

@rpdelaney
Copy link

However, as @niklasf pointed out, they were generated using this version of stockfish to begin with. So I'm not sure what use they would be for debugging stockfish.

@ianfab
Copy link
Collaborator

ianfab commented Jul 28, 2017

Yes, that's clearly a limitation. It might still be useful if Stockfish can be tuned towards achieving similar results as at much longer time control (perhaps at least 1-2 orders of magnitude), but calculating a set of all winning moves is important for that, I think, because it otherwise gets overfitted towards finding the fastest solution.

@ddugovic
Copy link
Owner

It's true that finding the fastest mate is more expensive than finding a strong move. Given that the success rate is already over 94% I'm not interested in tuning to maximize that result, but rather to help identify new ideas for trying in the testing queue.

@Vinvin20
Copy link
Author

The incomplete solutions could explained why SF get a so low score ;-)

We need a script to make SF analyze all the positions, 10 best moves, 1 core, 2 minutes per positions.

@rpdelaney
Copy link

@Vinvin20 This seems to work, but I haven't tested it much https://gist.github.com/rpdelaney/7e7e19ea1f6ed6cb2a67c03137e0f040

@ddugovic
Copy link
Owner

@ppigazzini
Copy link

ppigazzini commented Aug 7, 2017

@rpdelaney JHellis has derived Mate Finder from Official SF. Perhaps some of those tricks can be useful for a "Multi Variant Mate Finder" to be used to build some harder mates for Multi Variant SF.

@jhellis3
Copy link

jhellis3 commented Aug 7, 2017

I'm not sure how many of my changes in Matefinder would be generally applicable, however, the changes to LMR are probably the most likely to improve the tactical awareness of the engine in a general sense:

  •      && !(depth >= 16 * ONE_PLY && ss->ply <= 3 * ONE_PLY))
    
  •      if (newDepth - r + 8 * ONE_PLY < thisThread->rootDepth)
             r = std::min(r, 3 * ONE_PLY); 
    

Both are likely to cost Elo (due to longer TTD)... but you are much less likely to miss tactical shots.

@Vinvin20
Copy link
Author

Both are likely to cost Elo (due to longer TTD)... but you are much less likely to miss tactical shots.

Crazyhouse is way more tactical than chess.
After 5-10 moves, each side begin to attack heavily around the king.
One missed move and it can be over a couple of moves later.

@ddugovic
Copy link
Owner

Right, although one missed move isn't necessarily a tactical shot move -- in general we measure Elo gain of patches.

I do wonder whether adding a "Study" mode similar to matefinder could be useful for puzzle generation and puzzle-solving.

@ddugovic
Copy link
Owner

I have updated the 100 challenging puzzles link. Currently SF scores 96% on the chessvariants.training crazyhouse set, and of these 100 it fails it scores 40% on a second attempt.

Again, the goal isn't to score 100%, but to find low-cost improvements.

@ianfab
Copy link
Collaborator

ianfab commented Sep 13, 2017

According to a test that uses @ddugovic's set of 100 puzzles as starting positions and some local tests with difficult mate combinations, mate finding has (to my surprise) improved a lot with the crazyhouse qsearch patch.

@ppigazzini
Copy link

@ianfab from the regression test it seems that the patch a16ebb0 performs better with LTC

@ddugovic
Copy link
Owner

Wow, did this release gain 121 Elo over the previous release?

@ianfab
Copy link
Collaborator

ianfab commented Sep 15, 2017

@ppigazzini @ddugovic
This Elo gain is expected, but nevertheless an awesome result considering that improvement had slowed down before. Search for "crazyhouse" on the diff page niklasf/Stockfish@4257b04...af68133 to find the relevant patches that gave the improvement. The improvements mainly came from the qsearch patch (~50-60 Elo), PSQT tweaks (~20-30 Elo), and king safety related parameter tweaks (~20-30 Elo). (The Elo gains are just rough estimates from the SPRT results at LTC, no direct measurements.)

@Vinvin20
Copy link
Author

Great ! Incredible improvement !
I see often Yasser Seirawan playing ZH against SF Level 7 : https://lichess.org/@/yasser-seirawan/search?hasAi=1&sort.field=d&sort.order=desc
I'll watch more closely with this improvement !
Very good job from the whole team !

@ddugovic
Copy link
Owner

ddugovic commented Sep 16, 2017

Take a difficult position such as r1bn1r1b/ppp1Nppk/3pnP1p/8/2BpP3/5N2/PP3PPP/R2Q1RK1/Qbp w - - 0 1 where the mate is 1. Ng5+! Nxg5 2. Bxf7!! with unstoppable threats of Bg6#, Q@g6#, and Q@g8+ forcing mate. Currently this is a bit much for Stockfish to find in < 1 second, but it did highlight two things a human player considers:

  • Atari - Black's king is surrounded, so White more aggressively looks for a knock-out blow
  • Persistent threats - after 2. Bxf7 Black needs to deliver check/checkmate, or White mates by force. @ianfab 's qsearch patch improves null-move search which resolves "both kings are getting mated" race situations for a ~50-60 Elo (estimated) gain.

@ianfab
Copy link
Collaborator

ianfab commented Sep 16, 2017

I tried to address the first point with a patch some time ago, but my tests unfortunately were not very successful, see, e.g., http://35.161.250.236:6543/tests/view/58f75dbc6e23db2fa80810d1. The second point should mostly already be addressed in king safety and threat evaluation, but perhaps some more crazyhouse specific ideas could be added there.

@ddugovic
Copy link
Owner

ddugovic commented Oct 4, 2017

I think #416 resolves the initial issue. There will always be new ideas & room for further improvement but shallow mates seem not to pose a problem anymore:

setoption name UCI_Variant value crazyhouse
position fen r1bn1r1b/ppp1Nppk/3pnP1p/8/2BpP3/5N2/PP3PPP/R2Q1RK1/Qbp w - - 0 1
go movetime 1000info string variant crazyhouse startpos rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR[] w KQkq - 0 1

info depth 1 seldepth 1 multipv 1 score cp 1041 nodes 178 nps 178000 tbhits 0 time 1 pv c4e6 c8e6
info depth 2 seldepth 2 multipv 1 score cp 1077 nodes 1228 nps 614000 tbhits 0 time 2 pv e4e5 g7g6 f3d4 e6d4
info depth 3 seldepth 3 multipv 1 score cp 1083 nodes 1392 nps 464000 tbhits 0 time 3 pv e4e5 g7g6 c4e6 c8e6
info depth 4 seldepth 4 multipv 1 score cp 1083 nodes 1585 nps 528333 tbhits 0 time 3 pv e4e5 g7g6 c4e6 c8e6
info depth 5 seldepth 5 multipv 1 score cp 1095 nodes 2532 nps 633000 tbhits 0 time 4 pv e4e5 P@e4 f3d4 g7f6 P@g6 f7g6
info depth 6 seldepth 7 multipv 1 score cp 1117 nodes 4478 nps 746333 tbhits 0 time 6 pv e4e5 g7g6 c4e6 d8e6 f3d4 e6d4 d1d4
info depth 7 seldepth 8 multipv 1 score cp 984 nodes 9353 nps 623533 tbhits 0 time 15 pv f3d4 g7f6 d4e6 d8e6 e7c8 P@g6 c4e6 f7e6
info depth 8 seldepth 10 multipv 1 score cp 1134 nodes 18037 nps 546575 tbhits 0 time 33 pv f3d4 P@e2 d4e2 g7f6 e7c8 a8c8 P@e7 P@g6 e7f8q e6f8
info depth 9 seldepth 13 multipv 1 score cp 1175 nodes 30785 nps 513083 tbhits 0 time 60 pv f3d4 g7f6 c4e6 d8e6 e7c8 e6g7 B@e3 a8c8 P@e7 P@g6 e7f8q c8f8
info depth 10 seldepth 17 multipv 1 score cp 1190 nodes 56157 nps 492605 tbhits 0 time 114 pv f3d4 P@g6 e7c8 B@e5 P@e7 a8c8 d4e6 d8e6 e7f8q c8f8 c4e6 f7e6
info depth 11 seldepth 15 multipv 1 score cp 1243 nodes 91278 nps 490741 tbhits 0 time 186 pv f3d4 P@g6 e7c8 e6d4 P@e7 f8e8 c8d6 c7d6 d1d4 d8e6 d4d6 g7f6
info depth 12 seldepth 16 multipv 1 score mate 6 nodes 292311 nps 526686 tbhits 0 time 555 pv f3g5 e6g5 c4f7 g5f3 d1f3 N@h3 g2h3 c8f5 f3f5 P@g6 f7g6

@ddugovic ddugovic closed this as completed Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants