Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGTB not working for eligible variants #471

Closed
Nordlandia opened this issue Dec 5, 2017 · 41 comments
Closed

EGTB not working for eligible variants #471

Nordlandia opened this issue Dec 5, 2017 · 41 comments

Comments

@Nordlandia
Copy link

Nordlandia commented Dec 5, 2017

In variant twokings probing syzygy does not work even if both sides extra king is removed from the board. Once both spare kings has vanished from the board, the variant is back to normal FIDE chess.

Doesn't it make sense to probe syzygy if PV results in classic FIDE position without extra kings?

@ddugovic
Copy link
Owner

ddugovic commented Dec 5, 2017

I wonder how strong SF twokings is in comparison to other engines & whether the code would be much simpler (and handle all single-king positions properly) were it a subvariant (of normal chess).

I'm more than a bit nervous about making any changes before upstream official-stockfish@be382bb is merged since merging it will probably lose Elo.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 5, 2017

Syzygy add endgame strength. One way is to activate syzygy once one king on each side is captured. Maybe this can be added as patch, i don't know how much work this require.

@ianfab
Copy link
Collaborator

ianfab commented Dec 5, 2017

@ddugovic Since Two Kings is not that different from standard chess, I think that the few improvements to piece values and king danger evaluation might have already been sufficient to make it stronger than other engines.

I thought about making it a subvariant of normal chess when I implemented it, but I thought that we would not gain much (only removing some array entries that might in some cases even be useful to optimize), but introducing a non-zero piece value for the king in standard chess would be ugly, and workarounds also do not seem more elegant either.

The contempt change is no functional change for default contempt 0, so I do not see how it is related to twokings or how it affects playing strength. There are some merge conflicts, but the removal of DrawValue should simplify the variant code, e.g., regarding the stalemate evaluation code.

@Nordlandia The gain from tablebases usually is not that much (maybe 10-20 Elo), but it would of course be a nice feature. From a first look at the code, it seems to require a couple of changes, but it would not add much additional logic, so it is not very problematic regarding code maintenance.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 5, 2017

Which value is used for non-royal king for the time being?

Mann or Commoner piece is within a knights value. However in TwoKings having both monarchs is an valuable asset i think.

@ianfab supporting syzygy is mainly intended for engine matches or for analysing endgames.

Candidates

  1. Probe syzygy during all times. This mean SF can consult egtb (syzygy assistance) even if 3 or 4 kings is present on board.
  2. Probe syzygy only once FIDE position is present on board.

@ianfab
Copy link
Collaborator

ianfab commented Dec 5, 2017

@Nordlandia The king values (same for royal and non-royal) for middlegame and endgame resulted from SPSA tuning and can be found in types.h. It is a bit less than a knight.

Regarding syzygy, I currently only consider the second option as feasible. Adapting the syzygy generation and probing code requires far more work (and at least I am anyway not very familiar with that code).

@Nordlandia
Copy link
Author

Nordlandia commented Dec 5, 2017

According wikipedia article the king surpasses knight in the endgame. Value of 780 might not be the best EG value. Perhaps 800-815 is more correct, narrowing the gap between king and knight endgame value.

Better ask HGM.

@ianfab
Copy link
Collaborator

ianfab commented Dec 5, 2017

@Nordlandia In my experience, tuning usually finds better values than the ones estimated by humans, so human input is more useful for finding new evaluation terms than for optimizing their values. The optimal values can also differ between engines, so human estimates are mostly useful to find a reasonable starting value, afterwards automated tuning can take over. Please also note that the effective piece value for the king might be a bit higher than the raw piece value itself, because the piece-square table bonuses are quite high for the king.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 5, 2017

You're right.

Shouldn't SF associate the extra king as more valuable (bonus) than knight or bishop when king ahead?

White is handicaped by removing f1 king. SF evaluation is only +3 which strikes me as slightly odd. The human evaluation is 4+

tk

@ianfab
Copy link
Collaborator

ianfab commented Dec 6, 2017

@Nordlandia The value of the king anyway only plays a role if there is a difference in the number of kings (since it otherwise cancels out), and a difference can only be in the situation of 2 vs. 1 kings if you are not playing with more than two kings. So the values should already be optimized for that case (and for SEE).

@HGMuller
Copy link

HGMuller commented Dec 6, 2017

@Nordlandia: I don't see why you would think humans would evaluate that position as 4+. I certainly would not do that, and despite everything I do consider myself human. In fact +3 seems already optimistic. This is an opening position, and a genuine Commoner usually performs already worse then a Knight there. Surely the fact that the second King would turn absolutely royal on part of the board would suppress its value even further.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 6, 2017

@HGMuller: in the opening the Commoner is inferior to the knight, that's true. What about the endgame? doesn't the Commoner gain strength in the endgame opposed to the opening.

EG value for non-royal king is relatively high opposed to opening value, which indicate stronger EG capabilities.

KingValueMgTwoKings = 526
KingValueEgTwoKings = 780

Isn't SF slightly overoptimistic here? 0.33 advantage for black is trivial though.

Maybe K-pair suffer from redundancy, Neither King does anything that the other one can't do.

tk2

@HGMuller
Copy link

HGMuller commented Dec 6, 2017

Why ask me, rather than trying it yourself by playing out that position a couple of hundred times? If Stockfish does not randomize enough you can give it a book with 20x20 start positions where one of the Pawns of each side is advanced one step. Then you would know who has the advantage here.

I have only tested this with Fairy-Max, and Fairy-Max is not really smart w.r.t. mating potential (it thinks KNK = +3). So I cannot trust the results are 100% accurate.

@ddugovic
Copy link
Owner

ddugovic commented Dec 6, 2017

@Nordlandia Maybe start with SPSA and https://github.com/ddugovic/Stockfish/tree/tune_variant if you're unsure what parameter value(s) you wish to submit to a test.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 6, 2017

My experimental idea/suggestion was to increase endgame value of Commoner to 800. If you're satisfied with current piece values then everything is fine.

I can launch 100 game match à la 3+2 time control in KK vs KN position and see which side is superior.

I choose knight over bishop because Kn has limited mobility.

@ianfab
Copy link
Collaborator

ianfab commented Dec 6, 2017

@Nordlandia Feel free to change the king value and submit tests on fishtest (let me know if you need help). Ideas for improvements are very welcome, and I do not have any preference for a certain value except for the one that turns out to give the best performance Elo-wise. In my experience it is best to simply do tests and let the results decide to avoid overthinking ideas, because whether a certain idea or value works well in an engine also often changes over time as other changes are applied.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 6, 2017

The problem is that i have never submitted idea for test before. How do i proceed?

@ianfab
Copy link
Collaborator

ianfab commented Dec 6, 2017

@Nordlandia There is some documentation in the wiki regarding the three steps setting up the repository, creating the branches for testing and submitting a test. I do not know your experience with git, coding, or the fishtest test submission page, so let me know if there is a certain area where you need more detailed explanation.

You do not need to be hesitant when using the test submission page if you are not familiar with it. Worst case something will be inconsistent, then I will not approve the test and it can be deleted (and maybe resubmitted).

@Nordlandia
Copy link
Author

Nordlandia commented Dec 7, 2017

KK vs KN scored 13 wins and rest drawn games at 2+2 à la 100 games at 1-core. No games was won by black.

Each participant played 50 games with each color.

1n2k3/pppppppp/8/8/8/8/PPPPPPPP/3KK3 w - - 0 1

kk

@ddugovic
Copy link
Owner

ddugovic commented Dec 7, 2017

Maybe KnightValueEgTwoKings is wrong. I've submitted an SPSA parameter tuning session using parameters specified by running ./stockfish:

$ ./stockfish 
Stockfish 071217 64 BMI2 by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
PieceValue[TWOKINGS_VARIANT][MG][1],171,0,342,17.1,0.0020
PieceValue[TWOKINGS_VARIANT][MG][2],764,0,1528,76.4,0.0020
PieceValue[TWOKINGS_VARIANT][MG][3],826,0,1652,82.6,0.0020
PieceValue[TWOKINGS_VARIANT][MG][4],1282,0,2564,128.2,0.0020
PieceValue[TWOKINGS_VARIANT][MG][5],2526,0,5052,252.6,0.0020
PieceValue[TWOKINGS_VARIANT][MG][6],526,0,1052,52.6,0.0020
PieceValue[TWOKINGS_VARIANT][EG][1],240,0,480,24,0.0020
PieceValue[TWOKINGS_VARIANT][EG][2],848,0,1696,84.8,0.0020
PieceValue[TWOKINGS_VARIANT][EG][3],891,0,1782,89.1,0.0020
PieceValue[TWOKINGS_VARIANT][EG][4],1373,0,2746,137.3,0.0020
PieceValue[TWOKINGS_VARIANT][EG][5],2646,0,5292,264.6,0.0020
PieceValue[TWOKINGS_VARIANT][EG][6],780,0,1560,78,0.0020

@HGMuller
Copy link

HGMuller commented Dec 7, 2017

@Nordlandia: Are you sure the games are independent enough, when you start them all from the same position? If Stockfish Variant does not randomize, it could be that you are basically playing the same game all the time (which ends in a draw), and that only occasionally another move is done at some point (because Stockfish happens to change move very close to a timeout that would force it to play a move), leading to two other games with low probability, one ending in a win, the other in a loss.

When you say 13 wins. was it always the KK side that won? The result reports 7 wins and 6 losses. Is this just because you were alternating colors? Results are easier to interpret when you don't alternate colors. Or, when you cannot disable such alternation, alternate the starting position, so that even games are played with a pair of black Kings, ad odd games with a pair of white Kings.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 7, 2017

@HGMuller i had to abort 3+2 match because of repetition. All games were drawn. Maybe contempt is to be used in selfplay matches to avoid draw frequency. Kaufman encourages to use default contempt of 10 in K selfplay matches. Now since SF has updated contempt, maybe it's useful for selfplayˋ?

Yes, i wish cutechess support alternate colors option.

Also i requested cutechess to implement lock engine as white in engine matches.

cutechess/cutechess#321

@HGMuller
Copy link

HGMuller commented Dec 7, 2017

Late end-games tend to be drawish, (if nearly equal), not much can be done about that. But to prevent repetition. you can start every game from a different position.

@Nordlandia
Copy link
Author

Nordlandia commented Dec 7, 2017

I have to use EPD for thar right?

@HGMuller
Copy link

HGMuller commented Dec 7, 2017

I don't know. I never used cute-chess. I don't know how you specify the start position now. With XBoard the start FEN would have to be in a file, and you can have 100 FENs in that file just as easily as you can have one.

@ddugovic
Copy link
Owner

ddugovic commented Dec 7, 2017

SPSA/SF suggest the commoner is overvalued (but not by much)!
commoner

@Nordlandia
Copy link
Author

Nordlandia commented Dec 8, 2017

@ddugovic: my testing advocates the opposite. KK vs KN is not worse.

@ddugovic
Copy link
Owner

ddugovic commented Dec 8, 2017

@Nordlandia I don't disagree; I'm just saying the commoner seems overvalued. I'll try some testing from 8/8/8/3n4/4k3/8/8/K6K w - - later (maybe even increasing the 50-move-rule parameter to 100; I think there's a parameter for that?)

@Nordlandia
Copy link
Author

Nordlandia commented Dec 8, 2017

Interesting position, clearly only KK can win.
Only one problem: cutechess adjudicate KK vs KN as draw by insufficient mating material.

@ddugovic
Copy link
Owner

ddugovic commented Dec 8, 2017

Good point. I'll need to use -draw movecount=100 and -tbpieces 2 (running locally):
https://github.com/cutechess/cutechess/blob/master/projects/cli/res/doc/help.txt

@ddugovic
Copy link
Owner

ddugovic commented Feb 24, 2018

Perhaps once https://github.com/syzygy1/Rustfish matures (or perhaps it has already?) I'll fork it and address this issue in Rust...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants