Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trade penalty per Node #536

Closed
wants to merge 20 commits into from
Closed

Conversation

nblaxall
Copy link

Add trade-penalty to search

mooskagh and others added 17 commits November 3, 2018 19:41
* move kStartBoardPos to ChessBoard

* always build with WholeProgramOptimization on Appveyor

* make kStartingFen char*

* better constant names
* add --movetime option

* set benchmark search defaults to uci ones
* opencl: replace thread_local with a resource pool.

* Local variables clean-up.

* clang-format with -style=google

* Removing compiler warnings.

* Local variables naming fix.

* Removed a no longer used mutex.

* Fix XgemmBatched/Xgemv for retrieving wavefront size.

* Fixing OpenCLBuffers ctor (const ref).

* Fixing OpenCLComputation ctor (const ref).
@nblaxall
Copy link
Author

nblaxall commented Nov 24, 2018

With the help of jjosh and Videodr0me I changed the trade-penalty (piece trade "contempt") to change in depth of search, because the original seemed to only include the total piece count at root and didn't change.
I played around with different values, and "contempt = 0.005 * (node_to_process->piececount - 32);" seemed to help the most so far. I also flip the value depending on whose move it is.

My nickname is "Swynndla" on discord.

The Below test was all with lc0 v0.19.0-rc5 and was using a TC of 30+0.5 (30s+0.5).

   # PLAYER                :  RATING  PLAYED   (%)  CFS(%)    W    D     L  D(%)  OppN
   1 11248-tp005_-+_-32    :     366     566  88.8      90  442  121     3  21.4     1
   2 11248-tp010_-+_-32    :     336     442  87.0      98  338   93    11  21.0     1
   3 11248                 :     293     570  83.9     100  396  165     9  28.9     1
   4 laser                 :       0    1578  13.5     ---   23  379  1176  24.0     3


Head to head statistics:

1) 11248-tp005_-+_-32 366 :    566 (+442,=121,-3),  88.8 %

   vs.                      :  games (   +,   =, -),   (%) :   Diff,   SD, CFS (%)
   laser                    :    566 ( 442, 121, 3),  88.8 :   +366,   16,  100.0

2) 11248-tp010_-+_-32 336 :    442 (+338,=93,-11),  87.0 %

   vs.                      :  games (   +,  =,  -),   (%) :   Diff,   SD, CFS (%)
   laser                    :    442 ( 338, 93, 11),  87.0 :   +336,   17,  100.0

3) 11248              293 :    570 (+396,=165,-9),  83.9 %

   vs.                      :  games (   +,   =, -),   (%) :   Diff,   SD, CFS (%)
   laser                    :    570 ( 396, 165, 9),  83.9 :   +293,   15,  100.0

4) laser                0 :   1578 (+23,=379,-1176),  13.5 %

   vs.                      :  games (  +,   =,    -),   (%) :   Diff,   SD, CFS (%)
   11248-tp005_-+_-32       :    566 (  3, 121,  442),  11.2 :   -366,   16,    0.0
   11248-tp010_-+_-32       :    442 ( 11,  93,  338),  13.0 :   -336,   17,    0.0
   11248                    :    570 (  9, 165,  396),  16.1 :   -293,   15,    0.0


LOS csv:
"N","NAME",0,1,2,3
0,"   11248-tp005_-+_-32",,89.4,100.0,100.0
1,"   11248-tp010_-+_-32",10.6,,97.2,100.0
2,"                11248",0.0,2.8,,100.0
3,"                laser",0.0,0.0,0.0,

@jjoshua2 jjoshua2 changed the title Patch 1 Trade penalty per Node Nov 24, 2018
@mooskagh
Copy link
Member

I'm still hesitant to add ("non-zero") changes which intend to patch some misbehavior, rather than help NN avoid that behavior.
I'm sure we have a bug in training process somewhere which prevent NN strength to progress as it should. Even if there's not such a bug, we should try to make NN learn desired behaviour (e.g. by adding features or tweaking training process) rather than adding what seems to be a hack.

@jkiliani
Copy link

This PR or #466 may not be "zero", but they seem very logical to me. To win, a chess-playing entity has to create a favorable imbalance, and agreeing to equal trades is opposed to that concept. If either PR gains significantly against AB engines, and doesn't regress in self-play, why not just use it?

@zz4032
Copy link
Contributor

zz4032 commented Nov 25, 2018

@mooskagh
It depends on the opponent's strength, if trading pieces is favorable. This is part of the concept of contempt. And this information (if opponent is stronger) can only be provided by the user because the engine can not know about it or learn for it.

In case of opponents of equal strength (not related to contempt) it still makes sense to have an "avoid trading" penalty for the winning side. I agree the engine should find it out by itself, but it requires to look far ahead and have knowledge about the endgame.

@ddobbelaere
Copy link
Contributor

ddobbelaere commented Nov 25, 2018

From a "puristic" point of view, the cleanest solution is to let the NN decide on everything, including trading down pieces.

Moreover, I disagree with the statement that you should tweak the behavior depending on the opponent. Suppose you are a chess truth oracle, you just play (one of) the best move(s), irrespectively. I am not claiming Leela is such an oracle already, but why not try to aim for that in a consequent zero spirit?

@zz4032
Copy link
Contributor

zz4032 commented Nov 25, 2018

Because chess starts with score 0.0 and even a perfect engine could run into a 3-fold draw. Would a human grandmaster ever do this to a chess rookie?
Contempt was the reason why Stockfish 9 started dominating chess tournaments.

Btw I don't know why the zero concept is still floating around. Even the bugged test10 networks on good GPUs are stronger vs. crippled Stockfish 8 than Alpha Zero on Google's hardware.

@ddobbelaere
Copy link
Contributor

ddobbelaere commented Nov 25, 2018

Let's indeed say chess is a draw (and you know it as an oracle), I didn't say that you have to make life easy for your opponent by going to the quickest 3 fold rep.

A perfect engine should also try to maximize the possibility for its (imperfect) opponent to go astray, on top of playing the best moves that don't change the outcome of the game. Keeping pieces on the board is most likely a good strategy to do that, but why not let Leela find that out by herself?

(EDIT: I now realize that there is maybe a loophole in my argument. If Leela gets better and better, she doesn't play against "weaker" opponents during selfplay to reinforce this kind of behavior, namely not trading pieces, but only against her (stronger and stronger) self. For this stronger self, it might be less important whether pieces are traded or not, if it doesn't affect the theoretical outcome of the game. This seems like a strong argument in favor of contempt during matchplay.)

@hans-ekbrand
Copy link
Contributor

hans-ekbrand commented Nov 25, 2018

As ddobbelaere points out, playing training games only against an opponent with the same strength as itself will never teach Leela to maximise Elo in tournament settings. If we want Leela to really behave a like human, and take the strength of the opponent into account, the zero way of doing it would be to provide Elo of opponent as input to the NN and include weaker versions of herself in training, and - during tournament play - include opponent Elo in the UCI protocol. However, the UCI protocol is what it is and not up to us to change, so while a zero solution would be possible and preferable, it is not likely to happen. However, all non-zero patches of playing behaviour should be also be thoroughly tested against SF-dev, since the goal is to be best, not to beat laser more convincingly, and only be merged into master if they do not perform worse against SF-dev.

@zz4032
Copy link
Contributor

zz4032 commented Nov 25, 2018

   # PLAYER           :  RATING  ERROR   GAMES  WON  DRAWN  LOST  DRAWS(%)  OppN
   1 lc0-536-tuned    :     317     35     500  396     67    37      13.4     1
   2 lc0              :     317     38     500  388     83    29      16.6     1
   3 lc0-536          :     302     36     500  381     86    33      17.2     1
   4 stockfish        :       0   ----    1500   99    236  1165      15.7     3

"lc0-536-tuned" is with parameters from the tuning I did resulting in:
auto contempt = 0.0225 * (node_to_process->piececount - 13.1);

Tuning and matches on very short TC.

"lc0-536-tuned" vs. default:

   # PLAYER           :  RATING  ERROR   GAMES  DRAWS(%)
   1 lc0              :       0   ----     500      42.8
   2 lc0-536-tuned    :       0     25     500      42.8

@evalon32
Copy link
Contributor

@hans-ekbrand The UCI protocol already has the option to pass Elo (whether it's used in any tournaments is a separate question):

= UCI_Opponent, type string
With this command the GUI can send the name, title, elo and if the engine is playing a human or computer to the engine.

@jjoshua2
Copy link
Contributor

This pr lc0tradepenalty has shown 98% LOS over defaults, enough for me to quit testing and be sure it's better at 30s games. I added it to my 30s list, its roughly identical with my PR in performance, but probably not better against Vajolet. They could both be compared to SF and this one might do a bit better, but mine also seemed to gain. Swylandia has shown me some results of his doing better against SF as well.

Rank Name                         Elo    +    - games score oppo. draws
   1 lc0contempt .0015             85   40   40   394   89%  -245   18%
   2 lc0tradepenalty               76   37   37   441   89%  -245   19%
   3 lc0contempt .0015 fpur 1.3    35   78   78    85   85%  -245   18%
   4 lc0contempt 0 fpur 1.3        28   62   62   128   86%  -245   23%
   5 lc0contempt 0                 21   35   35   394   85%  -245   25%
   6 Vajolet2_2.5 30CPU          -245   19   19  1442   12%    57   20%
ResultSet-EloRating>los
                            lc lc lc lc lc Va
lc0contempt .0015              63 86 93 99100
lc0tradepenalty             36    82 90 98100
lc0contempt .0015 fpur 1.3  13 17    55 62 99
lc0contempt 0 fpur 1.3       6  9 44    58100
lc0contempt 0                0  1 37 41   100
Vajolet2_2.5 30CPU           0  0  0  0  0```

@nblaxall
Copy link
Author

nblaxall commented Nov 25, 2018

Results so far vs SF (30s+0.5, SF only has 4 cores):

   # PLAYER                :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)  OppN
   1 11248-tp005_-+_-32    :      81     204  61.3      99   67  116   21  56.9     1
   2 11248                 :      30     204  54.2      97   45  131   28  64.2     1
   3 SF9-bmi               :       0     408  42.3     ---   49  247  112  60.5     2


Head to head statistics:

1) 11248-tp005_-+_-32 81 :    204 (+67,=116,-21),  61.3 %

   vs.                      :  games (  +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   SF9-bmi                  :    204 ( 67, 116, 21),  61.3 :    +81,   16,  100.0

2) 11248              30 :    204 (+45,=131,-28),  54.2 %

   vs.                      :  games (  +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   SF9-bmi                  :    204 ( 45, 131, 28),  54.2 :    +30,   15,   97.2

3) SF9-bmi             0 :    408 (+49,=247,-112),  42.3 %

   vs.                      :  games (  +,   =,   -),   (%) :   Diff,   SD, CFS (%)
   11248-tp005_-+_-32       :    204 ( 21, 116,  67),  38.7 :    -81,   16,    0.0
   11248                    :    204 ( 28, 131,  45),  45.8 :    -30,   15,    2.8


LOS csv:
"N","NAME",0,1,2
0,"   11248-tp005_-+_-32",,98.9,100.0
1,"                11248",1.1,,97.2
2,"              SF9-bmi",0.0,2.8,

@nblaxall
Copy link
Author

I'm wondering, if trade-penalty / piece-trade contempt works, then it should maybe have even more of an effect vs SF (rather than laser) because SF is so good at endgames ... and if so, it means that we don't have to choose to use it only vs weaker opponents etc, but we can use it all the time.

Also, it's true that Leela is weak in endgames as compared to AB engines, and part of that may be things like temperature in training and also Leela doesn't get to see many endgames, but even if that is fixed in training, Leela will still be weaker than AB engines in endgames due to the nature of MCTS vs AB, where AB really comes to life due to low branching factor of endgames (ie seeing ahead 50 - 70 ply) and also the requirement of precise play on lines (and this is why MCTS can out-perform AB in middlegames over endgames). Therefore some sort of piece-trade contempt would probably help even if Leela's endgame training is improved.

Certainty prop would help a bit, and maybe an alternative to MCTS would be better, but for now MCTS is the best we have.

@nblaxall
Copy link
Author

Another thing I'd like to mention, if some sort of piece trade penalty method is being tested, then probably it doesn't make sense to test it with very short TC, because it heavily depends on the search - ie trades are compared in the search, and at very short TC there'd be lots of noise, whereas the longer the TC, the more the piece trade penalty would have an effect.

I've tested at 30s + 0.5, and I wouldn't think less than this is a good idea? ... I'd like to see tests with longer TC's.

@nblaxall
Copy link
Author

Final results vs SF (4-threads) 30s+0.5:

   # PLAYER                :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-tp005_-+_-32    :      66     350  59.3     100  105  205   40  58.6
   2 11248                 :      18     350  52.6      94   75  218   57  62.3
   3 SF9-bmi               :       0     700  44.1     ---   97  423  180  60.4

Head to head statistics:

1) 11248-tp005_-+_-32 66 :    350 (+105,=205,-40),  59.3 %

   vs.                      :  games (   +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   SF9-bmi                  :    350 ( 105, 205, 40),  59.3 :    +66,   12,  100.0

2) 11248              18 :    350 (+75,=218,-57),  52.6 %

   vs.                      :  games (  +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   SF9-bmi                  :    350 ( 75, 218, 57),  52.6 :    +18,   11,   94.3

3) SF9-bmi             0 :    700 (+97,=423,-180),  44.1 %

   vs.                      :  games (  +,   =,   -),   (%) :   Diff,   SD, CFS (%)
   11248-tp005_-+_-32       :    350 ( 40, 205, 105),  40.7 :    -66,   12,    0.0
   11248                    :    350 ( 57, 218,  75),  47.4 :    -18,   11,    5.7

"N","NAME",0,1,2
0,"   11248-tp005_-+_-32",,99.8,100.0
1,"                11248",0.2,,94.3
2,"              SF9-bmi",0.0,5.7,

Will test longer TC next.

@zz4032
Copy link
Contributor

zz4032 commented Nov 26, 2018

Added the source code update and tuned again for long TC, tuned parameter values are very similar to those from short TC tuning but converged in less number of games:
auto contempt = 0.0258 * (node_to_process->piececount - 11.1);

Now there is clearly an improvement:

   # PLAYER           :  RATING  ERROR   GAMES  WON  DRAWN  LOST  DRAWS(%)  LOS(%)  OppN
   1 lc0-536-tuned    :     387     46     300  246     48     6      16.0   100.0     1
   2 lc0              :     281     34     300  203     93     4      31.0   100.0     1
   3 stockfish        :       0   ----     600   10    141   449      23.5     ---     2

Running "lc0-356-tuned" vs. "lc0" next....

@gonzalezjo
Copy link

It’s also useful as an analysis tool, so I wouldn’t call it a patch for a misbehavior. It is an excellent feature for trying to get different insights into a position regarding complexity. You cannot train Leela into offering different analyses, so offering a feature like this is useful. Gaining Elo against some engines is a nice side effect.

@nblaxall
Copy link
Author

Added parameter options for trade penalty so people can specify their own:

To be called with eg: --trade-penalty=0.0258 --trade-penalty2=11.1
(those are the defaults)

@gonzalezjo
Copy link

They should probably default to off.

@jjoshua2
Copy link
Contributor

jjoshua2 commented Nov 27, 2018

All 30+0.5 testing

Tried zz's clop and it was so much worse I stopped it early since swylandia also was having bad early results and I have more promising things to test.

   0 cfish 11/1 30cpu               140      64      60   69.2%   48.3%
   1 lc0v19 11248                 -108      73      30   35.0%   63.3%
   2 lc0tradepenalty zz           -176     113      30   26.7%   33.3%

The prior defaults though are stronger than mine against SF! 83% LOS

                   cf lc lc
cfish 9/9 30cpu       99 99
lc0tradepenalty     0    83
lc0contempt .0015   0 16
ResultSet-EloRating>ratings
Rank Name                Elo    +    - games score oppo. draws
   1 cfish 11/1 30cpu      47   21   21   659   62%   -24   56%
   2 lc0tradepenalty     -13   29   29   330   39%    47   58%
   3 lc0contempt .0015   -34   29   29   329   36%    47   54%

Both are nearly as strong as fast cfish dev built November 1st! regular leela is ~250 elo behind vs 60 elo or 8 elo here.

@nblaxall
Copy link
Author

nblaxall commented Nov 27, 2018

Yea I tried ZZ's clopped results and they were worse. I'll try with the 0.005 & 32 numbers I was using before and test again just to make sure I didn't make a mistake in the parameter options version (although testing on my test position showed it was fine). I'll also change the parameters to default to off as gonzalezjo suggested.

@jjoshua2 jjoshua2 mentioned this pull request Nov 27, 2018
Copy link

@navs25 navs25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Performed extensive NPS-Tuner clop to identify the best parameters for the following versions.
Followed up with gauntlets against TCEC div 3 engines at 30 min + 10 sec increment.

Config 1 v0.19 Submitted to CCC 3 – operating well to date at 78% (current td 2nd)
Config 4 v0.19-rc4 contempt and v20-dev trade prevention both returned 82.14% against TCEC DIV 3 engines with no losses.

I strongly recommend submitting v20-dev “Trade Prevention” for TCEC DIV 3 with the above recommended parameters and/ adjustments by jjosh.

The final release v20 with merge of Trade Prevention to be submitted for TCEC div 2 following parameter tuning and testing prior to div 2 starting.

There is little to zero downside risk (imo) in adopting the above approach.

@jkiliani
Copy link

Trade-penalty should probably default to off so the PR is neutral for people that don't like the concept (@mooskagh), but there isn't really a good reason not to use this in TCEC/CCC. The tunings show a rather convincing benefit in my opinion.

@navs25
Copy link

navs25 commented Nov 27, 2018

Trade Penalty is no penalty, it is an advantage!
Perhaps we should rename the concept to “Positional Complexity”
This reflects what it is doing - playing to Leela’s positional strength. (Which is the opposite of ab-engines tactical simplicity)

@zz4032
Copy link
Contributor

zz4032 commented Nov 27, 2018

Selfplay with my tuned values didn't go well:

   # PLAYER           :  RATING  ERROR   GAMES  WON  DRAWN  LOST  DRAWS(%)  LOS(%)  OppN
   1 lc0              :       0   ----     300  168    107    25      35.7   100.0     1
   2 lc0-536-tuned    :    -182     31     300   25    107   168      35.7     ---     1

Now I will try tuning 1) at selfplay and 2) at selfplay alternating with playing weaker stockfish (equal opponent and weaker opponent simultaneously).

@nblaxall
Copy link
Author

nblaxall commented Nov 27, 2018

I re-ran the 30s + 0.5 vs SF for piece of mind, because my previous test was using v19 rc5 and also didn't use the -0.9999 to 0.9999 bit ... the PR is using v19 release and it's also using:

  if(node_to_process->v < -1.0) {
    node_to_process->v = -0.9999;
  } else if (node_to_process->v > 1.0) {
    node_to_process->v = 0.9999;
  }

... the results of the new 30s + 0.5 vs SF that I ran with up-to-date PR are:

   # PLAYER               :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-tp0.005_-32    :      57     244  58.0      85   70  143   31  58.6
   2 11248                :      36     244  55.1     100   55  159   30  65.2
   3 SF9-bmi              :       0     488  43.4     ---   61  302  125  61.9

"N","NAME",0,1,2
0,"    11248-tp0.005_-32",,85.4,100.0
1,"                11248",14.6,,99.7
2,"              SF9-bmi",0.0,0.3,

So it's still good - not quite as good but still good.
Maybe the 0.9999 part isn't needed and the if statement is slowing it down a very little bit?
I'm tempted to take that if statement out.

@zz4032
Copy link
Contributor

zz4032 commented Nov 28, 2018

Tuning result in selfplay and vs. 300Elo weaker Stockfish simultaenously:
auto contempt = 0.001 * (node_to_process->piececount - 27.7);
That's much closer now to the initial values others are using. Going to try that in matches next.

Edit: The difference in play with parameter changes above was negligible. I retuned vs. weaker SF again but with narrower bounds ([-0.01;0.01] for the first parameter) and this time I got:
auto contempt = 0.0031 * (node_to_process->piececount - 22.9);
This is close to what others have been testing.
Matches with 0.0031/22.9:

   # PLAYER           :  RATING  ERROR   GAMES  WON  DRAWN  LOST  DRAWS(%)  LOS(%)  OppN
   1 lc0-536-tuned    :     274     35     300  201     94     5      31.3    81.1     1
   2 lc0              :     245     56     131   80     50     1      38.2   100.0     1
   3 stockfish        :       0   ----     431    6    144   281      33.4     ---     2

Selfplay:

   # PLAYER           :  RATING  ERROR   GAMES  WON  DRAWN  LOST  DRAWS(%)  LOS(%)  OppN
   1 lc0              :       0   ----     378   40    308    30      81.5    90.0     1
   2 lc0-536-tuned    :      -9     14     378   30    308    40      81.5     ---     1

@nblaxall
Copy link
Author

nblaxall commented Dec 5, 2018

I'm continuing to test this tp version with a lot of different values and ideas, but so far the one submitted to tcec keeps coming out of top (ie the 0.005 & 32 one) - at lease for short time controls, as can be seen here (30s+0.5):

   # PLAYER             :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-tp_005_32    :     391     144  90.3     100  116   28    0  19.4
   2 11248              :     270     144  82.3     100   97   43    4  29.9
   3 laser              :       0     288  13.7     ---    4   71  213  24.7

... and if I combine that version with previous 30s+0.5 results:

   # PLAYER             :  RATING  PLAYED   (%)  CFS(%)    W    D     L  D(%)
   1 11248-tp_005_32    :     370     710  89.1     100  558  149     3  21.0
   2 11248              :     287     714  83.6     100  493  208    13  29.1
   3 laser              :       0    1424  13.7     ---   16  357  1051  25.1

(I believe jjosh & Navs are testing longer time controls.)

@nblaxall
Copy link
Author

nblaxall commented Dec 5, 2018

...and combining the 30s+0.5 tests vs SF9:

   # PLAYER             :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-tp_005_32    :      56     808  57.9      98  227  481  100  59.5
   2 11248              :      31     678  54.4     100  156  426   96  62.8
   3 SF9-bmi            :       0    1486  43.7     ---  196  907  383  61.0

@nblaxall
Copy link
Author

nblaxall commented Dec 7, 2018

The 3-fold repitition draw: rofChade vs Lc0 TP in tcec s14 div3 2018-12-05:

  1. Nf3 Nf6 2. Nc3 g6 3. e4 d6 4. d4 c6 5. Be2 Bg7 6. O-O Qc7 7. a4 O-O 8. Be3 e5 9. dxe5 dxe5 10. Qc1 a5 11. h3 Re8 12. Rd1 Bf8 13. Bg5 Nbd7 14. Bc4 Nh5 15. Qe3 Nc5 16. Bh6 Ng7 17. Qg5 Nge6 18. Qh4 Bxh6 19. Qxh6 Nf8 20. Nh2 Qe7 21. Nf3 Qc7 22. Nh2 Qe7 23. Nf3 Qc7 1/2-1/2
    ... standard Leela also wants to take the draw (even after a long think).

SF wants to do 22...Kg8 after a while of thinking.
The TP version also wanted to take the draw, but using --trade-penalty2=16.0 instead of 32 give contempt (ie in it makes Leela thinks she's winning by more) if the number of piece is greater than 16.
Using --trade-penalty=0.005 --trade-penalty2=16.0 avoids the repetition, and after 244313 nodes (pretty quick!) Leela settles on 22...Kg8 (see: https://pastebin.com/sQCQrj0A).

Using --trade-penalty=0.005 --trade-penalty2=16.0 instead of --trade-penalty=0.005 --trade-penalty2=32.0 tests just as well so far (not many games) using 30s+0.5:

   # PLAYER             :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-tp_005_16    :     361     202  88.6      58  159   40    3  19.8
   2 11248-tp_005_32    :     353     202  88.1      99  156   44    2  21.8
   3 11248              :     269     182  82.1     100  121   57    4  31.3
   4 laser              :       0     586  13.6     ---    9  141  436  24.1

It's interesting that tests show there's not a lot in it for elo with values of --trade-penalty2 between 32 & 16 (and even 36 seems fine) but using 0 loses some elo, and -16 loses more, and -32 loses so much in tests that it's weaker than laser.

So it looks like --trade-penalty=0.005 is giving the penalty for trading pieces, and --trade-penalty2=16.0 is giving standard contempt (due to the calculation that's used, in the above position it would be 24 pieces - 16).

@nblaxall
Copy link
Author

nblaxall commented Dec 8, 2018

For v0.19.1-rc2 vs v0.19.1-rc2 tp (30s+0.5) tested:

   # PLAYER                          :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-v0.19.1-rc2-tp2_005_16    :     374     240  89.0      98  190   47    3  19.6
   2 11248-v0.19.1-rc2               :     309     240  84.8     100  169   69    2  28.8
   3 laser                           :       0     480  13.1     ---    5  116  359  24.2

... ie tp still helps

@nblaxall
Copy link
Author

nblaxall commented Dec 9, 2018

For v0.19.1-rc2 vs SF9 (30s+0.5), although it initially started out well, it has since not done so well in my test:

   # PLAYER                         :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-v0.19.1-rc2              :      52     234  57.3      92   57  154   23  65.8
   2 11248-v0.19.1-rc2-tp_005_16    :      24     234  53.4      97   54  142   38  60.7
   3 SF9-bmi                        :       0     468  44.7     ---   61  296  111  63.2

... the number of games isn't great and so might be due to noise, but it looks as that tp isn't that helpful vs SF (even though it seemed to be for v0.19.0).

@nblaxall
Copy link
Author

I tried v0.19.1-rc2 tp vs SF with a longer TC of 2m+2s (120+2) so that variable cpuct could kick in (and also my theory is that tp does well with deeper search):

   # PLAYER                         :  RATING  PLAYED   (%)  CFS(%)    W    D    L  D(%)
   1 11248-v0.19.1-rc2-tp_005_16    :      87      50  62.0      86   13   36    1  72.0
   2 11248-v0.19.1-rc2              :      50      50  57.0      99    9   39    2  78.0
   3 SF9-bmi                        :       0     100  40.5     ---    3   75   22  75.0

... not many games yet but at least it's promising so far.

Next I'll actually merge rc2 into this PR so others can test too.

@nblaxall
Copy link
Author

I broke git trying to upgrade to rc2 ... I google and asked on discord and tried heaps of things (spent ages on this), but I think I've broken git more.

When I do a git clone: git clone -b patch-1 --recurse-submodules https://github.com/nblaxall/lc0.git
... and compile it, I get these errors:

In file included from ../../src/neural/network_legacy.cc:19:0:
../../src/neural/network_legacy.h:55:46: error: ‘SEunit’ in ‘class pblczero::Weights’ does not name a type
     explicit SEunit(const pblczero::Weights::SEunit& se);
                                              ^~~~~~
../../src/neural/network_legacy.cc:55:56: error: ‘SEunit’ in ‘class pblczero::Weights’ does not name a type
 LegacyWeights::SEunit::SEunit(const pblczero::Weights::SEunit& se)
                                                        ^~~~~~
../../src/neural/network_legacy.cc: In constructor ‘lczero::LegacyWeights::SEunit::SEunit(const int&)’:
../../src/neural/network_legacy.cc:56:26: error: request for member ‘w1’ in ‘se’, which is of non-class type ‘const int’
     : w1(LayerAdapter(se.w1()).as_vector()),
                          ^~
../../src/neural/network_legacy.cc:57:26: error: request for member ‘b1’ in ‘se’, which is of non-class type ‘const int’
       b1(LayerAdapter(se.b1()).as_vector()),
                          ^~
../../src/neural/network_legacy.cc:58:26: error: request for member ‘w2’ in ‘se’, which is of non-class type ‘const int’
       w2(LayerAdapter(se.w2()).as_vector()),
                          ^~
../../src/neural/network_legacy.cc:59:26: error: request for member ‘b2’ in ‘se’, which is of non-class type ‘const int’
       b2(LayerAdapter(se.b2()).as_vector()) {}
                          ^~
../../src/neural/network_legacy.cc: In constructor ‘lczero::LegacyWeights::Residual::Residual(const Residual&)’:
../../src/neural/network_legacy.cc:64:19: error: ‘const Residual {aka const class pblczero::Weights_Residual}’ has no member named ‘se’
       se(residual.se()),
                   ^~
../../src/neural/network_legacy.cc:65:23: error: ‘const Residual {aka const class pblczero::Weights_Residual}’ has no member named ‘has_se’
       has_se(residual.has_se()) {}
                       ^~~~~~
../../src/neural/network_legacy.cc: In constructor ‘lczero::LegacyWeights::ConvBlock::ConvBlock(const ConvBlock&)’:
../../src/neural/network_legacy.cc:70:36: error: ‘const ConvBlock {aka const class pblczero::Weights_ConvBlock}’ has no member named ‘bn_gammas’; did you mean ‘bn_means’?
       bn_gammas(LayerAdapter(block.bn_gammas()).as_vector()),
                                    ^~~~~~~~~
../../src/neural/network_legacy.cc:71:35: error: ‘const ConvBlock {aka const class pblczero::Weights_ConvBlock}’ has no member named ‘bn_betas’; did you mean ‘bn_means’?
       bn_betas(LayerAdapter(block.bn_betas()).as_vector()),
                                   ^~~~~~~~

@killerducky killerducky mentioned this pull request Dec 11, 2018
@jjoshua2 jjoshua2 closed this Jan 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.