Progress stalled? #34

Vandertic · 2019-11-02T16:37:56Z

I just wanted to reassure everyone that if the progress stalls we are going to increase the visits and we believe that in a few generations the upgrade will restore a good rate of improvement

kennyfs · 2019-11-02T22:38:36Z

Yes, I also think it stalls.

kennyfs · 2019-11-02T23:42:37Z

If old nets beat the newest one with winrate>60%, it obviously stalls.

l1t1 · 2019-11-03T06:38:03Z

50 40 30 games are all too small to get a true winrate of two similar strength nets
eg.
sai52 beat sai51 with(65.45%) in 50 game match,
but
sai51 beat sai52 with(58.82%) in 30 game match

kennyfs · 2019-11-03T09:37:33Z

Yes, the smaller match tests if the new net is terrible, but we need bigger match to test whether it stalled. l1t1 <notifications@github.com> 於 2019年11月3日週日 14:38 寫道：

…

50 40 30 games are all too small to get a true winrate of two similar strength nets eg. sai52 beat sai51 with(65.45%) in 50 game match, but sai51 beat sai52 with(58.82%) in 30 game match — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#34>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI3VVFNBS3ETRFOOOJDJ733QRZWUZANCNFSM4JIGS5GA> .

Vandertic · 2019-11-03T11:32:10Z

Visits increased to 1200

kennyfs · 2019-11-03T12:14:47Z

How to judge whether it stalls?
How to decide to increase visits or to enlarge net size?

Vandertic · 2019-11-03T22:12:04Z

There's no general rule. We try to understand what's happening and make decisions accordingly

l1t1 · 2019-11-04T07:25:22Z

did the training games of a generation increase after training with 1200 vists?

Vandertic · 2019-11-04T10:36:08Z

No, there was a problem with the script. Corrected this morning.

l1t1 · 2019-11-04T10:41:05Z

suggest set training games to a const value winrate of promote, eg, set sai 55's games to 200000.33=6600 games, sai 54=20000*0.64=12800 games, so that the stronger produce more higher quality games

Vandertic · 2019-11-04T11:12:28Z

@l1t1 It is a bit complicated to realize, but I will think about this.

Vandertic · 2019-11-08T23:12:18Z

Visits increased to 1600 from next generation

Vandertic · 2019-11-15T16:09:24Z

If you are wandering what happened with the present promotion, with this sudden and huge strength jump... Previously there was a stupid error in the training script. The rate used was never dropped as intended, and was still 0.05. Now we are quenching (slowly). The first change is from 0.05 to 0.01. (I thought I was training at 0.00025.)

sheeryjay · 2019-11-15T17:00:38Z

@Vandertic would it be possible to publish the parameters used (in repository)? Because as it is nobody could have spotted this error without asking for them, while if they were in the repository, people could see and maybe spot the error.

Similarly with the webpage and mentions about change from 800->1200->1600 visits. If people do not watch this issue they don't have a chance to even know when the change happened.

Vandertic · 2019-11-15T17:43:24Z

You are right. Will be done ASAP. (I have a conference Tuesday, so after that.)

Vandertic · 2019-11-15T22:14:18Z

@sheeryjay just a quick update before I write some real documentation next week: the bug was (still is) on the repository:

There rate should be configured here:
https://github.com/sai-dev/sai/blob/214a68f54e07f484086faa8ba15a6cfe91da6821/training/tf/config.py#L93

But then it is not used here:
https://github.com/sai-dev/sai/blob/214a68f54e07f484086faa8ba15a6cfe91da6821/training/tf/tfprocess.py#L198

Vandertic · 2019-11-15T22:21:53Z

BTW, I am temporarily increasing the number of nets for promotion and the steps between each net and the following one, so as to get an updated measure of which numbers make sense with the current training rate.

l1t1 · 2019-11-16T18:08:24Z

did sai 96 update the learn rate again?

Vandertic · 2019-11-16T19:04:31Z

Yes. Down to 0.004. That's why we have also more promotion candidates for this generation

Vandertic · 2019-11-23T21:09:21Z

We are at 0.0005 since generation 108. Almost stalled. Time to scale-up to 9x192, as soon as the training of that structure reaches current network in a couple of days.

barrtgt · 2019-11-24T11:00:11Z

Heatmap of SAI net # 116
d9cf4b3795f3ca47f19d3942630b386b3781d33c8d862de83b5233f96cb47a65

  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0  33  24   2   0   0   1   0   0   0   1   1   1   2  23  31   0   0
  0   0  26  90   1   1   0   1   1   1   1   0   1   1   1  85  24   0   0
  0   0   2   1   0   0   0   0   0   0   0   0   0   0   0   1   2   0   0
  0   0   1   1   0   0   0   0   0   1   0   0   0   0   0   1   1   0   0
  0   0   1   1   0   0   1   1   1   1   1   0   1   0   0   0   0   0   0
  0   0   1   1   0   0   0   1   1   2   1   1   0   0   0   0   1   0   0
  0   0   0   0   0   0   0   1   1   1   1   1   1   1   0   0   0   0   0
  0   0   0   1   0   0   1   2   1   6   1   2   1   1   0   0   0   0   0
  0   0   0   0   0   0   1   1   1   1   1   1   1   0   0   0   0   0   0
  0   0   1   0   1   0   1   1   1   2   1   1   1   0   0   0   1   0   0
  0   0   1   0   0   0   1   1   1   1   1   1   1   0   0   0   1   0   0
  0   0   1   1   0   0   0   0   1   1   1   0   0   0   0   1   1   0   0
  0   0   2   1   0   0   0   0   0   0   0   0   0   0   0   1   2   0   0
  0   0  26  84   1   1   1   0   0   0   0   0   1   1   1  81  24   0   0
  0   0  36  27   2   1   1   1   0   0   0   0   0   1   2  24  31   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

Heatmap of the final 6x128 LZ net, # 91
b3b00c6d75b4e74946a97b88949307c9eae2355a88f518ebf770c7758f90e357

  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   1  73   0   0   0   0   0   0   0   0   0   0   0  67   0   0   0
  0   0  76  95   0   0   0   0   0   0   0   0   0   0   0  97  63   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0  66  88   0   0   0   0   0   0   0   0   0   0   0 102  69   0   0
  0   0   0  62   0   0   0   0   0   0   0   0   0   0   0  76   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

Policy with --nrsymm:

net	33+34+44	peak 1st line
116	67.4	0.5
111	60.9	0.5
106	56.3	0.5
101	54.6	0.4
96	44.1	0.6
91	44.1	0.7
86	29.8	0.9
81	20.5	1
76	16.8	1.3
71	13.4	1.3
66	10.6	1.3
61	7.7	1.4
56	7.1	1.5
51	7.3	1.7
41	5.9	1.8
31	5.9	1.8
21	4.5	2.2
11	4.4	2.2
1	3.8	2.6

barrtgt · 2019-11-24T14:24:41Z

There is always the --randomvisits command, or you could do like MuZero and reduce temp to 0.5 until it stalls again, then to 0.25.

Vandertic · 2019-11-24T16:48:32Z

Visits raised to 2400

l1t1 · 2019-11-25T00:09:26Z

it seems good
2019-11-25 00:32 8c46b273 VS 7eb9f119 38 : 1 : 11 (77.00%) 50 / 50 promotion

nanzi · 2019-11-25T02:56:01Z

B&W are moving out from 33 opening style. It reminds me that minigoV16 wights also have the same opening style as black opening with 33+44(later w has a komoku-Joseki
+44). This might be a local optimal situation in my own opinion.

l1t1 · 2019-11-25T08:53:53Z

3 games first 4 moves are same.
autogtp -g 3 -k sga --url http://sai.unich.it/ --username aaa --password aaa

1 (B Q17) 1 (B Q17) 1 (B Q17) 2 (W C3) 2 (W C3) 2 (W C3) 3 (B D4) 3 (B D4) 3 (B D4) 4 (W C4) 4 (W 
C4) 4 (W C4) 5 (B D5) 5 (B D17) 5 (B D5) 6 (W R4) 6 (W Q3) 6 (
W D17) 7 (B C5) 7 (B Q5) 7 (B C5) 8 (W E2)

nanzi · 2019-11-25T10:53:09Z

In my visits 400 match tests, SAI120/LZ081 got 51/47(100total) and 194/197(400total).
Both-side-pass games were not counted.

Sgf files are commented in simplified Chinese.

barrtgt · 2019-11-25T12:59:45Z

Are reference and comparison games staying at 1600 visits?

Glrr · 2019-12-09T11:07:27Z

It's a bit frustrating indeed. I was convinced that 6x128 had not reached its max yet, but I expected that 9x192 would progress faster.

cryptsport · 2019-12-09T12:37:42Z

a large network is progressing more slowly, look at Leela Zero of 10 and 15 blocks - growth is not as fast as 5 blocks. and compare what SAI and Leela Zero have achieved after 700,000 games! what was the level of Leela Zero? 5x64, elo1650!

Vandertic · 2019-12-09T13:54:27Z

I agree with @cryptpark, but if you have any idea on how to improve faster, we are happy to consider options.

Glrr · 2019-12-09T14:11:46Z

a large network is progressing more slowly, look at Leela Zero of 10 and 15 blocks - growth is not as fast as 5 blocks.

LZ's progresses were slower at 15 blocks because it was stronger. Look at AGZ and AZ: their ratings grow very quickly at the beginning even if the size of the net doesn't change.

The main reason for switching from 6x128 to 9x192 was that the progress is suppose to stall at some point, and the fact that the 9x192 net was stronger looks like an evidence that 6x128 was close to its max (even is some progress was still possible). I don't see any reason for the 9x192 not to learn faster, at least at the beginning. But i agree that it's not an issue. A concern at most.

nanzi · 2019-12-10T03:35:54Z

Win-rate on empty board is a good indicator which was found from 9x9 training.
Black will have smaller and smaller win-rate gradually until 0.

In recent leelaz weights 249-254-255rc white's win-rates are 56.8%-57.1%-57.3%.
The current leelaz's situation is very consistent with the previous theory.

In recent SAI weights 140-145-150 white's win-rates are 51.4%-50.8%-50.3%, coz SAI hasn't achieved leelaz's strength. Black has found more skills to avoid losing games which we would see too on 6*128 if 6*128 were going on training IMO.

Today with SAI152-153 white's win-rates are 50.3%-50.8% rising again, while we can compare opening variances from 140 to 153 in 40K visits and notice that SAI knows 33 is a solid move avoiding battle、 slow and may be phased out. In SAI153 promotion match I saw a lot good moves in 44 and 43 corner such as in match 24 moves after 20th, it's early stage of the most fashion star joseki(move 6th and 11th corner enclosure and invasion are also amazing). I guess in 160's matches there wouldn't be 33 opening and AI fashion joseki would be seen.

Although recent elos are very close(9550), I think progress is still obvious.

sheeryjay · 2019-12-10T20:47:43Z

Is the learning rate correct? Shouldn't the parameter be perhaps higher?

Reason why I am asking is that current behaviour of promotions reminds me of situation when the parameter was to high and we were getting some nets that got 0% promotion wins, except that this seems just like what happened when gradually the parameter was rightfully lowered.

My naive and intuitive guess would be that after training up to the first released 9x192, the learning rate would be increased to allow the learning process to make bigger jumps initially. Or this would not help? Having 12+ networks make this slow advancement is kind of weird and I guess unexpected at this point?

l1t1 · 2019-12-10T22:09:15Z

maybe shift to 15x192? minigo and katago did not traning small nets

l1t1 · 2019-12-11T00:00:04Z

155 is weaker than many prior promoted nets in refer matches, why its elo is higher them?
update its elo is correct now

num	Upload Date	Hash	Size	Elo	Games	Training num
155	2019-12-10 21:33	6e6c3cc1	9x192	9542	284	735018

2019-12-11 00:29	ca7661d0 VS 6e6c3cc1	21 : 3 : 18 (53.57%)	42 / 40	reference
2019-12-11 00:29	7826692b VS 6e6c3cc1	20 : 4 : 22 (47.83%)	46 / 40	reference
2019-12-11 00:29	39fab1a2 VS 6e6c3cc1	22 : 3 : 19 (53.41%)	44 / 40	reference
2019-12-11 00:29	72d2f752 VS 6e6c3cc1	22 : 1 : 19 (53.57%)	42 / 40	reference
2019-12-11 00:29	2ff1f06d VS 6e6c3cc1	20 : 3 : 23 (46.74%)	46 / 40	reference

num	Upload Date	Hash	Size	Elo	Games	Training num
155	2019-12-10 21:33	6e6c3cc1	9x192	9609	33	735018
154	2019-12-10 12:46	ccf56a7e	9x192	9549	6106	729306
153	2019-12-10 02:23	2ff1f06d	9x192	9574	5188	725066
152	2019-12-09 15:57	72d2f752	9x192	9543	5168	718863
151	2019-12-09 06:18	776bf08c	9x192	9541	5201	713813
150	2019-12-08 20:13	c8a192b6	9x192	9537	5195	709049
149	2019-12-08 11:16	39fab1a2	9x192	9546	5190	703947
148	2019-12-07 22:08	b6628bb4	9x192	9505	5194	698108
147	2019-12-07 13:59	75e000fd	9x192	9545	5174	693547
146	2019-12-06 22:12	7826692b	9x192	9502	5205	687454
145	2019-12-06 10:35	789a72b1	9x192	9484	5180	682303
144	2019-12-06 01:25	e28e15d2	9x192	9475	5187	677188
143	2019-12-05 18:55	ca7661d0	9x192	9511	5238	672799

Glrr · 2019-12-11T06:59:15Z

Even if the blue line on the rating graph looks flat, the cloud of green dots seems to rise faster. This is somewhat puzzling. Wait and see...

l1t1 · 2019-12-11T07:37:05Z

i think the green dot elo is calculated after promote match, if some of them wont play in other match, their elo wont change.

Vandertic · 2019-12-11T08:31:37Z

Is the learning rate correct? Shouldn't the parameter be perhaps higher?

For every generation the (at the moment 11) candidate networks come from three different training experiments (or replications). We change rate and window size, so we are dynamically trying to understand what training rate and window is best at all times.

But I agree with you and after some generations where the higher training rate showed to work at least as well as the lower, now we may try to raise it again a step.

Vandertic · 2019-12-11T08:34:19Z

maybe shift to 15x192? minigo and katago did not traning small nets

It would slow down progress terribly. I think it's too early.

I think we may wait a little bit and see if we get improvements. SAI153 showed the best performance to date against LZ092.

Vandertic · 2019-12-11T08:34:35Z

i think the green dot elo is calculated after promote match, if some of them wont play in other match, their elo wont change.

Correct

Vandertic · 2019-12-11T08:36:57Z

By the way, we will add a few useful commits in the next week or two and then make a new release. Any suggestions are welcome.
Minimal updates will be the most recent LZ commits, a small tweak to improve ladder reading (not much, but hopefully enough to allow ladder learning) and hopefully Lizzie compatibility

Glrr · 2019-12-11T08:56:01Z

i think the green dot elo is calculated after promote match, if some of them wont play in other match, their elo wont change.

Of course, that's why there are so many green dots above the blue line. Nonetheless, it can give a trend on a dozen of generations.

Glrr · 2019-12-11T19:43:02Z

Well, we are now below the first 9x192. The situation is becoming more and more concerning. I would be happy if i had any clue about what is happening. Could it be just an issue with the evaluation of the nets ?

trinetra75 · 2019-12-11T21:47:14Z

I tried to compare the last network 6e65 against the one with the highest elo 4 generations before, on an empty Goban and some middle game positions, but nothing major came out of the comparison...
Only potential difference is that the newer network seems to have a slightly sharper policy...

l1t1 · 2019-12-12T00:41:27Z

my test shows in the four 9x192 nets that play against lz 250 v1, lz 139 is the best
#47

Vandertic · 2019-12-12T17:35:59Z

Maybe the fall of SAI157 changed things and now we will start to improve. We increased somewhat the training rate as suggested. We are also increasing a bit the number of candidates for one or two generations, to test parameters more thoroughly.

Vandertic · 2019-12-14T11:11:31Z

I have done some analysis on the pipeline hyperparameters. You can find it here.

dbosst · 2019-12-14T12:32:31Z

I have done some analysis on the pipeline hyperparameters. You can find it here.

That was impressive you could reverse engineer all that information!

So in summary if the training falls below ~0.4 elo on average for a long period of time then the training window size should be experimented with more?

barrtgt · 2019-12-16T23:25:21Z

Since Deepmind anchored Alphago to human elo, doesn't that change things a bit?

Vandertic · 2019-12-17T07:05:26Z

Not for the derivative. It is just an additive constant.

Vandertic · 2019-12-17T09:01:27Z

So in summary if the training falls below ~0.4 elo on average for a long period of time then the training window size should be experimented with more?

I would say that if we fall below 1-2 elo/generation before we reach a much higher level than we have a problem and should generally check for solutions. (Increase network size, improve training somehow, ...)

barrtgt · 2020-01-16T04:29:20Z

Improvement per gen seemed to have slowed quite a bit since we had that huge boost of clients. Sai has improved about 1.5 elo per gen since net 214. Does sai training use swa? If not, it should allow us to crank up the learning rate a lot and see if it's just stuck in a local minima.

Vandertic · 2020-01-20T17:43:38Z

We are trying different things and occasionally we also make quick attempts to increase the rate, but it seems that current progress is the best we can get, for now.

Consider that progress is now roughly as fast as it was for LZ at this stage. And we are clearly slowly improving also against LZ. In fact LZ092 is finally well behind and LZ098 was beaten today for the first time.

We are also prepared in case there is again a boost of clients, this time.

And, no, we are not using SWA.

Edit: BTW, the playing style of SAI changed quite a lot in the last 20-30 generations, despite the apparent small increase in strength, so maybe we are really improving steadily. I also believe that matches are not the best way to prove how good is a net at this high level of play. As you can see, there should be 400 Elo points between LZ103 and LZ113 (from LZ values), but we have similar performances against the two nets, implying that the huge progress between LZ103 and LZ113 does not immediately convert in matches performance.

nanzi · 2020-01-21T02:06:52Z

I have made a bar chart to show sai's policy rate on empty board with komi 7.5 .

When I run sai253, as visits go higher from 30k to 300k, komoku(4-3point) becomes more preferred in four corners.

Happy lunar new year !

kennyfs · 2020-01-21T02:16:41Z

Line chart may be better, and smooth one is much better.

Vandertic · 2020-02-14T17:48:51Z

Hello! We are moving to 12x256 with the next promotion. Stay tuned!

Vandertic added the good first issue Good for newcomers label Nov 2, 2019

Nazgand mentioned this issue Nov 15, 2019

what happened on the training of sai 92 #52

Closed

Vandertic mentioned this issue Dec 10, 2019

On the choice of self-play hyper-parameters. #64

Closed

Progress stalled? #34

Progress stalled? #34

Comments

Vandertic commented Nov 2, 2019

kennyfs commented Nov 2, 2019

kennyfs commented Nov 2, 2019

l1t1 commented Nov 3, 2019

kennyfs commented Nov 3, 2019 via email

Vandertic commented Nov 3, 2019

kennyfs commented Nov 3, 2019 • edited Loading

Vandertic commented Nov 3, 2019

l1t1 commented Nov 4, 2019

Vandertic commented Nov 4, 2019

l1t1 commented Nov 4, 2019

Vandertic commented Nov 4, 2019

Vandertic commented Nov 8, 2019

Vandertic commented Nov 15, 2019

sheeryjay commented Nov 15, 2019

Vandertic commented Nov 15, 2019

Vandertic commented Nov 15, 2019

Vandertic commented Nov 15, 2019

l1t1 commented Nov 16, 2019

Vandertic commented Nov 16, 2019

Vandertic commented Nov 23, 2019

barrtgt commented Nov 24, 2019

barrtgt commented Nov 24, 2019

Vandertic commented Nov 24, 2019

l1t1 commented Nov 25, 2019

nanzi commented Nov 25, 2019 • edited Loading

l1t1 commented Nov 25, 2019

nanzi commented Nov 25, 2019

barrtgt commented Nov 25, 2019 • edited Loading

Glrr commented Dec 9, 2019

cryptsport commented Dec 9, 2019

Vandertic commented Dec 9, 2019

Glrr commented Dec 9, 2019

nanzi commented Dec 10, 2019 • edited Loading

sheeryjay commented Dec 10, 2019

l1t1 commented Dec 10, 2019

l1t1 commented Dec 11, 2019 • edited Loading

Glrr commented Dec 11, 2019

l1t1 commented Dec 11, 2019

Vandertic commented Dec 11, 2019

Vandertic commented Dec 11, 2019

Vandertic commented Dec 11, 2019

Vandertic commented Dec 11, 2019

Glrr commented Dec 11, 2019 • edited Loading

Glrr commented Dec 11, 2019 • edited Loading

trinetra75 commented Dec 11, 2019

l1t1 commented Dec 12, 2019

Vandertic commented Dec 12, 2019

Vandertic commented Dec 14, 2019

dbosst commented Dec 14, 2019 • edited Loading

barrtgt commented Dec 16, 2019

Vandertic commented Dec 17, 2019

Vandertic commented Dec 17, 2019

barrtgt commented Jan 16, 2020 • edited Loading

Vandertic commented Jan 20, 2020 • edited Loading

nanzi commented Jan 21, 2020 • edited Loading

kennyfs commented Jan 21, 2020

Vandertic commented Feb 14, 2020

kennyfs commented Nov 3, 2019 •

edited

Loading

nanzi commented Nov 25, 2019 •

edited

Loading

barrtgt commented Nov 25, 2019 •

edited

Loading

nanzi commented Dec 10, 2019 •

edited

Loading

l1t1 commented Dec 11, 2019 •

edited

Loading

Glrr commented Dec 11, 2019 •

edited

Loading

Glrr commented Dec 11, 2019 •

edited

Loading

dbosst commented Dec 14, 2019 •

edited

Loading

barrtgt commented Jan 16, 2020 •

edited

Loading

Vandertic commented Jan 20, 2020 •

edited

Loading

nanzi commented Jan 21, 2020 •

edited

Loading