Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress stalled? #34

Open
Vandertic opened this issue Nov 2, 2019 · 83 comments
Open

Progress stalled? #34

Vandertic opened this issue Nov 2, 2019 · 83 comments
Labels
good first issue Good for newcomers

Comments

@Vandertic
Copy link
Member

I just wanted to reassure everyone that if the progress stalls we are going to increase the visits and we believe that in a few generations the upgrade will restore a good rate of improvement

@Vandertic Vandertic added the good first issue Good for newcomers label Nov 2, 2019
@kennyfs
Copy link

kennyfs commented Nov 2, 2019

Yes, I also think it stalls.

@kennyfs
Copy link

kennyfs commented Nov 2, 2019

If old nets beat the newest one with winrate>60%, it obviously stalls.

@l1t1
Copy link

l1t1 commented Nov 3, 2019

50 40 30 games are all too small to get a true winrate of two similar strength nets
eg.
sai52 beat sai51 with(65.45%) in 50 game match,
but
sai51 beat sai52 with(58.82%) in 30 game match

@kennyfs
Copy link

kennyfs commented Nov 3, 2019 via email

@Vandertic
Copy link
Member Author

Visits increased to 1200

@kennyfs
Copy link

kennyfs commented Nov 3, 2019

How to judge whether it stalls?
How to decide to increase visits or to enlarge net size?

@Vandertic
Copy link
Member Author

There's no general rule. We try to understand what's happening and make decisions accordingly

@l1t1
Copy link

l1t1 commented Nov 4, 2019

did the training games of a generation increase after training with 1200 vists?

@Vandertic
Copy link
Member Author

No, there was a problem with the script. Corrected this morning.

@l1t1
Copy link

l1t1 commented Nov 4, 2019

suggest set training games to a const value winrate of promote, eg, set sai 55's games to 200000.33=6600 games, sai 54=20000*0.64=12800 games, so that the stronger produce more higher quality games

@Vandertic
Copy link
Member Author

@l1t1 It is a bit complicated to realize, but I will think about this.

@Vandertic
Copy link
Member Author

Visits increased to 1600 from next generation

@Vandertic
Copy link
Member Author

If you are wandering what happened with the present promotion, with this sudden and huge strength jump... Previously there was a stupid error in the training script. The rate used was never dropped as intended, and was still 0.05. Now we are quenching (slowly). The first change is from 0.05 to 0.01. (I thought I was training at 0.00025.)

@sheeryjay
Copy link

@Vandertic would it be possible to publish the parameters used (in repository)? Because as it is nobody could have spotted this error without asking for them, while if they were in the repository, people could see and maybe spot the error.

Similarly with the webpage and mentions about change from 800->1200->1600 visits. If people do not watch this issue they don't have a chance to even know when the change happened.

@Vandertic
Copy link
Member Author

You are right. Will be done ASAP. (I have a conference Tuesday, so after that.)

@Vandertic
Copy link
Member Author

@sheeryjay just a quick update before I write some real documentation next week: the bug was (still is) on the repository:

There rate should be configured here:
https://github.com/sai-dev/sai/blob/214a68f54e07f484086faa8ba15a6cfe91da6821/training/tf/config.py#L93

But then it is not used here:
https://github.com/sai-dev/sai/blob/214a68f54e07f484086faa8ba15a6cfe91da6821/training/tf/tfprocess.py#L198

@Vandertic
Copy link
Member Author

BTW, I am temporarily increasing the number of nets for promotion and the steps between each net and the following one, so as to get an updated measure of which numbers make sense with the current training rate.

@l1t1
Copy link

l1t1 commented Nov 16, 2019

did sai 96 update the learn rate again?

@Vandertic
Copy link
Member Author

Yes. Down to 0.004. That's why we have also more promotion candidates for this generation

@Vandertic
Copy link
Member Author

We are at 0.0005 since generation 108. Almost stalled. Time to scale-up to 9x192, as soon as the training of that structure reaches current network in a couple of days.

@barrtgt
Copy link

barrtgt commented Nov 24, 2019

Heatmap of SAI net # 116
d9cf4b3795f3ca47f19d3942630b386b3781d33c8d862de83b5233f96cb47a65

  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0  33  24   2   0   0   1   0   0   0   1   1   1   2  23  31   0   0
  0   0  26  90   1   1   0   1   1   1   1   0   1   1   1  85  24   0   0
  0   0   2   1   0   0   0   0   0   0   0   0   0   0   0   1   2   0   0
  0   0   1   1   0   0   0   0   0   1   0   0   0   0   0   1   1   0   0
  0   0   1   1   0   0   1   1   1   1   1   0   1   0   0   0   0   0   0
  0   0   1   1   0   0   0   1   1   2   1   1   0   0   0   0   1   0   0
  0   0   0   0   0   0   0   1   1   1   1   1   1   1   0   0   0   0   0
  0   0   0   1   0   0   1   2   1   6   1   2   1   1   0   0   0   0   0
  0   0   0   0   0   0   1   1   1   1   1   1   1   0   0   0   0   0   0
  0   0   1   0   1   0   1   1   1   2   1   1   1   0   0   0   1   0   0
  0   0   1   0   0   0   1   1   1   1   1   1   1   0   0   0   1   0   0
  0   0   1   1   0   0   0   0   1   1   1   0   0   0   0   1   1   0   0
  0   0   2   1   0   0   0   0   0   0   0   0   0   0   0   1   2   0   0
  0   0  26  84   1   1   1   0   0   0   0   0   1   1   1  81  24   0   0
  0   0  36  27   2   1   1   1   0   0   0   0   0   1   2  24  31   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

Heatmap of the final 6x128 LZ net, # 91
b3b00c6d75b4e74946a97b88949307c9eae2355a88f518ebf770c7758f90e357

  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   1  73   0   0   0   0   0   0   0   0   0   0   0  67   0   0   0
  0   0  76  95   0   0   0   0   0   0   0   0   0   0   0  97  63   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0  66  88   0   0   0   0   0   0   0   0   0   0   0 102  69   0   0
  0   0   0  62   0   0   0   0   0   0   0   0   0   0   0  76   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

Policy with --nrsymm:

net 33+34+44 peak 1st line
116 67.4 0.5
111 60.9 0.5
106 56.3 0.5
101 54.6 0.4
96 44.1 0.6
91 44.1 0.7
86 29.8 0.9
81 20.5 1
76 16.8 1.3
71 13.4 1.3
66 10.6 1.3
61 7.7 1.4
56 7.1 1.5
51 7.3 1.7
41 5.9 1.8
31 5.9 1.8
21 4.5 2.2
11 4.4 2.2
1 3.8 2.6

@barrtgt
Copy link

barrtgt commented Nov 24, 2019

There is always the --randomvisits command, or you could do like MuZero and reduce temp to 0.5 until it stalls again, then to 0.25.

@Vandertic
Copy link
Member Author

Visits raised to 2400

@l1t1
Copy link

l1t1 commented Nov 25, 2019

it seems good
2019-11-25 00:32 8c46b273 VS 7eb9f119 38 : 1 : 11 (77.00%) 50 / 50 promotion

@nanzi
Copy link

nanzi commented Nov 25, 2019

B&W are moving out from 33 opening style. It reminds me that minigoV16 wights also have the same opening style as black opening with 33+44(later w has a komoku-Joseki
+44). This might be a local optimal situation in my own opinion.

@l1t1
Copy link

l1t1 commented Nov 25, 2019

3 games first 4 moves are same.
autogtp -g 3 -k sga --url http://sai.unich.it/ --username aaa --password aaa

1 (B Q17) 1 (B Q17) 1 (B Q17) 2 (W C3) 2 (W C3) 2 (W C3) 3 (B D4) 3 (B D4) 3 (B D4) 4 (W C4) 4 (W 
C4) 4 (W C4) 5 (B D5) 5 (B D17) 5 (B D5) 6 (W R4) 6 (W Q3) 6 (
W D17) 7 (B C5) 7 (B Q5) 7 (B C5) 8 (W E2)

@nanzi
Copy link

nanzi commented Nov 25, 2019

In my visits 400 match tests, SAI120/LZ081 got 51/47(100total) and 194/197(400total).
Both-side-pass games were not counted.

Sgf files are commented in simplified Chinese.

@barrtgt
Copy link

barrtgt commented Nov 25, 2019

Are reference and comparison games staying at 1600 visits?

@Glrr
Copy link

Glrr commented Dec 9, 2019

It's a bit frustrating indeed. I was convinced that 6x128 had not reached its max yet, but I expected that 9x192 would progress faster.

@cryptsport
Copy link

a large network is progressing more slowly, look at Leela Zero of 10 and 15 blocks - growth is not as fast as 5 blocks. and compare what SAI and Leela Zero have achieved after 700,000 games! what was the level of Leela Zero? 5x64, elo1650!

@Vandertic
Copy link
Member Author

I agree with @cryptpark, but if you have any idea on how to improve faster, we are happy to consider options.

@Glrr
Copy link

Glrr commented Dec 9, 2019

a large network is progressing more slowly, look at Leela Zero of 10 and 15 blocks - growth is not as fast as 5 blocks.

LZ's progresses were slower at 15 blocks because it was stronger. Look at AGZ and AZ: their ratings grow very quickly at the beginning even if the size of the net doesn't change.

The main reason for switching from 6x128 to 9x192 was that the progress is suppose to stall at some point, and the fact that the 9x192 net was stronger looks like an evidence that 6x128 was close to its max (even is some progress was still possible). I don't see any reason for the 9x192 not to learn faster, at least at the beginning. But i agree that it's not an issue. A concern at most.

@nanzi
Copy link

nanzi commented Dec 10, 2019

Win-rate on empty board is a good indicator which was found from 9x9 training.
Black will have smaller and smaller win-rate gradually until 0.

In recent leelaz weights 249-254-255rc white's win-rates are 56.8%-57.1%-57.3%.
The current leelaz's situation is very consistent with the previous theory.

In recent SAI weights 140-145-150 white's win-rates are 51.4%-50.8%-50.3%, coz SAI hasn't achieved leelaz's strength. Black has found more skills to avoid losing games which we would see too on 6*128 if 6*128 were going on training IMO.

Today with SAI152-153 white's win-rates are 50.3%-50.8% rising again, while we can compare opening variances from 140 to 153 in 40K visits and notice that SAI knows 33 is a solid move avoiding battle、 slow and may be phased out. In SAI153 promotion match I saw a lot good moves in 44 and 43 corner such as in match 24 moves after 20th, it's early stage of the most fashion star joseki(move 6th and 11th corner enclosure and invasion are also amazing). I guess in 160's matches there wouldn't be 33 opening and AI fashion joseki would be seen.

Although recent elos are very close(9550), I think progress is still obvious.

@sheeryjay
Copy link

Is the learning rate correct? Shouldn't the parameter be perhaps higher?

Reason why I am asking is that current behaviour of promotions reminds me of situation when the parameter was to high and we were getting some nets that got 0% promotion wins, except that this seems just like what happened when gradually the parameter was rightfully lowered.

My naive and intuitive guess would be that after training up to the first released 9x192, the learning rate would be increased to allow the learning process to make bigger jumps initially. Or this would not help? Having 12+ networks make this slow advancement is kind of weird and I guess unexpected at this point?

@l1t1
Copy link

l1t1 commented Dec 10, 2019

maybe shift to 15x192? minigo and katago did not traning small nets

@l1t1
Copy link

l1t1 commented Dec 11, 2019

155 is weaker than many prior promoted nets in refer matches, why its elo is higher them?
update its elo is correct now

num Upload Date Hash Size Elo Games Training num
155 2019-12-10 21:33 6e6c3cc1 9x192 9542 284 735018
2019-12-11 00:29 ca7661d0  VS  6e6c3cc1 21 : 3 : 18 (53.57%) 42 / 40 reference
2019-12-11 00:29 7826692b  VS  6e6c3cc1 20 : 4 : 22 (47.83%) 46 / 40 reference
2019-12-11 00:29 39fab1a2  VS  6e6c3cc1 22 : 3 : 19 (53.41%) 44 / 40 reference
2019-12-11 00:29 72d2f752  VS  6e6c3cc1 22 : 1 : 19 (53.57%) 42 / 40 reference
2019-12-11 00:29 2ff1f06d  VS  6e6c3cc1 20 : 3 : 23 (46.74%) 46 / 40 reference
num Upload Date Hash Size Elo Games Training num
155 2019-12-10 21:33 6e6c3cc1 9x192 9609 33 735018
154 2019-12-10 12:46 ccf56a7e 9x192 9549 6106 729306
153 2019-12-10 02:23 2ff1f06d 9x192 9574 5188 725066
152 2019-12-09 15:57 72d2f752 9x192 9543 5168 718863
151 2019-12-09 06:18 776bf08c 9x192 9541 5201 713813
150 2019-12-08 20:13 c8a192b6 9x192 9537 5195 709049
149 2019-12-08 11:16 39fab1a2 9x192 9546 5190 703947
148 2019-12-07 22:08 b6628bb4 9x192 9505 5194 698108
147 2019-12-07 13:59 75e000fd 9x192 9545 5174 693547
146 2019-12-06 22:12 7826692b 9x192 9502 5205 687454
145 2019-12-06 10:35 789a72b1 9x192 9484 5180 682303
144 2019-12-06 01:25 e28e15d2 9x192 9475 5187 677188
143 2019-12-05 18:55 ca7661d0 9x192 9511 5238 672799

@Glrr
Copy link

Glrr commented Dec 11, 2019

Even if the blue line on the rating graph looks flat, the cloud of green dots seems to rise faster. This is somewhat puzzling. Wait and see...

@l1t1
Copy link

l1t1 commented Dec 11, 2019

i think the green dot elo is calculated after promote match, if some of them wont play in other match, their elo wont change.

@Vandertic
Copy link
Member Author

Is the learning rate correct? Shouldn't the parameter be perhaps higher?

For every generation the (at the moment 11) candidate networks come from three different training experiments (or replications). We change rate and window size, so we are dynamically trying to understand what training rate and window is best at all times.

But I agree with you and after some generations where the higher training rate showed to work at least as well as the lower, now we may try to raise it again a step.

@Vandertic
Copy link
Member Author

maybe shift to 15x192? minigo and katago did not traning small nets

It would slow down progress terribly. I think it's too early.

I think we may wait a little bit and see if we get improvements. SAI153 showed the best performance to date against LZ092.

@Vandertic
Copy link
Member Author

i think the green dot elo is calculated after promote match, if some of them wont play in other match, their elo wont change.

Correct

@Vandertic
Copy link
Member Author

By the way, we will add a few useful commits in the next week or two and then make a new release. Any suggestions are welcome.
Minimal updates will be the most recent LZ commits, a small tweak to improve ladder reading (not much, but hopefully enough to allow ladder learning) and hopefully Lizzie compatibility

@Glrr
Copy link

Glrr commented Dec 11, 2019

i think the green dot elo is calculated after promote match, if some of them wont play in other match, their elo wont change.

Of course, that's why there are so many green dots above the blue line. Nonetheless, it can give a trend on a dozen of generations.

@Glrr
Copy link

Glrr commented Dec 11, 2019

Well, we are now below the first 9x192. The situation is becoming more and more concerning. I would be happy if i had any clue about what is happening. Could it be just an issue with the evaluation of the nets ?

@trinetra75
Copy link
Member

I tried to compare the last network 6e65 against the one with the highest elo 4 generations before, on an empty Goban and some middle game positions, but nothing major came out of the comparison...
Only potential difference is that the newer network seems to have a slightly sharper policy...

@l1t1
Copy link

l1t1 commented Dec 12, 2019

my test shows in the four 9x192 nets that play against lz 250 v1, lz 139 is the best
#47

@Vandertic
Copy link
Member Author

Maybe the fall of SAI157 changed things and now we will start to improve. We increased somewhat the training rate as suggested. We are also increasing a bit the number of candidates for one or two generations, to test parameters more thoroughly.

@Vandertic
Copy link
Member Author

I have done some analysis on the pipeline hyperparameters. You can find it here.

@dbosst
Copy link

dbosst commented Dec 14, 2019

I have done some analysis on the pipeline hyperparameters. You can find it here.

That was impressive you could reverse engineer all that information!

So in summary if the training falls below ~0.4 elo on average for a long period of time then the training window size should be experimented with more?

@barrtgt
Copy link

barrtgt commented Dec 16, 2019

Since Deepmind anchored Alphago to human elo, doesn't that change things a bit?

@Vandertic
Copy link
Member Author

Not for the derivative. It is just an additive constant.

@Vandertic
Copy link
Member Author

So in summary if the training falls below ~0.4 elo on average for a long period of time then the training window size should be experimented with more?

I would say that if we fall below 1-2 elo/generation before we reach a much higher level than we have a problem and should generally check for solutions. (Increase network size, improve training somehow, ...)

@barrtgt
Copy link

barrtgt commented Jan 16, 2020

Improvement per gen seemed to have slowed quite a bit since we had that huge boost of clients. Sai has improved about 1.5 elo per gen since net 214. Does sai training use swa? If not, it should allow us to crank up the learning rate a lot and see if it's just stuck in a local minima.

@Vandertic
Copy link
Member Author

Vandertic commented Jan 20, 2020

We are trying different things and occasionally we also make quick attempts to increase the rate, but it seems that current progress is the best we can get, for now.

Consider that progress is now roughly as fast as it was for LZ at this stage. And we are clearly slowly improving also against LZ. In fact LZ092 is finally well behind and LZ098 was beaten today for the first time.

We are also prepared in case there is again a boost of clients, this time.

And, no, we are not using SWA.

Edit: BTW, the playing style of SAI changed quite a lot in the last 20-30 generations, despite the apparent small increase in strength, so maybe we are really improving steadily. I also believe that matches are not the best way to prove how good is a net at this high level of play. As you can see, there should be 400 Elo points between LZ103 and LZ113 (from LZ values), but we have similar performances against the two nets, implying that the huge progress between LZ103 and LZ113 does not immediately convert in matches performance.

@nanzi
Copy link

nanzi commented Jan 21, 2020

sai_policy_on_empty_board

I have made a bar chart to show sai's policy rate on empty board with komi 7.5 .

When I run sai253, as visits go higher from 30k to 300k, komoku(4-3point) becomes more preferred in four corners.

Happy lunar new year !

@kennyfs
Copy link

kennyfs commented Jan 21, 2020

Line chart may be better, and smooth one is much better.

@Vandertic
Copy link
Member Author

Hello! We are moving to 12x256 with the next promotion. Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

10 participants