Type of self.output in policy.py #38

CezCz · 2018-03-10T12:39:41Z

I'm trying to play against this wonderful library. However when I'm trying genmove b I'm getting:

File "\MuGo\policy.py", line 152, in run
probabilities = self.session.run(self.output, feed_dict={self.x: processed_position[None, :]})[0]
AttributeError: 'PolicyNetwork' object has no attribute 'output'

What should self.output be ?

CezCz · 2018-03-17T20:09:58Z

I found in previous commits that output used to be, but later due to log_likelihood_cost refactor got deleted.

output = tf.nn.softmax(tf.reshape(h_conv_final, [-1, go.N ** 2]) + b_conv_final)

brilee · 2018-03-17T21:15:36Z

Hm. Sorry about that - work on this repo is continuing at https://github.com/tensorflow/minigo. I'll update the README.md

CezCz · 2018-05-31T09:17:24Z

Hey, It might result in such error, to fix output issue you can add this line: self.output = tf.nn.softmax(tf.reshape(h_conv_final, [-1, go.N ** 2]) + b_conv_final) at line 88 and it will work. However as @brilee mentioned, MuGo is no longer developed it's all switched to minigo. You may want to check out leela zero and/or lizzie(very easy configuration here). czw., 31 maj 2018 o 10:43 JoeyQ Wu <notifications@github.com> napisał(a):

…

Hello, @CezCz <https://github.com/CezCz> @brilee <https://github.com/brilee> I just met the same question "AttributeError: 'PolicyNetwork' object has no attribute 'output' " , and I want to ask you about whether it can result in the error "GTP Stream was Closed" . what should I do if I want this program can run the correct result ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALPOFnDcWIVdpPk3U1aA7ZK54WC1TvZfks5t360wgaJpZM4SlQyF> .

-- Cezary Czernecki

JoeyQWu · 2018-06-05T07:34:49Z

@CezCz yeah, thanks for your kind answer,
actually, I fixed the line 88 with
" log_likelihood_cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))" and it could work,
but I could not understand the output of mcts, why it often choose the bigger value even if it is negative ?
I am confused about the result , I would appreciate if you can tell me the reason @CezCz

CezCz · 2018-06-05T07:38:34Z

Hey, log_likelihood_cost is another problem, as now you need named logits parameter. I am happy that you managed to fix it. I am not sure what you are talking about when saying negative and bigger value? Can you provide an example? Cezary wt., 5 cze 2018 o 09:34 Joeyq <notifications@github.com> napisał(a):

…

@CezCz <https://github.com/CezCz> yeah, thanks for your kind answer, actually, I fixed the line 88 with " log_likelihood_cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))" and it could work, but I could not understand the output of mcts, why it often choose the bigger value even if it is negative ? I am confused about the result , I would appreciate if you can tell me the reason @CezCz <https://github.com/CezCz> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALPOFuLIha42c3GOEL8RL3KYlx3dTolTks5t5jSdgaJpZM4SlQyF> .

-- Cezary Czernecki

JoeyQWu · 2018-06-05T14:31:16Z

just like the first picture , the location of white is R4, and I get the value is -7.5, just as the second image,

the another position is Q3,

and its value is 8.5,

and why do the white chose the R4 rather than Q3, the latter value is greater than the former value , I am just very confused about this, perhaps I do not understand the code ,or maybe this is a silly question , but I want strongly to know the reason and I am very grateful to you @CezCz , you are a very kind person and thank you very much ！

CezCz · 2018-06-05T16:30:32Z

Hello, Move that is to play is solely chosen based on visit count, what are they for those two nodes? I've prepared a silly image describing what might be going on: [image](https://user-images.githubusercontent.com/11783702/40991674-cbbf834a-68f4-11e8-9f81-8ac087a70b55.png) Let's consider only depth of 2 is checked and only 1 search is being done. The state of tree in the picture is after backpropagation of this first search, I marked selection with blue pen, then backpropagation poorly with black. As you can see value network said the value of position is -0,98 (let's assume -1 is max). We can clearly see it is bad, however when the final move is chosen, only the visit count N is considered. In the end (1,1) node has the most visits therefor it is chosen. Cezary

JoeyQWu · 2018-06-06T06:13:20Z

Hi , @CezCz
So the next move is chosen just because the algorithm chooses the most visited move , and the value network backpropagated the visit count and the winner predicted, the positive value represents the current player wins this game , the next move is selected is not related to the value of value network, just related to the visit count , right ?

CezCz · 2018-06-06T06:24:00Z

@JoeyQWu
The move that is chosen to be played in the actual game yes. Not to confuse with move chosen within selection phase - this one is chosen based on some sophisticated heuristic with exploration taken into consideration.
You may want to read:
https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ - nice mcts introduction with examples
http://www.baeldung.com/java-monte-carlo-tree-search - simple monte carlo tree search implementation
https://deepmind.com/documents/119/agz_unformatted_nature.pdf - page 25-27 MCTS implementation within alphago zero (don't be confused about temperature parameter and parent visit count, these are just another parameters to promote exploration during training, but the core is visit count)

JoeyQWu · 2018-06-06T07:06:55Z

@CezCz
okay, I will read more to understand , thank you very much, you are so nice , very grateful to you for your help!

brilee · 2018-06-06T12:35:22Z

I also wrote http://www.moderndescartes.com/essays/deep_dive_mcts/ recently

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type of self.output in policy.py #38

Type of self.output in policy.py #38

CezCz commented Mar 10, 2018 •

edited

Loading

CezCz commented Mar 17, 2018

brilee commented Mar 17, 2018

CezCz commented May 31, 2018 via email

JoeyQWu commented Jun 5, 2018

CezCz commented Jun 5, 2018 via email

JoeyQWu commented Jun 5, 2018 •

edited

Loading

CezCz commented Jun 5, 2018 via email •

edited

Loading

JoeyQWu commented Jun 6, 2018

CezCz commented Jun 6, 2018

JoeyQWu commented Jun 6, 2018

brilee commented Jun 6, 2018

Type of self.output in policy.py #38

Type of self.output in policy.py #38

Comments

CezCz commented Mar 10, 2018 • edited Loading

CezCz commented Mar 17, 2018

brilee commented Mar 17, 2018

CezCz commented May 31, 2018 via email

JoeyQWu commented Jun 5, 2018

CezCz commented Jun 5, 2018 via email

JoeyQWu commented Jun 5, 2018 • edited Loading

CezCz commented Jun 5, 2018 via email • edited Loading

JoeyQWu commented Jun 6, 2018

CezCz commented Jun 6, 2018

JoeyQWu commented Jun 6, 2018

brilee commented Jun 6, 2018

CezCz commented Mar 10, 2018 •

edited

Loading

JoeyQWu commented Jun 5, 2018 •

edited

Loading

CezCz commented Jun 5, 2018 via email •

edited

Loading