Skip to content
This repository has been archived by the owner on Jun 16, 2021. It is now read-only.

Type of self.output in policy.py #38

Open
CezCz opened this issue Mar 10, 2018 · 11 comments
Open

Type of self.output in policy.py #38

CezCz opened this issue Mar 10, 2018 · 11 comments

Comments

@CezCz
Copy link

CezCz commented Mar 10, 2018

Hey @brilee ,

I'm trying to play against this wonderful library. However when I'm trying genmove b I'm getting:

File "\MuGo\policy.py", line 152, in run
probabilities = self.session.run(self.output, feed_dict={self.x: processed_position[None, :]})[0]
AttributeError: 'PolicyNetwork' object has no attribute 'output'

What should self.output be ?

@CezCz
Copy link
Author

CezCz commented Mar 17, 2018

I found in previous commits that output used to be, but later due to log_likelihood_cost refactor got deleted.

output = tf.nn.softmax(tf.reshape(h_conv_final, [-1, go.N ** 2]) + b_conv_final)

@brilee
Copy link
Owner

brilee commented Mar 17, 2018

Hm. Sorry about that - work on this repo is continuing at https://github.com/tensorflow/minigo. I'll update the README.md

@CezCz
Copy link
Author

CezCz commented May 31, 2018 via email

@JoeyQWu
Copy link

JoeyQWu commented Jun 5, 2018

@CezCz yeah, thanks for your kind answer,
actually, I fixed the line 88 with
" log_likelihood_cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))" and it could work,
but I could not understand the output of mcts, why it often choose the bigger value even if it is negative ?
I am confused about the result , I would appreciate if you can tell me the reason @CezCz

@CezCz
Copy link
Author

CezCz commented Jun 5, 2018 via email

@JoeyQWu
Copy link

JoeyQWu commented Jun 5, 2018

1 just like the first picture , the location of white is R4, and I get the value is -7.5, just as the second image,
2
the another position is Q3,
3
and its value is 8.5,
4
and why do the white chose the R4 rather than Q3, the latter value is greater than the former value , I am just very confused about this, perhaps I do not understand the code ,or maybe this is a silly question , but I want strongly to know the reason and I am very grateful to you @CezCz , you are a very kind person and thank you very much !

@CezCz
Copy link
Author

CezCz commented Jun 5, 2018 via email

@JoeyQWu
Copy link

JoeyQWu commented Jun 6, 2018

Hi , @CezCz
So the next move is chosen just because the algorithm chooses the most visited move , and the value network backpropagated the visit count and the winner predicted, the positive value represents the current player wins this game , the next move is selected is not related to the value of value network, just related to the visit count , right ?

@CezCz
Copy link
Author

CezCz commented Jun 6, 2018

@JoeyQWu
The move that is chosen to be played in the actual game yes. Not to confuse with move chosen within selection phase - this one is chosen based on some sophisticated heuristic with exploration taken into consideration.
You may want to read:
https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ - nice mcts introduction with examples
http://www.baeldung.com/java-monte-carlo-tree-search - simple monte carlo tree search implementation
https://deepmind.com/documents/119/agz_unformatted_nature.pdf - page 25-27 MCTS implementation within alphago zero (don't be confused about temperature parameter and parent visit count, these are just another parameters to promote exploration during training, but the core is visit count)

@JoeyQWu
Copy link

JoeyQWu commented Jun 6, 2018

@CezCz
okay, I will read more to understand , thank you very much, you are so nice , very grateful to you for your help!

@brilee
Copy link
Owner

brilee commented Jun 6, 2018

I also wrote http://www.moderndescartes.com/essays/deep_dive_mcts/ recently

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants