Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Rollout #19

Closed
nathan-whitaker opened this issue Jun 13, 2017 · 3 comments
Closed

Question about Rollout #19

nathan-whitaker opened this issue Jun 13, 2017 · 3 comments

Comments

@nathan-whitaker
Copy link

In this loop:

for i in range(rollout_num):

This is N Time Monte Carlo sampling with n = 16 in the code. But how are the different samples generated? given_num represents how many tokens to use from the input, and irepresents the i'th sample. Why are the samples different for different values of i? Is the rollout network being updated somewhere within call to get_reward and I'm missing it? I also don't see where the randomness is coming in for the Monte Carlo estimation of the partial sequence reward.

From my examination of the code, the network doesn't get updated and the session parameters are the same so I'm not sure how different samples are being generated.

Can someone help me understand how a) different samples are being generated, b) where is the randomness coming from, c) if the rollout network has the same parameters as the Generator network, how is it generating different samples than the generator?

Any help is greatly appreciated! Thank you for providing this code it has been very helpful to me.

@nathan-whitaker
Copy link
Author

I think I found what I was looking for.

next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32)

This line contains a call to https://www.tensorflow.org/api_docs/python/tf/multinomial

Which performs a sample over the logits generated from the network instead of taking the max like the generator network does.

@guotong1988
Copy link

Great investigation!

@guotong1988
Copy link

guotong1988 commented Aug 2, 2017

I still don't know exactly what N Time Monte Carlo sampling is.. Could you please explain? Thank you @LantaoYu @nathan-whitaker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants