Question about the weight for correction in the importance sampling #7

miracle24 · 2019-12-09T08:30:57Z

Hello. Thanks for your great work, from which I learned a lot about reinforcement learning. I am confused about the computation of the correction weight in the importance sampling.

According to the paper "Top-K Off-Policy Correction for a REINFORCE Recommender System", the correction weight is , in which, I think, is an action sampled from the behavior policy, i.e. , and thus the reward of a sequence is corrected by dividing the likelihood of action given in the updated policy. i.e. by the likelihood of the same action given in . Noted will be input into both and . However, in your implementation, I found that actions are not the same for and , seeing the function "pi_beta_sample" in here.

Am I wrong about this?

Thanks!

awarebayes · 2019-12-09T10:24:04Z

But I will consider changing it as you said. Probably that will be a better solution

awarebayes · 2019-12-09T12:59:22Z

Alright, about to release a commit to this.

The solution is simple: substitute the historical action with the one generated by the beta network.

Why do we need it? Because we need the probabilities.
P.S. Historical a_t is just a number, movie_id if you like. For other stuff, just a plain number is not enough, we need at least something like softmax probs or log_prob. You can translate that historical number into one hot, but I think beta is less hustle and is just better. Anyway, the idea is the same:
Can we do that? Since the entire purpose of the beta is to emulate the historical action, it can give us the probabilities we need. As well as that historical accurate action you were talking about.

I have addded a parameter called 'action_source' and so far I came up with:

class DiscreteActor(nn.Module):
    def __init__(self, input_dim, action_dim, hidden_size, init_w=0):
        super(DiscreteActor, self).__init__()
        ....
        # What's action source? See this issue: https://github.com/awarebayes/RecNN/issues/7
        # by default {pi: pi, beta: beta}
        # you can change it to be like {pi: beta, beta: beta} as miracle24 suggested
        self.action_source = {'pi': 'pi', 'beta': 'beta'}

    def pi_beta_sample(self, state, beta, action, **kwargs):
        ....

        # 3. sample the actions
        # See this issue: https://github.com/awarebayes/RecNN/issues/7
        # usually it works like:
        # pi_action = pi_categorical.sample(); beta_action = beta_categorical.sample();
        # but changing the action_source to {pi: beta, beta: beta} can be configured to be:
        # pi_action = beta_categorical.sample(); beta_action = beta_categorical.sample();
        available_actions = {'pi': pi_categorical.sample(), 'beta': beta_categorical.sample()}
        pi_action = available_actions[self.action_source['pi']]
        beta_action = available_actions[self.action_source['beta']]

policy_net = recnn.nn.DiscreteActor(1290, num_items, 2048).to(cuda)
# as miracle24 has suggested https://github.com/awarebayes/RecNN/issues/7
# you can enable this to be more like the paper authors meant it to
# policy_net.action_source = {'pi': 'beta', 'beta': 'beta'}

awarebayes · 2019-12-09T13:03:16Z

I have released a commit to this, you can look at it here

Write me back if I can close the issue

miracle24 · 2019-12-09T14:50:10Z

Actually, I do not know which one is correct or better. Maybe I can try both in my task. Thanks a lot.

awarebayes closed this as completed Dec 9, 2019

awarebayes mentioned this issue Jan 9, 2020

Questions about Topk REINFORCE #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the weight for correction in the importance sampling #7

Question about the weight for correction in the importance sampling #7

miracle24 commented Dec 9, 2019 •

edited

Loading

awarebayes commented Dec 9, 2019

awarebayes commented Dec 9, 2019 •

edited

Loading

awarebayes commented Dec 9, 2019

miracle24 commented Dec 9, 2019 •

edited

Loading

Question about the weight for correction in the importance sampling #7

Question about the weight for correction in the importance sampling #7

Comments

miracle24 commented Dec 9, 2019 • edited Loading

awarebayes commented Dec 9, 2019

awarebayes commented Dec 9, 2019 • edited Loading

awarebayes commented Dec 9, 2019

miracle24 commented Dec 9, 2019 • edited Loading

miracle24 commented Dec 9, 2019 •

edited

Loading

awarebayes commented Dec 9, 2019 •

edited

Loading

miracle24 commented Dec 9, 2019 •

edited

Loading