Gluon: name issue about example"Actor Critic" #13397

dbsxdbsx · 2018-11-25T06:24:44Z

dbsxdbsx
Nov 25, 2018

I am referring to the gluon example: actor critic.

According to the code in actor_critic.py, the true returns of each states is calculated as:

        # reverse accumulate and normalize rewards
        running_reward = running_reward * 0.99 + t * 0.01
        R = 0
        for i in range(len(rewards)-1, -1, -1):
            R = rewards[i] + args.gamma * R
            rewards[i] = R

,which is an Monte Carlo method without bootstrapping.
So I think the name should be REINOFRCE with Baseline but not Actor Critic. As stated in Section 13.5 of book Reinforcement Learning: An Introduction:

Although the REINFORCE-with-baseline method learns both a policy and a state-value function, we
do not consider it to be an actor–critic method because its state-value function is used only as a
baseline, not as a critic. That is, it is not used for bootstrapping (updating the value estimate for
a state from the estimated values of subsequent states), but only as a baseline for the state whose
estimate is being updated.

And I also found Pytorch has the same issue with their example. But anyway, it is just a naming problem. If almost people think this should be also treated as Actor Critic. Then never mind~

vdantu · 2018-11-25T17:36:21Z

vdantu
Nov 25, 2018

@mxnet-label-bot add [Question, Example]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gluon: name issue about example"Actor Critic" #13397

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Gluon: name issue about example"Actor Critic" #13397

Uh oh!

dbsxdbsx Nov 25, 2018

Replies: 1 comment

Uh oh!

vdantu Nov 25, 2018

dbsxdbsx
Nov 25, 2018

vdantu
Nov 25, 2018