# For Policy Network Problem #28

cumttang · 2017-04-06T12:20:03Z

Thanks for your code, but I have a question that if the rewards is negative, do the code still work?
If not, how to fix it or ensure the loss keep in positive?

IbrahimSobh · 2017-04-06T14:50:33Z

Hi
Rewards can be negative (if they represent penalty)

For example
In the basic scenario
living_reward = -1

which means the agent is getting a negative rewards and it has to achieve its goal ASAP

cumttang · 2017-04-06T15:02:00Z

Thanks a lot.
But I still have a question that the loss function would converge to what degree? Negative value? Or Zero?

awjuliani · 2017-04-06T17:39:40Z

Hi cumttang,

Which RL algorithm are you referring to? In all cases the loss function is designed to support both positive and negative rewards, and having negative rewards should not interfere with training at all, so long as the reward function of the environment itself is sensible.

One thing to keep in mind though is that methods such as DQN and A3C have issues with overly large rewards (either positive or negative). It is recommended that rewards going into the network not be of greater magnitude than 1.

awjuliani closed this as completed Apr 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# For Policy Network Problem #28

# For Policy Network Problem #28

cumttang commented Apr 6, 2017

IbrahimSobh commented Apr 6, 2017

cumttang commented Apr 6, 2017 •

edited

awjuliani commented Apr 6, 2017

# For Policy Network Problem #28

# For Policy Network Problem #28

Comments

cumttang commented Apr 6, 2017

IbrahimSobh commented Apr 6, 2017

cumttang commented Apr 6, 2017 • edited

awjuliani commented Apr 6, 2017

cumttang commented Apr 6, 2017 •

edited