Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# For Policy Network Problem #28

Closed
cumttang opened this issue Apr 6, 2017 · 3 comments
Closed

# For Policy Network Problem #28

cumttang opened this issue Apr 6, 2017 · 3 comments

Comments

@cumttang
Copy link

cumttang commented Apr 6, 2017

Thanks for your code, but I have a question that if the rewards is negative, do the code still work?
If not, how to fix it or ensure the loss keep in positive?

@IbrahimSobh
Copy link

Hi
Rewards can be negative (if they represent penalty)

For example
In the basic scenario
living_reward = -1

which means the agent is getting a negative rewards and it has to achieve its goal ASAP

@cumttang
Copy link
Author

cumttang commented Apr 6, 2017

Thanks a lot.
But I still have a question that the loss function would converge to what degree? Negative value? Or Zero?

@awjuliani
Copy link
Owner

Hi cumttang,

Which RL algorithm are you referring to? In all cases the loss function is designed to support both positive and negative rewards, and having negative rewards should not interfere with training at all, so long as the reward function of the environment itself is sensible.

One thing to keep in mind though is that methods such as DQN and A3C have issues with overly large rewards (either positive or negative). It is recommended that rewards going into the network not be of greater magnitude than 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants