New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
# For Policy Network Problem #28
Comments
Hi For example which means the agent is getting a negative rewards and it has to achieve its goal ASAP |
Thanks a lot. |
Hi cumttang, Which RL algorithm are you referring to? In all cases the loss function is designed to support both positive and negative rewards, and having negative rewards should not interfere with training at all, so long as the reward function of the environment itself is sensible. One thing to keep in mind though is that methods such as DQN and A3C have issues with overly large rewards (either positive or negative). It is recommended that rewards going into the network not be of greater magnitude than 1. |
Thanks for your code, but I have a question that if the rewards is negative, do the code still work?
If not, how to fix it or ensure the loss keep in positive?
The text was updated successfully, but these errors were encountered: