Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bugfix: don't sample action distribution twice
there was a nasty bug that caused the controller to sample from the softmax action probability distribution twice. this is problematic because the action choice determines both the log probability of the action as well as the choice of the next action_classifier
- Loading branch information