Skip to content

Commit

Permalink
bugfix: don't sample action distribution twice
Browse files Browse the repository at this point in the history
there was a nasty bug that caused the controller to sample
from the softmax action probability distribution twice.
this is problematic because the action choice determines
both the log probability of the action as well as the
choice of the next action_classifier
  • Loading branch information
cosmicBboy committed Aug 9, 2018
1 parent 1208c78 commit 59ec74e
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion deep_cash/deep_cash/cash_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ def decode(
Where the actions are a sequence of algorithm components and
hyperparameter settings.
TODO: add unit tests for this method and related methods.
"""
input_tensor, hidden = init_input_tensor, init_hidden
actions = []
Expand Down Expand Up @@ -173,7 +175,7 @@ def select_action(self, action_probs, action_index):
action_classifier = self.action_classifiers[action_index]
action_dist = Categorical(action_probs)
choice_index = action_dist.sample()
_choice_index = int(action_dist.sample().data)
_choice_index = int(choice_index.data)
return {
"action_type": action_classifier["action_type"],
"action_name": action_classifier["name"],
Expand Down

0 comments on commit 59ec74e

Please sign in to comment.