bugfix: don't sample action distribution twice

there was a nasty bug that caused the controller to sample from the softmax action probability distribution twice. this is problematic because the action choice determines both the log probability of the action as well as the choice of the next action_classifier
cosmicBboy · Aug 9, 2018 · 59ec74e · 59ec74e
1 parent 1208c78
commit 59ec74e
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/deep_cash/deep_cash/cash_controller.py b/deep_cash/deep_cash/cash_controller.py
@@ -143,6 +143,8 @@ def decode(
 
         Where the actions are a sequence of algorithm components and
         hyperparameter settings.
+
+        TODO: add unit tests for this method and related methods.
         """
         input_tensor, hidden = init_input_tensor, init_hidden
         actions = []
@@ -173,7 +175,7 @@ def select_action(self, action_probs, action_index):
         action_classifier = self.action_classifiers[action_index]
         action_dist = Categorical(action_probs)
         choice_index = action_dist.sample()
-        _choice_index = int(action_dist.sample().data)
+        _choice_index = int(choice_index.data)
         return {
             "action_type": action_classifier["action_type"],
             "action_name": action_classifier["name"],