-
Notifications
You must be signed in to change notification settings - Fork 341
Description
Hello all, and thank you for this beautiful library.
If have found a serious bug that would prevent any RL algorithm to work. The bug is in select_softmax, used in e.g. breakout_stdp.py
During the first iteration (time=0), pipeline.spike_record[output] has a shape os [100, 4] , spike is tensor([0., 0., 0., 0.]), so softmax is tensor([0.25, 0.,25, 0.25, 0.25]) and it's all good.
However at any other timestep the dimension of pipeline.spike_record[output] has a shape of tensor([100, 1, 4]), spike is something like tensor([[17., 17., 17., 17.]) which mean that torch.softmax(spikes, dim=0) is computing the softmax on the wrong dimension, and the result is always probabilities = tensor([[1., 1., 1., 1.]]), meaning that the agent will always perform a random action.
I am happy to fix this. In the meantime, I think it's good to share it as some people might get confused.
This issue is not impacting other examples such as eth_minst.py, that's why this net can learn supervised.