Skip to content

A (serious) bug preventing RL algorithms to work #577

@ValerioB88

Description

@ValerioB88

Hello all, and thank you for this beautiful library.

If have found a serious bug that would prevent any RL algorithm to work. The bug is in select_softmax, used in e.g. breakout_stdp.py
During the first iteration (time=0), pipeline.spike_record[output] has a shape os [100, 4] , spike is tensor([0., 0., 0., 0.]), so softmax is tensor([0.25, 0.,25, 0.25, 0.25]) and it's all good.

However at any other timestep the dimension of pipeline.spike_record[output] has a shape of tensor([100, 1, 4]), spike is something like tensor([[17., 17., 17., 17.]) which mean that torch.softmax(spikes, dim=0) is computing the softmax on the wrong dimension, and the result is always probabilities = tensor([[1., 1., 1., 1.]]), meaning that the agent will always perform a random action.

I am happy to fix this. In the meantime, I think it's good to share it as some people might get confused.
This issue is not impacting other examples such as eth_minst.py, that's why this net can learn supervised.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions