Implementation of MSTDP and MSTDPET #217

Huizerd · 2019-03-22T10:23:26Z

I'm currently working on reward-modulated STDP, and noticed that there are some discrepancies between the Florian 2007 paper and your MSTDP(ET) implementation. This was already mentioned in #140 and fixed (it seems) in #141, however these changes were reverted in #165. Was this intentional? I was planning on fixing everything with some PRs, but if there are reasons why you wouldn't want the original paper implementation in BindsNET and thus reverted the fixes, I would like to know 😄

djsaunde · 2019-03-22T16:02:13Z

Yeah, we haven't really been able to get MSTDP or MSTDPET to work on anything, and we aren't actively working on it. If you can fix it and demonstrate that it works on some task, that would be a big win for anyone who wants to use BindsNET with reward-modulated learning rules.

Huizerd · 2019-03-26T07:12:25Z

Ok so far I've managed to replicate Figure 1 from Florian 2007 (see https://gist.github.com/Huizerd/3dc3bfa79cfb721aea491b94dc04efa7). So the code for the learning rules seems to be right, at least. Now I'm replicating one of his experiments, however he uses connections between two layers of neurons where half of the weights are negatively bounded (inhibitory) and half of them are positively bounded (excitatory). Is there a way I could implement this with BindsNET?

Huizerd · 2019-03-26T13:08:35Z

I implemented experiment 4.3 from the paper by R. Florian, see the code here: https://gist.github.com/Huizerd/9c794260e629613b66750043d583a1a2 and the modifications to BindsNET needed here: master...Huizerd:mstdp. It resulted in the following learning curves:

Which show similar patterns to those in the paper:

The code is still very rough, as I first wanted to hear your opinions. If all is well, I will make it all nice and compatible and do a PR! Note that I also made some changes to the LIF neuron to make it more similar to the one used in the paper.

djsaunde · 2019-03-26T13:34:29Z

Really impressive work!

So the code for the learning rules seems to be right, at least. Now I'm replicating one of his experiments, however he uses connections between two layers of neurons where half of the weights are negatively bounded (inhibitory) and half of them are positively bounded (excitatory). Is there a way I could implement this with BindsNET?

You can create two connections, one negatively bounded and the other positively bounded. You could also implement this feature within BindsNET: for each weight, include a wmin and wmax (instead of having a single, global wmin, wmax for the whole connection).

If all is well, I will make it all nice and compatible and do a PR! Note that I also made some changes to the LIF neuron to make it more similar to the one used in the paper.

Yes, I think the correct MSTDP and MSTDPET algorithms would be an awesome addition to the library! Maybe create a new Nodes object for your modified LIFNodes, but we can address that during the review of the PR.

Do you have plans to write a paper on this? Seems like there's a lot of interesting applications of this method. I'm open to collaboration 😃

dee0512 · 2019-03-26T17:51:26Z

Hi everyone,

Take a look at my branch devdhar-master.

https://github.com/Hananel-Hazan/bindsnet/blob/devdhar-master/examples/XOR/mstdp.py

I have made similar changes to MSTDP method. Not the MSTDP-ET method though. The wmin and wmax feature has been implemented in that branch.

5e990f7
@djsaunde Would you like me to create a PR for that?

djsaunde · 2019-03-27T12:55:33Z

Since @Huizerd has fixes for both methods, I think a PR from his branch would be preferable. You could make a PR against his (once it's made) if you want to propose changes to his implementation.

dee0512 · 2019-03-27T14:53:55Z

I meant the wmax wmin update since it is not done by @Huizerd and would be easier for him to implement his experiments. In terms of implementing the experiments on Bindsnet there is also the question of handling the reward for each step. The reward can be different for each timestep. How are we planning to implement that? The fix I have is that I pass the desired output to the network.run function and the reward is calculated depending on the output during each step.

djsaunde · 2019-03-27T14:59:15Z

I meant the wmax wmin update since it is not done by @Huizerd and would be easier for him to implement his experiments.

I'm not sure what you mean by this, can you explain / link me to the implementation?

In terms of implementing the experiments on Bindsnet there is also the question of handling the reward for each step. The reward can be different for each timestep. How are we planning to implement that?

We could pass in a reference to a reward function. In the constructor of a reward-modulated learning rule, we could pass in a reward function that takes (state, action) pairs and maps them to a scalar reward. Is this similar to your approach? Could you link me to your code that implements this fix?

dee0512 · 2019-03-27T17:19:49Z

In this commit: 5e990f7

I have implemented what you said:

You can create two connections, one negatively bounded and the other positively bounded. You could also implement this feature within BindsNET: for each weight, include a wmin and wmax (instead of having a single, global wmin, wmax for the whole connection).

I have also modified the network.run to accept the desired output and calculate the reward based on that. Basically in the Florian experiments, reward is given when the output neuron either spikes or does not spike. Therefore, you can only know the reward while simulating.

djsaunde · 2019-03-27T18:06:51Z

@dee0512 I made some comments on your commit, to help me understand some things.

Huizerd · 2019-03-27T18:07:49Z

We could pass in a reference to a reward function. In the constructor of a reward-modulated learning rule, we could pass in a reward function that takes (state, action) pairs and maps them to a scalar reward. Is this similar to your approach? Could you link me to your code that implements this fix?

I think something like this is preferred, however a reward function that accepts (state, action) might not work well in all cases. The Gym environments, for example, have this reward function built in and only give back reward (so we should leave that as an option at least). For the case where someone creates a custom environment (without the Gym framework), they would need an action selection function (spikes of certain layer -> action) and a reward function (state + action -> reward). So it might work for this case.

The approach by @dee0512 modifies the network.run in a non-universal way: it works only for the Florian experiments, see: https://github.com/Hananel-Hazan/bindsnet/blob/8c62e027f741f8acdfc51f431d351b9abf67b022/bindsnet/network/__init__.py#L300

I calculated the reward inside the loop where network.run is called each time, see: https://gist.github.com/Huizerd/9c794260e629613b66750043d583a1a2#file-rl_mstdp_florian2007_exp4-3-py-L177

but I think @djsaunde's way (or something analogous to it) is the cleanest solution.

djsaunde · 2019-03-27T18:12:11Z

Also, if we can't come up with a universal solution (so to speak), you could simply sub-class Network and re-define functions as-needed. This could be done on a per-experiment basis. E.g.:

class RewardNetwork(bindsnet.network.Network):
    def __init__(self, ...):
        super().__init__()
        # custom constructor logic...

    def reward_fn(self, ...):
        # task-specific reward function

This is essentially what is being done in @dee0512 's approach, except in the base Network class.

Hananel-Hazan · 2019-03-27T18:29:10Z

I don't see the problem by having two run methods. It should simplify the use, one for situation where the reward already been known (or simple inference) and other for RL learning where the reward been calculated base on the activity and desired output.

djsaunde assigned Huizerd Mar 22, 2019

djsaunde added the bug Something isn't working label Mar 22, 2019

Hananel-Hazan assigned dee0512 Mar 22, 2019

Huizerd mentioned this issue Mar 28, 2019

MSTDP(ET) implementation #219

Merged

djsaunde closed this as completed in #219 Apr 2, 2019

Huizerd mentioned this issue Apr 3, 2019

Reward prediction error instead of reward #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of MSTDP and MSTDPET #217

Implementation of MSTDP and MSTDPET #217

Huizerd commented Mar 22, 2019

djsaunde commented Mar 22, 2019

Huizerd commented Mar 26, 2019

Huizerd commented Mar 26, 2019 •

edited

djsaunde commented Mar 26, 2019

dee0512 commented Mar 26, 2019

djsaunde commented Mar 27, 2019

dee0512 commented Mar 27, 2019

djsaunde commented Mar 27, 2019

dee0512 commented Mar 27, 2019

djsaunde commented Mar 27, 2019

Huizerd commented Mar 27, 2019

djsaunde commented Mar 27, 2019 •

edited

Hananel-Hazan commented Mar 27, 2019

Implementation of MSTDP and MSTDPET #217

Implementation of MSTDP and MSTDPET #217

Comments

Huizerd commented Mar 22, 2019

djsaunde commented Mar 22, 2019

Huizerd commented Mar 26, 2019

Huizerd commented Mar 26, 2019 • edited

djsaunde commented Mar 26, 2019

dee0512 commented Mar 26, 2019

djsaunde commented Mar 27, 2019

dee0512 commented Mar 27, 2019

djsaunde commented Mar 27, 2019

dee0512 commented Mar 27, 2019

djsaunde commented Mar 27, 2019

Huizerd commented Mar 27, 2019

djsaunde commented Mar 27, 2019 • edited

Hananel-Hazan commented Mar 27, 2019

Huizerd commented Mar 26, 2019 •

edited

djsaunde commented Mar 27, 2019 •

edited