Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of MSTDP and MSTDPET #217

Closed
Huizerd opened this issue Mar 22, 2019 · 13 comments
Closed

Implementation of MSTDP and MSTDPET #217

Huizerd opened this issue Mar 22, 2019 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@Huizerd
Copy link
Collaborator

Huizerd commented Mar 22, 2019

I'm currently working on reward-modulated STDP, and noticed that there are some discrepancies between the Florian 2007 paper and your MSTDP(ET) implementation. This was already mentioned in #140 and fixed (it seems) in #141, however these changes were reverted in #165. Was this intentional? I was planning on fixing everything with some PRs, but if there are reasons why you wouldn't want the original paper implementation in BindsNET and thus reverted the fixes, I would like to know 😄

@djsaunde
Copy link
Collaborator

Yeah, we haven't really been able to get MSTDP or MSTDPET to work on anything, and we aren't actively working on it. If you can fix it and demonstrate that it works on some task, that would be a big win for anyone who wants to use BindsNET with reward-modulated learning rules.

@djsaunde djsaunde added the bug Something isn't working label Mar 22, 2019
@Huizerd
Copy link
Collaborator Author

Huizerd commented Mar 26, 2019

Ok so far I've managed to replicate Figure 1 from Florian 2007 (see https://gist.github.com/Huizerd/3dc3bfa79cfb721aea491b94dc04efa7). So the code for the learning rules seems to be right, at least. Now I'm replicating one of his experiments, however he uses connections between two layers of neurons where half of the weights are negatively bounded (inhibitory) and half of them are positively bounded (excitatory). Is there a way I could implement this with BindsNET?

@Huizerd
Copy link
Collaborator Author

Huizerd commented Mar 26, 2019

I implemented experiment 4.3 from the paper by R. Florian, see the code here: https://gist.github.com/Huizerd/9c794260e629613b66750043d583a1a2 and the modifications to BindsNET needed here: master...Huizerd:mstdp. It resulted in the following learning curves:

reward

Which show similar patterns to those in the paper:

Screenshot from 2019-03-26 14-06-44

The code is still very rough, as I first wanted to hear your opinions. If all is well, I will make it all nice and compatible and do a PR! Note that I also made some changes to the LIF neuron to make it more similar to the one used in the paper.

@djsaunde
Copy link
Collaborator

Really impressive work!

So the code for the learning rules seems to be right, at least. Now I'm replicating one of his experiments, however he uses connections between two layers of neurons where half of the weights are negatively bounded (inhibitory) and half of them are positively bounded (excitatory). Is there a way I could implement this with BindsNET?

You can create two connections, one negatively bounded and the other positively bounded. You could also implement this feature within BindsNET: for each weight, include a wmin and wmax (instead of having a single, global wmin, wmax for the whole connection).

If all is well, I will make it all nice and compatible and do a PR! Note that I also made some changes to the LIF neuron to make it more similar to the one used in the paper.

Yes, I think the correct MSTDP and MSTDPET algorithms would be an awesome addition to the library! Maybe create a new Nodes object for your modified LIFNodes, but we can address that during the review of the PR.

Do you have plans to write a paper on this? Seems like there's a lot of interesting applications of this method. I'm open to collaboration 😃

@dee0512
Copy link
Collaborator

dee0512 commented Mar 26, 2019

Hi everyone,

Take a look at my branch devdhar-master.

https://github.com/Hananel-Hazan/bindsnet/blob/devdhar-master/examples/XOR/mstdp.py

I have made similar changes to MSTDP method. Not the MSTDP-ET method though. The wmin and wmax feature has been implemented in that branch.

5e990f7
@djsaunde Would you like me to create a PR for that?

@djsaunde
Copy link
Collaborator

Since @Huizerd has fixes for both methods, I think a PR from his branch would be preferable. You could make a PR against his (once it's made) if you want to propose changes to his implementation.

@dee0512
Copy link
Collaborator

dee0512 commented Mar 27, 2019

I meant the wmax wmin update since it is not done by @Huizerd and would be easier for him to implement his experiments. In terms of implementing the experiments on Bindsnet there is also the question of handling the reward for each step. The reward can be different for each timestep. How are we planning to implement that? The fix I have is that I pass the desired output to the network.run function and the reward is calculated depending on the output during each step.

@djsaunde
Copy link
Collaborator

I meant the wmax wmin update since it is not done by @Huizerd and would be easier for him to implement his experiments.

I'm not sure what you mean by this, can you explain / link me to the implementation?

In terms of implementing the experiments on Bindsnet there is also the question of handling the reward for each step. The reward can be different for each timestep. How are we planning to implement that?

We could pass in a reference to a reward function. In the constructor of a reward-modulated learning rule, we could pass in a reward function that takes (state, action) pairs and maps them to a scalar reward. Is this similar to your approach? Could you link me to your code that implements this fix?

@dee0512
Copy link
Collaborator

dee0512 commented Mar 27, 2019

In this commit: 5e990f7

I have implemented what you said:

You can create two connections, one negatively bounded and the other positively bounded. You could also implement this feature within BindsNET: for each weight, include a wmin and wmax (instead of having a single, global wmin, wmax for the whole connection).

I have also modified the network.run to accept the desired output and calculate the reward based on that. Basically in the Florian experiments, reward is given when the output neuron either spikes or does not spike. Therefore, you can only know the reward while simulating.

@djsaunde
Copy link
Collaborator

@dee0512 I made some comments on your commit, to help me understand some things.

@Huizerd
Copy link
Collaborator Author

Huizerd commented Mar 27, 2019

We could pass in a reference to a reward function. In the constructor of a reward-modulated learning rule, we could pass in a reward function that takes (state, action) pairs and maps them to a scalar reward. Is this similar to your approach? Could you link me to your code that implements this fix?

I think something like this is preferred, however a reward function that accepts (state, action) might not work well in all cases. The Gym environments, for example, have this reward function built in and only give back reward (so we should leave that as an option at least). For the case where someone creates a custom environment (without the Gym framework), they would need an action selection function (spikes of certain layer -> action) and a reward function (state + action -> reward). So it might work for this case.

The approach by @dee0512 modifies the network.run in a non-universal way: it works only for the Florian experiments, see: https://github.com/Hananel-Hazan/bindsnet/blob/8c62e027f741f8acdfc51f431d351b9abf67b022/bindsnet/network/__init__.py#L300

I calculated the reward inside the loop where network.run is called each time, see: https://gist.github.com/Huizerd/9c794260e629613b66750043d583a1a2#file-rl_mstdp_florian2007_exp4-3-py-L177

but I think @djsaunde's way (or something analogous to it) is the cleanest solution.

@djsaunde
Copy link
Collaborator

djsaunde commented Mar 27, 2019

Also, if we can't come up with a universal solution (so to speak), you could simply sub-class Network and re-define functions as-needed. This could be done on a per-experiment basis. E.g.:

class RewardNetwork(bindsnet.network.Network):
    def __init__(self, ...):
        super().__init__()
        # custom constructor logic...

    def reward_fn(self, ...):
        # task-specific reward function

This is essentially what is being done in @dee0512 's approach, except in the base Network class.

@Hananel-Hazan
Copy link
Collaborator

I don't see the problem by having two run methods. It should simplify the use, one for situation where the reward already been known (or simple inference) and other for RL learning where the reward been calculated base on the activity and desired output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants