Review Request: Vitay #19

Closed
wants to merge 16 commits into
from

Conversation

Projects
None yet
5 participants
@vitay

vitay commented Jun 30, 2016

Julien Vitay

Dear @ReScience/editors,

I request a review for the reproduction of the following paper:

  • Laje, R. and Buonomano, D.V. (2013). Robust timing and motor patterns by taming chaos in recurrent neural networks. Nat Neurosci. 16(7) pp 925-33. doi:10.1038/nn.3405

I believe the original results have been faithfully reproduced as explained in the accompanying article.

The repository lives at https://github.com/vitay/ReScience-submission/tree/vitay

Best regards,
Julien Vitay


EDITOR

  • Editor acknowledgment (@rougier 01 July 2016)
  • Reviewer 1 (@neuronalX 01 July 2016)
  • Reviewer 2 (@piero-le-fou 11 July 2016)
  • Review 1 decision [accept 07 October 2016]
  • Review 2 decision [accept 29 August 2016]
  • Editor decision [accept 07 October 2016]

@vitay vitay changed the title from Vitay to Review Request: Vitay Jun 30, 2016

@rougier rougier assigned rougier and unassigned otizonaizit Jul 1, 2016

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Jul 1, 2016

Member

EDITOR

Thanks @vitay for the submission. I will soon assign two reviewers.

Member

rougier commented Jul 1, 2016

EDITOR

Thanks @vitay for the submission. I will soon assign two reviewers.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Jul 1, 2016

Member

EDITOR

@neuronalX Could you review this submission ?

Member

rougier commented Jul 1, 2016

EDITOR

@neuronalX Could you review this submission ?

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Jul 1, 2016

REVIEWER 1

Yes I can!

neuronalX commented Jul 1, 2016

REVIEWER 1

Yes I can!

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Jul 1, 2016

Member

EDITOR

Thanks @neuronalX, I will assign a second reviewer and review will start.

Member

rougier commented Jul 1, 2016

EDITOR

Thanks @neuronalX, I will assign a second reviewer and review will start.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Jul 8, 2016

Member

EDITOR

@vitay Sorry for the delay but because of the period (conference and/or vacation), a lot of reviewers are not available. I'm still searching and will keep you informed here.

Member

rougier commented Jul 8, 2016

EDITOR

@vitay Sorry for the delay but because of the period (conference and/or vacation), a lot of reviewers are not available. I'm still searching and will keep you informed here.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Jul 11, 2016

Member

EDITOR

@vitay The second reviewer will be @piero-le-fou.

@neuronalX @piero-le-fou Thanks for having accepted, you can start the review now and post your comments in this thread.

Member

rougier commented Jul 11, 2016

EDITOR

@vitay The second reviewer will be @piero-le-fou.

@neuronalX @piero-le-fou Thanks for having accepted, you can start the review now and post your comments in this thread.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Jul 24, 2016

Member

EDITOR

@neuronalX @piero-le-fou Any update ?

Member

rougier commented Jul 24, 2016

EDITOR

@neuronalX @piero-le-fou Any update ?

@p-enel

This comment has been minimized.

Show comment
Hide comment
@p-enel

p-enel Jul 25, 2016

REVIEWER 2

So, I've been a bit unlucky with the simulations, but it has been my fault for one of them. I didn't read carefully the Readme file. Nonetheless, it might be helpful for others: check that you have the latest version of the required python packages (see Readme). In my case, the code would execute properly until the "linspace" function with a dtype argument which is at the end of the Fig2.py file.

Other issue: the end of the Fig3.py script has missing import and variable conversion.

  • The variable pearsons is still a list at line 110, therefore the "**2" returns an error
  • pyplot has not been imported in the file, so calling plt.errorbar() and successive functions won't work

Other than that all the code is running perfectly!!
I'll come back to you with an update on the results,
Cheers.
Pierre Enel

p-enel commented Jul 25, 2016

REVIEWER 2

So, I've been a bit unlucky with the simulations, but it has been my fault for one of them. I didn't read carefully the Readme file. Nonetheless, it might be helpful for others: check that you have the latest version of the required python packages (see Readme). In my case, the code would execute properly until the "linspace" function with a dtype argument which is at the end of the Fig2.py file.

Other issue: the end of the Fig3.py script has missing import and variable conversion.

  • The variable pearsons is still a list at line 110, therefore the "**2" returns an error
  • pyplot has not been imported in the file, so calling plt.errorbar() and successive functions won't work

Other than that all the code is running perfectly!!
I'll come back to you with an update on the results,
Cheers.
Pierre Enel

@vitay

This comment has been minimized.

Show comment
Hide comment
@vitay

vitay Jul 25, 2016

AUTHOR

Thanks for the report. I uploaded a fix for Fig3.py. The script took so long (3 days) that I ran the simulation only once without the plot, and never checked it worked as standalone...

Also good to know that np.linspace(..., dtype=) is new, perhaps I should try to avoid using it to have a broader compatibility.

vitay commented Jul 25, 2016

AUTHOR

Thanks for the report. I uploaded a fix for Fig3.py. The script took so long (3 days) that I ran the simulation only once without the plot, and never checked it worked as standalone...

Also good to know that np.linspace(..., dtype=) is new, perhaps I should try to avoid using it to have a broader compatibility.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Aug 10, 2016

Member

EDITOR

@neuronalX Any update on your side ?

Member

rougier commented Aug 10, 2016

EDITOR

@neuronalX Any update on your side ?

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Aug 11, 2016

Dear Julien Vitay,
The paper you propose is very interesting and self-sufficient (understandable without reading the reproducted paper), and the code is well commented and organised.

Because the code is well written, I was able to go through it and understand most of it. There is a few things I was not able to understand, thus I have some little remarks on the code. I tried to organise it by file.

(I will give comments on the paper later.)

RecurrentNetwork.py

Does it need to be in a list in which you append the 2950 arrays/scalars (for Fig.1) one at a time?

    record_r = []
    record_z = []
        record_r.append(self.r)
        record_z.append(self.z)

Or can you use a numpy array instead? (may be more efficient for long simulations)

    record_r = np.zeros((duration, self.N, 1))
    record_z = np.zeros((duration, self.No, 1))
        record_r[t] = self.r
        record_z[t] = self.z

Why is the index "t" not at the same place in the two use of "trajectory" and the use of "stimulus"?

# Learning
if trajectory.size > 0 and t>=learn_start and t<learn_stop and t%2==0:
    if not learn_readout:
        self.rls_recurrent(trajectory[t, :, :])
    else:
        self.rls_readout(trajectory[:, :, t])
# Update the neurons' firing rates
self.update_neurons(stimulus[:, :, t], noise)

Could you please give more explanations about the dimensions/shape of "error" in the code (and paper) when you give this line of code:

error = self.r - target

Moreover, could you please explain why you use 0 indices in all the following lines

RxPxR = (1. + np.dot(r_plastic.T,  PxR))[0, 0]
self.W_rec[i, self.W_plastic[i]] -= error[i, 0] * (PxR[:, 0]/RxPxR)

Optional remark: could you use the more explicit "_" instead of "dummy"?

nb, dummy, duration = stimulus.shape

Fig1.py

Names of variables do not seem consistent for the output of net.simulate()

Should "trajectory" and "pretraining" be understood as "initial_trajectory" and "(pre)trained_trajectory" resp.?

trajectory, initial_output = net.simulate(stimulus=impulse, noise=False)
pretraining, pretraining_output = net.simulate(stimulus=impulse)

Fig2.py

Got a warning at lines 74 and 77

/code/Fig2.py:74: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  perturbation_chaos[1, 0, t_offset + t_perturbation: t_offset + t_perturbation + d_perturbation] = perturbation_amplitude
/code/Fig2.py:77: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  perturbation_neuron[3, 0, t_offset + t_perturbation: t_offset + t_perturbation + d_perturbation] = perturbation_amplitude

This is because "t_perturbation" is a float and not an int

t_perturbation = 300. # Offset for the perturbation

Optional: could you inscrease the compatibility of np.linspace() use?

Do you really need to use "dtype=np.int32"? Because without this it should be compatible with prior versions of numpy.
(So more people would be able to run the code easily.)

subsampling_chaos = t_offset + d_stim + np.linspace(0, d_chaos, 20, dtype=np.int32)
subsampling_neuron = t_offset + d_stim + np.linspace(0, d_neuron, 20, dtype=np.int32)

Optional: could you use an exception (for data loading)?

It would be better for end-users. Otherwise the error one gets is not explicit (i.e. concerning the next line).
For instance, just modifying/adding these 2 lines would be enough:

try:
    targets = sio.loadmat('../data/DAC_handwriting_output_targets.mat')
except Exception, e: #1
    print("You have to download the handwriting data first.")
    print("Go to the data/ folder and run the script get_handwriting.sh:")
    print("$ bash get_handwriting.sh")
    raise Exception(e) #2

(NB: I am not sure for which version of Python this code is working)

Fig3.py

Concerning the time of simulation ("3 days"), did you try to change the numpy arrays from dtype "float64" to dtype "float32"?

(Because 3 days is a bit long for one that want to try the code, so it could be interesting to improve it.)

Moreover (but I may have not understood it well), you may also win time by using the same "10 initial networks" (like in the original experiment) and by training the recurrent connections only once (because retraining the readout can be done for different delays with the same recurrent dynamics).

Thank you in advance for your answers!
Best,
Xavier Hinaut

Dear Julien Vitay,
The paper you propose is very interesting and self-sufficient (understandable without reading the reproducted paper), and the code is well commented and organised.

Because the code is well written, I was able to go through it and understand most of it. There is a few things I was not able to understand, thus I have some little remarks on the code. I tried to organise it by file.

(I will give comments on the paper later.)

RecurrentNetwork.py

Does it need to be in a list in which you append the 2950 arrays/scalars (for Fig.1) one at a time?

    record_r = []
    record_z = []
        record_r.append(self.r)
        record_z.append(self.z)

Or can you use a numpy array instead? (may be more efficient for long simulations)

    record_r = np.zeros((duration, self.N, 1))
    record_z = np.zeros((duration, self.No, 1))
        record_r[t] = self.r
        record_z[t] = self.z

Why is the index "t" not at the same place in the two use of "trajectory" and the use of "stimulus"?

# Learning
if trajectory.size > 0 and t>=learn_start and t<learn_stop and t%2==0:
    if not learn_readout:
        self.rls_recurrent(trajectory[t, :, :])
    else:
        self.rls_readout(trajectory[:, :, t])
# Update the neurons' firing rates
self.update_neurons(stimulus[:, :, t], noise)

Could you please give more explanations about the dimensions/shape of "error" in the code (and paper) when you give this line of code:

error = self.r - target

Moreover, could you please explain why you use 0 indices in all the following lines

RxPxR = (1. + np.dot(r_plastic.T,  PxR))[0, 0]
self.W_rec[i, self.W_plastic[i]] -= error[i, 0] * (PxR[:, 0]/RxPxR)

Optional remark: could you use the more explicit "_" instead of "dummy"?

nb, dummy, duration = stimulus.shape

Fig1.py

Names of variables do not seem consistent for the output of net.simulate()

Should "trajectory" and "pretraining" be understood as "initial_trajectory" and "(pre)trained_trajectory" resp.?

trajectory, initial_output = net.simulate(stimulus=impulse, noise=False)
pretraining, pretraining_output = net.simulate(stimulus=impulse)

Fig2.py

Got a warning at lines 74 and 77

/code/Fig2.py:74: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  perturbation_chaos[1, 0, t_offset + t_perturbation: t_offset + t_perturbation + d_perturbation] = perturbation_amplitude
/code/Fig2.py:77: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  perturbation_neuron[3, 0, t_offset + t_perturbation: t_offset + t_perturbation + d_perturbation] = perturbation_amplitude

This is because "t_perturbation" is a float and not an int

t_perturbation = 300. # Offset for the perturbation

Optional: could you inscrease the compatibility of np.linspace() use?

Do you really need to use "dtype=np.int32"? Because without this it should be compatible with prior versions of numpy.
(So more people would be able to run the code easily.)

subsampling_chaos = t_offset + d_stim + np.linspace(0, d_chaos, 20, dtype=np.int32)
subsampling_neuron = t_offset + d_stim + np.linspace(0, d_neuron, 20, dtype=np.int32)

Optional: could you use an exception (for data loading)?

It would be better for end-users. Otherwise the error one gets is not explicit (i.e. concerning the next line).
For instance, just modifying/adding these 2 lines would be enough:

try:
    targets = sio.loadmat('../data/DAC_handwriting_output_targets.mat')
except Exception, e: #1
    print("You have to download the handwriting data first.")
    print("Go to the data/ folder and run the script get_handwriting.sh:")
    print("$ bash get_handwriting.sh")
    raise Exception(e) #2

(NB: I am not sure for which version of Python this code is working)

Fig3.py

Concerning the time of simulation ("3 days"), did you try to change the numpy arrays from dtype "float64" to dtype "float32"?

(Because 3 days is a bit long for one that want to try the code, so it could be interesting to improve it.)

Moreover (but I may have not understood it well), you may also win time by using the same "10 initial networks" (like in the original experiment) and by training the recurrent connections only once (because retraining the readout can be done for different delays with the same recurrent dynamics).

Thank you in advance for your answers!
Best,
Xavier Hinaut

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Aug 19, 2016

Member

EDITOR

@vitay Can you address the comments ?

Member

rougier commented Aug 19, 2016

EDITOR

@vitay Can you address the comments ?

@rougier rougier added the 02 - Review label Aug 20, 2016

@vitay

This comment has been minimized.

Show comment
Hide comment
@vitay

vitay Aug 22, 2016

Thanks a lot for the very useful comments. I pushed the corresponding changes.

RecurrentNetwork.py

  • Recording: preallocating the recording arrays is indeed nicer and is now used, but does not speed up the simulations (the learning methods take the most computation time anyway).
  • Position of the time index t: it was indeed not consistent. Now all stimuli, recordings and trajectories use time as the first index.
  • Shape of error: the error is a vector with one element per neuron, so (800, 1) for the recurrent units and (1, 1) or (2, 1) for the read-out units. The article is modified to explain it.
  • 0 indices: I took the convention to define vectors as (N, 1) arrays instead of (N,) throughout the code, in order to avoid alignment mistakes when doing matrix-vector multiplications. Unfortunately, indexing the recurrent weight matrix as self.W_rec[i, self.W_plastic[i]] expects a (80,) vector, not the (80, 1) vector returned by PxR, so we have to slice it. But it is true that the indexing of RxPxR was not needed, so now the code looks like:
RxPxR = (1. + np.dot(r_plastic.T,  PxR))
self.W_rec[i, self.W_plastic[i]] -= error[i, 0] * (PxR/RxPxR)[:, 0]

I added a comment in the article on this.

  • Useless return values: "_" is now used instead of "dummy"

Fig1.py

The variable names have been made more consistent:

initial_trajectory, initial_output = net.simulate(stimulus=impulse, noise=False)
pretraining_trajectory, pretraining_output = net.simulate(stimulus=impulse)

Fig2.py

  • Warning: "t_perturbation" is now an int (it occurred also in Fig1).
  • linspace: the cast is now done after the call to linspace, it should allow the use of earler versions of Numpy:
subsampling_chaos = (t_offset + d_stim + np.linspace(0, d_chaos, 20)).astype(np.int32)
subsampling_neuron = (t_offset + d_stim + np.linspace(0, d_neuron, 20)).astype(np.int32)
  • Data loading: I am also not fully sure about exceptions and different versions of Python (especially 2 and 3, your code would fail on 3). In the uploaded code, I simply call exit() at the end of the exception, I think it is enough.

Fig3.py

  • Changing the numpy arrays from dtype "float64" to dtype "float32" allow to obtain a 30% speed improvement. The RecurrentNetworknow accepts a dtype argument allowing to specify the floating point precision. float32 is the default, but I had once a problem of numerical stability with Fig1.py, so one may have to be careful with that.
  • I do not think using the same ten initial networks would change much the computation time: in the original article, the recurrent weights are also trained everytime, it is just the initialization which is the same. The timing capacity of the network only depends on the recurrent weights, but it would be interesting to check if, for long durations such as 8 seconds, the impairment is only after the initial 4 seconds (outside the timing capacity), or if the whole interval is corrupted.

vitay commented Aug 22, 2016

Thanks a lot for the very useful comments. I pushed the corresponding changes.

RecurrentNetwork.py

  • Recording: preallocating the recording arrays is indeed nicer and is now used, but does not speed up the simulations (the learning methods take the most computation time anyway).
  • Position of the time index t: it was indeed not consistent. Now all stimuli, recordings and trajectories use time as the first index.
  • Shape of error: the error is a vector with one element per neuron, so (800, 1) for the recurrent units and (1, 1) or (2, 1) for the read-out units. The article is modified to explain it.
  • 0 indices: I took the convention to define vectors as (N, 1) arrays instead of (N,) throughout the code, in order to avoid alignment mistakes when doing matrix-vector multiplications. Unfortunately, indexing the recurrent weight matrix as self.W_rec[i, self.W_plastic[i]] expects a (80,) vector, not the (80, 1) vector returned by PxR, so we have to slice it. But it is true that the indexing of RxPxR was not needed, so now the code looks like:
RxPxR = (1. + np.dot(r_plastic.T,  PxR))
self.W_rec[i, self.W_plastic[i]] -= error[i, 0] * (PxR/RxPxR)[:, 0]

I added a comment in the article on this.

  • Useless return values: "_" is now used instead of "dummy"

Fig1.py

The variable names have been made more consistent:

initial_trajectory, initial_output = net.simulate(stimulus=impulse, noise=False)
pretraining_trajectory, pretraining_output = net.simulate(stimulus=impulse)

Fig2.py

  • Warning: "t_perturbation" is now an int (it occurred also in Fig1).
  • linspace: the cast is now done after the call to linspace, it should allow the use of earler versions of Numpy:
subsampling_chaos = (t_offset + d_stim + np.linspace(0, d_chaos, 20)).astype(np.int32)
subsampling_neuron = (t_offset + d_stim + np.linspace(0, d_neuron, 20)).astype(np.int32)
  • Data loading: I am also not fully sure about exceptions and different versions of Python (especially 2 and 3, your code would fail on 3). In the uploaded code, I simply call exit() at the end of the exception, I think it is enough.

Fig3.py

  • Changing the numpy arrays from dtype "float64" to dtype "float32" allow to obtain a 30% speed improvement. The RecurrentNetworknow accepts a dtype argument allowing to specify the floating point precision. float32 is the default, but I had once a problem of numerical stability with Fig1.py, so one may have to be careful with that.
  • I do not think using the same ten initial networks would change much the computation time: in the original article, the recurrent weights are also trained everytime, it is just the initialization which is the same. The timing capacity of the network only depends on the recurrent weights, but it would be interesting to check if, for long durations such as 8 seconds, the impairment is only after the initial 4 seconds (outside the timing capacity), or if the whole interval is corrupted.
@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Aug 23, 2016

Member

Thanks @vitay

@neuronalX @piero-le-fou Are you satisfied with the changes ?

Member

rougier commented Aug 23, 2016

Thanks @vitay

@neuronalX @piero-le-fou Are you satisfied with the changes ?

@p-enel

This comment has been minimized.

Show comment
Hide comment
@p-enel

p-enel Aug 23, 2016

@rougier @vitay

I would like to add a few remarks.

But before, I should mention these two typos at the beginning of the paper:

  • first sentence of the introduction:

Recurrent neural networks have the capacity…

  • end of first paragraph of Methods:

60% of the recurrent connections? are plastic...

First, I want to emphasise that the author did a great job at reproducing the original paper's results, and that the additional comments from the author on the details of the implementation are quite useful and even interesting. In addition, the code provided by the author is substantially more readable that the original Matlab code.

The only question I have is regarding the reproduction of figure 2. It seems that, in the original paper, each instance of perturbation injected in the network elicited temporarily divergent trajectories. In other words, in each simulation the effect of the perturbation was different. However, it doesn't seem to be the case in the reproduced figure, and I noticed in the code that the same connectivity and impulse are used for each simulation. Was it a different impulse input connectivity for each simulation in the original code? If not, how would you explain this difference. Although, I do not doubt that, whatever the impulse conditions, the perturbed trajectories would go back to its attracting course, I suppose that one of the goal of the figure was to demonstrate that with different perturbations of a given amplitude, the trajectory always ultimately converges back within a reasonable amount of time.

Of minor importance, but since it was brought up by the other reviewer, a more elegant way of discarding irrelevant outputs of a function in python is to address outputs them directly from the function, e.g. out2 = f(arg)[1]

Thank you

p-enel commented Aug 23, 2016

@rougier @vitay

I would like to add a few remarks.

But before, I should mention these two typos at the beginning of the paper:

  • first sentence of the introduction:

Recurrent neural networks have the capacity…

  • end of first paragraph of Methods:

60% of the recurrent connections? are plastic...

First, I want to emphasise that the author did a great job at reproducing the original paper's results, and that the additional comments from the author on the details of the implementation are quite useful and even interesting. In addition, the code provided by the author is substantially more readable that the original Matlab code.

The only question I have is regarding the reproduction of figure 2. It seems that, in the original paper, each instance of perturbation injected in the network elicited temporarily divergent trajectories. In other words, in each simulation the effect of the perturbation was different. However, it doesn't seem to be the case in the reproduced figure, and I noticed in the code that the same connectivity and impulse are used for each simulation. Was it a different impulse input connectivity for each simulation in the original code? If not, how would you explain this difference. Although, I do not doubt that, whatever the impulse conditions, the perturbed trajectories would go back to its attracting course, I suppose that one of the goal of the figure was to demonstrate that with different perturbations of a given amplitude, the trajectory always ultimately converges back within a reasonable amount of time.

Of minor importance, but since it was brought up by the other reviewer, a more elegant way of discarding irrelevant outputs of a function in python is to address outputs them directly from the function, e.g. out2 = f(arg)[1]

Thank you

@vitay

This comment has been minimized.

Show comment
Hide comment
@vitay

vitay Aug 24, 2016

Thanks for the comments. I uploaded the manuscript with the missing words.

For the perturbation in Fig. 2, I am not fully sure: the original code does not provide the complete script for the figure, it is a GUI where the user can click during the trajectory to perturb the neurons. In the text is only stated:

The perturbation was produced by a 10-ms pulse of an additional input unit randomly connected to all units in the RRN with an input amplitude of 0.2, injected at t=300 ms (approximately the time of the ‘h’ and ‘e’ during the ‘chaos’ and ‘neuron’ trajectories respectively).

So I think it should be the same perturbation every time, as it is only the weights which are random. I just realize I used two different perturbation neurons (one for each word), while the text implies only one, but that should not make a big difference. Perhaps Laje and Buonomano indeed used a different perturbation every time (with different weights), that would explain the higher variance. The perturbation is rather short (10 ms), so it should not have such a dramatic effect on the trajectories if the perturbation is deterministic.

vitay commented Aug 24, 2016

Thanks for the comments. I uploaded the manuscript with the missing words.

For the perturbation in Fig. 2, I am not fully sure: the original code does not provide the complete script for the figure, it is a GUI where the user can click during the trajectory to perturb the neurons. In the text is only stated:

The perturbation was produced by a 10-ms pulse of an additional input unit randomly connected to all units in the RRN with an input amplitude of 0.2, injected at t=300 ms (approximately the time of the ‘h’ and ‘e’ during the ‘chaos’ and ‘neuron’ trajectories respectively).

So I think it should be the same perturbation every time, as it is only the weights which are random. I just realize I used two different perturbation neurons (one for each word), while the text implies only one, but that should not make a big difference. Perhaps Laje and Buonomano indeed used a different perturbation every time (with different weights), that would explain the higher variance. The perturbation is rather short (10 ms), so it should not have such a dramatic effect on the trajectories if the perturbation is deterministic.

@p-enel

This comment has been minimized.

Show comment
Hide comment
@p-enel

p-enel Aug 29, 2016

@rougier
I have no further questions, it's good to go for me!

p-enel commented Aug 29, 2016

@rougier
I have no further questions, it's good to go for me!

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Aug 29, 2016

Member

@piero-le-fou Ok, thanks for the review.

Member

rougier commented Aug 29, 2016

@piero-le-fou Ok, thanks for the review.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Sep 8, 2016

Member

@neuronalX Any update on your review ?

Member

rougier commented Sep 8, 2016

@neuronalX Any update on your review ?

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Sep 12, 2016

Thank you for the answers, the modifications in the code and the interesting answers.

Hereafter, I have some comments on the paper.

Strongly / Weakly

Strongly or Weakly connected?

In the beginning of the paper, you talk about "weakly/sparsely connected" (line 3 of Introduction and line 2 of Methods), and also about "strongly" (l.4 Intro.)

The research field of reservoir computing [2, 4] focuses on forcing weakly connected (i.e. deterministic) recurrent networks to produce a desired output through supervised learning (gradient descent-like). Strongly-connected recurrent networks can exhibit rich and complex patterns in the absence of stimulation, but they are chaotic, as they can not reproduce twice the same trajectory in the presence of noise. In Laje and Buonomano [3], the authors proposed a method to tame the chaos in such networks ...

And then, you state:

The network proposed by Laje and Buonomano [3] is a pool of 800 rate-coded neurons
sparsely connected with each other with a fixed probability.

Are the network used by Laje and Buonomano strongly or sparsly connected?

Strongly / Weakly connected vs. Chaotic / Deterministic

You seem to state that weakly connected RNN are deterministic, and strongly connected ones are chaotic. You should state clearly the references for such statements, because from my understanding of reservoir computing these chaotic/deterministic properties do not come from the sparness of the network, but is for instance more related to the "effective spectral radius" [1] of the weight matrix (in the absence of output feedback). Moreover, the order of statements could be interpreted as comparing chaotic and deterministic as antagonists: a deterministic network can be chaotic.

[1] Jaeger Herbert, Mantas Lukoševičius, Dan Popovici, and Udo Siewert (2007) Optimization and Applications of Echo State Networks with Leaky-Integrator Neurons. Neural Networks 20(3): 335–52.

"supervised learning (gradient descent-like)"

Could you also please give a reference for this statement? Because one particularity of Reservoir Computing (RC) over "more classical" learning schemes is the opportunity to do a one-shot learning of the weights by linear or ridge regression. Thus, I do not think that "gradient descent-like" is a good summarizing qualifier for RC.

Thank you for the answers, the modifications in the code and the interesting answers.

Hereafter, I have some comments on the paper.

Strongly / Weakly

Strongly or Weakly connected?

In the beginning of the paper, you talk about "weakly/sparsely connected" (line 3 of Introduction and line 2 of Methods), and also about "strongly" (l.4 Intro.)

The research field of reservoir computing [2, 4] focuses on forcing weakly connected (i.e. deterministic) recurrent networks to produce a desired output through supervised learning (gradient descent-like). Strongly-connected recurrent networks can exhibit rich and complex patterns in the absence of stimulation, but they are chaotic, as they can not reproduce twice the same trajectory in the presence of noise. In Laje and Buonomano [3], the authors proposed a method to tame the chaos in such networks ...

And then, you state:

The network proposed by Laje and Buonomano [3] is a pool of 800 rate-coded neurons
sparsely connected with each other with a fixed probability.

Are the network used by Laje and Buonomano strongly or sparsly connected?

Strongly / Weakly connected vs. Chaotic / Deterministic

You seem to state that weakly connected RNN are deterministic, and strongly connected ones are chaotic. You should state clearly the references for such statements, because from my understanding of reservoir computing these chaotic/deterministic properties do not come from the sparness of the network, but is for instance more related to the "effective spectral radius" [1] of the weight matrix (in the absence of output feedback). Moreover, the order of statements could be interpreted as comparing chaotic and deterministic as antagonists: a deterministic network can be chaotic.

[1] Jaeger Herbert, Mantas Lukoševičius, Dan Popovici, and Udo Siewert (2007) Optimization and Applications of Echo State Networks with Leaky-Integrator Neurons. Neural Networks 20(3): 335–52.

"supervised learning (gradient descent-like)"

Could you also please give a reference for this statement? Because one particularity of Reservoir Computing (RC) over "more classical" learning schemes is the opportunity to do a one-shot learning of the weights by linear or ridge regression. Thus, I do not think that "gradient descent-like" is a good summarizing qualifier for RC.

@vitay

This comment has been minimized.

Show comment
Hide comment
@vitay

vitay Sep 13, 2016

Strongly or sparsely connected?

Actually both: the connection matrix is sparse (connection probability of 10%) but the weights, when they exist, are strong (gain g>1.5). I called the second aspect "strongly connected", but maybe a more correct term would be "high-gain regime" as in the original article.

Chaotic/deterministic

The understanding I had (I am not an expert) was based on citations of this paper:

Sompolinsky, Crisanti, and Sommers (1998). Chaos in Random Neural Networks. Phys. Rev. Lett. 61, 259. doi:10.1103/PhysRevLett.61.259

For example, in Sussilo and Abbott 2009, they state:

To study the effect of spontaneous chaotic activity on network performance, we introduce a factor g that scales the strengths of the recurrent connections within the network. Networks with g < 1 are inactive prior to training, whereas networks with g > 1 exhibit chaotic spontaneous activity (Sompolinsky et al., 1988) that gets more irregular and fluctuates more rapidly as g is increased beyond 1 (we typically use g = 1.5).

So I inferred the chaotic behavior of a recurrent network depends mostly on the gain of the connections (g>1.5). This is probably a link between this gain and the effective spectral radius of Jaeger et al. (2007), but I am not sure which one.

supervised learning (gradient descent-like)

My bad, I thought only iterative rules like RLS were used.

For all these reasons, I changed the introduction into:

Recurrent neural networks have the capacity to exhibit complex spatio-temporal patterns that can be used to produce motor patterns. The research field of reservoir computing [@Jaeger2001; @Maass2002] focuses on forcing recurrent networks to produce a desired read-out output through supervised learning. In the high-gain regime, recurrent networks can even exhibit rich and complex patterns in the absence of stimulation, but they become chaotic, as they can not reproduce twice the same trajectory in the presence of noise [@Sompolinsky1988]. In @Laje2013, the authors proposed a method to tame the chaos in such networks, by training the recurrent weights to minimize the error between a desired trajectory - spontaneously generated by the network in an initial trial - and the current trajectory. By selectively creating stable dynamic attractors, this method allows to benefit at the same time from the rich dynamics of a chaotic network and from the reproducibility of a deterministic one.

I hope it is less confusing that way.

vitay commented Sep 13, 2016

Strongly or sparsely connected?

Actually both: the connection matrix is sparse (connection probability of 10%) but the weights, when they exist, are strong (gain g>1.5). I called the second aspect "strongly connected", but maybe a more correct term would be "high-gain regime" as in the original article.

Chaotic/deterministic

The understanding I had (I am not an expert) was based on citations of this paper:

Sompolinsky, Crisanti, and Sommers (1998). Chaos in Random Neural Networks. Phys. Rev. Lett. 61, 259. doi:10.1103/PhysRevLett.61.259

For example, in Sussilo and Abbott 2009, they state:

To study the effect of spontaneous chaotic activity on network performance, we introduce a factor g that scales the strengths of the recurrent connections within the network. Networks with g < 1 are inactive prior to training, whereas networks with g > 1 exhibit chaotic spontaneous activity (Sompolinsky et al., 1988) that gets more irregular and fluctuates more rapidly as g is increased beyond 1 (we typically use g = 1.5).

So I inferred the chaotic behavior of a recurrent network depends mostly on the gain of the connections (g>1.5). This is probably a link between this gain and the effective spectral radius of Jaeger et al. (2007), but I am not sure which one.

supervised learning (gradient descent-like)

My bad, I thought only iterative rules like RLS were used.

For all these reasons, I changed the introduction into:

Recurrent neural networks have the capacity to exhibit complex spatio-temporal patterns that can be used to produce motor patterns. The research field of reservoir computing [@Jaeger2001; @Maass2002] focuses on forcing recurrent networks to produce a desired read-out output through supervised learning. In the high-gain regime, recurrent networks can even exhibit rich and complex patterns in the absence of stimulation, but they become chaotic, as they can not reproduce twice the same trajectory in the presence of noise [@Sompolinsky1988]. In @Laje2013, the authors proposed a method to tame the chaos in such networks, by training the recurrent weights to minimize the error between a desired trajectory - spontaneously generated by the network in an initial trial - and the current trajectory. By selectively creating stable dynamic attractors, this method allows to benefit at the same time from the rich dynamics of a chaotic network and from the reproducibility of a deterministic one.

I hope it is less confusing that way.

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Sep 13, 2016

@vitay Thank you very much for the precisions. I know understand much better what you mean. Of course the strength of the connections (which is related to the spectral radius) influence the chaotic/non-chaotic regime.
My misunderstanding was due to the fact that I understood "weakly"/"sparsely connected" and "strongly"/"fully connected" as synonyms.
Thank you for the changes.

@vitay Thank you very much for the precisions. I know understand much better what you mean. Of course the strength of the connections (which is related to the spectral radius) influence the chaotic/non-chaotic regime.
My misunderstanding was due to the fact that I understood "weakly"/"sparsely connected" and "strongly"/"fully connected" as synonyms.
Thank you for the changes.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Sep 16, 2016

Member

@neuronalX Do you have any other comments or are you happy with the proposed changes ?

Member

rougier commented Sep 16, 2016

@neuronalX Do you have any other comments or are you happy with the proposed changes ?

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Sep 27, 2016

Member

@neuronalX Can you tell me if you accept/reject or intend to upload new comments ?

Member

rougier commented Sep 27, 2016

@neuronalX Can you tell me if you accept/reject or intend to upload new comments ?

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Sep 27, 2016

Yes, I will have more comments.

Yes, I will have more comments.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Sep 27, 2016

Member

Ok. Let's give us two more weeks, does that sound good ?

Member

rougier commented Sep 27, 2016

Ok. Let's give us two more weeks, does that sound good ?

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Sep 27, 2016

I hope to finish this week.

Le 27.09.2016 à 12:54, Nicolas P. Rougier a écrit :

Ok. Let's give us two more weeks, does that sound good ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJUPG1-yWOibjJIBMANt_GQpXqgU8eRhks5quPXIgaJpZM4JCVdF.

I hope to finish this week.

Le 27.09.2016 à 12:54, Nicolas P. Rougier a écrit :

Ok. Let's give us two more weeks, does that sound good ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#19 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJUPG1-yWOibjJIBMANt_GQpXqgU8eRhks5quPXIgaJpZM4JCVdF.

@neuronalX

This comment has been minimized.

Show comment
Hide comment
@neuronalX

neuronalX Oct 6, 2016

Hello, I am very sorry for the long delay.
Here are my last comments.

1st equation of the network

Please define y_j(t) in the text.

Input matrix Win

The weights of the input matrix Win are taken from the normal distribution, with mean 0 and variance 1.

It means that Laje & Buonomano just scale the recurrent matrix Wrec but there is no scaling factor for Win? It might be interesting to put a remark on this if they never talk about it (just to notify it at least without comments). Because depending on the number of inputs it could influence the experiments.

I_0 & chaotic behavior

I_0 is chosen very small in the experiments reproduced here (I_0 = 0.001) but is enough to induce a chaotic behavior of the recurrent neurons (non-reproducibility between two trials).

I would say that the behavior of the network is already chaotic, or more precisly "its regime" is chaotic even if I_0 is zero. So I would not say that a non-zero I_0 "induce" the chaotic behavior, but rather highlight the chaotic regime of the network.

typo Wout

Read-out neurons
The read-out neurons simply sum the activity of the recurrent neurons using a matrix
W o ut:

error[i, 0]

# Learning rule W <- W - e * (P*R)/(1+R'*P*R)
W_rec[i, W_plastic[i]] -= error[i, 0] * (PxR/RxPxR)[:, 0]

The weight update uses the error of the neuron i (a scalar) and a vector corre-
sponding to (P_R)/(1+R'_P*R). The shape of W_rec[i, W_plastic[i]] is (80,) on
average, but PxR/RxPxR is (80, 1), so we need to slice this array to the correct shape
using [:, 0]. The implementation for the read-out weights is similar, except we can
use directly r instead of r_plastic in the update rules.

I am not sure if I understood well, but if error[i] is a scalar, why do you need to add 0 in the next dimension?

g

The scaling factor g was chosen smaller as in the previous experiment (1.5 instead of 1.8) to accelerate training.

Could you please give more details about this sentence? Why does it accelerate training?

exact function

The exact function is irrelevant: only the length of the training window actually matters.

You are talking about the "gaussian peak" function, right? If you tried other functions, which do not change the results, please give more info in just one sentence, it could be interesting for readers.

Hello, I am very sorry for the long delay.
Here are my last comments.

1st equation of the network

Please define y_j(t) in the text.

Input matrix Win

The weights of the input matrix Win are taken from the normal distribution, with mean 0 and variance 1.

It means that Laje & Buonomano just scale the recurrent matrix Wrec but there is no scaling factor for Win? It might be interesting to put a remark on this if they never talk about it (just to notify it at least without comments). Because depending on the number of inputs it could influence the experiments.

I_0 & chaotic behavior

I_0 is chosen very small in the experiments reproduced here (I_0 = 0.001) but is enough to induce a chaotic behavior of the recurrent neurons (non-reproducibility between two trials).

I would say that the behavior of the network is already chaotic, or more precisly "its regime" is chaotic even if I_0 is zero. So I would not say that a non-zero I_0 "induce" the chaotic behavior, but rather highlight the chaotic regime of the network.

typo Wout

Read-out neurons
The read-out neurons simply sum the activity of the recurrent neurons using a matrix
W o ut:

error[i, 0]

# Learning rule W <- W - e * (P*R)/(1+R'*P*R)
W_rec[i, W_plastic[i]] -= error[i, 0] * (PxR/RxPxR)[:, 0]

The weight update uses the error of the neuron i (a scalar) and a vector corre-
sponding to (P_R)/(1+R'_P*R). The shape of W_rec[i, W_plastic[i]] is (80,) on
average, but PxR/RxPxR is (80, 1), so we need to slice this array to the correct shape
using [:, 0]. The implementation for the read-out weights is similar, except we can
use directly r instead of r_plastic in the update rules.

I am not sure if I understood well, but if error[i] is a scalar, why do you need to add 0 in the next dimension?

g

The scaling factor g was chosen smaller as in the previous experiment (1.5 instead of 1.8) to accelerate training.

Could you please give more details about this sentence? Why does it accelerate training?

exact function

The exact function is irrelevant: only the length of the training window actually matters.

You are talking about the "gaussian peak" function, right? If you tried other functions, which do not change the results, please give more info in just one sentence, it could be interesting for readers.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Oct 6, 2016

Member

Thanks @neuronalX. Provided @vitay makes recommend corrections, would you recommend acceptance ?

Member

rougier commented Oct 6, 2016

Thanks @neuronalX. Provided @vitay makes recommend corrections, would you recommend acceptance ?

@neuronalX

This comment has been minimized.

Show comment
Hide comment

@rougier Sure!

@vitay

This comment has been minimized.

Show comment
Hide comment
@vitay

vitay Oct 6, 2016

Thanks, I have pushed the desired modifications.

1st equation of the network

The sentence is now:

The weights of the input matrix $W^{in}$ are taken from the normal distribution, with mean 0 and variance 1, and multiply the rates of the input neurons $y_j(t)$

Input matrix Win

Win is indeed not scaled, as they use only 1 to 4 input neurons in the experiments, which are even not activated at the same time (the impulses occur at different times).

I added the following sentence:

The variance of these weights does not depend on the number of inputs, as only one input neuron is activated at the same time.

I_0 & chaotic behavior

I am not sure about this: I ran the simulations without noise and the trajectories were always deterministic in the durations considered here. Even with this amount of noise, trajectories diverge only after 1s. Perhaps that if you simulate long enough you would see some divergence without noise as rounding errors accumulate. It seems like the impulses bring the network into a highly deterministic state which "survives" to the chaotic nature of the network. I simply modified the sentence to:

$I_0$ is chosen very small in the experiments reproduced here ($I_0 = 0.001$) but is enough to highlight the chaotic behavior of the recurrent neurons (non-reproducibility between two trials).

typo Wout

Fixed.

error[i, 0]

That is because I use (N, 1) shapes for vectors instead of (N,). Adressing error[i] returns an arrays of shape (1,), which would work in this case, but is not that clean. I extended the decription of the error vector:

The error is computed as a vector of shape $(800, 1)$ (one scalar value per recurrent unit), but in practice only used for the first N_plastic units:

g

This is the value taken by Laje and Buonomano in the original article. With g=1.5 the network is less chaotic and it takes less training trials to "tame" that chaos. I added the following comment:

As in the original paper, the scaling factor $g$ was chosen smaller than in the previous experiment (1.5 instead of 1.8) to accelerate training: the network is less chaotic and it becomes easier to find values for the recurrent weights that lead to stable trajectories. Using $g=1.8$ would only require more training trials.

exact function

I have not systematically studied all possible functions to make a strong claim, but my guess is that as long as the dynamics of the reservoir are complex enough, the read-out neurons can learn virtually any function (except perhaps at high frequencies). Here we focus on learning the recurrent weights (i.e. to exhibit stabe trajectories long enough), not the readout ones, so any function of the same duration could have been used (e.g. flat). The gaussian bump after a delay is primarily there to attract time perception researchers. I have extended a bit that comment:

The position of the peak within the training window is systematically varied from 250 ms to 8 s, what primarily influences the total duration of a trial. The exact shape of the output function is not very important: only the length of the training window actually matters as we want the recurrent units to exhibit stable trajectories long enough. Other functions with the same total duration would lead to similar results.

vitay commented Oct 6, 2016

Thanks, I have pushed the desired modifications.

1st equation of the network

The sentence is now:

The weights of the input matrix $W^{in}$ are taken from the normal distribution, with mean 0 and variance 1, and multiply the rates of the input neurons $y_j(t)$

Input matrix Win

Win is indeed not scaled, as they use only 1 to 4 input neurons in the experiments, which are even not activated at the same time (the impulses occur at different times).

I added the following sentence:

The variance of these weights does not depend on the number of inputs, as only one input neuron is activated at the same time.

I_0 & chaotic behavior

I am not sure about this: I ran the simulations without noise and the trajectories were always deterministic in the durations considered here. Even with this amount of noise, trajectories diverge only after 1s. Perhaps that if you simulate long enough you would see some divergence without noise as rounding errors accumulate. It seems like the impulses bring the network into a highly deterministic state which "survives" to the chaotic nature of the network. I simply modified the sentence to:

$I_0$ is chosen very small in the experiments reproduced here ($I_0 = 0.001$) but is enough to highlight the chaotic behavior of the recurrent neurons (non-reproducibility between two trials).

typo Wout

Fixed.

error[i, 0]

That is because I use (N, 1) shapes for vectors instead of (N,). Adressing error[i] returns an arrays of shape (1,), which would work in this case, but is not that clean. I extended the decription of the error vector:

The error is computed as a vector of shape $(800, 1)$ (one scalar value per recurrent unit), but in practice only used for the first N_plastic units:

g

This is the value taken by Laje and Buonomano in the original article. With g=1.5 the network is less chaotic and it takes less training trials to "tame" that chaos. I added the following comment:

As in the original paper, the scaling factor $g$ was chosen smaller than in the previous experiment (1.5 instead of 1.8) to accelerate training: the network is less chaotic and it becomes easier to find values for the recurrent weights that lead to stable trajectories. Using $g=1.8$ would only require more training trials.

exact function

I have not systematically studied all possible functions to make a strong claim, but my guess is that as long as the dynamics of the reservoir are complex enough, the read-out neurons can learn virtually any function (except perhaps at high frequencies). Here we focus on learning the recurrent weights (i.e. to exhibit stabe trajectories long enough), not the readout ones, so any function of the same duration could have been used (e.g. flat). The gaussian bump after a delay is primarily there to attract time perception researchers. I have extended a bit that comment:

The position of the peak within the training window is systematically varied from 250 ms to 8 s, what primarily influences the total duration of a trial. The exact shape of the output function is not very important: only the length of the training window actually matters as we want the recurrent units to exhibit stable trajectories long enough. Other functions with the same total duration would lead to similar results.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Oct 7, 2016

Member

Congratulations @vitay, your submission has been accepted and will be soon published.
Please do not push new commits until it is published.

Member

rougier commented Oct 7, 2016

Congratulations @vitay, your submission has been accepted and will be soon published.
Please do not push new commits until it is published.

@ReScience ReScience locked and limited conversation to collaborators Oct 7, 2016

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Oct 7, 2016

Member

@vitay Can you provide a list of keywords ?

Member

rougier commented Oct 7, 2016

@vitay Can you provide a list of keywords ?

@vitay

This comment has been minimized.

Show comment
Hide comment
@vitay

vitay Oct 7, 2016

For the model:

recurrent neural networks, reservoir computing, dynamical systems, learning, chaos

Perhaps also the Python/Numpy keywords, I do not know if they are implicit.

vitay commented Oct 7, 2016

For the model:

recurrent neural networks, reservoir computing, dynamical systems, learning, chaos

Perhaps also the Python/Numpy keywords, I do not know if they are implicit.

@rougier

This comment has been minimized.

Show comment
Hide comment
@rougier

rougier Oct 7, 2016

Member

Publication has been published and will appear soon on https://rescience.github.io/read

DOI

Member

rougier commented Oct 7, 2016

Publication has been published and will appear soon on https://rescience.github.io/read

DOI

@rougier rougier closed this Oct 7, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.