Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Review Request: Vitay #19
I request a review for the reproduction of the following paper:
I believe the original results have been faithfully reproduced as explained in the accompanying article.
The repository lives at https://github.com/vitay/ReScience-submission/tree/vitay
So, I've been a bit unlucky with the simulations, but it has been my fault for one of them. I didn't read carefully the Readme file. Nonetheless, it might be helpful for others: check that you have the latest version of the required python packages (see Readme). In my case, the code would execute properly until the "linspace" function with a dtype argument which is at the end of the Fig2.py file.
Other issue: the end of the Fig3.py script has missing import and variable conversion.
Other than that all the code is running perfectly!!
Thanks for the report. I uploaded a fix for Fig3.py. The script took so long (3 days) that I ran the simulation only once without the plot, and never checked it worked as standalone...
Also good to know that np.linspace(..., dtype=) is new, perhaps I should try to avoid using it to have a broader compatibility.
Dear Julien Vitay,
Because the code is well written, I was able to go through it and understand most of it. There is a few things I was not able to understand, thus I have some little remarks on the code. I tried to organise it by file.
(I will give comments on the paper later.)
Does it need to be in a list in which you append the 2950 arrays/scalars (for Fig.1) one at a time?
Or can you use a numpy array instead? (may be more efficient for long simulations)
Why is the index "t" not at the same place in the two use of "trajectory" and the use of "stimulus"?
Could you please give more explanations about the dimensions/shape of "error" in the code (and paper) when you give this line of code:
Moreover, could you please explain why you use 0 indices in all the following lines
Optional remark: could you use the more explicit "_" instead of "dummy"?
Names of variables do not seem consistent for the output of net.simulate()
Should "trajectory" and "pretraining" be understood as "initial_trajectory" and "(pre)trained_trajectory" resp.?
Got a warning at lines 74 and 77
This is because "t_perturbation" is a float and not an int
Optional: could you inscrease the compatibility of np.linspace() use?
Do you really need to use "dtype=np.int32"? Because without this it should be compatible with prior versions of numpy.
Optional: could you use an exception (for data loading)?
It would be better for end-users. Otherwise the error one gets is not explicit (i.e. concerning the next line).
(NB: I am not sure for which version of Python this code is working)
Concerning the time of simulation ("3 days"), did you try to change the numpy arrays from dtype "float64" to dtype "float32"?
(Because 3 days is a bit long for one that want to try the code, so it could be interesting to improve it.)
Moreover (but I may have not understood it well), you may also win time by using the same "10 initial networks" (like in the original experiment) and by training the recurrent connections only once (because retraining the readout can be done for different delays with the same recurrent dynamics).
Thank you in advance for your answers!
Thanks a lot for the very useful comments. I pushed the corresponding changes.
I added a comment in the article on this.
The variable names have been made more consistent:
I would like to add a few remarks.
But before, I should mention these two typos at the beginning of the paper:
First, I want to emphasise that the author did a great job at reproducing the original paper's results, and that the additional comments from the author on the details of the implementation are quite useful and even interesting. In addition, the code provided by the author is substantially more readable that the original Matlab code.
The only question I have is regarding the reproduction of figure 2. It seems that, in the original paper, each instance of perturbation injected in the network elicited temporarily divergent trajectories. In other words, in each simulation the effect of the perturbation was different. However, it doesn't seem to be the case in the reproduced figure, and I noticed in the code that the same connectivity and impulse are used for each simulation. Was it a different impulse input connectivity for each simulation in the original code? If not, how would you explain this difference. Although, I do not doubt that, whatever the impulse conditions, the perturbed trajectories would go back to its attracting course, I suppose that one of the goal of the figure was to demonstrate that with different perturbations of a given amplitude, the trajectory always ultimately converges back within a reasonable amount of time.
Of minor importance, but since it was brought up by the other reviewer, a more elegant way of discarding irrelevant outputs of a function in python is to address outputs them directly from the function, e.g. out2 = f(arg)
Thanks for the comments. I uploaded the manuscript with the missing words.
For the perturbation in Fig. 2, I am not fully sure: the original code does not provide the complete script for the figure, it is a GUI where the user can click during the trajectory to perturb the neurons. In the text is only stated:
So I think it should be the same perturbation every time, as it is only the weights which are random. I just realize I used two different perturbation neurons (one for each word), while the text implies only one, but that should not make a big difference. Perhaps Laje and Buonomano indeed used a different perturbation every time (with different weights), that would explain the higher variance. The perturbation is rather short (10 ms), so it should not have such a dramatic effect on the trajectories if the perturbation is deterministic.
Thank you for the answers, the modifications in the code and the interesting answers.
Hereafter, I have some comments on the paper.
Strongly / Weakly
Strongly or Weakly connected?
In the beginning of the paper, you talk about "weakly/sparsely connected" (line 3 of Introduction and line 2 of Methods), and also about "strongly" (l.4 Intro.)
And then, you state:
Are the network used by Laje and Buonomano strongly or sparsly connected?
Strongly / Weakly connected vs. Chaotic / Deterministic
You seem to state that weakly connected RNN are deterministic, and strongly connected ones are chaotic. You should state clearly the references for such statements, because from my understanding of reservoir computing these chaotic/deterministic properties do not come from the sparness of the network, but is for instance more related to the "effective spectral radius"  of the weight matrix (in the absence of output feedback). Moreover, the order of statements could be interpreted as comparing chaotic and deterministic as antagonists: a deterministic network can be chaotic.
 Jaeger Herbert, Mantas Lukoševičius, Dan Popovici, and Udo Siewert (2007) Optimization and Applications of Echo State Networks with Leaky-Integrator Neurons. Neural Networks 20(3): 335–52.
"supervised learning (gradient descent-like)"
Could you also please give a reference for this statement? Because one particularity of Reservoir Computing (RC) over "more classical" learning schemes is the opportunity to do a one-shot learning of the weights by linear or ridge regression. Thus, I do not think that "gradient descent-like" is a good summarizing qualifier for RC.
Strongly or sparsely connected?
Actually both: the connection matrix is sparse (connection probability of 10%) but the weights, when they exist, are strong (gain g>1.5). I called the second aspect "strongly connected", but maybe a more correct term would be "high-gain regime" as in the original article.
The understanding I had (I am not an expert) was based on citations of this paper:
Sompolinsky, Crisanti, and Sommers (1998). Chaos in Random Neural Networks. Phys. Rev. Lett. 61, 259. doi:10.1103/PhysRevLett.61.259
For example, in Sussilo and Abbott 2009, they state:
So I inferred the chaotic behavior of a recurrent network depends mostly on the gain of the connections (g>1.5). This is probably a link between this gain and the effective spectral radius of Jaeger et al. (2007), but I am not sure which one.
supervised learning (gradient descent-like)
My bad, I thought only iterative rules like RLS were used.
For all these reasons, I changed the introduction into:
I hope it is less confusing that way.
@vitay Thank you very much for the precisions. I know understand much better what you mean. Of course the strength of the connections (which is related to the spectral radius) influence the chaotic/non-chaotic regime.
I hope to finish this week.
Le 27.09.2016 à 12:54, Nicolas P. Rougier a écrit :
Hello, I am very sorry for the long delay.
1st equation of the network
Thanks, I have pushed the desired modifications.
1st equation of the network
The sentence is now:
Input matrix Win
Win is indeed not scaled, as they use only 1 to 4 input neurons in the experiments, which are even not activated at the same time (the impulses occur at different times).
I added the following sentence:
I_0 & chaotic behavior
I am not sure about this: I ran the simulations without noise and the trajectories were always deterministic in the durations considered here. Even with this amount of noise, trajectories diverge only after 1s. Perhaps that if you simulate long enough you would see some divergence without noise as rounding errors accumulate. It seems like the impulses bring the network into a highly deterministic state which "survives" to the chaotic nature of the network. I simply modified the sentence to:
That is because I use (N, 1) shapes for vectors instead of (N,). Adressing error[i] returns an arrays of shape (1,), which would work in this case, but is not that clean. I extended the decription of the error vector:
This is the value taken by Laje and Buonomano in the original article. With g=1.5 the network is less chaotic and it takes less training trials to "tame" that chaos. I added the following comment:
I have not systematically studied all possible functions to make a strong claim, but my guess is that as long as the dynamics of the reservoir are complex enough, the read-out neurons can learn virtually any function (except perhaps at high frequencies). Here we focus on learning the recurrent weights (i.e. to exhibit stabe trajectories long enough), not the readout ones, so any function of the same duration could have been used (e.g. flat). The gaussian bump after a delay is primarily there to attract time perception researchers. I have extended a bit that comment: