# Review Request : G. Viejo, B. Girard, M. Khamassi #14

Closed
wants to merge 4 commits into
from

## Conversation

Projects
None yet
4 participants

### gviejo commented Jan 20, 2016

 AUTHOR Dear @ReScience/editors, I request a review for the replication of the following paper: Keramati, M., Dezfouli, A., & Piray, P. (2011). Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS computational biology, 7(5), e1002055. I believed the original results have been faithfully replicated as explained in the accompanying article The repository lives at https://github.com/gviejo/ReScience-submission/tree/VIEJO-GIRARD-KHAMASSI-2016 $git clone https://github.com/gviejo/ReScience-submission.git$ cd ReScience-submission $git checkout VIEJO-GIRARD-KHAMASSI-2016$ cd code EDITOR Editor acknowledgement (@rougier) Jan 20, 2016 Reviewer 1 (@vitay) Jan 20, 2016 Reviewer 2 (@gdetor) Jan 21, 2016 Review 1 decision [accept] Feb 9, 2016 Review 2 decision [accept] Feb 8, 2016 Editor decision [accept] Feb 9, 2016

Member

### rougier commented Jan 20, 2016

 EDITOR Thank you for the submission. I will soon assign two reviewers. In the meantime, could you edit your first comment such as to copy the structure of #3 ? And I think the link to the PDF you provided is wrong.

Member

### rougier commented Jan 20, 2016

 EDITOR Julien Vitay @vitay has accepted to review your submission
Member

### rougier commented Jan 21, 2016

 EDITOR Georgios Detorakis @gdetor has accepted to review your submission

Member

### rougier commented Jan 23, 2016

 EDITOR @gviejo Before the actual review starts, you might consider cleaning your repo (you added some temporary files): article/missfont.log article/rescience-template.log article/texput.log article/viejo_girard_khamassi.bcf article/viejo_girard_khamassi.log article/viejo_girard_khamassi.out article/viejo_girard_khamassi.run.xml

### gviejo commented Feb 3, 2016

 AUTHOR @rougier The repository should be clean.

### vitay commented Feb 8, 2016

 REVIEWER 1 Alert: @gviejo Article The article describes precisely what the reproduced model does, what were the main difficulties in reimplementing the model (missing parameter values, typos) and where the reproduction differs from the original. The authors justify sufficiently the small difference in probabilities of action. The authors only reproduce the first figure of the original figure, which is sufficient. If the authors find the time to implement the other tasks, or at least to hint what should be changed in order to reproduce them, it would be of great value. The manuscript has some formatting issues: The parameters table is not correct. By suppressing the first dashed line, it looks much nicer on my computer, but it may depend on pandoc's version: Name Description Value ------------------ ----------------------------------------- ------------------------- $\sigma$ Updating rate of the average reward 0.02 $\eta$ Variance of evolution noise 0.0001 $P_n$ Variance of observation noise 0.05 $\beta$ Rate of exploration 1.0 $\rho$ Update rate of the reward function 0.1 $\gamma$ Discount factor 0.95 $\tau$ Time step of graph exploration 0.08 depth Depth of search in graph exploration 3 $\phi$ Update rate of the transition function 0.5 init cov Initialisation of covariance matrice 1.0 $\kappa$ Unscentered transform parameters 0.1 ------------------ ----------------------------------------- ------------------------- Notice the typo on "Initialisation" for init cov. The algorithm should not be indented that much to the right, it feels almost centered. Some equations are centered in the initialization phase, what impairs reading. Below is a modification that looks more consistent: > **Initialization** >> $Q(s, a)^{Goal-Directed} = \{ 0, \ldots \}$ >> $Q(s, a)^{Habitual} = \{ 0, \dots \}$ >> $\Sigma = \left( \begin{array}{*4{c}} cov \times \eta & 0 & \ldots & 0 \\ 0 & cov \times \eta & \ldots & \vdots \\ \vdots & \ldots & \ddots & 0 \\ 0 & \ldots & 0 & cov \times \eta \\ \end{array}\right)$ >> $R(S1, EM) = 1$ >> $\bar{R} = 0$ >> $\hat{R}(s,a) = \{0,\ldots\}$ > **Main Loop** >> $\textbf{FOR}\ i = 1:T$ >>> $s_t = S_0$ >>> $\textbf{IF}\ i = T_{devaluation}$ >>>> $R(S1, EM) = 0$ >>>> $\hat{R}(S1, EM) = -1$ >>> $\textbf{WHILE}\ s_t \neq S1 \bigwedge a_t \neq EM$ >>>> $a_t = \textbf{Selection}(s_t)$ >>>> $r_t = R(s_t,a_t)$ >>>> $s_{t+1} = transition(s_t, a_t)$ >>>> $\textbf{Update}(s_t,a_t, s_{t+1}, r_t)$ > **Selection** >> $\{a_1,\ldots,ai,\ldots\} \leftarrow sort(Q(s_t,a_i))$ >> $VPI(s_t, a_1) = (Q(s_t,a_2)^{H}-Q(s_t,a_1)^{H})P(Q(s_t,a_1)^{H}>$VPI(s_t, a_i) = (Q(s_t,a_i)^{H}-Q(s_t,a_1)^{H})P(Q(s_t,a_i)^{H}>Q(s_t,a_1)^{H}) + \frac{\sigma(s_t,a_t)}{\sqrt{2\pi}} e^{-\frac{(Q(s_t,a_1)^H - Q(s_t,a_i)^H)^2}{2\sigma(s_t,a_t)^2}}$>>$\textbf{FOR}\ i \in \{a_1, a_2,\ldots, a_i, \ldots\}$>>>$\textbf{IF}\ VPI(s_t, a_i) \geq \tau \bar{R}$>>>>$Q(s_t,a_i) = \hat{R}(s_t,a_i) + \gamma \sum\limits_{s'}p_{T}(\{s,a\}\rightarrow s') \max\limits_{b \in A} Q(s',b)^{Goal-directed}$>>>$\textbf{ELSE}$>>>>$Q(s_t,a_i) = Q(s_t,a_i)^{Habitual}$>>$a_t \leftarrow \textit{SoftMax}(Q(s_t,a), \beta)$> **Update** >>$\bar{R} = (1-\sigma) \bar{R} + \sigma r_t$>>$\hat{R}(s_t,a_t) =(1 - \rho) \hat{R} + \rho r_t$>>$p_{T}(s_t, a_t, s_{t+1}) = (1 - \phi) p_{T}(s_t, a_t, s_{t+1}) + \phi$>> Specific to Kalman Q-Learning >>$\Theta = \{ \theta_j, 0 \geq j \geq 2|S.A|\}$>>$\check{W} = \{ w_j, 0 \geq j \geq 2|S.A| \}$>>$\check{R} = \{ \check{r}_j = \theta_j(s_t,a_t) - \gamma \max\limits_{b \in A} \theta_j(s_{t+1},b),\ 0 \geq j \geq 2|S.A|\}$>>$r_{predicted} = \sum\limits_{j=0}^{2|S.A|} w_j \check{r}_j$>>$P_{\theta_j \check{r}_j} = \sum\limits_{j=0}^{2|S.A|} w_j (\theta_j - Q^{Habitual}_t)(\check{r}_j - r_{predicted})$>>$P_{\check{r}_j} = \sum\limits_{j=0}^{2|S.A|} w_j (\check{r}_j - r_{predicted})^2 + P_n$>>$K_t = P_{\theta_j \check{r}_j} P_{\check{r}_j}^{-1}$>>$\delta_t = r_t - r_{predicted}$>>$Q_{t+1}^{Habitual} = Q_{t}^{H} + K_t \delta_t$>>$P_{t+1}^{H} = P_{t}^{H} - K_t P_{\Sigma_t} K_t^T$As the algorithm is quite complex, it would be good to add more comments inside the algorithm, such as: >> # Sort the Q-values in descending order >>$\{a_1,\ldots,ai,\ldots\} \leftarrow sort(Q(s_t,a_i))\$ Code The code runs correctly on Linux Mint 17.3 (python 2.7.6, numpy 1.8.2, matplotlib 1.3.1, scipy 0.13.3), with only the mentioned dependencies required. It reproduces the figure of the article as expected. At the small cost of importing the future module at the beginning of run.py: from __future__ import print_function and calling the print function like this: print(exp, nb_trials, deval_time) # in run.py print("Parameters not found") # in models.py as well as using range() instead of xrange(): from __future__ import print_function try: import xrange as range # Overrides range in Python2, does nothing in Python 3 except Exception: pass import numpy as np ... for i in range(nb_blocs): for exp, nb_trials, deval_time in zip(['mod','ext'], [nb_iter_mod, nb_iter_ext], [deval_mod_time, deval_ext_time]): ... for j in range(nb_trials): ... the code would run with both Python 2 and Python 3. I suggest to make this change, as the README specifies "at least python 2.7.3", what includes python 3.x. run.py is well structured, with the parameters grouped and commented, before the main loop is actually performed. However, the two other files contain many methods which are not documented at all. The method names are sufficiently well chosen so that one understands what they are supposed to do. In order to allow users to adapt and expand your code, it would be great if you could add some docstrings to each method, even if it sounds dummy, as in: def createQValuesDict(states, actions): """ Returns a dictionary containing the Q-values of all state-action pairs. The keys are either: * states, in which case the vaue is a list action indices. * (state, action) tuples, in which case ... * (state, index) tuples, in which case... Parameters: * states: list of states * actions: list of actions """ values = {0:np.zeros(len(states)*len(actions))} tmp = 0 for s in states: values[s] = [] n = 0 for a in actions: values[(s,a)] = tmp values[s].append(tmp) values[(s,n)] = a tmp = tmp + 1 n = n + 1 return values Even inside a single method, it would be good for comprehension to add comments to know what is currently done, for example: def updateValue(self, reward): # Retrieve the current state, action and reward r = (reward==1)*1 s = self.state a = self.action # Compute the sigma points sigma_points, weights = computeSigmaPoints(self.values[0], self.covariance['cov'], self.kappa) # Predicted rewards rewards_predicted = (sigma_points[:,self.values[(s,a)]]-self.gamma*np.max(sigma_points[:,self.values[s]], 1)).reshape(len(sigma_points), 1) reward_predicted = np.dot(rewards_predicted.flatten(), weights.flatten()) # Covariance cov_values_rewards = np.sum(weights*(sigma_points-self.values[0])*(rewards_predicted-reward_predicted), 0) cov_rewards = np.sum(weights*(rewards_predicted-reward_predicted)**2) + self.var_obs # Kalman gains kalman_gain = cov_values_rewards/cov_rewards # Update the value self.values[0] = self.values[0] + kalman_gain*(reward-reward_predicted) # Update the covariance self.covariance['cov'][:,:] = self.covariance['cov'][:,:] - (kalman_gain.reshape(len(kalman_gain), 1)*cov_rewards)*kalman_gain This may be redundant with the method names, but it allows a much faster comprehension of the code. The code is released under the BSD licence (as defined by the header of the python files), but no LICENSE.txt file is present in the directory. Not fully sure, but I think you should add one.

REVIEWER 2

## Article

The text explains the main goal of this replication and provides all the necessary parameters and the algorithm of the implementation as well. The table is not so clear. It's better to put parameters on separate rows in the first column, in a second column the comments regarding each parameter and finally in a third column the arithmetic values of parameters. It would be nice if authors could explain (or speculate) why there is a slight difference in time steps and probabilities of action (different parameters, different implementation of Kalman Q-learning).

## Source codes

The code runs smoothly on a Lenovo Tinkpad X250 laptop - Linux Fedora 22. It consumes ~44 MB of physical memory and it takes ~274,0s to finish execution and plot the results. The results are qualitatively similar to the ones in the original article.

Here are some comments regarding the source codes:

• Use range instead of xrange
• params dictionary in run.py is defined but never used.
• run.py file is well structured and commented.
• models.py and fonctions.py are well structured but there are not at all comments. Is good to comment at least the functions and the basic methods of your classes.
• In models.py for the two classes you can use the configuration file parser (ConfigParser see: https://docs.python.org/2/library/configparser.html) for loading the parameters of the model.
• In models.py, fonctions.py and run.py there are long lines of code. Is better to break them into multiple lines.

## Typos

In the introduction:
"We replicate the results of the first task i.e. the devaluation experiment with two states
and two actions." Put a comma before i.e.

Authors use a BSD license for their source codes, although they do not provide any LICENSE along with their sources. In addition, they have not used the proper statement of BSD within their *.py files (see https://opensource.org/licenses/BSD-3-Clause).

Member

### rougier commented Feb 8, 2016

 EDITOR Dear @gviejo Could you address reviewer's comment ? For the code license, have a look at http://rescience.github.io/write/, you'll find explanation for including a license in your code repository.

### gviejo commented Feb 9, 2016

 AUTHOR @rougier I changed the repositiry according to the rewiever's comments except for the use of ConfigParser. I choosed to clean some functions related to parameters dictionnary that were not used. I added LICENSE.md in the code repository but I am not sure to understand the comments about the LICENSE within the py files.
Member

### vitay commented Feb 9, 2016

 REVIEWER 1 The changes are great, thanks. It is good to publish.

### gdetor commented Feb 9, 2016

 REVIEWER 2 Everything's fine. It can be published. Thank you.

Member

### rougier commented Feb 9, 2016

 EDITOR @gviejo Congratulations, your submission has been accepted. Please do not modify your repository until further notice.

Member

### rougier commented Feb 10, 2016

 This submission has been accepted for publication, and has been published at https://github.com/ReScience/ReScience/wiki/Current-Issue