# Review Request: Mondeel, Ogundipe, Westerhoff #41

Closed
wants to merge 19 commits into
from

## Conversation

Projects
None yet
5 participants

### ThierryMondeel commented Oct 26, 2017 • edited by ctb

AUTHOR

Dear @ReScience/editors,

I request a review for the following replication:

### Original article

Title: Predicting metabolic biomarkers of human inborn errors of metabolism
Author(s): T. Shlomi, M.N. Cabili, E. Ruppin
Journal (or Conference): Mol. Syst. Biol
Year: 2009
DOI: 10.1038/msb.2009.22
PDF: http://msb.embopress.org/content/msb/5/1/263.full.pdf

### Replication

Author(s): Thierry D.G.A. Mondeel, Vivian Ogundipe, and Hans V. Westerhoff
Repository: https://github.com/ThierryMondeel/ReScience-submission-Shlomi2009/tree/Mondeel-Ogundipe-Westerhoff-2017
PDF: https://github.com/ThierryMondeel/ReScience-submission-Shlomi2009/blob/Mondeel-Ogundipe-Westerhoff-2017/article/Mondeel_Ogundipe_Westerhoff-2017.pdf
Keywords: Flux variability analysis (FVA), inborn errors of metabolism (IEM), metabolic reconstruction
Language: Python
Domain: Systems biology, computational biology

### Results

• Article has been fully replicated
• Article has been partially replicated
• Article has not been replicated

### Potential reviewers

Potential editors: Timothée Poisot, C. Titus Brown, Karthik Ram
Potential reviewers: Simon van Heeringen, Gökcen Eraslan, Federico Vaggi, Aaron Shifman

EDITOR

• Editor acknowledgment (@ctb Oct 31, 2017)
• Reviewer 1 (@aaronshifman Nov 7, 2017)
• Reviewer 2 (@FedericoV Nov 8, 2017)
• Review 1 decision accept
• Review 2 decision accept
• Editor decision accept (@ctb Mar 25, 2018)

### ThierryMondeel added some commits Oct 26, 2017

 initial article commit 
 e8d15f5 
 gitignore 
 fa38712 
 License + manuscript edits + module versions 
 169cb95 
 Update of main ReadMe 
 bfe6186 
 Update the ReadMe 
 ce78559 
 bug fix in recon 1 map file name 
 dcbd885 
 ReadMe update 
 24df3c7 
 missing requirement for binder 
 eff4845 
 rename notebooks and move all data to the data folder 
 4ba2461 

Member

### rougier commented Oct 26, 2017

 Thanks for your submission, an editor will be soon assigned.
Member

### rougier commented Oct 26, 2017

 @ctb can you handle this submission ?

Member

### rougier commented Oct 29, 2017

 @ctb 🛎 Can you handle this submission ?

 yes.
Member

### rougier commented Nov 2, 2017

 @ctb Thanks. Did you find reviewers?
Member

 @ctb 🛎

### ctb commented Nov 6, 2017

 next on my list ;)

### ctb commented Nov 6, 2017

 @FedericoV would you be willing to review?

### ctb commented Nov 7, 2017

 @FedericoV bump. @simonvh @gokceneraslan @aaronshifman interested/available?

### aaronshifman commented Nov 7, 2017

 @ctb I'b be happy to do it

### FedericoV commented Nov 7, 2017

 Hi Titus, Metabolomics is not really my area of expertise, but I should still be able to take a look at it. I'm happy to help. Federico … On Tue, 7 Nov 2017 at 10:49 Aaron Shifman ***@***.***> wrote: @ctb I'b be happy to do it — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41 (comment)>, or mute the thread .

### ctb commented Nov 8, 2017 • edited

 thank you both!

### aaronshifman commented Nov 21, 2017 • edited

@ThierryMondeel I've split this review into 3 independent sections for the purpose of clarity. Overall I found the code well written and presented. I had some small issues with the manuscript that should be relatively simple to address (only after sending this I noticed that latex doesn't render in the comments, so my apologies)

# Code presentation

The code in general is very well written and quite well commented and that is always appreciated, generally list comprehensions are less desirable than a vectorized statement - however given your use of cobra package these are unavoidable. I personally am also not a fan of the notebook format, but the fact the the critical component (the algorithm) is in a separate importable module more or less waives that concern. I do have some overall critiques of the code which should be quite simple to fix

• There are unused imports (for example from cobra import Metabolite, Reaction) which if they're unused should be removed for cleanliness.
• There are a few lines that are too long most of these arise from having a long comment at the end of a long print statement (see the except block from line 174 in findBiomarkers.py).
• There is a mix of tabs and spaces creating indents
• You also switch between multiline string (''') and lines of comments (#####), stick to the multiline strings
• Lastly increasing some space between lines would significantly improve visibility
• For most, if not all of these comments a quick auto-format in most IDEs should take care of them

# Manuscript

I'm not a metabalomics person, so feel free to tell me that some of these comments are not warranted. Nevertheless, I do believe that in places things could be made clearer.

First off, the introduction is very direct, FVA is never actually defined aside from a numerical algorithm, most of your definitions in the methods section (boundary metabolites, exchange reactions, \ldots) I think could go into the introduction. This would make a much more gradual introduction as well as bringing everyone up to the same place. Also setting up the problem a bit more (giving context) could help.

I do want to highlight a concern with the manuscript which is how the replication is implemented. My first concern is that you say you replicate figures 1, and 2 in the manuscript when in reality figure 1A is replicated and figure 2 is presented in a table that doesn't quite match. For example what the original text calls ARGININEMIA you call Arginase deficiency - I imagine they're synonymous (google seems to think so), but the closer you can make your replication to the original results the easier it is to have confidence in the replication. Secondly I don't understand why this was presented as a table instead of a figure as done in the original.

With regards to the flux variability calculation section, I found it confusing in places.

• Maximize $\pm v_i$, I assume that you just mean that $v_i$ can be positive or negative, in which case you can omit the $\pm$.
-When you introduce the concept you index by $i$, then later on you index by $k$, again I assume that these are representing the same thing - in which case keep the same indexing variable.
• $S$ is never defined in the manuscript
• With regards to $Z$, I follow your explanation to where you define $Z$ as a dot product and then I get lost, what is $c$, and what do you mean by "one or more fluxes". On the same vein, you define $Z_{max}$ in the text but $Z_{opt}$ appears in the equation - are these supposed to be that same thing.
• As one of the constraints you say that $\alpha_k \leq v_k \leq \beta_k$ where do $\alpha$ and $\beta$ come from?

I think this if these concerns were addressed that I would be better able to follow the IEM algorithm you present later.

My last concern is with your sensitivity analysis. While I applaud your going above and beyond so to speak, it seems to take up almost half of your replication. Given that this is a replication paper, I feel that the same result could have been attained with a few sentences (we tried some slightly permuted topologies \ldots different results \ldots method is only as good as the pathway annotation \ldots). Something like this would clarify that you did look into it but it's not the main focus (the replication is).

Lastly in your references some references use the form of page numbers 1000-1020 and some use the form 1000-20. If you could please make that consistent.

# Software replication

The code does run and does seem to replicate the results found in the manuscript. Due to my not quite understanding the algorithm, I'm having difficulty picking apart the code. However given the care put into putting together the code and that it does seem to replicate the results - my guess is that I won't find anything pathological with further analysis.

There is a part of this about the reference implementation being easy to use. I didn't have tqdm installed on any of my environments. Yes it does only take 15s to install, but it's an extra step - and at least for me it hasn't added much. Whether or not it's included doesn't sway my opinion in either direction - it's just something to consider.

### ctb commented Nov 22, 2017

 Thanks for the review @aaronshifman! @ThierryMondeel, minor suggestion, I've found pyflakes to be both simple and useful in identifying unused variables and imports.
Member

### rougier commented Dec 11, 2017

 @ctb @ThierryMondeel Any progress ?
 [Code] update following Aaron Shifman's suggestions 
- unused imports
- tabs v. spaces: spaces only
- fixed (some) long lines
- fixed (some) pylint warnings
mainly space after comma warnings
- some increases in whitespace for readability
 38e7069 

### ThierryMondeel commented Dec 18, 2017

 Apologies for the slow response. I have been on vacation for two weeks. @aaronshifman Thanks for the comments. I am addressing them. @ctb Is it preferable that I commit changes to address @aaronshifman's suggestions, or should I wait for @FedericoV to submit comments?

### aaronshifman commented Feb 17, 2018

Hello @ThierryMondeel thank you for the changes you made after the last round of revisions, I find that they make the paper much more understandable. I still have some minor comments

# Paper

• The word fraction is redundant

they found a 56% fraction of the biomarkers that are correctly predicted and a fraction of 76% of predicted biomarkers that were predicted correctly

• I'm not sure I understand this - is X originating in the cell? In which case it should read "where X is understood to originate from the cell ..."

where X is understood to originate from or efflux to the biofluids.

• In the explanation of the flux variability calculations since you describe the size of $S$ as $m\times r$ for r reactions, could you define r earlier on as the number of reactions (and hence the number of fluxes)

• In the same part, Thank you for the clarification, I understand what you are going for with the $\pm v_i$, however that notation is quite cumbersome. The original paper uses the notation Min $v_i$ or Max $v_i$, if you wanted a less wordy statement you might use something like $v_i^\pm$ as a notation and explain it in the text.

• In the same part, I'm not sure I follow why the sparseness of S is relevant. Since you never discuss numerical methods for spareness, you can remove that sentence.

• Thank you for the clarification about $c^Tv$ term, this now makers sense.

• There is a cross-ref error after table 1 ([tbl:PKU_results])

• As an aesthetic comment when you describe the reaction for PHETHPTOX2, I assume the line breaks are because the equation doesn't fit on one like, having the $\leftrightarrow$ off to the side looks slightly odd. If you centered the arrow and wrote it as $\updownarrow$ then it would look like an equation written vertically (the same applies to the following chemical equation)

• In your replication of the original figure 1, what you call "EX_M*" is called "V*" in the original figure also in panel B the original figure is V2,V1,... whereas your implementation goes EX_M1,EX_M2,... I appreciate that given different software packages it may be difficult to replicate exactly, but if you could change at least one of those features (naming convention or order) they would be more easily comparable because at first glance the replication does not look correct due to the ordering.

• Thank you for the changes to the table representing figure 2. Now that the names match the replication is quite clearer.

• As far as figure 3 goes the replication seems quite clear, and you identified a difference between the replication and the original (the WT interval), a bit more discussion may be warranted on potential sources of differences - or if that isn't possible how much of an effect this should have on the results.

# Code

• Overall the code if clean and fairly easy to read - at times it can be a bit heavy on the list comprehensions, but that's not the end of the world. I do however have a couple of concerns about the software implementation.

• I can't see where the figures are being created, I see you have illustrator files, I can't open them right now, but would I be wrong in guessing that you are drawing the figures based on the output from the model - that's fine but should be mentioned in the article.

• For the replication of figure 3, it would be beneficial to have another cell with the extracted values that you use (EX_Met__L_e for CBS and AHCY) so that it's more clear to see so that the reader doesn't have to search through a large table.

• Lastly the tqdm comment - it wasn't a major issue I'm fine leaving it in.

### ThierryMondeel added some commits Feb 20, 2018

 RE: Aaron 2.0 
- Small text changes
- medium is not -10 inward not -1. This results in correct predictions for Figure 3.
- Include text and code to show result for -1 related to Figure 3
- Fix ordering Figure 1
 cc9d69e 
 Additional explanation of the Figure 3 interval prediction 
 51c23f9 

### ThierryMondeel commented Feb 20, 2018

@aaronshifman Thanks! See below.

# Paper

The word fraction is redundant

they found a 56% fraction of the biomarkers that are correctly predicted and a fraction of 76% of predicted biomarkers that were predicted correctly

Yes, fixed.

I'm not sure I understand this - is X originating in the cell? In which case it should read "where X is understood to originate from the cell ..."

where X is understood to originate from or efflux to the biofluids.

Ok, I removed this sentence. The sentence following it should explain: "Positive flux indicates net secretion of $X$ by the cell and negative flux indicates the net uptake of $X$."

X is is typically the extracellular form of the metabolite. Take the example of lactate: it would be produced intracellularly, be transported outward across the membrane and then efflux through the exchange reaction signifying net production.

In the explanation of the flux variability calculations since you describe the size of $S$ as $m\times r$ for r reactions, could you define r earlier on as the number of reactions (and hence the number of fluxes)

Ok, slightly rephrased this now.

In the same part, Thank you for the clarification, I understand what you are going for with the $\pm v_i$, however that notation is quite cumbersome. The original paper uses the notation Min $v_i$ or Max $v_i$, if you wanted a less wordy statement you might use something like $v_i^\pm$ as a notation and explain it in the text.

Yes ok, I see your point. I reverted to the original paper's notation.

In the same part, I'm not sure I follow why the sparseness of S is relevant. Since you never discuss numerical methods for spareness, you can remove that sentence.

Ok, done.

Thank you for the clarification about $c^Tv$ term, this now makers sense.

Great!

There is a cross-ref error after table 1 ([tbl:PKU_results])

Good catch. Fixed.

As an aesthetic comment when you describe the reaction for PHETHPTOX2, I assume the line breaks are because the equation doesn't fit on one like, having the $\leftrightarrow$ off to the side looks slightly odd. If you centered the arrow and wrote it as $\updownarrow$ then it would look like an equation written vertically (the same applies to the following chemical equation)

Thanks, good suggestion, included now.

In your replication of the original figure 1, what you call "EX_M*" is called "V*" in the original figure also in panel B the original figure is V2,V1,... whereas your implementation goes EX_M1,EX_M2,... I appreciate that given different software packages it may be difficult to replicate exactly, but if you could change at least one of those features (naming convention or order) they would be more easily comparable because at first glance the replication does not look correct due to the ordering.

I agree, I fixed the ordering. I prefer to keep calling the reactions EX_M* since that is more standard in the field and more clearly emphasizes that they are exchange reactions.

As far as figure 3 goes the replication seems quite clear, and you identified a difference between the replication and the original (the WT interval), a bit more discussion may be warranted on potential sources of differences - or if that isn't possible how much of an effect this should have on the results.

Following this comment, I tried to think on what I could do to investigate what the source of the differences might be. Recall that I used to set the influx of medium components to -1. The reasoning was just that this worked to produce the results in Figure 2. Not so for Figure 3, as you and we pointed out.

So I tried changing the influx to -10. And what do you know! The Figure 2 results are still upheld and Figure 3 now gives the same predictions.

I adjusted the change to -10 in the text and I included the -1 vs. -10 simulations in the notebook of Figure 2 and discuss them in the text now.

Thanks! without your "push" I wouldn't have figured this out ;)

#Code

Overall the code if clean and fairly easy to read - at times it can be a bit heavy on the list comprehensions, but that's not the end of the world. I do however have a couple of concerns about the software implementation.
I can't see where the figures are being created, I see you have illustrator files, I can't open them right now, but would I be wrong in guessing that you are drawing the figures based on the output from the model - that's fine but should be mentioned in the article.

Yes you are correct. Figure 1 and Figure 3 in our reproduction are made in Illustrator based on the numerical results from the algorithm. I now mention this in the text.

For the replication of figure 3, it would be beneficial to have another cell with the extracted values that you use (EX_Met__L_e for CBS and AHCY) so that it's more clear to see so that the reader doesn't have to search through a large table.

I'm not sure what you have in mind here. The output is already in Figure 3B. The output of the algorithm is the 4 intervals depicted in black and red and I included the numerical values for the interval. No more information comes out of the algorithm. I simply mentioned the table that the notebook generates for readers that want to see exactly where the output is generated and whether I copied it correctly to the figure.

Lastly the tqdm comment - it wasn't a major issue I'm fine leaving it in.

Thanks!

Member

### rougier commented Mar 6, 2018

 @ctb What's the status of the submission?

### FedericoV commented Mar 12, 2018

 I apologize for the delay in reviewing the last set of changes. I'm completely satisfied with current state of the work - in particular, having findBiomarkers.py in the main repo makes it much easier to follow through the logic of the implementation. I would recommend accepting the reproduction.

### ctb commented Mar 12, 2018

 thank you!

### aaronshifman commented Mar 14, 2018

 @ThierryMondeel Thank you for the changes - I have no further comments on the code or manuscript. I have no issue recommending acceptance @ctb

### ctb commented Mar 14, 2018

 I would like to look over in the next day or two - thank you very much for your reviews!

### ThierryMondeel commented Mar 15, 2018

 @FedericoV @aaronshifman Thank you! @ctb Ok!

### ctb commented Mar 25, 2018

 Thanks, this is definitely an accept! Very nice discussion by all, thanks much - it made for great reading :). I am about to go on vacation for a week, and will attend to this next weekend...

### ThierryMondeel commented Mar 25, 2018 • edited

 @ctb Great news, thanks! Enjoy your vacation :)
Member

### rougier commented Mar 28, 2018

 @ThierryMondeel Congrats. @ctb Just ping me if you need help during the publication process
Member

### rougier commented Apr 9, 2018

 @ctb Are you back from vacation? Can you handle the publication? (You'll probably need help from me or @khinsen at some point).
Member

### rougier commented Apr 23, 2018

 @ctb Can you handle the publication?

### ThierryMondeel commented Apr 30, 2018

 @ctb @rougier Anything I can do to help things along?
Member

### rougier commented Apr 30, 2018

 I'll try to contact @ctb on twitter, he's more reactive there I think.
Member

### rougier commented May 9, 2018

 @ctb Ping!

### ctb commented May 11, 2018

 This weekend. Apologies for the delay, everyone.

Member

### rougier commented May 22, 2018

 @ctb @ThierryMondeel Re-opening the PR since the paper is not yet published.
Member

### rougier commented May 25, 2018

 @ctb Can you give us some update on the publication process?

### ctb commented May 29, 2018

 Published as 10.5281/zenodo.1254629