Trajectory inference methods
While not directly related to the accuracy of the inferred trajectory, the quality of the implementation of a method is also an important evaluation metric1. User friendly tools can be easily installed, have an intuitive user interface, and contain in depth documentation. SUch tools are thus easy to apply on new datasets by both experienced and novice users. Developer friendly tools can be easily adapted by other developers, expanding the scope, scalability or accuracy of the tool and thus stimulating new developments in the field. Finally, future proof tools contain several indications that the tool will stand the test of time, by (among other things) including a rigorous assessment of the accuracy and robustness of the method. an important evaluation metric.
To assess these three major reasons behind the need for good tools, we created a transparent checklist of important scientific and software development practices. Each point of this checklist is grouped inside an “aspect”, which is weighted based on how often we found it being cited in a set of articles discussing good practices (Table 1). We also labelled each item based on whether it concerned the user friendliness, developer friendliness or potential broad applicability.
|Open source||availability||8||2, 3, 4, 5, 6, 7, 8||Method’s code is freely available (0.5). The code can be run on a freely available platform (0.5).|
|Version control||availability||7||2, 3, 4, 5, 6, 7||The code is available on a public version controlled repository, such as Github (1).|
|Packaging||availability||5||2, 5, 8, 7||The code is provided as a “package”, exposing functionality through functions or shell commands (0.5). The code can be easily installed through a repository such as CRAN, Bioconductor, PyPI, CPAN, debian packages, … (0.5).|
|Dependencies||availability||5||4, 5, 6, 9||Dependencies are clearly stated in the tutorial or in the code (0.5). Dependencies are automatically installed (0.5).|
|License||availability||7||2, 4, 5, 6, 7, 8||The code is licensed (0.5). License allows academic use (0.5).|
|Interface||availability||2||7||The tool can be run using a graphical user interface, either locally or on a web server (0.5). The tool can be run through the command line or through a programming language (0.5).|
|Function and object naming||code_quality||3||3, 5||Functions/commands have well chosen names (0.67). Arguments/parameters have well chosen names (0.33).|
|Code style||code_quality||4||3, 5, 6||Code has a consistent style (0.5). Code follows (basic) good practices in the programming language of choice, for example PEP8 or the tidyverse style guide (0.5).|
|Code duplication||code_quality||3||3, 5||Duplicated code is minimal (1).|
|Self-contained functions||code_quality||4||10, 4, 7||The method is exposed to the user as self-contained functions or commands (1).|
|Plotting||code_quality||1||Plotting functions are provided for the final and/or intermediate results (1).|
|Dummy proofing||code_quality||3||2, 9||Package contains dummy proofing, i.e. testing whether the parameters and data supplied by the user make sense and are useful (1).|
|Unit testing||code_assurance||6||2, 3, 10, 5, 7||Method is tested using unit tests (0.5). Tests are run automatically using functionality from the programming language (0.5).|
|Continuous integration||code_assurance||5||11, 5, 6, 7||The method uses continuous integration, for example on Travis CI (1).|
|Code coverage||code_assurance||1||The code coverage of the repository is assessed. (1). What is the percentage of code coverage (1).|
|Support||code_assurance||6||3, 5, 6, 7, 8||There is a support ticket system, for example on Github (0.5). The authors respond to tickets and issues are resolved within a reasonable time frame (0.5).|
|Development model||code_assurance||2||12||The repository separates the development code from master code, for example using git master en developer branches (0.4). The repository has created releases, or several branches corresponding to major releases. (0.4). The repository has branches for the development of separate features. (0.2).|
|Tutorial||documentation||6||5, 7, 8, 9, 13||A tutorial or vignette is available (0.25). The tutorial has example results (0.25). The tutorial has real example data (0.25). The tutorial showcases the method on several datasets (1=0, 2=0.5, >2=1) (0.25).|
|Function documentation||documentation||6||3, 4, 5, 7, 9||The purpose and usage of functions/commands is documented (0.33). The parameters of functions/commands are documented (0.33). The output of functions/commands is documented (0.33).|
|Inline documentation||documentation||6||3, 4, 5, 7, 9||Inline documentation is present in the code (1).|
|Parameter transparency||documentation||2||4||All important parameters are exposed to the user (1).|
|Seed setting||behaviour||2||14||The method does not artificially become deterministic, for example by setting some (0.5) or a lot (1) of seeds (1).|
|Unexpected output||behaviour||2||6||No unexpected output messages are generated by the method (0.25). No unexpected files, folders or plots are generated (0.25). No unexpected warnings during runtime or compilation are generated (0.5).|
|Trajectory format||behaviour||1||The postprocessing necessary to extract the relevant output from the method is minimal (1), moderate (0.5) or extensive (0) (1).|
|Prior information||behaviour||1||Prior information is required (0), optional (1) or not required (1) (1).|
|Publishing||paper||1||The method is published (1).|
|Peer review||paper||4||9, 15, 16||The paper is published in a peer-reviewed journal (1).|
|Evaluation on real data||paper||3||17, 18||The paper shows the method’s usefulness on several (1), one (0.25) or no real datasets. (0.5). The paper quantifies the accuracy of the method given a gold or silver standard trajectory (0.5).|
|Evaluation of robustness||paper||5||9, 17, 13, 18||The paper assessed method robustness (to eg. noise, subsampling, parameter changes, stability) in one (0.5) or several (1) ways (1).|
Table 1: Scoring checklist for tool quality control. Each quality aspect was given a weight based on how often it was mentioned in a set of articles discussing best practices for tool development.
We made an initial assessment of the quality of each tool based on this score sheet. Next, we allowed the authors to respond and rebut through the github issue system at our dynmethods repository (https://github.com/dynverse/dynmethods). After several adapations, we created our final qc score for each method (Figure 1).
Figure 1: Overall quality control score for each method
Only 2 tools reached an near-perfect quality control score (CellTrails and Slingshot), with only minor issues regarding the absence of a graphical user interface or the absence of separate development branches (Figure 2). The bulk of tools reach a score between 0.5 and 0.7, with several qc items consequently lacking among most of these tools, as listed in Figure 2 right, with mostly issues regarding the code assurance and the depth by which the tool is evaluated within the paper. Only a limited number of methods reached a score lower than 0.5 (CALISTA, FORKS, reCAT and Waterfall), with issues among all categories.
Figure 2: Overview of the quality control scores for every tool Shown is the score given for each method on every item from our quality control score sheet. Each aspect of the quality control was part of a category, and each category was weighted so that it contributed equally to the final quality score. Within each category, each aspect also received a weight depending on how often it was mentioned in a set of papers discussing good practices in tool development and evaluation. This is represented in the plot as the height on the y-axis. Top: Average QC score for each method. Right: The average score of each quality control item. Shown into more detail are those items which had an average score lower than 0.5.
1. Does your code stand up to scrutiny? Nature 555, 142 (2018).
2. Lee, J. Rpackages: R package development - the Leek group way! (2017).
3. Wilson, G. et al. Best Practices for Scientific Computing. PLOS Biology 12, e1001745 (2014).
4. Taschuk, M. & Wilson, G. Ten simple rules for making research software more robust. PLOS Computational Biology 13, e1005412 (2017).
5. Wickham, H. R Packages: Organize, Test, Document, and Share Your Code. (“O’Reilly Media, Inc.”, 2015).
6. Artaza, H. et al. Top 10 metrics for life science software good practices. F1000Research 5, 2000 (2016).
7. Silva, L. B., Jimenez, R. C., Blomberg, N. & Luis Oliveira, J. General guidelines for biomedical software development. F1000Research 6, (2017).
8. Jiménez, R. C. et al. Four simple recommendations to encourage best practices in research software. F1000Research 6, (2017).
9. Karimzadeh, M. & Hoffman, M. M. Top considerations for creating bioinformatics software documentation. Briefings in Bioinformatics doi:[10.1093/bib/bbw134](https://doi.org/10.1093/bib/bbw134)
10. Anderson, A. Writing Great Scientific Code. (2016).
11. Beaulieu-Jones, B. K. & Greene, C. S. Reproducibility of computational workflows is automated using continuous analysis. Nature Biotechnology 35, nbt.3780 (2017).
12. Driessen, V. A successful Git branching model. nvie.com (2010).
13. Boulesteix, A.-L. Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research. PLOS Computational Biology 11, e1004191 (2015).
14. Puget, J. F. Green dice are loaded (welcome to p-hacking). (2016).
15. Gannon, F. The essential role of peer review. EMBO Reports 2, 743 (2001).
16. Baldwin, M. In referees we trust? Physics Today 70, 44–49 (2017).
17. Aniba, M. R., Poch, O. & Thompson, J. D. Issues in bioinformatics benchmarking: The case study of multiple sequence alignment. Nucleic Acids Research 38, 7353–7363 (2010).
18. Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K. & Boulesteix, A.-L. Over-optimism in bioinformatics: An illustration. Bioinformatics 26, 1990–1998 (2010).
Figure 1: Number of trajectory inference tools over time
Figure 2: Computer languages in which these TI tools are written
Figure 3: Number of tools able to predict a particular trajectory type over time
Figure 4: Number of tools fixing the topology over time