-
Notifications
You must be signed in to change notification settings - Fork 0
/
codecheck-paper.tex
1270 lines (1143 loc) · 76.4 KB
/
codecheck-paper.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt]{article}
\usepackage{f1000_styles}
\usepackage[numbers]{natbib}
\usepackage{hyperref}
\hypersetup{colorlinks=false, pdfborderstyle={/S/U/W 1}, pdfborder=0 0 1}
\usepackage{url}
\usepackage{setspace}
\usepackage{booktabs}
\usepackage{doi} % turn DOIs into links
\usepackage{academicons} % for ORCID icon, via https://tex.stackexchange.com/a/534476/212961
\definecolor{idcolor}{HTML}{A6CE39}
\newcommand{\orcid}[1]{\href{https://orcid.org/#1}{\color{idcolor}\aiOrcid}}
\newcommand{\email}[1]{\href{mailto:#1}{#1}}
\newcommand{\rev}[1]{#1}
%\newcommand{\rev}[1]{\textit{#1}}
%\newcommand{\rev}[1]{\textcolor{blue}{#1}}
\usepackage{fontawesome}
\newcommand{\github}[1]{\href{https://github.com/#1}{\faGithub}}
\newcommand{\twitter}[1]{\href{https://twitter.com/#1}{\faTwitter}}
\onehalfspacing
\begin{document}
\title{CODECHECK: an Open Science initiative for the independent
execution of computations underlying research articles during peer
review to improve reproducibility}
\author[1,$\ast$]{Daniel~N\"{u}st \orcid{0000-0002-0024-5046}}
\author[2,$\ast$]{Stephen~J~Eglen \orcid{0000-0001-8607-8025}}
\affil[1]{Institute for Geoinformatics, University of M\"unster, Germany; \email{daniel.nuest@uni-muenster.de}}
\affil[2]{Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK; \email{sje30@cam.ac.uk}}
\affil[$\ast$]{Corresponding authors}
\maketitle
\begin{abstract}
The traditional scientific paper falls short of effectively communicating computational research. To help improve this situation, we propose a system by which the computational workflows underlying research articles are checked. The CODECHECK system uses open infrastructure and tools and can be integrated into review and publication processes in multiple ways. We describe these integrations along multiple dimensions (importance, who, openness, when). In collaboration with academic publishers and conferences, we demonstrate CODECHECK with 25 reproductions of diverse scientific publications. These CODECHECKs show that asking for reproducible workflows during a collaborative review can effectively improve executability. While CODECHECK has clear limitations, it may represent a building block in Open Science and publishing ecosystems for improving the reproducibility, appreciation, and, potentially, the quality of non-textual research artefacts. The CODECHECK website can be accessed here: \url{https://codecheck.org.uk/}.
\end{abstract}
\subsection*{Abbreviations}
ACM: Association for Computing Machinery;
ECRs: Early Career Researchers;
RCR: Replicated Computational Results;
TOMS: Transactions on Mathematical Software.
\clearpage
\section*{Introduction}\label{introduction}
Many areas of scientific research use computations to simulate or
analyse their data. These complex computations are difficult to
explain coherently in a paper \citep{marwick_how_2015}. To complement
the traditional route of sharing research by writing papers, there is
a growing demand to share the underlying artefacts, notably code and
datasets, so that others can inspect, reproduce or expand that work
(see Figure~\ref{fig:inverse}). Early proponents of this initiative
were Buckheit and Donoho
\cite{buckheit_wavelab_1995,claerbout_electronic_1992}, who noted:
\emph{``An article about computational science in a scientific
publication is \text{not} the scholarship itself, it is merely
\textbf{advertising} of the scholarship. The actual scholarship is
the complete software development environment and the complete set
of instructions which generated the figures.''}
If researchers start sharing more artefacts, how might these artefacts
be examined to ensure that they do what they claim? For example,
although scientific journals now require a data sharing statement that
outlines what data the authors have (or will) share, journals
implement this differently. On one hand, journals have
been created to accept ``data papers''
(e.g.,~\href{https://www.nature.com/sdata/}{\emph{Scientific~Data}},
\href{https://essd.copernicus.org/}{\emph{Earth System Science Data}},
\href{https://rmets.onlinelibrary.wiley.com/journal/20496060}{\emph{Geoscience
Data Journal}}, \href{https://bdj.pensoft.net/}{\emph{Biodiversity
Data Journal}},
\href{https://openpsychologydata.metajnl.com/}{\emph{Journal of Open
Psychology Data}}, \href{https://odjar.org/}{\emph{Open Data
Journal for Agricultural Research}},
\href{https://openhealthdata.metajnl.com}{\emph{Journal of Open Health
Data}}); these journals have established rigorous procedures by
which data are validated according to standards in each field. On
the other hand, many journals still allow authors to
state ``Data available upon reasonable request''. Authors, while
possibly well intentioned at the time of writing the article, often
cannot provide data when requested as data disappears over time
\cite{Vines2014-hf}.
Given that data are not routinely shared, what hope might there be for
sharing computer programs? Both data and software are required to
validate a computational analysis; data can be seen as inert whereas
code requires an environment to be run in. This makes software harder
to share. Our experience is that researchers offer several reasons
for why code is not shared, e.g.,~``there is no documentation'', ``I
cannot maintain it'', or ``I do not want to give away my code to
competitors''. Our view is that sharing code, wherever possible, is
good for the community and the individual
\cite{Barnes2010-iv,markowetz_five_2015}. Having code and data openly
available, and archived, provides a valuable resource for others to
learn from, even if the code is broken or lacks documentation.
However, with a little effort, we believe that if an independent
person can re-run the programs, this is worth documenting and that
this reduces the barrier to evaluating non-text research materials.
Just as data journals' validations of data and all journals' peer
review provides a ``baseline reassurance'', i.e.,~that a paper has
been checked by someone with an understanding of the topic
\cite{fyfe_mission_2019}, the same baseline could be provided for the
\rev{computational} workflow underlying a paper. With this in mind, we have developed a
set of principles and an example \rev{CODECHECK} workflow that provides a pragmatic
way of checking that a paper's code works, i.e., it is reproducible
following the Claerbout/Donoho/Peng terminology
\cite{barba_terminologies_2018}.
Here we offer a thorough description of a process and its variations
to integrate a much-needed evaluation of computational reproducibility
into peer review, and we demonstrate its feasibility by means of 25
reproductions across scientific disciplines. We call this system
CODECHECK.
\begin{figure}
% emojis: https://github.com/googlefonts/noto-emoji
% cross mark: u274c
% check mark: u2714
\centering
\includegraphics[width=0.7\textwidth]{figs/rr.pdf}
\caption{The inverse problem in reproducible research. The left
half of the diagram shows a diverse range of materials used
within a laboratory. These materials are often then
condensed for sharing with the outside world via the
research paper, a static PDF document. Working backwards from the
PDF to the underlying materials is impossible. This prohibits reuse
and is not only non-transparent for a specific paper but is also
ineffective for science as a whole. By sharing the
materials on the left, others outside the lab can enhance this work.}
\label{fig:inverse}
\end{figure}
\section*{What is a CODECHECK?}\label{what-is-a-codecheck}
\subsection*{\rev{CODECHECK w}orkflow and people}\label{workflow-people}
CODECHECK is best demonstrated by way of our example workflow, and later
we expand on the underlying principles. The \rev{CODECHECK} workflow involves three
groups of people:
(1) the \textbf{author} of a paper providing the code to be checked,
(2) the \textbf{publisher} of a journal interested in publishing the author's paper, and
(3) the \textbf{codechecker}, who checks that the author's code works.
The six-step \rev{CODECHECK} workflow we have refined is shown in
Figure~\ref{fig:workflow}. In this article, we also refer to a
\textbf{peer-reviewer} who is independent of this process, and
performs the traditional academic review of the content of an article.
\begin{figure}
% original figure source: https://docs.google.com/drawings/d/1tGOZFNZle-oE1Ynw_tLZFmNcv3wgxD0HPs5bXzDOrng/edit
% emojis: https://github.com/googlefonts/noto-emoji and https://github.com/twitter/twemoji
% emoji codes:
% book stack: 1f4da
% package: 1f4e6
% file drawer: 1f5c4
% disk: 1f4be
% document: 1f4c4
% phd: 1f393
% detective: u1f575
% laptop: 1f4bb
% folder: 1f4c1 1f4c2
% receipt: 1f9fe
\centering
\includegraphics[width=\textwidth]{figs/codecheck_overview.pdf}
\caption{The CODECHECK example \rev{workflow} implementation.
Codecheckers act as detectives: They investigate and record,
but do not fix issues. Numbers in bold refer to steps
outlined in the text.}
\label{fig:workflow}
\end{figure}
\textbf{Step 1:} The author submits their manuscript along with the
code and data to the publisher. The code and data need not be openly
available at this point. However, in many cases the code and data may
be published on a code hosting platform, such as GitHub or
GitLab. Ideally, the author is expecting the CODECHECK and prepares
for it, e.g.,~by asking a colleague to attempt a reproduction, and
providing a set of instructions on how to re-run the \rev{computational} workflow.
\textbf{Step 2:} The publisher finds a codechecker to check the
code. This is analogous to the publisher finding one or more
peer-reviewers to evaluate the paper, except we suggest that the
codechecker and the author talk directly to each other.
\textbf{Step 3:} The codechecker runs the code, based on instructions
provided by the author. They check if some or all of the results from
the paper can be reproduced. If there are any problems running the
code, the codechecker asks the author for help, updates, or further
documentation. The burden to provide reproducible material lies with
the author. The codechecker then tries to run the code again. This
process iterates until either the codechecker is successful, or the
codechecker concludes the \rev{paper's} workflow is not reproducible. As part of
this process, the codechecker could work entirely locally, relying on
their own computing resources, or in the cloud, e.g.,~using the open
MyBinder infrastructure \cite{jupyter_binder_2018} or alternatives,
some of which are more tailored to scientific publications while
others offer commercial options for, e.g.,~publishers
(cf. \cite{konkol_publishing_2020}). A cloud-based infrastructure
allows for the codechecker and author to collaboratively improve the
code and enforces a complete definition of the computing environment;
but, unless secure infrastructure is provided, e.g.,~by the publisher,
this requires the code and data to be published openly online. Note
that the task of the codechecker is to check only the ``mechanics'' of the
\rev{computational} workflow. In the context of
mathematics, Stodden et~al. \cite{stodden_setting_2013} distinguish
between \emph{verification} and \emph{validation}; following their
definition, a CODECHECK ensures verification
%\footnote{Note this is a different use from that referred to as the \emph{Verification Reports} article type in the journal \emph{Cortex} \cite{chambers_verification_2020}.},
of computational results, i.e.,~checking that code generates the
output it claims to create, but not a validation, i.e.,~checking that
the code implements the right algorithm to solve the specific research
problem. Nevertheless, simply attempting to reproduce an output may
highlight a submission's shortcomings in meeting a journal's
requirements (cf.~\cite{christian_journal_2020}) and may effectively
increase transparency, thereby improving practices
(cf.~\cite{nosek_scientific_2012}) even if the check does not go into
every detail.
\textbf{Step 4:} The codechecker writes a certificate stating how the
code was run and includes a copy of outputs (figures or tables) that
were independently generated. The certificate may include
recommendations on how to improve the material. The free text in the
certificate can describe exactly what was checked, because each
\rev{computational} workflow is unique. Since no specific tool or platform is required,
such that no authors are excluded, it is futile for the
codechecker to use automation or fixed checklists.
\textbf{Step 5:} The certificate and auxiliary files created during
the check, e.g.,~a specification of a computing environment, data
subsets or helper scripts, and the original code and data get
deposited in an open archive unless restrictions (data size, license
or sensitivity) apply. Currently, codecheckers deposit the material
on Zenodo themselves, but a publisher may complete this step after
integrating CODECHECK into its review process. A badge or other
visual aid may be added to the deposit and the paper and link to the
certificate. Although a badge simplifies the CODECHECK into a binary
value and risks introducing confusion regarding the extent of the
check, a badge provides recognition value and acknowledges the
completed CODECHECK. The badge and the actual check are incentives
for undertaking the effort needed to provide a reproducible workflow.
\textbf{Step 6:} The publisher can, depending on the timing, provide
the certificate to peer-reviewers or editors or publish it and link
between certificate, paper, and any repositories. Currently, the
codechecker creates these connections on Zenodo. They appear as links
with a relationship type on the Zenodo landing page for a certificate,
e.g., the ``related identifiers'' and ``alternate identifiers'' of
certificate \texttt{2020-025} \cite{cert-2020-025}. The publisher
also credits the codechecker's work by depositing the activity
in scholarly profiles, such as ORCID (see peer review contributions in
\href{https://support.orcid.org/hc/en-us/articles/360006971333-Peer-Review}{ORCID records}).
The publisher also ensures proper publication metadata, e.g.,~links
from the certificate repository to the published paper or the original
code repository.
\subsection*{Variations}\label{variations}
\subsubsection*{Dimensions of CODECHECK workflows}\label{dimensions-of-workflows}
Our workflow is just one of many possibilities of a CODECHECK
workflow. Here we consider several dimensions in a space of possible
CODECHECK workflows(Figure~\ref{fig:dimensions}). These aspects touch
on timing, responsibilities, and transparency.
\begin{figure}
% original figure source: https://docs.google.com/drawings/d/1q2EdW3Ad-IpcD-CeoR1eIFMJW2fF6OVmDhCtCUmLO1k/edit
\centering
\includegraphics[width=0.8\textwidth]{figs/codecheck_dimensions.pdf}
\caption{The dimensions of implementing a CODECHECK \rev{workflow}.}
\label{fig:dimensions}
\end{figure}
\subsubsection*{When to do a CODECHECK and with what importance?}
\label{when-to-do-a-codecheck}
The time at which a CODECHECK is done and its ascribed importance are
closely connected, so we describe the dimensions \emph{When} and
\emph{Importance} together. The earlier a CODECHECK happens in the
publishing process, the more it can affect editorial decisions: Is a
paper published, sent back for revisions, or rejected? Even earlier
checks, i.e.,~a CODECHECK of a preprint, may help to improve the
\rev{computational} workflow itself, even before a publisher is involved. As such,
codechecking papers could be part of a preprint server's policy or
initiated by interested authors.
Publishers could introduce a CODECHECK as a \textbf{strict prerequisite}.
As this can reduce the workload of reviewers, such a check should occur early in the review process.
Yet, the later in the review process the check happens, the easier is
it to allow bidirectional communication between the author and
codechecker, e.g., because the author might already be notified of the
paper's acceptance and may be more willing to share materials online closer to the paper's publication date.
A \textbf{pre-review} CODECHECK means editors would send a submission
for peer review only if it passes the check, or include the certificate in the submission package provided to peer-reviewers.
Peer-reviewers may then judge the relevance of the computations for the results of the work.
A CODECHECK may also be conducted in \textbf{parallel} to the academic
peer review. This puts less burden on the turnaround time for the
CODECHECK, yet it only makes the outcomes available during the final
consideration by the handling editor. The check could also be
assigned after suggestion by a reviewer, which would remove the need
for submissions to undergo a pre-review screening. However,
soliciting such a ``specialist review'' is much less desirable than
having a regular CODECHECK, thus avoiding the situation in which some
submissions get special treatment. In both cases, the editor's
decision could be based both on CODECHECK and peer-review reports.
A \textbf{post-acceptance} CODECHECK would have the smallest impact on
editorial decisions and may simply provide \textbf{extra merit} on top
of the submission's acceptance. This is the least impactful solution
in which all material is still evaluated and the results of the check
are properly acknowledged, because the check can be completed before
publication of the paper. The GIScience checks (see
below) falls into this category: by displaying a badge on the volume
and article landing pages, the AGILE conference highlights articles
whose reproducibility was confirmed. Similarly, in
collaborations with journals, some GIScience articles were
checked whilst authors worked on revisions.
A CODECHECK may also be conducted \textbf{post-publication}, though
this requires an update to the article and article metadata to
reference the check so that readers can find the CODECHECK. In
general, publishers hesitate to make such revisions to published
articles. We do not prefer this option as it has the least impact on
current publishing practices and downplays the importance
of reproducible workflows for ensuring good scientific practice.
%The timing of a CODECHECK needs connecting to typical scholarly communication process.
%While ideas exist for radically changing how researchers can share their work
% \footnote{For example Octopus, \url{https://science-octopus.org/about}, or Hypergraph, \url{https://www.libscie.org/hypergraph}.},
% % mimosa is another platform idea, but no real prototype AFAICT:
% % https://projects.invisionapp.com/share/E9Z3RKE7W3P#/screens/436978924
% % https://docs.google.com/presentation/d/1V_K8hghgnvGfEtW7TdwtowTveZyy4Qs8rMemNyHx_Dg/edit#slide=id.p
% an evolutionary approach has value, and we do not yet suggest to completely eliminating research papers as a way of sharing work.
Enhancing existing \rev{review and publication} processes with CODECHECKs allows communities to
gradually transition towards more open practices.
When integrating a CODECHECK into existing review and publication processes, the \emph{turnaround time} is crucial.
Depending on when and who conducts the check, it might be done quickly
or it might delay publication.
We found that a CODECHECK generally takes 2--5 hours, with some outliers on the higher end.
This time includes writing and publishing the certificate but excludes
actual computation time, some of which took days.
These efforts are comparable to the time needed to peer review a submission,
which aligns with the efforts some volunteer codecheckers are willing to
make.
Currently, there is considerable amount of communicating about the \rev{CODECHECK workflow}, especially regarding who publishes which document when, so that proper
cross-referencing between paper and certificate is ensured via persistent
identifiers. When integrated into a peer review platform, this
handling of documents should become much more streamlined.
\subsubsection*{Openness, or ``Who knows who?''}\label{who-knows-who}
Anonymity is broadly discussed, especially in the push towards
open peer review as part of the Open Science movement
(cf. \cite{ross-hellauer_guidelines_2019}).
Without taking a strong stance on this topic, our motivation
behind CODECHECK
for higher transparency and reproducibility does indeed favour a more open
review process.
However, anonymity can protect individuals
\cite{tennant_limitations_2020}, e.g.,~junior scientists.
The negative effects of a signed review may be reduced if a CODECHECK is not relevant for a journal's decision to accept or reject, but that is, of course, not desirable when the goal is higher transparency and reproducibility.
Instead, CODECHECK is a technical process that should generally find fixable problems; it is not aimed at giving an opinion or identifying a faulty approach.
If passing a CODECHECK becomes mandatory, full transparency may need
revisiting as the relations between authors and codecheckers would fall under the
same social and community challenges as open peer review
(cf.~\cite{everythinghertz123}).
% previous sentence inspired by https://osf.io/9cftx/
The technical nature of the check and the challenge of providing sufficient documentation is why we see great benefits in bidirectional communication between author and codechecker.
Instead of trying to
fix problems or guess the next step, the codechecker can ask the author to
rework the documentation or update code.
Instead of struggling to provide perfect instructions and as a result possibly not sharing any code or data, the author can make a best effort to document sufficiently.
Authors and readers can profit from a codecheckers' experience and approach, as during the check they may create useful and instructive files, e.g.,~a machine-readable computing environment specification.
While communication between author and codechecker may be anonymised via
the publisher, it most likely only helps to protect the identity of
the codechecker, because code is hard to anonymise.
Therefore, the most effective and desirable situation for the
stakeholders is to hold a open and collaborative CODECHECK.
The contributions by the codechecker may even be integrated into
the code of the \rev{paper's} workflow and be acknowledged as code commits. This way,
proper credit can be given within the research software development community.
\subsubsection*{Who does the CODECHECK?}\label{who-does-the-codecheck}
Just as with peer-reviewers, a potential codechecker should have the
right skills and availability to do the work. Ideally, the
codechecker has a matching code \emph{and} domain expertise to the
paper, although a well-documented \rev{analysis} should be executable by any
computationally-competent person. Naturally, the more
prerequisite knowledge the codechecker has, the quicker they can
understand the goals and mechanics of an analysis. From our
experiences, the priority should be given to matching technical
expertise first, as lacking knowledge in setting up a computing
environment with a particular language or tool is much more of a
problem than assessing the outcome, e.g.,~comparing created figures
with the original, without an in-depth understanding of the domain.
The depth of the check will mostly be driven by the time required and
expertise of the checker, though in general, we expect a CODECHECK
to consider reproducibility of the results above performance of the code.
Codecheckers could be drawn from a regular pool of
\textbf{peer-reviewers}, or from a special group of
\textbf{reproducibility reviewers} via specific roles such as
\textbf{reproducibility editors}, or \textbf{editorial staff} with a
publisher. One codechecker is sufficient to verify the \rev{paper's} workflow since
it is mostly a factual process. Code usually harbours systematic and
repeatable mistakes and is thereby more reliable and auditable than
processes controlled by humans \cite{tibav:42484}, e.g.,~in a
laboratory. If however publication of the paper depends on the
CODECHECK, a second opinion may be required.
We also see a great opportunity to involve \emph{early-career
researchers} (ECRs) as codecheckers. ECRs arguably have a high
interest in learning about new tools and technologies, to build up
their own expertise. CODECHECK offers a way for ECRs to gain insights
into new research and highlight the importance of reproduction.
\emph{ReScience X}, a journal devoted to reproduction and replication
experiments \cite{roesch_new_2020}, shares an interest in this
combination. ECRs are also often familiar with new technologies, thus
also making them likely to author CODECHECK-ready manuscripts. A
supporting data point for ECRs as early adopters is that they are
responsible for 77\% of 141 registered reports that were submitted
\cite{chambers_registered_2019}. As ECRs are introduced to peer
review as codecheckers, they may transition into the role of
peer-reviewer over time. Overall, we see several opportunities
and benefits to setting up a new process for codechecking with a clear
commitment to openness and transparency, independent of the current
peer review process (see \emph{Openness} dimension).
The codechecker could be a member of \textbf{editorial staff}; this is
the most controlled but also resource-intensive option. Such a
resource commitment would show that publishers are investing in
reproducibility, yet this commitment may be hard for small publishers.
These codecheckers could be fully integrated into the internal \rev{publication process}.
Credit for doing the codecheck is also achieved, as it is part of
their duties. By contrast, it is useful for researchers to be
publicly credited for their reviewing activity. A regular review
may be listed in public databases (e.g.,~ORCID, see Step~6 above,
or commercial offerings such as \href{https://publons.com/about/home/}{Publons}, and
\href{https://www.reviewercredits.com/}{ReviewerCredits});
a codechecker could be similarly listed.
% The number of scientific reviews is of course only a coarse indicator of an individual's community contribution, yet it is unclear if a regular reviewer, who conducts a regular scientific review and in addition also conducts a CODECHECK, should be credited twice.
% In this case, the published CODECHECK certificate and possibly the published peer review can clearly describe the possibly large efforts put into both.
The codechecker community has over 20 volunteers who signed up in the last
year, see \url{https://github.com/codecheckers/codecheckers/}. Their
motivations, mentioned in the \href{https://github.com/codecheckers/codecheckers/labels/registration}{registration
information},
include: supporting reproducible research and Open Science, improve
coding skills, gaining experience in helping scientists with their
code, encouraging a sharing culture, and learning from other people's
mistakes; many are also motivated simply by curiosity. We see
benefits to an open shared list of codecheckers across journals rather
than a private in-house group, as this may allow for better matches
regarding expertise and workload sharing.
This community can establish CODECHECK as a viable option
for independent no-cost Open Access journals.
\section*{Core principles}\label{core-principles}
The \rev{CODECHECK} workflow and variations outlined describe our current
views on how code could be checked. They are not immutable, but
we believe the following core principles underpin our CODECHECK
\rev{workflow}:
\textbf{1. Codecheckers record but don't investigate or fix.} \\
The codechecker follows the author's instructions to run the code. If
instructions are unclear, or if code does not run, the codechecker
tells the author. We believe that the job of the codechecker is not to
fix these problems but simply to report them to the author and await a
fix. The level of documentation required for third parties to
reproduce a \rev{computational} workflow is hard to get right, and too often this
uncertainty leads researchers to give up and not document it at all.
The conversation with a codechecker fixes this problem.
\textbf{2. Communication between humans is key.} \\
Some code may work without any interaction, e.g. \cite{cert-2020-013},
but often there are hidden dependencies that need adjusting for a
particular system. Allowing the codechecker to communicate directly
and openly with the author make this process as constructive as
possible; routing this conversation (possibly anonymously) through a
publisher would introduce delays and inhibit community building.
\textbf{3. Credit is given to codecheckers.} \\
The value of performing a CODECHECK is comparable to that of a peer
review, and it may require a similar amount of time. Therefore, the
codechecker's activity should be recorded, ideally in the published
paper. The public record can be realised by publishing the
certificate in a citable form (i.e.,~with a DOI), by
listing codecheckers on the journal's website or, ideally, by
publishing the checks alongside peer review activities in public
databases.
\textbf{4. \rev{Computational} workflows must be auditable.} \\
The codechecker should have sufficient material to validate the
\rev{computational} workflow outputs submitted by the
authors. Stark~\cite{stark_before_2018} calls this
``preproducibility'' and the ICERM report \cite{stodden_setting_2013}
defines the level ``Auditable Research'' similarly. Communities can
establish their own good practices or adapt generic concepts and
practical tools, such as publishing all building blocks of science in
a research compendium (cf.~\url{https://research-compendium.science/})
or ``repro-pack'' \cite{barba_praxis_2018}. A completed check means
that code could be executed at least once using the provided
instructions, and, therefore, all code and data was given and could be
investigated more deeply or extended in the future. Ideally, this is
a ``one~click'' step, but achieving this requires particular skills
and a sufficient level of documentation for third
parties. Furthermore, automation may lead to people gaming the system
or reliance on technology, which can often hide important details.
All such aspects can reduce the understandability of the material, so
we estimate our approach to codechecking, done without automation and
with open human communication, to be a simple way to ensure long-term
transparency and usefulness. We acknowledge that others have argued
in favour of bitwise reproducibility because, in the long run, it can
\rev{help to automate checking by comparing outputs algorithmically} (e.g.,
\url{https://twitter.com/khinsen/status/1242842759733665799}), but
until \rev{such an ideal is achievable} we need CODECHECK's approach.
\textbf{5. Open by default and transitional by disposition.} \\
Unless there are strong reasons to the contrary
(e.g.,~sensitive data on human subjects), all code and data, both from
author and codechecker, will be made freely available when
the certificate is published. Openness is not required for the paper
itself, to accommodate journals in their transition to
Open Access models. The code and data publication should follow
community good practices. Ultimately we may find that CODECHECK activities are
subsumed within peer review.
%% By disposition, a CODECHECK workflow should
%% be designed (a) recognising shortcomings in asserting reproducibility
%% of submissions and (b) with an understanding of the complex situation
%% that led to this. However, establishing a CODECHECK workflow should
%% mirror initiatives to improve education around reproducibility of
%% methods. The long term goal is for a specific community to reach a
%% state where the level of verification ensured by codecheckers becomes
%% part of peer review and a regular merit of all accepted papers.
\section*{Implementation}\label{implementation}
\subsection*{Register}\label{register}
To date we have created 25 certificates (Table~\ref{tab:register})
falling into three broad themes: (1) classic and current
papers from computational neuroscience, (2) COVID-19 modelling
preprints, and (3) GIScience.
The first theme was an initial set of papers used to explore the
concept of CODECHECK. The idea was to take well-known articles from a
domain of interest (Neuroscience). Our first CODECHECK (certificate
number \texttt{2020-001}) was performed before publication on an
article for the journal \emph{GigaScience}, which visusalized the
outputs from a family of supervised classification algorithms.
The second theme was a response to the COVID-19 pandemic, selecting
papers that predicted outcomes. The checks were solicited through
community interaction or by our initiative rather than requested from
journals. Some certificates were since acknowledged in the accepted
papers \cite{Davies2020-vj,kucharski_effectiveness_2020}. In
particular, we codechecked the well-known Imperial college model of UK
lockdown procedures from March 2020,
demonstrating that the model results were reproducible
\cite{Chawla2020-hi,cert-2020-010}.
The third theme represents co-author DN's service as a Reproducibility
Reviewer at the AGILE conference series, where the \emph{Reproducible
AGILE} Initiative \cite{reproducible_agile} independently
established a process for reproducing \rev{computational} workflows at the AGILE
conference series \cite{nust_improving_2020}. While using slightly
different terms and infrastructure (``reproducibility reports'' are
published on the Open Science Framework instead of certificates
on Zenodo) AGILE reproducibility reviews adhere to CODECHECK
principles. A few checks were also completed as part of peer reviews
for GIScience journals.
\begin{table}
\footnotesize
\centering
\begin{tabular}{llp{12cm}}
\toprule
\textbf{Certificate} & \textbf{Research area} & \textbf{Description} \\ \midrule
\texttt{2020-001} \cite{cert-2020-001} & Machine learning & Code for benchmarking ML classification tool checked post acceptance of manuscript and before its publication in \textit{Gigascience} \cite{Piccolo2020-lo}. \\
\texttt{2020-002} \cite{cert-2020-002} & Neuroscience & Code written for this project checked by second project member as demonstration using paper from 1997 showing unsupervised learning from natural images \cite{Hancock1992-mp}. \\
\texttt{2020-003} \cite{cert-2020-003} & Neuroscience & Code written for this project checked by second project member as demonstration using classic paper on models of associative memory \cite{Hopfield1982-mz}. \\
\texttt{2020-004} \cite{cert-2020-004} & Neuroscience & Code written for this project checked by second project member as demonstration using classic paper on cart-pole balancing problem \cite{Barto1983-rg}. \\
\texttt{2020-005} \cite{cert-2020-005} & Neuroscience & Check of independent reimplementation of spike-timing-dependent plasticity (STDP) model \cite{larisch_re_2019} conducted as demonstration for this paper. \\ % Python ANNarchy
\texttt{2020-006} \cite{cert-2020-006} & Neuroscience & Check of independent reimplementation of a generalized linear integrate-and-fire neural model \cite{detorakis_re_2017} conducted as demonstration for this paper. \\
\texttt{2020-007} \cite{cert-2020-007} & Neuroscience & Check of independent reimplementation of analysing spike patterns of neurons \cite{hathway_re_2018} conducted as demonstration for this paper. \\
\texttt{2020-008} \cite{cert-2020-008} & COVID-19 & Code for modelling of interventions on COVID-19 cases in the UK checked at preprint stage \cite{davies-preprint-2020} and later published \cite{Davies2020-vj}. \\
\texttt{2020-009} \cite{cert-2020-009} & COVID-19 & Code for analysis of effectiveness of measures to reduce transmission of SARS-CoV-2 checked as preprint \cite{kucharski-preprint-2020} and later published \cite{kucharski_effectiveness_2020}. \\ % R, check mentioned in acks, but not linked
\texttt{2020-010} \cite{cert-2020-010} & COVID-19 & Code for analysis of non-pharmaceutical interventions (Report 9) checked as a preprint \cite{ferguson_report_2020}. \\ % CovidSim
\texttt{2020-011} \cite{cert-2020-011} & COVID-19 & Code for modelling of COVID-19 spread across Europe was provided by authors and checked while paper was in press \cite{flaxman_estimating_2020}. \\
\texttt{2020-012} \cite{cert-2020-012} & COVID-19 & Code for modelling of COVID-19 spread across the USA was checked as preprint \cite{unwin_report_2020} and later published \cite{unwin_state-level_2020}. \\
\texttt{2020-013} \cite{cert-2020-013} & Neuroscience & Code for analysis of rest-activity patterns in people without con-mediated vision was checked as a preprint \cite{Spitschan2020.06.02.129502} after direct contact with the authors. \\ % MATLAB
\texttt{2020-014} \cite{cert-2020-014} & Neuroscience & Code for analysis of perturbation patterns of neural activity was checked after publication as part of publisher collaboration \cite{Sadeh2020}. \\ % MATLAB
\texttt{2020-015} \cite{cert-2020-015} & Neuroscience & Code for a neural network model for human focal seizures was checked after publication as part of publisher collaboration \cite{Liou2020}. \\ % MATLAB
\texttt{2020-016} \cite{cert-2020-016} & GIScience & Code for models demonstrating the Modifiable Aral Unit Problem (MAUP) in spatial data science \cite{Brunsdon2020} was checked during peer review. \\ % R
\texttt{2020-017} \cite{cert-2020-017} & GIScience & Code for spatial data handling, analysis, and visualisation using a variety of R packages \cite{Bivand2020} was checked after peer review before publication. \\ % R
\texttt{2020-018} \cite{cert-2020-018} & GIScience & AGILE conference reproducibility report using a demonstration data subset with cellular automaton for modeling dynamic phenomena \cite{Hojati2020}. \\ % R
\texttt{2020-019} \cite{cert-2020-019} & GIScience & AGILE conference reproducibility report with subsampled dataset for reachability analysis of suburban transportation using shared cars \cite{Illium2020}. \\
\texttt{2020-020} \cite{cert-2020-020} & GIScience & AGILE conference reproducibility report using a container for checking in-database windows operators for processing spatio-temporal data \cite{Werner2020}. \\ % Python + PostGIS
\texttt{2020-021} \cite{cert-2020-021} & GIScience & AGILE conference reproducibility report checking code for comparing supervised machine learning models for spatial nominal entity recognition \cite{Medad2020}. \\ % Python
\texttt{2020-022} \cite{cert-2020-022} & GIScience & AGILE conference reproducibility report checking code for visualising text analysis on intents and concepts from geo-analytic questions \cite{Xu2020}. \\ % Python
\texttt{2020-023} \cite{cert-2020-023} & GIScience & AGILE conference reproducibility report on analysis of spatial footprints of geo-tagged extreme weather events from social media \cite{Owuor2020}. \\
\texttt{2020-024} \cite{cert-2020-024} & Neuroscience & Code for multi-agent system for concept drift detection in electromyography \cite{vieira_driftage_2020} was checked during peer review. \\ % R
\texttt{2020-025} \cite{cert-2020-025} & GIScience & Adaptation and application of Local Indicators for Categorical Data (LICD) to archaeological data \cite{carrer_application_2021} was checked after peer review before publication. \\ % R
\\ \bottomrule
\end{tabular}
\caption{Register of completed certificates as of December 2020. An interactive version
is available at \url{http://codecheck.org.uk/register}.
}
\label{tab:register}
\end{table}
\subsection*{Annotated certificate and check metadata}\label{annotated-certificate}
After running the \rev{paper's} workflow, the codechecker writes a certificate
stating which outputs from the original article, i.e.,~numbers,
figures or tables, could be reproduced. This certificate is made
openly available so that everyone can see which
elements were reproduced and what limitations or issues were found.
The certificate links to code and data used by the codechecker,
allowing others to build on the work. The format of the certificates
evolved during the project, as we learnt to automate different
aspects of the certification. The metadata
is stored in a machine-readable structured file in YAML, the CODECHECK
configuration file \texttt{codecheck.yml}.
The technical specification of the CODECHECK configuration file is
published at \url{https://codecheck.org.uk/spec/config/latest/}. The
configuration file enables current and future automation of \rev{CODECHECK} workflows
and meta-analyses.
Figure~\ref{fig:cert} shows pages 1--4 (of 10) of an example
certificate to check predictions of COVID-19 spread across the USA
\cite{cert-2020-012,unwin_report_2020}. Figure~\ref{fig:cert}A shows
the certificate number and its DOI, which points to the certificate
and any supplemental files on Zenodo. The CODECHECK logo is added for
recognition and to denote successful reproduction.
Figure~\ref{fig:cert}B provides the key metadata extracted from
\texttt{codecheck.yml}; it names the paper that was checked (title,
DOI), the authors, the codechecker, when the check was performed, and
where code/data are available.
Figure~\ref{fig:cert}C shows a textual summary of how the CODECHECK
was performed and key findings. Figure~\ref{fig:cert}D
(page 2 of the certificate) shows the outputs that were generated
based on the MANIFEST of output files in the CODECHECK. It shows the
file name (Output), the description stating to which figure/table each
file should be compared in the original paper (Comment), and the file
size. Page 3 of the certificate, Figure~\ref{fig:cert}E gives
detailed notes from the codechecker, here documenting what steps were
needed to run the code and that the code took about 17~hours to
complete. Finally, page 4 of the certificate shows the first output
generated by the CODECHECK Figure~\ref{fig:cert}F. In this case, the
figure matched figure~4 of \cite{unwin_report_2020}. The remaining
pages of the certificate show other outputs and the computing
environment in which the certificate itself was created (not shown
here).
\begin{figure}
\centering
\includegraphics[width=\textwidth]{figs/annotate-cert-crop.pdf}
\caption{Annotated certificate \texttt{2020-012} \cite{cert-2020-012} (first four pages only).}
\label{fig:cert}
\end{figure}
\subsection*{Tools and resources}\label{tools}
We use freely available infrastructure, GitHub and Zenodo, to run our
system. The \texttt{codecheckers} \textbf{GitHub} organisation at
\url{https://github.com/codecheckers} contains projects for managing
the project website, the codecheckers community and its discussions,
code repositories, and the main register of CODECHECKs. Both the
project website \url{https://codecheck.org.uk/} and the register at
\url{https://codecheck.org.uk/register} are hosted as GitHub pages.
The register database is a single table in CSV format that connects
the certificate identifier with the repository associated with a
CODECHECK. Each of these repositories, which currently can be hosted
on GitHub or Open Science Framework, contains the CODECHECK
metadata file \texttt{codecheck.yml}. The register further contains a
column for the type of check, e.g.,~community, journal, or conference,
and the respective GitHub issue where communications and assignments
around a specific check are organised. No information is duplicated
between the register and the metadata files. The continuous
integration infrastructure of GitHub, GitHub Actions, is used to
automate generation of the register. \textbf{Zenodo} is our preferred open
repository for storing certificates. It mints DOIs for deposits and
ensures long-term availability of all digital artefacts related to the
project. The CODECHECK community on Zenodo is available at
\url{https://zenodo.org/communities/codecheck/}. It holds
certificates, the regularly archived register
\cite{codecheck_register_jan2021}, and other material related to
CODECHECK.
A \textbf{custom R package}, \texttt{codecheck}, automates
repetitive tasks around authoring certificates and managing the
register. The package is published at
\url{https://github.com/codecheckers/codecheck} under MIT license
\cite{stephen_eglen_codecheckerscodecheck_2021}.
It includes scripts to deposit certificates and related
files to Zenodo using the R package \texttt{zen4R} \cite{zen4r} and
for the register update process outlined above. Codecheckers can
ignore this package, and use their own tools for creating
and depositing the certificate. This flexibility accommodates
different skill sets and unforeseen technical advances or challenges.
These tools and resources demonstrate that a CODECHECK \rev{workflow} can be
managed on freely available platforms.
Automation of some aspects may improve turnaround time. Our main
resource requirements are the \textbf{humans} needed for managing the
project and processes and the codecheckers. All contributions
currently rely on (partly grant-based) public funding and
volunteering.
\section*{Related work}\label{related-work}
The journal \emph{ACM Transactions on Mathematical Software (TOMS)}
recently established a ``Replicated Computational Results'' (RCR)
review process \cite{heroux_editorial_2015}, where ``replicable'' is
the same as our use of ``reproducible''. Fifteen RCR Reports have
been published so far (search on \url{https://search.crossref.org/}
with the term \texttt{"Replicated Computations Results (RCR) Report"}
on 2020-12-10).
% 15: https://academic.microsoft.com/search?q=Replicated%20Computations%20Results%20%22(RCR)%20Report%22&f=&orderBy=0&skip=10&take=10
and the process is being extended extended to the ACM journal
\emph{Transactions on Modeling and Computer Simulation}.
%% (cf. \url{https://dl.acm.org/journal/tomacs/author-guidelines}).
The TOMS RCR follows CODECHECK principles 1-4, although our work was
independently developed of theirs. The TOMS editorial
\cite{heroux_editorial_2015} shares similar concerns about selection
of reviewers, as we discussed above. Unlike existing CODECHECK
certificates, the RCR reports undergo editorial review. Publication
of the RCR report recognises the efforts of the reproducing person,
while the potential for this motive to be a conflict of interest is
acknowledged. TOMS also recognises reviewer activity in a partnership
with Publons (see
\url{https://authors.acm.org/author-services/publons}). \rev{As well
as this, ACM provides several badges to indicate what kind of artifact review or reproduction a paper submitted to an ACM journal completed (\url{https://www.acm.org/publications/policies/artifact-review-and-badging-current}), but does not provide nor require a specific review process. In principle, these badges could be awarded by a codechecker, too, though the different levels and even partial replacement of artifacts required to achieve a \emph{Results Reproduced} go beyond a CODECHECK's scope. A completed check certainly warrants the ACM badge \emph{Artifacts Evaluated - Functional} and possibly \emph{Artifacts Evaluated - Reusable} and likely \emph{Artifacts Available}, depending on additional requirements by implementing journals. However, we do not require codecheckers to evaluate code quality or ensuring proper archival of artifacts though, in our experience, they are likely to encounter or comment on these topics.}
This activity in the ACM journals can be seen as one possible \rev{process}
within a CODECHECK system, and clearly shares much in spirit.
CODECHECK, however, specifically aims to give codecheckers recognition
as reviewers. In our view, the reviewer role removes the possible
conflict of interest while keeping the public acknowledgement.
Specific to the field of mathematics, the RCR is also expected to
apply a review of the software itself if the system it runs on cannot
be evaluated by an independent party. The TOMS RCR creators concur
with the importance of communication, expect collaboration between
author and RCR reviewers, share the considerations around reviewer
selection, and also put trust in reviewer judgement over numerical
bit-wise perfection. A key difference is that for TOMS RCR, authors
opt-in with an \emph{RCR Review Request} and the RCR reports are
published in the TOMS journal next to the actual papers.
Several journals provide special article types for reproductions of
published papers. \emph{Information Systems} has an invitation only
Reproducibility Section for articles describing the reproducibility
efforts of published articles, which are co-authored by the original
authors and the reproducibility reviewer(s) (see
\url{https://www.elsevier.com/journals/information-systems/0306-4379/guide-for-authors}).
\emph{Nature Machine Intelligence} recently introduced a new type of
article, the reusability report \cite{noauthor_research_2020}.
Inspired by the detailed and nuanced submissions to a reproducibility
challenge, the reusability report focuses on the exploration of
robustness and generalizability of the original paper's claims
\cite{noauthor_research_2020}. This answers the specific community's
challenges around computational reproducibility and also values these
kinds of contributions as independent publications, which goes beyond
the goals of CODECHECK. The journal \emph{Cortex} has a special
article type \emph{Verification Reports}, which are actually about
replication of results and are very well designed/reasoned
\cite{chambers_verification_2020}.
\rev{The \emph{Journal of Water Resources Planning and Management}'s
policy recognises reproducible papers in a special collection and
incentivises authors with waived or reduced fees
\cite{rosenberg_reproducible_2021}.}
In a similar vein, the CODECHECK
certificates could also be published as a special article type within
journals. \rev{Finally, the \textit{Journal of Open Source Software}
provides its reviewers with a checklist of items to check during
review,
\url{https://joss.readthedocs.io/en/latest/review_checklist.html\#software-paper},
effectively providing a much more detailed form of check for scientific
software that could complement CODECHECKs, too.}
Going beyond individual articles, the journal
\href{https://rescience.github.io}{\emph{ReScience~C}} publishes only
replications, also requiring open code and replication by a third
party. \rev{The journal now accepts ``Reproduction reports'' that
describe if some code accompanying a published article can (or can
not) reproduce the same results as shown in the article.} ReScience~C
also relies on free infrastructure (GitHub and Zenodo).
For research with high stakes, where reproduction would be too weak and
post-publication replication possibly too late because of policy impact,
Benjamin-Chung et~al. \cite{benjamin-chung_internal_2020} propose
\emph{internal replication}.
A \rev{computational} workflow that has undergone internal
replication would likely be of high quality and relatively easy to check.
Similarly, internal CODECHECKs may be used, with the same limitations such
as group think \cite{benjamin-chung_internal_2020},
to ensure reproducibility before submission.
Such internal checks are professionalised in local reproduction services,
such as \href{https://ciser.cornell.edu/research/results-reproduction-r-squared-service/}{CISER R-squared} or \href{https://isps.yale.edu/research/data/approach}{YARD},
or in communities such as \href{https://github.com/OxfordCodeReviewNet/forum}{Oxford's code review network}.
Gavis and Donoho \cite{gavish_universal_2011} propose a new discipline
and infrastructure for reproducible computational research. Their
specific packaging format, provenance record, and cryptographic
\emph{Verifiable Result Identifier} would indeed provide excellent
reproducibility. However, the system is also complex and since its
creation in 2011 we are not aware of any publisher using it; also, the
system is not open source. In comparison, CODECHECK is less powerful
but also much more flexible and less dependent on specific tools or
infrastructure. If data and code are deposited properly, i.e.,~very
unlikely to disappear, then the certificate's DOI is practically close
to the cryptographic identifier.
Another platform for publishing results of reproductions is
\href{https://scigen.report}{\emph{SciGen.Report}}. It is a
community-run independent platform to foster communication on
reproducibility. People can report on fully, partially, or
failed reproductions of articles after publication.
CODECHECK is uniquely designed to be adopted across journals or events
and to build a community of codecheckers. CODECHECK shares its
interdisciplinary nature with other community initiatives concerned
with reproducibility awareness, education, and support, such as
\href{https://reprohack.github.io/reprohack-hq/}{ReproHack},
\href{https://twitter.com/Code_Copilot}{Code Copilot}, or
\href{https://paperswithcode.com/about}{Papers with Code}.
The latter recently announced a collaboration with the preprint server
\emph{arXiv} on providing data and code supplements for machine
learning manuscripts
%\footnote{\url{https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv-ecc362883167}}
and runs a reproducibility challenge.
%\footnote{\url{https://paperswithcode.com/rc2020}}.
% Papers with Code now also does more disciplines than just ML.
Likewise, different disciplines and journals provide reproducibility
checklists, e.g.,~science and engineering \cite{rosenberg_next_2020} or
GIScience \cite{nust_agile_2019}, which naturally share some aspects
while addressing particularities as well as addressing researchers
from different fields. Regarding the education and guidance for
authors, we see CODECHECK's role as referencing and linking
educational efforts and helpful material, not as creating and
maintaining such content.
\section*{Limitations}\label{limitations}
\textbf{Isn't CODECHECK what peer review should be doing already?} On
the surface, yes, but peer reviewers are overburdened enough and
asking them to do more work around peer review is not likely to
succeed. When an editor (Tsuyoshi Miyakawa) requested raw data from
n=41 authors before reviewing, 21 authors withdrew their manuscripts;
19 of the 20 remaining articles were rejected after peer review
\cite{miyakawa_no_2020}. Such basic checks require effort from
editors, yet they only rely on the availability of data files and the
content of the paper. These availability checks can be enhanced by
having more complex CODECHECKs request the code and then execute it.
This might fall within idealistic expectations of peer review, but is
rare. Establishing a CODECHECK \rev{workflow} acknowledges that peer
reviewing practices have been unable to adapt to the challenges of
computational papers. The concept of a CODECHECK, just as the
concepts of reproducible research and Open Science, may be
transitional by nature. If the activities described here as being part
of a CODECHECK are integrated into the publication process the
initiative will have succeeded.
\textbf{Should CODECHECK requirements be more demanding?}
CODECHECK by design does not require authors to provide (and sustain)
an eternally functional \rev{computational} workflow nor suggests a specific software
stack or practical approach.
Creating something
that anyone can reproduce \href{https://twitter.com/DougBlank/status/1135904909663068165}{has been called a fool's
errand}
and we tend to agree. However, the package of data, code, and
documentation collaboratively created by authors and codecheckers is a
snapshot of a working analysis that greatly increases the likelihood
of a successful reproduction and the possibility that a \rev{computational} workflow can
be extended by third parties in the future, if they have
access to suitable resources and matching skill set.
\rev{The CODECHECK principles help to make very clear what a CODECHECK
badge on a paper means and also ensure a minimum standard that
other processes or badges may not have, e.g., only superficially
checked self-awarded badges (
\url{https://www.cambridge.org/core/journals/environmental-data-science/information/instructions-for-authors}).}
Concrete
implementations of CODECHECK workflows, especially for specific
disciplines, may reify much more helpful guidelines for authors on how
to create reproducibility packages. Our author-friendly ``low bar''
should not stay low forever, but cultural change takes time and the
encouragement and guidance that CODECHECK, as part of the widely
accepted peer review concept, can provide may eventually allow the bar
to be raised much higher, e.g.,~with executable research
compendia \cite{nust_opening_2017}, ``Whole Tales''
\cite{brinckman_computing_2018}, or continuous analysis
\cite{beaulieu-jones_reproducibility_2017-1}. However, considering
that missing artefacts and lack of documentation have repeatedly been
identified as key barriers to reproducibility
(e.g.,~\cite{stagge_assessing_2019,nust_improving_2020}), we would not
underestimate the power of a simple check. For example, ModelDB
curation policies require that only one figure need be manually reproduced
\cite{mcdougal_reproducibility_2016}, but that has not limited the
usefulness nor success of the platform.
A codechecker does not fulfil the same role as a \emph{statistical
reviewer}, as it is applied by some journals in the biomedical
domain (cf.~\cite{petrovecki_role_2009,greenwood_how_2015}). The
statistical reviewer evaluates the appropriateness of statistical
methods \cite{greenwood_how_2015} and can support topical reviewers
if, e.g.,~complex methods or sophisticated variants of statistical
tests are applied \cite{petrovecki_role_2009}. The codechecker may go
equally deep into the review, but only if they have the expertise and
time. We can imagine a tiered \rev{CODECHECK} workflow where a codechecker could, just
as a conventional reviewer could, recommend a detailed code review
(see next paragraph) to the editor if they come upon certain issues while
examining the work.
A codechecker does not conduct a \emph{code review}. Code reviews are
valuable to improve reproducibility and reusability, and their
proponents even believe they can improve the research
\cite{petre_code_2014}. Code reviews, however, have quite different
structural challenges and require even more resources. That said, a
well-reviewed codebase is likely to be easier to codecheck, and the
awareness of high-quality code raised through CODECHECK may lead to
more support for code reviewing. Initiatives and journals that
conduct software reviews independent of a specific publication or
\rev{venue} include \href{https://ropensci.org/}{ROpenSci}, \href{https://www.pyopensci.org/}{PyOpenSci}, and \href{https://joss.theoj.org}{JOSS}. Furthermore, the codechecker's task
list is intentionally not overloaded with related issues such as
ensuring proper citation of data and software or depositing material
in suitable repositories. Nevertheless, codecheckers are free to
highlight these issues.
\textbf{How are failures during checks handled?} We do not yet have a process for denoting if a reproduction fails, as our case-studies were all successful. In the case that a journal
adopts CODECHECK for all submissions, the question remains as what to
do if a check fails, after exhausting efforts between author and
codechecker to reproduce the \rev{computational} workflow. A negative comment in a
CODECHECK certificate or a failed check does not necessarily mean the
paper or research is bad (cf. discussion on negative comments in
\cite{everythinghertz123}). We doubt that publicly reporting failures
(i.e.,~the code would not run) will increase overall reproducibility,
and may prohibit authors from sharing their work, which is always more
desirable than nothing shared. Therefore, we recommend sharing interim
reproduction efforts only with the authors, even if that means that volunteer
efforts may go unnoticed if no certificate is published.
% eLife's new model (see https://osf.io/9cftx/) will publish reports eventually, e.g.,~after preprint has been accepted elsewhere
Rosenthal et~al. \cite{Rosenthal2016b} discuss such incentives for
different actors around the implementation of reproducibility. We see
CODECHECK as one way for organisations to invest in reproducibility by
creating incentives until reproducible computations become the norm.
\textbf{Who will pay for the compute time?} For papers that take
significant compute time (days, not minutes), it is unclear who will
pay for it. One must carefully consider the sustainability of
rerunning computations and the environmental impact large
calculations, such as training machine learning models, have. A
pragmatic workaround is to request that authors provide a ``toy''
example, or small dataset that can be quickly analysed to demonstrate
that the \rev{paper's} workflow runs correctly.
\textbf{What about my proprietary software and sensitive data?} Given
the prevalence of proprietary software, e.g MATLAB, in some disciplines
we pragmatically
decided that we should accept code as long as we could find a machine
with suitable licences to run it. However, this prohibits us from
using open infrastructure for reproducibility
(cf.~\cite{konkol_publishing_2020,perkel_make_2019}) and requires the
codechecker to have access to that particular software. Non-open
software also considerably hampers reuse, especially by researchers
from the global south. \rev{Likewise, if a research
requires specific hardware, e.g. GPU, we are reliant on the
codechecker having access to similar hardware. Both licenses and costs
can be barriers to a CODECHECK, but the focus on the codechecker's
assessment provides options to overcome these barriers if needed.} \rev{Therefore, allowing proprietary software and
specialised hardware are compromises that should be reconsidered.
In any case, authors must make such requirements clear and the
opportunity to answer them must be documented for codecheckers.}
Solutions for proprietary and sensitive data exist. Authors can
provide synthetic data (cf. \cite{shannon_opening_2018}), some data
can effectively be redacted \cite{oloughlin_data_2015}, and publishers
or independent entities can provide infrastructure for sharing data
and \rev{computational} workflows confidentially \cite{perignon_certify_2019} or with
access to derived results but not raw data
\cite{shannon_opening_2018}, i.e.,~data enclaves
\cite{foster_research_2018}, or domains of reproducibility
\cite{harris_more_2017}.
\textbf{Can't someone cheat?} Yes. We simply check that the code runs,
not that is correct or sound science. This ``mechanical'' test is