--- title: | A Link is not Enough – Reproducibility of Data link: https://link.springer.com/content/pdf/10.1007%2Fs13222-019-00317-8.pdf date: 2019-06-18 00:00:00 tags: [reproducible paper] description: | Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices. --- title: | Automated Documentation of End-to-End Experiments in Data Science link: https://sergred.github.io/files/phd.proposal.reds.icde.pdf date: 2019-06-11 00:00:00 tags: [reproducible paper] description: | Reproducibility plays a crucial role in experimentation. However, the modern research ecosystem and the underlying frameworks are constantly evolving and thereby making it extremely difficult to reliably reproduce scientific artifacts such as data, algorithms, trained models and visual-izations. We therefore aim to design a novel system for assisting data scientists with rigorous end-to-end documentation of data-oriented experiments. Capturing data lineage, metadata, andother artifacts helps reproducing and sharing experimental results. We summarize this challenge as automated documentation of data science experiments. We aim at reducing manualoverhead for experimenting researchers, and intend to create a novel approach in dataflow and metadata tracking based on the analysis of the experiment source code. The envisioned system will accelerate the research process in general, andenable capturing fine-grained meta information by deriving a declarative representation of data science experiments. --- title: | Open Science for Computational Science and for Computer Science link: http://oceanrep.geomar.de/46540/1/2019-05-08SotonWAIS.pdf date: 2019-06-04 00:00:00 tags: [reproducibility talk] description: | Talk on open science for computational sciences. --- title: | All models are wrong, some are useful, but are they reproducible? Commentary on Lee et al. (2019) link: https://psyarxiv.com/af6w7/ date: 2019-06-04 00:00:00 tags: [reproducible paper] description: | Lee et al. (2019) make several practical recommendations for replicable, useful cognitive modeling. They also point out that the ultimate test of the usefulness of a cognitive model is its ability to solve practical problems. In this commentary, we argue that for cognitive modeling to reach applied domains, there is a pressing need to improve the standards of transparency and reproducibility in cognitive modelling research. Solution-oriented modeling requires engaging practitioners who understand the relevant domain. We discuss mechanisms by which reproducible research can foster engagement with applied practitioners. Notably, reproducible materials provide a start point for practitioners to experiment with cognitive models and determine whether those models might be suitable for their domain of expertise. This is essential because solving complex problems requires exploring a range of modeling approaches, and there may not time to implement each possible approach from the ground up. We also note the broader benefits to reproducibility within the field. --- title: | Modeling Provenance and Understanding Reproducibility for OpenRefine Data Cleaning Workflows link: https://www.usenix.org/conference/tapp2019/presentation/mcphillips date: 2019-06-04 00:00:00 tags: [reproducibility talk] description: | Preparation of data sets for analysis is a critical component of research in many disciplines. Recording the steps taken to clean data sets is equally crucial if such research is to be transparent and results reproducible. OpenRefine is a tool for interactively cleaning data sets via a spreadsheet-like interface and for recording the sequence of operations carried out by the user. OpenRefine uses its operation history to provide an undo/redo capability that enables a user to revisit the state of the data set at any point in the data cleaning process. OpenRefine additionally allows the user to export sequences of recorded operations as recipes that can be applied later to different data sets. Although OpenRefine internally records details about every change made to a data set following data import, exported recipes do not include the initial data import step. Details related to parsing the original data files are not included. Moreover, exported recipes do not include any edits made manually to individual cells. Consequently, neither a single recipe, nor a set of recipes exported by OpenRefine, can in general represent an entire, end-to-end data preparation workflow. Here we report early results from an investigation into how the operation history recorded by OpenRefine can be used to (1) facilitate reproduction of complete, real-world data cleaning workflows; and (2) support queries and visualizations of the provenance of cleaned data sets for easy review. --- title: | The importance of standards for sharing of computational models and data link: https://psyarxiv.com/q3rnx date: 2019-05-28 00:00:00 tags: [reproducible paper] description: | The Target Article by Lee et al. (2019) highlights the ways in which ongoing concerns about research reproducibility extend to model-based approaches in cognitive science. Whereas Lee et al. focus primarily on the importance of research practices to improve model robustness, we propose that the transparent sharing of model specifications, including their inputs and outputs, is also essential to improving the reproducibility of model-based analyses. We outline an ongoing effort (within the context of the Brain Imaging Data Structure community) to develop standards for the sharing of the structure of computational models and their outputs. --- title: | A Roadmap for Computational Communication Research link: https://osf.io/preprints/socarxiv/4dhfk/ date: 2019-05-25 00:00:00 tags: [reproducible paper] description: | Computational Communication Research (CCR) is a new open access journal dedicated to publishing high quality computational research in communication science. This editorial introduction describes the role that we envision for the journal. First, we explain what computational communication science is and why a new journal is needed for this subfield. Then, we elaborate on the type of research this journal seeks to publish, and stress the need for transparent and reproducible science. The relation between theoretical development and computational analysis is discussed, and we argue for the value of null-findings and risky research in additive science. Subsequently, the (experimental) two-phase review process is described. In this process, after the first double-blind review phase, an editor can signal that they intend to publish the article conditional on satisfactory revisions. This starts the second review phase, in which authors and reviewers are no longer required to be anonymous and the authors are encouraged to publish a preprint to their article which will be linked as working paper from the journal. Finally, we introduce the four articles that, together with this Introduction, form the inaugural issue. --- title: | A response to O. Arandjelovic's critique of "The reproducibility of research and the misinterpretation of p-values" link: https://arxiv.org/ftp/arxiv/papers/1905/1905.08338.pdf date: 2019-05-25 00:00:00 tags: [reproducible paper] description: | The main criticism of my piece in ref (2) seems to be that my calculations rely on testing a point null hypothesis, i.e. the hypothesis that the true effect size is zero. He objects to my contention that the true effect size can be zero, "just give the same pill to both groups", on the grounds that two pills can't be exactly identical. He then says "I understand that this criticism may come across as frivolous semantic pedantry of no practical consequence: of course that the author meant to say 'pills with the same contents' as everybody would have understood". Yes, that is precisely how it comes across to me. I shall try to explain in more detail why I think that this criticism has little substance. --- title: | Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology link: https://psyarxiv.com/fk8vh date: 2019-05-24 00:00:00 tags: [reproducible paper] description: | Ongoing technological developments have made it easier than ever before for scientists to share their data, materials, and analysis code. Sharing data and analysis code makes it easier for other researchers to re-use or check published research. These benefits will only emerge if researchers can reproduce the analysis reported in published articles, and if data is annotated well enough so that it is clear what all variables mean. Because most researchers have not been trained in computational reproducibility, it is important to evaluate current practices to identify practices that can be improved. We examined data and code sharing, as well as computational reproducibility of the main results without contacting the original authors, for Registered Reports published in the in psychological literature between 2014 and 2018. Of the 62 articles that met our inclusion criteria data was available for 40 articles, and analysis scripts for 43 articles. For the 35 articles that shared both data and code and performed analyses in SPSS, R, or JASP, we could run the scripts for 30 articles, and reproduce the main results for 19 articles. Although the percentage of articles that shared both data and code (61%) and articles that could be computationally reproduced (54%) was relatively high compared to other studies, there is clear room for improvement. We provide practices recommendations based on our observations, and link to examples of good research practices in the papers we reproduced. --- title: | Automatic generation of provenance metadataduring execution of scientific workflows link: http://ceur-ws.org/Vol-2357/paper8.pdf date: 2019-05-14 00:00:00 tags: [reproducible paper] description: | Data processing in data intensive scientific fields likebioinformatics is automated to a great extent. Among others,automation is achieved with workflow engines that execute anexplicitly stated sequence of computations. Scientists can usethese workflows through science gateways or they develop themby their own. In both cases they may have to preprocess their raw data and also may want to further process the workflowoutput. The scientist has to take care about provenance of thewhole data processing pipeline. This is not a trivial task dueto the diverse set of computational tools and environments usedduring the transformation of raw data to the final results. Thuswe created a metadata schema to provide provenance for dataprocessing pipelines and implemented a tool that creates this metadata during the execution of typical scientific computations. --- title: | Methodological Reporting Behavior, Sample Sizes, and Statistical Power in Studies of Event- Related Potentials: Barriers to Reproducibility and Replicability link: https://psyarxiv.com/kgv9z date: 2019-05-09 00:00:00 tags: [reproducible paper] description: | Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP studies from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are only powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies. --- title: | Empirical examination of the replicability of associations between brain structure and psychological variables link: https://elifesciences.org/articles/43464 date: 2019-05-09 00:00:00 tags: [reproducible paper] description: | Linking interindividual differences in psychological phenotype to variations in brain structure is an old dream for psychology and a crucial question for cognitive neurosciences. Yet, replicability of the previously-reported ‘structural brain behavior’ (SBB)-associations has been questioned, recently. Here, we conducted an empirical investigation, assessing replicability of SBB among heathy adults. For a wide range of psychological measures, the replicability of associations with gray matter volume was assessed. Our results revealed that among healthy individuals 1) finding an association between performance at standard psychological tests and brain morphology is relatively unlikely 2) significant associations, found using an exploratory approach, have overestimated effect sizes and 3) can hardly be replicated in an independent sample. After considering factors such as sample size and comparing our findings with more replicable SBB-associations in a clinical cohort and replicable associations between brain structure and non-psychological phenotype, we discuss the potential causes and consequences of these findings. --- title: | A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks link: http://www.ic.uff.br/~leomurta/papers/pimentel2019a.pdf date: 2019-05-07 00:00:00 tags: [reproducible paper] description: | Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects andthe ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development. --- title: | Towards minimum reporting standards for life scientists link: https://osf.io/preprints/metaarxiv/9sm4x/ date: 2019-05-07 00:00:00 tags: [reproducible paper] description: | Transparency in reporting benefits scientific communication on many levels. While specific needs and expectations vary across fields, the effective use of research findings relies on the availability of core information about research materials, data, and analysis. In December 2017, a working group of journal editors and experts in reproducibility convened to create the “minimum standards” working group. This working group aims to devise a set of minimum expectations that journals could ask their authors to meet, and will draw from the collective experience of journals implementing a range of different approaches designed to enhance reporting and reproducibility (e.g. STAR Methods), existing life science checklists (e.g. the Nature Research reporting summary), and the results of recent meta-research studying the efficacy of such interventions (e.g. Macleod et al. 2017; Han et al. 2017). --- title: | Replication Redux: The Reproducibility Crisis and the Case of Deworming link: https://econpapers.repec.org/paper/wbkwbrwps/8835.htm date: 2019-05-06 00:00:00 tags: [reproducible paper] description: | In 2004, a landmark study showed that an inexpensive medication to treat parasitic worms could improve health and school attendance for millions of children in many developing countries. Eleven years later, a headline in the Guardian reported that this treatment, deworming, had been "debunked."The pronouncement followed an effort to replicate and re-analyze the original study, as well as an update to a systematic review of the effects of deworming. This story made waves amidst discussion of a reproducibility crisis in some of the social sciences. This paper explores what it means to"replicate"and"reanalyze"a study, both in general and in the specific case of deworming. The paper reviews the broader replication efforts in economics, then examines the key findings of the original deworming paper in light of the "replication," "reanalysis," and "systematic review."The paper also discusses the nature of the link between this single paper's findings, other papers' findings, and any policy recommendations about deworming. This example provides a perspective on the ways replication and reanalysis work, the strengths and weaknesses of systematic reviews, and whether there is, in fact, a reproducibility crisis in economics. --- title: | An empirical assessment of transparency and reproducibility-related research practices in the social sciences (2014-2017) link: https://osf.io/preprints/metaarxiv/6uhg5/ date: 2019-05-06 00:00:00 tags: [reproducible paper] description: | Serious concerns about research quality have catalyzed a number of reform initiatives intended to improve transparency and reproducibility and thus facilitate self-correction, increase efficiency, and enhance research credibility. Meta-research has evaluated the merits of individual initiatives; however, this may not capture broader trends reflecting the cumulative contribution of these efforts. In this study, we evaluated a broad range of indicators related to transparency and reproducibility in a random sample of 198 articles published in the social sciences between 2014 and 2017. Few articles indicated availability of materials (15/96, 16% [95% confidence interval, 9% to 23%]), protocols (0/103), raw data (8/103, 8% [2% to 15%]), or analysis scripts (3/103, 3% [1% to 6%]), and no studies were pre-registered (0/103). Some articles explicitly disclosed funding sources (or lack of; 72/179, 40% [33% to 48%]) and some declared no conflicts of interest (32/179, 18% [13% to 24%]). Replication studies were rare (2/103, 2% [0% to 4%]). Few studies were included in evidence synthesis via systematic review (6/96, 6% [3% to 11%]) or meta-analysis (2/96, 2% [0% to 4%]). Slightly less than half the articles were publicly available (95/198, 48% [41% to 55%]). Minimal adoption of transparency and reproducibility-related research practices could be undermining the credibility and efficiency of social science research. The present study establishes a baseline that can be revisited in the future to assess progress. --- title: | Brain and Behavior: Assessing reproducibility in association studies link: https://elifesciences.org/articles/46757 date: 2019-05-06 00:00:00 tags: [reproducible paper] description: | Research that links brain structure with behavior needs more data, better analyses, and more intelligent approaches. --- title: | Helping Science Succeed: The Librarian’s Role in Addressing the Reproducibility Crisis link: https://conservancy.umn.edu/handle/11299/202524 date: 2019-04-21 00:00:00 tags: [reproducibility talk] description: | Headlines and scholarly publications portray a crisis in biomedical and health sciences. In this webinar, you will learn what the crisis is and the vital role of librarians in addressing it. You will see how you can directly and immediately support reproducible and rigorous research using your expertise and your library services. You will explore reproducibility guidelines and recommendations and develop an action plan for engaging researchers and stakeholders at your institution. #MLAReproducibility Learning Outcomes By the end of this webinar, participants will be able to: describe the basic history of the “reproducibility crisis” and define reproducibility and replicability explain why librarians have a key role in addressing concerns about reproducibility, specifically in terms of the packaging of science explain 3-4 areas where librarians can immediately and directly support reproducible research through existing expertise and services start developing an action plan to engage researchers and stakeholders at their institution about how they will help address research reproducibility and rigor Audience Librarians who work with researchers; librarians who teach, conduct, or assist with evidence-synthesis or critical appraisal, and managers and directors who are interested in allocating resources toward supporting research rigor. No prior knowledge or skills required. Basic knowledge of scholarly research and publishing helpful. Recording ($) is available here: www.medlib-ed.org/products/2069/helping-science-succeed-the-librarians-role-in-addressing-the-reproducibility-crisis-recording --- title: | Practical open science: tools and techniques for improving the reproducibility and transparency of your research link: https://acmi.microbiologyresearch.org/content/journal/acmi/10.1099/acmi.ac2019.po0446 date: 2019-04-21 00:00:00 tags: [reproducibility report] description: | Science progresses through critical evaluation of underlying evidence and independent replication of results. However, most research findings are disseminated without access to supporting raw data, and findings are not routinely replicated. Furthermore, undisclosed flexibility in data analysis, such as incomplete reporting, unclear exclusion criteria, and optional stopping rules allow for presenting exploratory research findings using the tools of confirmatory hypothesis testing. These questionable research practices make results more publishable, though it comes at the expense of their credibility and future replicability. The Center for Open Science builds tools and encourages practices that incentivizes work that is not only good for the scientist, but also good for science. These include open source platforms to organize research, archive results, preregister analyses, and disseminate findings. This poster presents an overview of those practices and gives practical advice for researchers who want to increase the rigor of their practices. --- title: | Promoting and supporting credibility in neuroscience link: https://journals.sagepub.com/doi/full/10.1177/2398212819844167 date: 2019-04-18 00:00:00 tags: [reproducibility report] description: | Over the coming years, a core objective of the BNA is to promote and support credibility in neuroscience, facilitating a cultural shift away from ‘publish or perish’ towards one which is best for neuroscience, neuroscientists, policymakers and the public. Among many of our credibility activities, we will lead by example by ensuring that our journal, Brain and Neuroscience Advances, exemplifies scientific practices that aim to improve the reproducibility, replicability and reliability of neuroscience research. To support these practices, we are implementing some of the Transparency and Openness Promotion (TOP) guidelines, including badges for open data, open materials and preregistered studies. The journal also offers the Registered Report (RR) article format. In this editorial, we describe our expectations for articles submitted to Brain and Neuroscience Advances. --- title: | Open and Reproducible Research on Open Science Framework link: https://currentprotocols.onlinelibrary.wiley.com/doi/full/10.1002/cpet.32 date: 2019-04-16 00:00:00 tags: [reproducible paper] description: | By implementing more transparent research practices, authors have the opportunity to stand out and showcase work that is more reproducible, easier to build upon, and more credible. Scientists gain by making work easier to share and maintain within their own laboratories, and the scientific community gains by making underlying data or research materials more available for confirmation or making new discoveries. The following protocol gives authors step‐by‐step instructions for using the free and open source Open Science Framework (OSF) to create a data management plan, preregister their study, use version control, share data and other research materials, or post a preprint for quick and easy dissemination. --- title: | Rigor, Reproducibility, and Responsibility: A Quantum of Solace link: https://www.cmghjournal.org/article/S2352-345X(19)30032-3/pdf date: 2019-04-15 00:00:00 tags: [reproducible paper] description: | Lack of reproducibility in biomedical science is aserious and growing issue. Two publications, in 2011 and 2012, along with other analyses, documented failures to replicate key findings and other fundamental flaws in high-visibility research articles. This triggered action among funding bodies, journals, and other change-agents. Here, I examine well-recognized and underrecognized factors that contribute to experimental failure andsuggest individual and community approaches that can be used to attack these factors and eschew the SPECTRE of irreproducibility. --- title: | Encouraging Reproducibility in Scientific Research of the Internet link: http://drops.dagstuhl.de/opus/volltexte/2019/10347/pdf/dagrep_v008_i010_p041_18412.pdf date: 2019-04-09 00:00:00 tags: [reproducible paper] description: | Reproducibility of research in Computer Science (CS) and in the field of networking in particularis a well-recognized problem. For several reasons, including the sensitive and/or proprietarynature of some Internet measurements, the networking research community pays limited attentionto the of reproducibility of results, instead tending to accept papers that appear plausible.This article summarises a 2.5 day long Dagstuhl seminar on Encouraging Reproducibility inScientific Research of the Internet held in October 2018. The seminar discussed challenges toimproving reproducibility of scientific Internet research, and developed a set of recommendationsthat we as a community can undertake to initiate a cultural change toward reproducibility ofour work. It brought together people both from academia and industry to set expectations andformulate concrete recommendations for reproducible research. This iteration of the seminar wasscoped to computer networking research, although the outcomes are likely relevant for a broaderaudience from multiple interdisciplinary fields. --- title: | Data Repositories For Research Reproducibility link: https://augusta.openrepository.com/bitstream/handle/10675.2/622251/Data%20Repositories%20For%20Reproducibility.pdf?sequence=1 date: 2019-04-09 00:00:00 tags: [reproducibility talk] description: | A presentation that gives an overview of data reproducibility, data reproducibility components and challenges, data reproducibility initiatives, data journals and repositories, university library resources, all within the scope of the health sciences, social sciences, and the arts and humanities disciplines. --- title: | The battle for reproducibility over storytelling link: https://psyarxiv.com/shryx/ date: 2019-03-25 00:00:00 tags: [reproducible paper] description: | This issue of Cortex plays host to a lively debate about the reliability of cognitive neuroscience research. Across seven Discussion Forum pieces, scientists representing a range of backgrounds and career levels reflect on whether the "reproducibility crisis" – or "credibility revolution" (Vazire, 2018; Munafò et al., 2017) – that has achieved such prominence in psychology has extended into cognitive neuroscience. If so, they ask, what is the underlying cause and how can we solve it? --- title: | Study on automatic citation screening in systematic reviews: reporting, reproducibility and complexity link: http://eprints.keele.ac.uk/6073/1/OlorisadePhD2019.pdf date: 2019-03-25 00:00:00 tags: [reproducible paper] description: | Research into text mining based tool support for citation screening in systematic reviews is growing. The field has not experienced much independent validation. It is anticipated that more transparency in studies will increase reproducibility and in-depth understanding leading to the maturation of the field. The citation screen tool presented aims to support research transparency, reproducibility and timely evolution of sustainable tools. --- title: | Successes and struggles with computational reproducibility: Lessons from the Fragile Families Challenge link: https://osf.io/preprints/socarxiv/g3pdb/ date: 2019-03-21 00:00:00 tags: [reproducible paper] description: | Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results in a published paper using the original author's raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this paper, we describe our approach to enabling computational reproducibility for the 12 papers in this special issue of Socius about the Fragile Families Challenge. Our approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools enabled us to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on our successes and struggles, we conclude with recommendations to authors and journals. --- title: | From the Wet Lab to the Web Lab: A Paradigm Shift in Brain Imaging Research link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6405692/ date: 2019-03-19 00:00:00 tags: [reproducible paper] description: | Web technology has transformed our lives, and has led to a paradigm shift in the computational sciences. As the neuroimaging informatics research community amasses large datasets to answer complex neuroscience questions, we find that the web is the best medium to facilitate novel insights by way of improved collaboration and communication. Here, we review the landscape of web technologies used in neuroimaging research, and discuss future applications, areas for improvement, and the limitations of using web technology in research. Fully incorporating web technology in our research lifecycle requires not only technical skill, but a widespread culture change; a shift from the small, focused "wet lab" to a multidisciplinary and largely collaborative "web lab." --- title: | Designing for Reproducibility: A Qualitative Study of Challenges and Opportunities in High Energy Physics link: https://arxiv.org/pdf/1903.05875.pdf date: 2019-03-19 00:00:00 tags: [reproducible paper] description: | Reproducibility should be a cornerstone of scientific research and is a growing concern among the scientific community and the public. Understanding how to design services and tools that support documentation, preservation and sharing is required to maximize the positive impact of scientific research. We conducted a study of user attitudes towards systems that support data preservation in High Energy Physics, one of science's most data-intensive branches. We report on our interview study with 12 experimental physicists, studying requirements and opportunities in designing for research preservation and reproducibility. Our findings suggest that we need to design for motivation and benefits in order to stimulate contributions and to address the observed scalability challenge. Therefore, researchers' attitudes towards communication, uncertainty, collaboration and automation need to be reflected in design. Based on our findings, we present a systematic view of user needs and constraints that define the design space of systems supporting reproducible practices. --- title: | A demonstration of modularity, reuse, reproducibility, portability and scalability for modeling and simulation of cardiac electrophysiology using Kepler Workflows link: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006856 date: 2019-03-13 00:00:00 tags: [reproducible paper] description: | Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing. While there are good examples if the use of scientific workflows in bioinformatics, medical informatics, biomedical imaging and data analysis, there are fewer examples in multi-scale computational modeling in general and cardiac electrophysiology in particular. Cardiac electrophysiology simulation is a mature area of multi-scale computational biology that serves as an excellent use case for developing and testing new scientific workflows. In this article, we develop, describe and test a computational workflow that serves as a proof of concept of a platform for the robust integration and implementation of a reusable and reproducible multi-scale cardiac cell and tissue model that is expandable, modular and portable. The workflow described leverages Python and Kepler-Python actor for plotting and pre/post-processing. During all stages of the workflow design, we rely on freely available open-source tools, to make our workflow freely usable by scientists. --- title: | Reproducibility and Reuse of Experiments in eScience: Workflows, Ontologies and Scripts link: http://repositorio.unicamp.br/bitstream/REPOSIP/333317/1/Carvalho_LucasAugustoMontalvaoCosta_D.pdf date: 2019-03-03 00:00:00 tags: [reproducible paper] description: | Scripts and Scientific Workflow Management Systems (SWfMSs) are common approachesthat have been used to automate the execution flow of processes and data analysis in scien-tific (computational) experiments. Although widely used in many disciplines, scripts arehard to understand, adapt, reuse, and reproduce. For this reason, several solutions havebeen proposed to aid experiment reproducibility for script-based environments. However,they neither allow to fully document the experiment nor do they help when third partieswant to reuse just part of the code. SWfMSs, on the other hand, help documentationand reuse by supporting scientists in the design and execution of their experiments, whichare specified and run as interconnected (reusable) workflow components (a.k.a. buildingblocks). While workflows are better than scripts for understandability and reuse, they stillrequire additional documentation. During experiment design, scientists frequently createworkflow variants, e.g., by changing workflow components. Reuse and reproducibilityrequire understanding and tracking variant provenance, a time-consuming task. This the-sis aims to support reproducibility and reuse of computational experiments. To meetthese challenges, we address two research problems: (1) understanding a computationalexperiment, and (2) extending a computational experiment. Our work towards solvingthese problems led us to choose workflows and ontologies to answer both problems. Themain contributions of this thesis are thus: (i) to present the requirements for the con-version of script to reproducible research; (ii) to propose a methodology that guides thescientists through the process of conversion of script-based experiments into reproducibleworkflow research objects; (iii) to design and implement features for quality assessmentof computational experiments; (iv) to design and implement W2Share, a framework tosupport the conversion methodology, which exploits tools and standards that have beendeveloped by the scientific community to promote reuse and reproducibility; (v) to designand implement OntoSoft-VFF, a framework for capturing information about software andworkflow components to support scientists manage workflow exploration and evolution.Our work is showcased via use cases in Molecular Dynamics, Bioinformatics and WeatherForecasting --- title: | Assessing data availability and research reproducibility in hydrology and water resources link: https://www.nature.com/articles/sdata201930 date: 2019-03-03 00:00:00 tags: [reproducible paper] description: | There is broad interest to improve the reproducibility of published research. We developed a survey tool to assess the availability of digital research artifacts published alongside peer-reviewed journal articles (e.g. data, models, code, directions for use) and reproducibility of article results. We used the tool to assess 360 of the 1,989 articles published by six hydrology and water resources journals in 2017. Like studies from other fields, we reproduced results for only a small fraction of articles (1.6% of tested articles) using their available artifacts. We estimated, with 95% confidence, that results might be reproduced for only 0.6% to 6.8% of all 1,989 articles. Unlike prior studies, the survey tool identified key bottlenecks to making work more reproducible. Bottlenecks include: only some digital artifacts available (44% of articles), no directions (89%), or all artifacts available but results not reproducible (5%). The tool (or extensions) can help authors, journals, funders, and institutions to self-assess manuscripts, provide feedback to improve reproducibility, and recognize and reward reproducible articles as examples for others. --- title: | Big Idea for a Big Challenge: Influencing Reproducibility on an Institutional Scale link: https://dc.uthsc.edu/cgi/viewcontent.cgi?article=1003&context=scmla date: 2019-02-19 00:00:00 tags: [reproducibility talk] description: | Presentation about the reproducibility initiative at the University of Utah. --- title: | The ReproRubric: Evaluation Criteria for the Reproducibility of Computational Analyses link: https://osf.io/thvef/ date: 2019-02-19 00:00:00 tags: [reproducible paper] description: | The computational reproducibility of analytic results has been discussed and evaluated in many different scientific disciplines, all of which have one finding in common: analytic results are far too often not reproducible. There are numerous examples of reproducibility guidelines for various applications, however, a comprehensive assessment tool for evaluating the individual components of the research pipeline was unavailable. To address this need, COS developed the ReproRubric, which defines multiple Tiers of reproducibility based on criteria established for each critical stage of the typical research workflow - from initial design of the experiment through final reporting of the results. --- title: | How to increase reproducibility and transparency in your research link: https://blogs.egu.eu/geolog/2019/02/01/reproducibility-and-transparency-in-research/ date: 2019-02-09 00:00:00 tags: [news article] description: | Contemporary science faces many challenges in publishing results that are reproducible. This is due to increased usage of data and digital technologies as well as heightened demands for scholarly communication. These challenges have led to widespread calls for more research transparency, accessibility, and reproducibility from the science community. This article presents current findings and solutions to these problems, including recent new software that makes writing submission-ready manuscripts for journals of Copernicus Publications a lot easier. --- title: | Facilitating Replication and Reproducibility in Team Science: The 'projects' R Package link: https://www.biorxiv.org/content/biorxiv/early/2019/02/04/540542.full.pdf1 date: 2019-02-09 00:00:00 tags: [reproducible paper] description: | The contemporary scientific community places a growing emphasis on the reproducibility of research. The projects R package is a free, open-source endeavor created in the interest of facilitating reproducible research workflows. It adds to existing software tools for reproducible research and introduces several practical features that are helpful for scientists and their collaborative research teams. For each individual project, it supplies an intuitive framework for storing raw and cleaned study data sets, and provides script templates for protocol creation, data cleaning, data analysis and manuscript development. Internal databases of project and author information are generated and displayed, and manuscript title pages containing author lists and their affiliations are automatically generated from the internal database. File management tools allow teams to organize multiple projects. When used on a shared file system, multiple researchers can harmoniously contribute to the same project in a less punctuated manner, reducing the frequency of misunderstandings and the need for status updates. --- title: | The Brazilian Reproducibility Initiative link: https://cdn.elifesciences.org/articles/41602/elife-41602-v1.pdf date: 2019-02-09 00:00:00 tags: [reproducible paper] description: | Most efforts to estimate the reproducibility of published findings have focused on specific areas of research, even though science is usually assessed and funded on a regional or national basis. Here we describe a project to assess the reproducibility of findings in biomedical science published by researchers based in Brazil. The Brazilian Reproducibility Initiative is a systematic, multi-center effort to repeatbetween 60 and 100 experiments: theproject will focus on a set of common laboratory methods, repeating each experiment in three different laboratories. The results, due in 2021, will allow us to estimate the level of reproducibility of biomedical sciencein Brazil, and to investigate what the published literature can tell us about the reproducibility ofresearch in a given area. --- title: | The reproducibility crisis in the age of digital medicine link: https://www.nature.com/articles/s41746-019-0079-z date: 2019-02-02 00:00:00 tags: [news article] description: | As databases of medical information are growing, the cost of analyzing data is falling, and computer scientists, engineers, and investment are flooding into the field, digital medicine is subject to increasingly hyperbolic claims. Every week brings news of advances: superior algorithms that can predict clinical events and disease trajectory, classify images better than humans, translate clinical texts, and generate sensational discoveries around new risk factors and treatment effects. Yet the excitement about digital medicine—along with the technologies like the ones that enable a million people to watch a major event—poses risks for its robustness. How many of those new findings, in other words, are likely to be reproducible? --- title: | Ten simple rules on how to create open access and reproducible molecular simulations of biological systems link: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006649 date: 2019-01-22 00:00:00 tags: [reproducible paper] description: | These 10 simple rules should not be limited to molecular dynamics but also include Monte Carlo simulations, quantum mechanics calculations, molecular docking, and any other computational methods involving computations on biological molecules. --- title: | The Reproducibility of Economics Research: A Case Study link: https://hautahi.com/static/docs/Replication_aejae.pdf date: 2019-01-22 00:00:00 tags: [reproducible paper] description: | Published reproductions or replications of economics research are rare. However, recent years have seen increased recognition of the important role of replication in the scientific endeavor. We describe and present the results of a large reproduction exercise in which we assess the reproducibility of research articles published in the American Economic Journal: Applied Economics over the last decade. 69 of 162 eligible replication attempts successfuly replicated the article’s analysis 42.6%. A further 68 (42%) were at least partially successful. A total of 98 out of 303 (32.3%) relied on confidential or proprietary data, and were thus not reproducible by this project. We also conduct several bibliometric analyses of reproducible vs. non-reproducible articles. --- title: | The Dagstuhl Beginners Guide to Reproducibility for Experimental Networking Research link: https://vaibhavbajpai.com/documents/papers/proceedings/dagstuhl-reproducibility-guidelines-ccr-2019.pdf date: 2019-01-22 00:00:00 tags: [reproducible paper] description: | Reproducibility is one of the key characteristics of good science, but hard to achieve for experimental disciplines like Internet measurements and networked systems. This guide provides advice to researchers, particularly those new to the field, on designing experiments so that their work is more likely to be reproducible and to serve as a foundation for follow-on work by others. --- title: | Is it Safe to Dockerize my Database Benchmark? link: https://dbermbach.github.io/publications/2019-sac-dads.pdf date: 2019-01-22 00:00:00 tags: [reproducible paper] description: | Docker seems to be an attractive solution for cloud database benchmarking as it simplifies the setup process through pre-built images that are portable and simple to maintain. However, the usage of Docker for benchmarking is only valid if there is no effect on measurement results. Existing work has so far only focused on the performance overheads that Docker directly induces for specific applications. In this paper, we have studied indirect effects of dockerization on the results of database benchmarking. Among others, our results clearly show that containerization has a measurable and non-constant influence on measurement results and should, hence, only be used after careful analysis. --- title: | A Reaction Norm Perspective on Reproducibility link: https://www.biorxiv.org/content/biorxiv/early/2019/01/07/510941.full.pdf date: 2019-01-15 00:00:00 tags: [reproducible paper] description: | Reproducibility in biomedical research, and more specifically in preclinical animal research, has been seriously questioned. Several cases of spectacular failures to replicate findings published in the primary scientific literature have led to a perceived reproducibility crisis. Diverse threats to reproducibility have been proposed, including lack of scientific rigour, low statistical power, publication bias, analytical flexibility and fraud. An important aspect that is generally overlooked is the lack of external validity caused by rigorous standardization of both the animals and the environment. Here, we argue that a reaction norm approach to pheno- typic variation, acknowledging gene-by-environment interactions, can help us seeing reproducibility of animal experiments in a new light. We illustrate how dominating environmental effects can affect inference and effect size estimates of studies and how elimination of dominant factors through standardization affects the nature of the expected phenotype variation. We do this by introducing a construct that we dubbed the reaction norm of small effects. Finally, we discuss the consequences of a reaction norm of small effects for statistical analysis, specifically for random effect latent variable models and the random lab model. --- title: | The Costs of Reproducibility link: https://www.sciencedirect.com/science/article/pii/S0896627318310390 date: 2019-01-08 00:00:00 tags: [reproducible paper] description: | Improving the reproducibility of neuroscience research is of great concern, especially to early-career researchers (ECRs). Here I outline the potential costs for ECRs in adopting practices to improve reproducibility. I highlight the ways in which ECRs can achieve their career goals while doing better science and the need for established researchers to support them in these efforts. --- title: | Towards an Open (Data) Science Analytics-Hub for Reproducible Multi-Model Climate Analysis at Scale link: https://github.com/osbd/osbd-2018/blob/master/proceedings/S09213_5108.pdf date: 2019-01-03 00:00:00 tags: [reproducible paper] description: | Open Science is key to future scientific research and promotes a deep transformation in the whole scientific research process encouraging the adoption of transparent and collaborative scientific approaches aimed at knowledge sharing. Open Science is increasingly gaining attention in the current and future research agenda worldwide. To effectively address Open Science goals, besides Open Access to results and data, it is also paramount to provide tools or environments to support the whole research process, in particular the design, execution and sharing of transparent and reproducible experiments, including data provenance (or lineage) tracking. This work introduces the Climate Analytics-Hub, a new component on top of the Earth System Grid Federation (ESGF), which joins big data approaches and parallel computing paradigms to provide an Open Science environment for reproducible multi-model climate change data analytics experiments at scale. An operational implementation has been set up at the SuperComputing Centre of the Euro-Mediterranean Center on Climate Change, with the main goal of becoming a reference Open Science hub in the climate community regarding the multi-model analysis based on the Coupled Model Intercomparison Project (CMIP). --- title: | A Practical Roadmap for Provenance Capture and Data Analysis in Spark-based Scientific Workflows link: https://sc18.supercomputing.org/proceedings/workshops/workshop_files/ws_works111s2-file1.pdf date: 2018-12-29 00:00:00 tags: [reproducible paper] description: | Whenever high-performance computing applications meet data-intensive scalable systems, an attractive approach is the use of Apache Spark for the management of scientific workflows. Spark provides several advantages such as being widely supported and granting efficient in-memory data management for large-scale applications. However, Spark still lacks support for data tracking and workflow provenance. Additionally, Spark’s memory management requires accessing all data movements between the workflow activities. Therefore, the running of legacy programs on Spark is interpreted as a "black-box" activity, which prevents the capture and analysis of implicit data movements. Here, we present SAMbA, an Apache Spark extension for the gathering of prospective and retrospective provenance and domain data within distributed scientific workflows. Our approach relies on enveloping both RDD structure and data contents at runtime so that (i) RDD-enclosure consumed and produced data are captured and registered by SAMbA in a structured way, and (ii) provenance data can be queried during and after the execution of scientific workflows. By following the W3C PROV representation, we model the roles of RDD regarding prospective and retrospective provenance data. Our solution provides mechanisms for the capture and storage of provenance data without jeopardizing Spark’s performance. The provenance retrieval capabilities of our proposal are evaluated in a practical case study, in which data analytics are provided by several SAMbA parameterizations. --- title: | Everything Matters: The ReproNim Perspective on Reproducible Neuroimaging link: https://osf.io/u78a6/ date: 2018-12-29 00:00:00 tags: [reproducible paper] description: | There has been a recent major upsurge in the concerns about reproducibility in many areas of science. Within the neuroimaging domain, one approach is to promote reproducibility is to target the re-executability of the publication. The information supporting such re-executability can enable the detailed examination of how an initial finding generalizes across changes in the processing approach, and sampled population, in a controlled scientific fashion. ReproNim: A Center for Reproducible Neuroimaging Computation is a recently funded initiative that seeks to facilitate the ‘last mile’ implementations of core re-executability tools in order to reduce the accessibility barrier and increase adoption of standards and best practices at the neuroimaging research laboratory level. In this report, we summarize the overall approach and tools we have developed in this domain. --- title: | Semantic workflows for benchmark challenges:Enhancing comparability, reusability and reproducibility link: https://psb.stanford.edu/psb-online/proceedings/psb19/srivastava.pdf date: 2018-12-17 00:00:00 tags: [reproducible paper] description: | WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selectionby reasoning aboutdata characteristics. --- title: | Analysis validation has been neglected in the Age of Reproducibility link: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000070 date: 2018-12-17 00:00:00 tags: [reproducible paper] description: | Increasingly complex statistical models are being used for the analysis of biological data. Recent commentary has focused on the ability to compute the same outcome for a given dataset (reproducibility). We argue that a reproducible statistical analysis is not necessarily valid because of unique patterns of nonindependence in every biological dataset. We advocate that analyses should be evaluated with known-truth simulations that capture biological reality, a process we call “analysis validation.” We review the process of validation and suggest criteria that a validation project should meet. We find that different fields of science have historically failed to meet all criteria, and we suggest ways to implement meaningful validation in training and practice. --- title: | Papers with code. Sorted by stars. Updated weekly. link: https://github.com/zziz/pwc date: 2018-12-11 00:00:00 tags: [reproducibility talk] description: | A long list of papers with link to the code that supports the papers. --- title: | "I want to do the right thing but..." link: https://www3.ntu.edu.sg/conference/RIC/Slides/6C_SHAPES_CBmE%20slides_I_want_to_do_the_right_thing_but....pdf date: 2018-12-11 00:00:00 tags: [reproducibility talk] description: | Presentation from Singapore Meeting on Research Integrity Reproducibility: Research integrity but much, much more. --- title: | Improving Quality, Reproducibility, and Usability of FRET‐Based Tension Sensors link: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cyto.a.23688 date: 2018-12-11 00:00:00 tags: [reproducible paper] description: | Mechanobiology, the study of how mechanical forces affect cellular behavior, is an emerging field of study that has garnered broad and significant interest. Researchers are currently seeking to better understand how mechanical signals are transmitted, detected, and integrated at a subcellular level. One tool for addressing these questions is a Förster resonance energy transfer (FRET)‐based tension sensor, which enables the measurement of molecular‐scale forces across proteins based on changes in emitted light. However, the reliability and reproducibility of measurements made with these sensors has not been thoroughly examined. To address these concerns, we developed numerical methods that improve the accuracy of measurements made using sensitized emission‐based imaging. To establish that FRET‐based tension sensors are versatile tools that provide consistent measurements, we used these methods, and demonstrated that a vinculin tension sensor is unperturbed by cell fixation, permeabilization, and immunolabeling. This suggests FRET‐based tension sensors could be coupled with a variety of immuno‐fluorescent labeling techniques. Additionally, as tension sensors are frequently employed in complex biological samples where large experimental repeats may be challenging, we examined how sample size affects the uncertainty of FRET measurements. In total, this work establishes guidelines to improve FRET‐based tension sensor measurements, validate novel implementations of these sensors, and ensure that results are precise and reproducible. --- title: | Science in hand: how art and craft can boost reproducibility link: https://www.nature.com/articles/d41586-018-07676-4 date: 2018-12-10 00:00:00 tags: [news article] description: | We — a surgeon, a research nurse and a synthetic chemist — looked beyond science to discover how people steeped in artistic skills might help to close this 'haptic gap', the deficit in skills of touch and object manipulation. We have found that craftspeople and performers can work fruitfully alongside scientists to address some of the challenges. We have also discovered striking similarities between the observational skills of an entomologist and an analytical chemist; the dexterity of a jeweller and a microsurgeon; the bodily awareness of a dancer and a space scientist; and the creative skills of a scientific glassblower, a reconstructive surgeon, a potter and a chef. --- title: | Data Pallets: Containerizing Storage For Reproducibility and Traceability link: https://arxiv.org/abs/1811.04740 date: 2018-11-18 00:00:00 tags: [reproducible paper] description: | Trusting simulation output is crucial for Sandia's mission objectives. We rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated "sandbox" and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call \emph{data pallets}. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it. --- title: | A Model-Centric Analysis of Openness, Replication, and Reproducibility link: https://arxiv.org/abs/1811.04525 date: 2018-11-14 00:00:00 tags: [reproducible paper] description: | The literature on the reproducibility crisis presents several putative causes for the proliferation of irreproducible results, including HARKing, p-hacking and publication bias. Without a theory of reproducibility, however, it is difficult to determine whether these putative causes can explain most irreproducible results. Drawing from an historically informed conception of science that is open and collaborative, we identify the components of an idealized experiment and analyze these components as a precursor to develop such a theory. Openness, we suggest, has long been intuitively proposed as a solution to irreproducibility. However, this intuition has not been validated in a theoretical framework. Our concern is that the under-theorizing of these concepts can lead to flawed inferences about the (in)validity of experimental results or integrity of individual scientists. We use probabilistic arguments and examine how openness of experimental components relates to reproducibility of results. We show that there are some impediments to obtaining reproducible results that precede many of the causes often cited in literature on the reproducibility crisis. For example, even if erroneous practices such as HARKing, p-hacking, and publication bias were absent at the individual and system level, reproducibility may still not be guaranteed. --- title: | Reproducible Publications at AGILE Conferences link: https://agile-online.org/agile-actions/current-initiatives/reproducible-publications-at-agile-conferences date: 2018-11-09 00:00:00 tags: [reproducibility conference, reproducibility guidelines] description: | The council of the Association of Geographic Information Laboratories in Europe (AGILE) provides funding to support a new AGILE initiative. Reproducible Publications at AGILE Conferences" will develop protocols for publishing reproducible research in AGILE conference publications. The aim is to support and improve the way we describe our science and to enhance the usefulness of AGILE conference publications to the wider community. The potential benefits of this include greater research transparency, enhanced citations of published articles and increased relevance of the conference in the field. The funding will support a workshop attended by domain experts to develop author and reviewer guidelines that will be presented at the AGILE 2019 conference. The initiative members are Daniel Nüst (Institute for Geoinformatics, University of Münster, Münster, Germany), Frank Ostermann (Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands), Rusne Sileryte (Faculty of Architecture and the Built Environment, Delft University of Technology, Delft, The Netherlands), Carlos Granell (Institute of New Imaging Technologies, Universitat Jaume I de Castellón, Castellón, Spain), and Barbara Hofer (Interfaculty Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria)." --- title: | Conducting Replication Studies With Confidence link: https://www.healio.com/nursing/journals/jne/2018-11-57-11/%7B304888b9-6291-4a92-b4e4-2d5910e85110%7D/conducting-replication-studies-with-confidence date: 2018-11-08 00:00:00 tags: [reproducible paper] description: | Although essential to the development of a robust evidence base for nurse educators, the concepts of replication and reproducibility have received little attention in the nursing education literature. In this Methodology Corner installment, the concepts of study replication and reproducibility are explored in depth. In designing, conducting, and documenting the findings of studies in nursing education, researchers are encouraged to make design choices that improve study replicability and reproducibility of study findings. [J Nurs Educ. 2018;57(11):638–640.] There has been considerable discussion in the professional literature about questionable research practices that raise doubt about the credibility of research findings (Shrout & Rodgers, 2018) and that limit reproducibility of research findings (Shepherd, Peratikos, Rebeiro, Duda, & McCowan, 2017). This discussion has led to what scientists term as a replication crisis (Goodman, Fanelli, & Ioannidis, 2016). Although investigators in various disciplines have provided suggestions to address this crisis (Alvarez, Key, & Núñez, 2018; Goodman et al., 2016; Shrout & Rodgers, 2018), similar discussions or reports of replication within nursing education literature are limited, despite a call for replication studies (Morin, 2016). Consequently, the focus of this article is on replication and reproducibility. The topic is important, given that the hallmark of good science is being able to replicate or reproduce findings (Morin, 2016). Replication serves to provide “stability in our knowledge of nature” (Schmidt, 2009, p. 92). --- title: | Software to improve transfer and reproducibility of cell culture methods link: https://www.future-science.com/doi/pdf/10.2144/btn-2018-0062 date: 2018-11-08 00:00:00 tags: [reproducible paper] description: | Cell culture is a vital component of laboratories throughout the scientifi c community, yet the absence of standardized protocols and documentation practice challenges laboratory effi ciency and scientific reproducibility. We examined the effectiveness of a cloud-based software application, CultureTrax ® as a tool for standardizing and transferring a complex cell culture protocol. The software workfl ow and template were used to electronically format a cardiomyocyte differentiation protocol and share a digitally executable copy with a different lab user. While the protocol was unfamiliar to the recipient, they executed the experiment by solely using CultureTrax and successfully derived cardiomyocytes from human induced pluripotent stem cells. This software tool significantly reduced the time and resources required to effectively transfer and implement a novel protocol. --- title: | Practical Data Curation for Reproducibility link: https://osf.io/preprints/lissa/fp8cd date: 2018-11-08 00:00:00 tags: [reproducibility talk] description: | This presentation will review incentives for researchers to engage in reproducibility and data sharing practices and offer practical solutions for metadata, file handling, preservation, and licensing issues. It will focus on pragmatic motivations and methods for integrating reproducibility concepts into existing processes. --- title: | Replicability or reproducibility? On the replication crisis in computational neuroscience and sharing only relevant detail link: https://link.springer.com/article/10.1007/s10827-018-0702-z date: 2018-11-01 00:00:00 tags: [reproducible paper] description: | Replicability and reproducibility of computational models has been somewhat understudied by “the replication movement.” In this paper, we draw on methodological studies into the replicability of psychological experiments and on the mechanistic account of explanation to analyze the functions of model replications and model reproductions in computational neuroscience. We contend that model replicability, or independent researchers' ability to obtain the same output using original code and data, and model reproducibility, or independent researchers' ability to recreate a model without original code, serve different functions and fail for different reasons. This means that measures designed to improve model replicability may not enhance (and, in some cases, may actually damage) model reproducibility. We claim that although both are undesirable, low model reproducibility poses more of a threat to long-term scientific progress than low model replicability. In our opinion, low model reproducibility stems mostly from authors' omitting to provide crucial information in scientific papers and we stress that sharing all computer code and data is not a solution. Reports of computational studies should remain selective and include all and only relevant bits of code. --- title: | To Clean or Not to Clean: Document Preprocessing and Reproducibility link: https://dl.acm.org/citation.cfm?id=3242180 date: 2018-11-01 00:00:00 tags: [reproducible paper] description: | Web document collections such as WT10G, GOV2, and ClueWeb are widely used for text retrieval experiments. Documents in these collections contain a fair amount of non-content-related markup in the form of tags, hyperlinks, and so on. Published articles that use these corpora generally do not provide specific details about how this markup information is handled during indexing. However, this question turns out to be important: Through experiments, we find that including or excluding metadata in the index can produce significantly different results with standard IR models. More importantly, the effect varies across models and collections. For example, metadata filtering is found to be generally beneficial when using BM25, or language modeling with Dirichlet smoothing, but can significantly reduce retrieval effectiveness if language modeling is used with Jelinek-Mercer smoothing. We also observe that, in general, the performance differences become more noticeable as the amount of metadata in the test collections increase. Given this variability, we believe that the details of document preprocessing are significant from the point of view of reproducibility. In a second set of experiments, we also study the effect of preprocessing on query expansion using RM3. In this case, once again, we find that it is generally better to remove markup before using documents for query expansion. --- title: | The Rule of Two: A Star Wars Edict or a Method of Reproducibility and Quality? link: https://www.preprints.org/manuscript/201810.0357/v1 date: 2018-10-23 00:00:00 tags: [reproducible paper] description: | In recent years, biomedical research has faced increased scrutiny over issues related to reproducibility and quality in scientific findings(1-3). In response to this scrutiny, funding institutions and journals have implemented top-down policies for grant and manuscript review. While a positive step forward, the long-term merit of these policies is questionable given their emphasis on completing a check-list of items instead of a fundamental re-assessment of how scientific investigation is conducted. Moreover, the top-down style of management used to institute these policies can be argued as being ineffective in engaging the scientific workforce to act upon these issues. To meet current and future biomedical needs, new investigative methods that emphasize collective-thinking, teamwork, shared knowledge and cultivate change from the bottom-up are warranted. Here, a perspective on a new approach to biomedical investigation within the individual laboratory that emphasizes collaboration and quality is discussed. --- title: | Editorial: Revised Guidelines to Enhance the Rigor and Reproducibility of Research Published in American Physiological Society Journals link: https://www.physiology.org/doi/pdf/10.1152/ajpregu.00274.2018 date: 2018-10-23 00:00:00 tags: [news article] description: | A challenge in modern research is the common inability to repeat novel findings published in even the most “impact-heavy” journals. In the great majority of instances, this may simply be due to a failure of the published manuscripts to include—and the publisher to require— comprehensive information on experimental design, methods, reagents, or the in vitro and in vivo systems under study. Failure to accurately reproduce all environmental influences on an experiment, particularly those using animals, also contributes to inability to repeat novel findings. The most common reason for failures of reproducibility may well bein the rigor and transparency with which methodology is described by authors. Another reason may be the reluctance by more established investigators to break with traditional methods of data presentation. However, one size does not fit all when it comes to data presentation, particularly because of the wide variety of data formats presented in individual disciplines represented by journals. Thus, some flexibility needs to be allowed. The American Physiological Society (APS) has made available guidelines for transparent reporting that it recommends all authors follow(https://www.physiology.org/author-info.promoting-transparent-reporting) (https://www.physiology.org/author-info.experimental-details-to-report). These are just some of the efforts being made to facilitate the communication of discovery in a transparent manner, which complement what has been a strength of the discipline for many years—the ability of the scientists and scientific literature to self-correct (8). --- title: | Teaching Computational Reproducibility for Neuroimaging link: https://www.frontiersin.org/articles/10.3389/fnins.2018.00727/full date: 2018-10-23 00:00:00 tags: [reproducible paper] description: | We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of specific neuroimaging software packages. Our experience suggests that this combination leaves the student with a superficial understanding of the underlying ideas, and an informal, inefficient, and inaccurate approach to analysis. To address these problems, we based our course around a substantial open-ended group project. This allowed us to teach: (a) computational tools to ensure computationally reproducible work, such as the Unix command line, structured code, version control, automated testing, and code review and (b) a clear understanding of the statistical techniques used for a basic analysis of a single run in an MR scanner. The emphasis we put on the group project showed the importance of standard computational tools for accuracy, efficiency, and collaboration. The projects were broadly successful in engaging students in working reproducibly on real scientific questions. We propose that a course on this model should be the foundation for future programs in neuroimaging. We believe it will also serve as a model for teaching efficient and reproducible research in other fields of computational science. --- title: | Experimental deception: Science, performance, and reproducibility link: https://psyarxiv.com/93p45/ date: 2018-10-23 00:00:00 tags: [reproducible paper] description: | Experimental deception has not been seriously examined in terms of its impact on reproducible science. I demonstrate, using data from the Open Science Collaboration’s Reproducibility Project (2015), that experiments involving deception have a higher probability of not replicating and have smaller effect sizes compared to experiments that do not have deception procedures. This trend is possibly due to missing information about the context and performance of agents in the studies in which the original effects were generated, leading to either compromised internal validity, or an incomplete specification and control of variables in replication studies. Of special interest are the mechanisms by which deceptions are implemented and how these present challenges for the efficient transmission of critical information from experimenter to participant. I rehearse possible frameworks that might form the basis of a future research program on experimental deception and make some recommendations as to how such a program might be initiated. --- title: | University researchers push for better research methods link: http://www.mndaily.com/article/2018/10/n-university-researchers-push-for-better-research-methods date: 2018-10-16 00:00:00 tags: [popular media] description: | Faculty members and graduate students at the University of Minnesota have formed a workshop to hold discussions about reproducibility in research studies. The discussions come during a national movement to replicate research in social science fields, such as psychology. The movement has shown many previous studies are not reliable. After discussions last spring regarding ways the University can address these research practices, the Minnesota Center for Philosophy of Science designed workshops for faculty and students to discuss ways to develop replicable research methods. --- title: | Open Science as Better Gatekeepers for Science and Society: A Perspective from Neurolaw link: https://psyarxiv.com/8dr23/ date: 2018-10-22 00:00:00 tags: [reproducible paper] description: | Results from cognitive neuroscience have been cited as evidence in courtrooms around the world, and their admissibility has been a challenge for the legal system. Unfortunately, the recent reproducibility crisis in cognitive neuroscience, showing that the published studies in cognitive neuroscience may not be as trustworthy as expected, has made the situation worse. Here we analysed how the irreproducible results in cognitive neuroscience literature could compromise the standards for admissibility of scientific evidence, and pointed out how the open science movement may help to alleviate these problems. We conclude that open science not only benefits the scientific community but also the legal system, and society in a broad sense. Therefore, we suggest both scientists and practitioners follow open science recommendations and uphold the best available standards in order to serve as good gatekeepers in their own fields. Moreover, scientists and practitioners should collaborate closely to maintain an effective functioning of the entire gatekeeping system of the law. --- title: | Responding to the growing issue of research reproducibility link: https://avs.scitation.org/doi/pdf/10.1116/1.5049141 date: 2018-10-16 00:00:00 tags: [reproducible paper] description: | An increasing number of studies, surveys, and editorials highlight experimental and computational reproducibility and replication issues that appear to pervade most areas of modern science. This perspective examines some of the multiple and complex causes of what has been called a "reproducibility crisis," which can impact materials, interface/(bio)interphase, and vacuum sciences. Reproducibility issues are not new to science, but they are now appearing in new forms requiring innovative solutions. Drivers include the increasingly multidiscipline, multimethod nature of much advanced science, increased complexity of the problems and systems being addressed, and the large amounts and multiple types of experimental and computational data being collected and analyzed in many studies. Sustained efforts are needed to address the causes of reproducibility problems that can hinder the rate of scientific progress and lower public and political regard for science. The initial efforts of the American Vacuum Society to raise awareness of a new generation of reproducibility challenges and provide tools to help address them serve as examples of mitigating actions that can be undertaken. --- title: | Building a Reproducible Machine Learning Pipeline link: https://arxiv.org/abs/1810.04570v1 date: 2018-10-16 00:00:00 tags: [reproducible paper] description: | Reproducibility of modeling is a problem that exists for any machine learning practitioner, whether in industry or academia. The consequences of an irreproducible model can include significant financial costs, lost time, and even loss of personal reputation (if results prove unable to be replicated). This paper will first discuss the problems we have encountered while building a variety of machine learning models, and subsequently describe the framework we built to tackle the problem of model reproducibility. The framework is comprised of four main components (data, feature, scoring, and evaluation layers), which are themselves comprised of well defined transformations. This enables us to not only exactly replicate a model, but also to reuse the transformations across different models. As a result, the platform has dramatically increased the speed of both offline and online experimentation while also ensuring model reproducibility. --- title: | Reference environments: A universal tool for reproducibility in computational biology link: https://arxiv.org/pdf/1810.03766.pdf date: 2018-10-15 00:00:00 tags: [reproducible paper] description: | The drive for reproducibility in the computational sciences has provoked discussion and effort across a broad range of perspectives: technological, legislative/policy, education, and publishing. Discussion on these topics is not new, but the need to adopt standards for reproducibility of claims made based on computational results is now clear to researchers, publishers and policymakers alike. Many technologies exist to support and promote reproduction of computational results: containerisation tools like Docker, literate programming approaches such as Sweave, knitr, iPython or cloud environments like Amazon Web Services. But these technologies are tied to specific programming languages (e.g. Sweave/knitr to R; iPython to Python) or to platforms (e.g. Docker for 64-bit Linux environments only). To date, no single approach is able to span the broad range of technologies and platforms represented in computational biology and biotechnology. To enable reproducibility across computational biology, we demonstrate an approach and provide a set of tools that is suitable for all computational work and is not tied to a particular programming language or platform. We present published examples from a series of papers in different areas of computational biology, spanning the major languages and technologies in the field (Python/R/MATLAB/Fortran/C/Java). Our approach produces a transparent and flexible process for replication and recomputation of results. Ultimately, its most valuable aspect is the decoupling of methods in computational biology from their implementation. Separating the 'how' (method) of a publication from the 'where' (implementation) promotes genuinely open science and benefits the scientific community as a whole. --- title: | The Brazilian Reproducibility Initiative: a systematic assessment of Brazilian biomedical science link: https://osf.io/ahf7t/ date: 2018-10-08 00:00:00 tags: [reproducible paper] description: | With concerns over research reproducibility on the rise, systematic replications of published science have become an important tool to estimate the replicability of findings in specific areas. Nevertheless, such initiatives are still uncommon in biomedical science, and have never been performed at a national level. The Brazilian Reproducibility Initiative is a multicenter, systematic effort to assess the reproducibility of the country’s biomedical research by replicating between 50 and 100 experiments from Brazilian life sciences articles. The project will focus on a set of common laboratory methods, performing each experiment in multiple institutions across the country, with the reproducibility of published findings analyzed in the light of interlaboratory variability. The results, due in 2021, will allow us not only to estimate the reproducibility of Brazilian biomedical science, but also to investigate if there are aspects of the published literature that can be used to predict it. --- title: | Towards Reproducible and Reusable Deep Learning Systems Research Artifacts link: https://openreview.net/pdf?id=Ske-2Gyk9X date: 2018-10-08 00:00:00 tags: [reproducible paper] description: | This paper discusses results and insights from the 1st ReQuEST workshop, a collective effort to promote reusability, portability and reproducibility of deep learning research artifacts within the Architecture/PL/Systems communities. ReQuEST (Reproducible Quality-Efficient Systems Tournament) exploits the open-source. Collective Knowledge framework (CK) to unify benchmarking, optimization, and co-design of deep learning systems implementations and exchange results via a live multi-objective scoreboard. Systems evaluated under ReQuEST are diverse and include an FPGA-based accelerator, optimized deep learning libraries for x86 and ARM systems, and distributed inference in Amazon Cloud and over a cluster of Raspberry Pis. We finally discuss limitations to our approach, and how we plan improve upon those limitations for the upcoming SysML artifact evaluation effort. --- title: | Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering link: https://arxiv.org/abs/1809.10139 date: 2018-10-03 00:00:00 tags: [reproducible paper] description: | Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss. --- title: | A Serverless Tool for Platform Agnostic Computational Experiment Management link: https://arxiv.org/abs/1809.07693v1 date: 2018-09-25 00:00:00 tags: [reproducible paper, reproducibility infrastructure] description: | Neuroscience has been carried into the domain of big data and high performance computing (HPC) on the backs of initiatives in data collection and an increasingly compute-intensive tools. While managing HPC experiments requires considerable technical acumen, platforms and standards have been developed to ease this burden on scientists. While web-portals make resources widely accessible, data organizations such as the Brain Imaging Data Structure and tool description languages such as Boutiques provide researchers with a foothold to tackle these problems using their own datasets, pipelines, and environments. While these standards lower the barrier to adoption of HPC and cloud systems for neuroscience applications, they still require the consolidation of disparate domain-specific knowledge. We present Clowdr, a lightweight tool to launch experiments on HPC systems and clouds, record rich execution records, and enable the accessible sharing of experimental summaries and results. Clowdr uniquely sits between web platforms and bare-metal applications for experiment management by preserving the flexibility of do-it-yourself solutions while providing a low barrier for developing, deploying and disseminating neuroscientific analysis. --- title: | Exploration of reproducibility issues in scientometric research link: https://openaccess.leidenuniv.nl/bitstream/handle/1887/65315/STI2018_paper_113.pdf?sequence=1 date: 2018-09-21 00:00:00 tags: [reproducible paper] description: | In scientometrics, we have not yet had an intensive debate about the reproducibility of research published in our field, although concerns about a lack of reproducibility have occasionally surfaced (see e.g. Glänzel & Schöpflin 1994 and Van den Besselaar et al. 2017), and the need to improve the reproducibility is used as an important argument for open citation data (see www.issi-society.org/open-citations-letter/). We initiated a first discussion about reproducibility in scientometrics with a workshop at ISSI 2017 in Wuhan. One of the outcomes was the sense that scientific fields differ with regard to the type and pervasiveness of threats to the reproducibility of their published research, last but not least due to their differences in modes of knowledge production, such as confirmatory versus exploratory study designs, and differences in methods and empirical objects. --- title: | Reproducibility and Replicability in a Fast-paced Methodological World link: https://psyarxiv.com/cnq4d/ date: 2018-09-21 00:00:00 tags: [reproducible paper] description: | Methodological developments and software implementations progress in increasingly faster time-frames. The introduction and widespread acceptance of pre-print archived reports and open-source software make state-of-the-art statistical methods readily accessible to researchers. At the same time, researchers more and more emphasize that their results should be reproducible (using the same data obtaining the same results), which is a basic requirement for assessing the replicability (obtaining similar results in new data) of results. While the age of fast-paced methodology greatly facilitates reproducibility, it also undermines it in ways not often realized by researchers. The goal of this paper is to make researchers aware of these caveats. I discuss sources of limited replicability and reproducibility in both the development of novel statistical methods and their implementation in software routines. Novel methodology comes with many researcher degrees of freedom, and new understanding comes with changing standards over time. In software-development, reproducibility may be impacted due to software developing and changing over time, a problem that is greatly magnified by large dependency-trees between software-packages. The paper concludes with a list of recommendations for both developers and users of new methods to improve reproducibility of results. --- title: | Simple changes of individual studies can improve the reproducibility of the biomedical scientific process as a whole link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0202762 date: 2018-09-11 00:00:00 tags: [reproducible paper] description: | We developed a new probabilistic model to assess the impact of recommendations rectifying the reproducibility crisis (by publishing both positive and 'negative' results and increasing statistical power) on competing objectives, such as discovering causal relationships, avoiding publishing false positive results, and reducing resource consumption. In contrast to recent publications our model quantifies the impact of each single suggestion not only for an individual study but especially their relation and consequences for the overall scientific process. We can prove that higher-powered experiments can save resources in the overall research process without generating excess false positives. The better the quality of the pre-study information and its exploitation, the more likely this beneficial effect is to occur. Additionally, we quantify the adverse effects of both neglecting good practices in the design and conduct of hypotheses-based research, and the omission of the publication of 'negative' findings. Our contribution is a plea for adherence to or reinforcement of the good scientific practice and publication of 'negative' findings. --- title: | Issues in Reproducible Simulation Research link: https://link.springer.com/article/10.1007/s11538-018-0496-1 date: 2018-09-11 00:00:00 tags: [reproducible paper] description: | In recent years, serious concerns have arisen about reproducibility in science. Estimates of the cost of irreproducible preclinical studies range from 28 billion USD per year in the USA alone (Freedman et al. in PLoS Biol 13(6):e1002165, 2015) to over 200 billion USD per year worldwide (Chalmers and Glasziou in Lancet 374:86–89, 2009). The situation in the social sciences is not very different: Reproducibility in psychological research, for example, has been estimated to be below 50% as well (Open Science Collaboration in Science 349:6251, 2015). Less well studied is the issue of reproducibility of simulation research. A few replication studies of agent-based models, however, suggest the problem for computational modeling may be more severe than for laboratory experiments (Willensky and Rand in JASSS 10(4):2, 2007; Donkin et al. in Environ Model Softw 92:142–151, 2017; Bajracharya and Duboz in: Proceedings of the symposium on theory of modeling and simulation—DEVS integrative M&S symposium, pp 6–11, 2013). In this perspective, we discuss problems of reproducibility in agent-based simulations of life and social science problems, drawing on best practices research in computer science and in wet-lab experiment design and execution to suggest some ways to improve simulation research practice. --- title: | Reproducibility study of a PDEVS model application to fire spreading link: https://dl.acm.org/citation.cfm?id=3275411 date: 2018-09-11 00:00:00 tags: [reproducible paper,reproducibility infrastructure] description: | The results of a scientific experiment have to be reproduced to be valid. The scientific method is well known in experimental sciences but it is not always the case for computer scientists. Recent publications and studies has shown that there is a significant reproducibility crisis in Biology and Medicine. This problem has also been demonstrated for hundreds of publications in computer science where only a limited set of publication results could be reproduced. In this paper we present the reproducibility challenge and we examine the reproducibility of a Parallel Discrete Event System Specification (PDEVS) model with two different execution frameworks. --- title: | Classification of Provenance Triples for Scientific Reproducibility: A Comparative Evaluation of Deep Learning Models in the ProvCaRe Project link: https://link.springer.com/chapter/10.1007/978-3-319-98379-0_3 date: 2018-09-11 00:00:00 tags: [reproducible paper,reproducibility infrastructure] description: | Scientific reproducibility is key to the advancement of science as researchers can build on sound and validated results to design new research studies. However, recent studies in biomedical research have highlighted key challenges in scientific reproducibility as more than 70% of researchers in a survey of more than 1500 participants were not able to reproduce results from other groups and 50% of researchers were not able to reproduce their own experiments. Provenance metadata is a key component of scientific reproducibility and as part of the Provenance for Clinical and Health Research (ProvCaRe) project, we have: (1) identified and modeled important provenance terms associated with a biomedical research study in the S3 model (formalized in the ProvCaRe ontology); (2) developed a new natural language processing (NLP) workflow to identify and extract provenance metadata from published articles describing biomedical research studies; and (3) developed the ProvCaRe knowledge repository to enable users to query and explore provenance of research studies using the S3 model. However, a key challenge in this project is the automated classification of provenance metadata extracted by the NLP workflow according to the S3 model and its subsequent querying in the ProvCaRe knowledge repository. In this paper, we describe the development and comparative evaluation of deep learning techniques for multi-class classification of structured provenance metadata extracted from biomedical literature using 12 different categories of provenance terms represented in the S3 model. We describe the application of the Long Term Short Memory (LSTM) network, which has the highest classification accuracy of 86% in our evaluation, to classify more than 48 million provenance triples in the ProvCaRe knowledge repository (available at: https://provcare.case.edu/). --- title: | The reproducibility opportunity link: https://www.nature.com/articles/s41562-018-0398-0 date: 2018-09-04 00:00:00 tags: [news article] description: | It is important for research users to know how likely it is that reported research findings are true. The Social Science Replication Project finds that, in highly powered experiments, only 13 of 21 high-profile reports could be replicated. Investigating the factors that contribute to reliable results offers new opportunities for the social sciences. --- title: | Scientists Only Able to Reproduce Results for 13 out of 21 Human Behavior Studies link: https://gizmodo.com/scientists-only-able-to-reproduce-results-for-13-out-of-1828632222 date: 2018-08-29 00:00:00 tags: [news article] description: | If the results in a published study can’t be replicated in subsequent experiments, how can you trust what you read in scientific journals? One international group of researchers is well aware of this reproducibility crisis, and has been striving to hold scientists accountable. For their most recent test, they attempted to reproduce 21 studies from two of the top scientific journals, Science and Nature, that were published between 2010 and 2015. Only 13 of the reproductions produced the same results as the original study. --- title: | Practical guidelines for rigor and reproducibility in preclinical and clinical studies on cardioprotection link: https://link.springer.com/article/10.1007/s00395-018-0696-8 date: 2018-08-21 00:00:00 tags: [reproducible paper] description: | We refer to the recent guidelines for experimental models of myocardial ischemia and infarction [279], and aim to provide now practical guidelines to ensure rigor and reproducibility in preclinical and clinical studies on cardioprotection. In line with the above guidelines [279], we define rigor as standardized state-of-the-art design, conduct and reporting of a study, which is then a prerequisite for reproducibility, i.e. replication of results by another laboratory when performing exactly the same experiment. --- title: | Editorial: Data repositories, registries, and standards in the search for valid and reproducible biomarkers link: https://onlinelibrary.wiley.com/doi/abs/10.1111/jcpp.12962 date: 2018-08-21 00:00:00 tags: [reproducible paper] description: | The paucity of major scientific breakthroughs leading to new or improved treatments, and the inability to identify valid and reproducible biomarkers that improve clinical management, has produced a crisis in confidence in the validity of our pathogenic theories and the reproducibility of our research findings. This crisis in turn has driven changes in standards for research methodologies and prompted calls for the creation of open‐access data repositories and the preregistration of research hypotheses. Although we should embrace the creation of repositories and registries, and the promise for greater statistical power, reproducibility, and generalizability of research findings they afford, we should also recognize that they alone are no substitute for sound design in minimizing study confounds, and they are no guarantor of faith in the validity of our pathogenic theories, findings, and biomarkers. One way, and maybe the only sure way, of knowing that we have a valid understanding of brain processes and disease mechanisms in human studies is by experimentally manipulating variables and predicting its effects on outcome measures and biomarkers. --- title: | ReproServer: Making Reproducibility Easier and Less Intensive link: https://arxiv.org/abs/1808.01406 date: 2018-08-12 00:00:00 tags: [reproducible paper, ReproZip] description: | Reproducibility in the computational sciences has been stymied because of the complex and rapidly changing computational environments in which modern research takes place. While many will espouse reproducibility as a value, the challenge of making it happen (both for themselves and testing the reproducibility of others' work) often outweigh the benefits. There have been a few reproducibility solutions designed and implemented by the community. In particular, the authors are contributors to ReproZip, a tool to enable computational reproducibility by tracing and bundling together research in the environment in which it takes place (e.g. one's computer or server). In this white paper, we introduce a tool for unpacking ReproZip bundles in the cloud, ReproServer. ReproServer takes an uploaded ReproZip bundle (.rpz file) or a link to a ReproZip bundle, and users can then unpack them in the cloud via their browser, allowing them to reproduce colleagues' work without having to install anything locally. This will help lower the barrier to reproducing others' work, which will aid reviewers in verifying the claims made in papers and reusing previously published research. --- title: | Building research evidence towards reproducibility of animal research link: http://blogs.plos.org/everyone/2018/08/06/arrive-rct date: 2018-08-07 00:00:00 tags: [popular news] description: | Since our debut in late 2006, PLOS ONE has strived to promote best practices in research reporting as a way to improve reproducibility in research. We have supported initiatives towards increased transparency, as well as the gathering of evidence that can inform improvements in the quality of reporting in research articles. In line with this commitment, PLOS ONE collaborated in a randomized controlled trial (RCT) to test the impact of an intervention asking authors to complete a reporting checklist at the time of manuscript submission. The results from this trial have recently been posted on bioRxiv (1) and provide a further step toward building the necessary evidence base to inform editorial interventions towards improving reporting quality. --- title: | Open Science Badges in the Journal of Neurochemistry link: https://onlinelibrary.wiley.com/doi/full/10.1111/jnc.14536 date: 2018-08-07 00:00:00 tags: [reproducible paper] description: | The Open Science Framework (OSF) has the mission to increase openness, integrity, and reproducibility in research. The Journal of Neurochemistry became a signatory of their Transparency and Openness guidelines in 2016, which provides eight modular standards (Citation standards, Data Transparency, Analytic Methods/Code Transparency, Research Materials Transparency, Design and Analysis Transparency, Study Pre‐registration, Analysis Plan Transparency, Replication) with increasing levels of stringency. Furthermore, OSF recommends and offers a collection of practices intended to make scientific processes and results more transparent and available in a standardized way for reuse to people outside the research team. It includes making research materials, data, and laboratory procedures freely accessible online to anyone. This editorial announces the decision of the Journal of Neurochemistry to introduce Open Science Badges, maintained by the Open Science Badges Committee and by the Center for Open Science (COS). The Open Science Badges, visual icons placed on publications, certify that an open practice was followed and signal to readers that an author has shared the corresponding research evidence, thus, allowing an independent researcher to understand how to reproduce the procedure. --- title: | Assessment of the impact of shared brain imaging data on the scientific literature link: https://europepmc.org/abstract/med/30026557 date: 2018-07-26 00:00:00 tags: [reproducible paper] description: | Data sharing is increasingly recommended as a means of accelerating science by facilitating collaboration, transparency, and reproducibility. While few oppose data sharing philosophically, a range of barriers deter most researchers from implementing it in practice. To justify the significant effort required for sharing data, funding agencies, institutions, and investigators need clear evidence of benefit. Here, using the International Neuroimaging Data-sharing Initiative, we present a case study that provides direct evidence of the impact of open sharing on brain imaging data use and resulting peer-reviewed publications. We demonstrate that openly shared data can increase the scale of scientific studies conducted by data contributors, and can recruit scientists from a broader range of disciplines. These findings dispel the myth that scientific findings using shared data cannot be published in high-impact journals, suggest the transformative power of data sharing for accelerating science, and underscore the need for implementing data sharing universally. --- title: | The State of Sustainable Research Software: Results from the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) link: https://arxiv.org/pdf/1807.07387.pdf date: 2018-07-24 00:00:00 tags: [reproducibility report] description: | This article summarizes motivations, organization, and activities of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) held in Manchester, UK in September 2017. The WSSSPE series promotes sustainable research software by positively impacting principles and best practices, careers, learning, and credit. This article discusses the Code of Conduct, idea papers, position papers, experience papers, demos, and lightning talks presented during the workshop. The main part of the article discusses the speed-blogging groups that formed during the meeting, along with the outputs of those sessions. --- title: | The value of universally available raw NMR data for transparency, reproducibility, and integrity in natural product research† link: https://pubs.rsc.org/en/content/articlehtml/2018/np/c7np00064b date: 2018-07-19 00:00:00 tags: [reproducible paper] description: | With contributions from the global natural product (NP) research community, and continuing the Raw Data Initiative, this review collects a comprehensive demonstration of the immense scientific value of disseminating raw nuclear magnetic resonance (NMR) data, independently of, and in parallel with, classical publishing outlets. A comprehensive compilation of historic to present-day cases as well as contemporary and future applications show that addressing the urgent need for a repository of publicly accessible raw NMR data has the potential to transform natural products (NPs) and associated fields of chemical and biomedical research. The call for advancing open sharing mechanisms for raw data is intended to enhance the transparency of experimental protocols, augment the reproducibility of reported outcomes, including biological studies, become a regular component of responsible research, and thereby enrich the integrity of NP research and related fields. --- title: | Preserving Workflow Reproducibility: The RePlay-DH Client as a Tool for Process Documentation link: http://www.lrec-conf.org/proceedings/lrec2018/pdf/707.pdf date: 2018-07-17 00:00:00 tags: [reproducible paper] description: | In this paper we present a software tool for elicitation and management of process metadata. It follows our previously published design idea of an assistant for researchers that aims at minimizing the additional effort required for producing a sustainable workflow documentation. With the ever-growing number of linguistic resources available, it also becomes increasingly important to provide proper documentation to make them comparable and to allow meaningful evaluations for specific use cases. The often prevailing practice of post hoc documentation of resource generation or research processes bears the risk of information loss. Not only does detailed documentation of a process aid in achieving reproducibility, it also increases usefulness of the documented work for others as a cornerstone of good scientific practice. Time pressure together with the lack of simple documentation methods leads to workflow documentation in practice being an arduous and often neglected task. Our tool ensures a clean documentation for common workflows in natural language processing and digital humanities. Additionally, it can easily be integrated into existing institutional infrastructures. --- title: | Writing Empirical Articles: Transparency, Reproducibility, Clarity, and Memorability link: http://journals.sagepub.com/doi/abs/10.1177/2515245918754485 date: 2018-07-17 00:00:00 tags: [reproducible paper] description: | This article provides recommendations for writing empirical journal articles that enable transparency, reproducibility, clarity, and memorability. Recommendations for transparency include preregistering methods, hypotheses, and analyses; submitting registered reports; distinguishing confirmation from exploration; and showing your warts. Recommendations for reproducibility include documenting methods and results fully and cohesively, by taking advantage of open-science tools, and citing sources responsibly. Recommendations for clarity include writing short paragraphs, composed of short sentences; writing comprehensive abstracts; and seeking feedback from a naive audience. Recommendations for memorability include writing narratively; embracing the hourglass shape of empirical articles; beginning articles with a hook; and synthesizing, rather than Mad Libbing, previous literature. --- title: | A Guide to Reproducibility in Preclinical Research link: https://europepmc.org/abstract/med/29995667 date: 2018-07-17 00:00:00 tags: [reproducible paper] description: | Many have raised concerns about the reproducibility of biomedical research. In this Perspective, the authors address this "reproducibility crisis" by distilling discussions around reproducibility into a simple guide to facilitate understanding of the topic.Reproducibility applies both within and across studies. The following questions address reproducibility within studies: "Within a study, if the investigator repeats the data management and analysis will she get an identical answer?" and "Within a study, if someone else starts with the same raw data, will she draw a similar conclusion?" Contrastingly, the following questions address reproducibility across studies: "If someone else tries to repeat an experiment as exactly as possible, will she draw a similar conclusion?" and "If someone else tries to perform a similar study, will she draw a similar conclusion?"Many elements of reproducibility from clinical trials can be applied to preclinical research (e.g., changing the culture of preclinical research to focus more on transparency and rigor). For investigators, steps toward improving reproducibility include specifying data analysis plans ahead of time to decrease selective reporting, more explicit data management and analysis protocols, and increasingly detailed experimental protocols, which allow others to repeat experiments. Additionally, senior investigators should take greater ownership of the details of their research (e.g., implementing active laboratory management practices, such as random audits of raw data [or at least reduced reliance on data summaries], more hands-on time overseeing experiments, and encouraging a healthy skepticism from all contributors). These actions will support a culture where rigor + transparency = reproducibility.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. --- title: | Institutional Framework and Responsibilities: Facing Open Science’s challenges and assuring quality of research link: https://www.repository.cam.ac.uk/handle/1810/276203 date: 2018-07-13 00:00:00 tags: [reproducibility talk] description: | This presentation to LERU workshop: Nurturing a Culture of Responsible Research in the Era of Open Science considered the issue of the credibility of science being in question in a 'post-truth' world and how reproducibility is adding to the problem. Open Science offers a solution, but it is not easy to implement, particularly by research institutions. The main issues relate to language used in the open space, that solutions look different to different disciplines, that researchers are often feeling "under siege" and that we need to reward good open practice. --- title: | Reproducible science: What, why, how link: https://digital.csic.es/handle/10261/145975 date: 2018-07-13 00:00:00 tags: [reproducible paper] description: | Most scientific papers are not reproducible: it is really hard, if not impossible, to understand how results are derived from data, and being able to regenerate them in the future (even by the same researchers). However, traceability and reproducibility of results are indispensable elements of highquality science, and an increasing requirement of many journals and funding sources. Reproducible studies include code able to regenerate results from the original data. This practice not only provides a perfect record of the whole analysis but also reduces the probability of errors and facilitates code reuse, thus accelerating scientific progress. But doing reproducible science also brings many benefits to the individual researcher, including saving time and effort, improved collaborations, and higher quality and impact of final publications. In this article we introduce reproducible science, why it is important, and how we can improve the reproducibility of our work. We introduce principles and tools for data management, analysis, version control, and software management that help us achieve reproducible workflows in the context of ecology. --- title: | A Primer on the ‘Reproducibility Crisis’ and Ways to Fix It link: https://onlinelibrary.wiley.com/doi/abs/10.1111/1467-8462.12262 date: 2018-07-13 00:00:00 tags: [reproducible paper] description: | This article uses the framework of Ioannidis (2005) to organise a discussion of issues related to the ‘reproducibility crisis’. It then goes on to use that framework to evaluate various proposals to fix the problem. Of particular interest is the ‘post‐study probability’, the probability that a reported research finding represents a true relationship. This probability is inherently unknowable. However, a number of insightful results emerge if we are willing to make some conjectures about reasonable parameter values. Among other things, this analysis demonstrates the important role that replication can play in improving the signal value of empirical research. --- title: | A Statistical Model to Investigate the Reproducibility Rate Based on Replication Experiments link: https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12273 date: 2018-07-13 00:00:00 tags: [reproducible paper] description: | The reproducibility crisis, that is, the fact that many scientific results are difficult to replicate, pointing to their unreliability or falsehood, is a hot topic in the recent scientific literature, and statistical methodologies, testing procedures and p‐values, in particular, are at the centre of the debate. Assessment of the extent of the problem–the reproducibility rate or the false discovery rate–and the role of contributing factors are still an open problem. Replication experiments, that is, systematic replications of existing results, may offer relevant information on these issues. We propose a statistical model to deal with such information, in particular to estimate the reproducibility rate and the effect of some study characteristics on its reliability. We analyse data from a recent replication experiment in psychology finding a reproducibility rate broadly coherent with other assessments from the same experiment. Our results also confirm the expected role of some contributing factor (unexpectedness of the result and room for bias) while they suggest that the similarity between original study and the replica is not so relevant, thus mitigating some criticism directed to replication experiments. --- title: | Using Provenance for Generating Automatic Citations link: https://www.usenix.org/system/files/conference/tapp2018/tapp2018-paper-ton-that.pdf date: 2018-07-12 00:00:00 tags: [reproducible paper] description: | When computational experiments include only datasets, they could be shared through the Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs) which point to these resources. However, experiments seldom include only datasets, but most often also include software, execution results, provenance, and other associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While an entire Research Object may be citable using a URI or a DOI, it is often desirable to cite specific sub-components of a research object to help identify, authorize, date, and retrieve the published sub-components of these objects. In this paper, we present an approach to automatically generate citations for sub-components of research objects by using the object’s recorded provenance traces. The generated citations can be used "as is" or taken as suggestions that can be grouped and combined to produce higher level citations. --- title: | Variable Bibliographic Database Access Could Limit Reproducibility link: https://academic.oup.com/bioscience/advance-article/doi/10.1093/biosci/biy074/5050469 date: 2018-07-12 00:00:00 tags: [reproducible paper] description: | Bibliographic databases provide access to scientific literature through targeted queries. The most common uses of these services, aside from accessing scientific literature for personal use, are to find relevant citations for formal surveys of scientific literature, such as systematic reviews or meta-analysis, or to estimate the number of publications on a certain topic as a measure of sampling effort. Bibliographic search tools vary in the level of access to the scientific literature they allow. For instance, Google Scholar is a bibliographic search engine which allows users to find (but not necessarily access) scientific literature for no charge, whereas other services, such as Web of Science, are subscription based, allowing access to full texts of academic works at costs that can exceed$100,000 annually for large universities (Goodman 2005). One of the most commonly used bibliographic databases, Clarivate Analytics–produced Web of Science, offers tailored subscriptions to their citation indexing service. This flexibility allows subscriptions and resulting access to be tailored to the needs of researchers at the institution (Goodwin 2014). However, there are issues created by this differential access, which we discuss further below. --- title: | Software Reproducibility: How to put it into practice? link: https://osf.io/z48cm/ date: 2018-07-11 00:00:00 tags: [reproducible paper] description: | On 24 May 2018, Maria Cruz, Shalini Kurapati, and Yasemin Türkyilmaz-van der Velden led a workshop titled “Software Reproducibility: How to put it into practice?”, as part of the event Towards cultural change in data management - data stewardship in practice held at TU Delft, the Netherlands. There were 17 workshop participants, including researchers, data stewards, and research software engineers. Here we describe the rationale of the workshop, what happened on the day, key discussions and insights, and suggested next steps. --- title: | DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis link: https://gatesopenresearch.org/articles/2-31/v1 date: 2018-07-05 00:00:00 tags: [reproducible paper, reproducibility infrastructure] description: | A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and software have been released that facilitate such work-flows and scientific journals have increasingly demanded that code and primary data be made available with publications. There has been little practical advice on implementing reproducible research work-flows for large ’omics’ or systems biology data sets used by teams of analysts working in collaboration. In such instances it is important to ensure all analysts use the same version of a data set for their analyses. Yet, instantiating relational databases and standard operating procedures can be unwieldy, with high "startup" costs and poor adherence to procedures when they deviate substantially from an analyst’s usual work-flow. Ideally a reproducible research work-flow should fit naturally into an individual’s existing work-flow, with minimal disruption. Here, we provide an overview of how we have leveraged popular open source tools, including Bioconductor, Rmarkdown, git version control, R, and specifically R’s package system combined with a new tool DataPackageR, to implement a lightweight reproducible research work-flow for preprocessing large data sets, suitable for sharing among small-to-medium sized teams of computational scientists. Our primary contribution is the DataPackageR tool, which decouples time-consuming data processing from data analysis while leaving a traceable record of how raw data is processed into analysis-ready data sets. The software ensures packaged data objects are properly documented and performs checksum verification of these along with basic package version management, and importantly, leaves a record of data processing code in the form of package vignettes. Our group has implemented this work-flow to manage, analyze and report on pre-clinical immunological trial data from multi-center, multi-assay studies for the past three years. --- title: | How to Read a Research Compendium link: https://arxiv.org/pdf/1806.09525.pdf date: 2018-06-30 00:00:00 tags: [reproducible paper, reproducibility infrastructure] description: | Researchers spend a great deal of time reading research papers. Keshav (2012) provides a three-pass method to researchers to improve their reading skills. This article extends Keshav's method for reading a research compendium. Research compendia are an increasingly used form of publication, which packages not only the research paper's text and figures, but also all data and software for better reproducibility. We introduce the existing conventions for research compendia and suggest how to utilise their shared properties in a structured reading process. Unlike the original, this article is not build upon a long history but intends to provide guidance at the outset of an emerging practice. --- title: | Teaching computational reproducibility for neuroimaging link: https://arxiv.org/pdf/1806.06145.pdf date: 2018-06-23 00:00:00 tags: [reproducible paper] description: | We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of specific neu-roimaging software packages. Our experience suggests that this combination leaves the student with asuperficial understanding of the underlying ideas, and an informal, inefficient, and inaccurate approach to analysis. To address these problems, we based our course around a substantial open-ended group project. This allowed us to teach: (a) computational tools to ensure computationally reproducible work,such as the Unix command line, structured code, version control, automated testing, and code reviewand (b) a clear understanding of the statistical techniques used for a basic analysis of a single run in an MRI scanner. The emphasis we put on the group project showed the importance of standard computational tools for accuracy, efficiency, and collaboration. The projects were broadly successful in engagingstudents in working reproducibly on real scientific questions. We propose that a course on this modelshould be the foundation for future programs in neuroimaging. We believe it will also serve as a modelfor teaching efficient and reproducible research in other fields of computational science --- title: | YAMP: a containerised workflow enabling reproducibility in metagenomics research link: https://watermark.silverchair.com/giy072.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAdIwggHOBgkqhkiG9w0BBwagggG_MIIBuwIBADCCAbQGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQM9e5qZs5DsbcT7x6rAgEQgIIBhSLDG-1fl8D5GYdRDo-2wBE35B-JUWT1SELphgsGS_7CBSCg9GQc7B81NfOMex-RSvfwHbIBIbz5nksrXpTSpkpqjAMuzfrTNxDmLykrRHKT77Aor9BJNfACV3E2eOy1GXT6B08kbo77o85nn8G8vdSE9Qf7Dv-ACvv2bEi07bZrQ2WPC14oEFIOWKmorKXrhIcQrI7CrU3MyoypLWGhEsLh4BnTgSbs13V4yTpI7FbPur6wUMVBP81cru_Ud33rwrH4GKKADD8RYRMMwuN3RU_ZSl-XVDe2ph6RBw6KPfnA_imkXp8SRjPfsy5xieC8JJo4RqnQKsrMp895HOc3OI5nD0gyPJqWpPqqBOdDGb5hPaFpBfT_bJidP6xsIrQP9zleZAHSsEewOPfDOzCnycVPoJIhcyWCpTu3McLAu3IRny3XZKa2EtxvgRR9Tgcm7s-WHYh1MxvlabnKaboJE08SiPCr_T6CCKEkUhTovpwja1NrmhuQM5HYPY9ZRjrPd4-Nt2xG date: 2018-06-19 00:00:00 tags: [reproducible paper, reproducibility infrastructure] description: | YAMP is a user-friendly workflow that enables the analysis of whole shotgun metagenomic data while using containerisation to ensure computational reproducibility and facilitate collaborative research. YAMP can be executed on any UNIX-like system, and offers seamless support for multiple job schedulers as well as for Amazon AWS cloud. Although YAMP has been developed to be ready-to-use by non-experts, bioinformaticians will appreciate its flexibility, modularisation, and simple customisation. The YAMP script, parameters, and documentation are available at https://github.com/alesssia/YAMP. --- title: | The RAMP framework: from reproducibility to transparency in the design and optimization of scientific workflows link: https://openreview.net/pdf?id=Syg4NHz4eQ date: 2018-06-19 00:00:00 tags: [reproducible paper] description: | The RAMP (Rapid Analytics and Model Prototyping) is a software and project management tool developed by the Paris-Saclay Center for Data Science. The original goal was to accelerate the adoption of high-quality data science solutions for domain science problems by running rapid collaborative prototyping sessions. Today it is a full-blown data science project management tool promoting reproducibility, fair and transparent model evaluation, and democratization of data science. We have used the framework for setting up and solving about twenty scientific problems, for organizing scientific sub-communities around these events, and for training novice data scientists. --- title: | Three Dimensions of Reproducibility in Natural Language Processing link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998676/ date: 2018-06-19 00:00:00 tags: [reproducible paper] description: | Despite considerable recent attention to problems with reproducibility of scientific research, there is a striking lack of agreement about the definition of the term. That is a problem, because the lack of a consensus definition makes it difficult to compare studies of reproducibility, and thus to have even a broad overview of the state of the issue in natural language processing. This paper proposes an ontology of reproducibility in that field. Its goal is to enhance both future research and communication about the topic, and retrospective meta-analyses. We show that three dimensions of reproducibility, corresponding to three kinds of claims in natural language processing papers, can account for a variety of types of research reports. These dimensions are reproducibility of a conclusion, of a finding, and of a value. Three biomedical natural language processing papers by the authors of this paper are analyzed with respect to these dimensions. --- title: | Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility link: https://dl.acm.org/citation.cfm?doid=3214239.3214242 date: 2018-06-16 00:00:00 tags: [reproducible paper] description: | The ability to independently regenerate published computational claims is widely recognized as a key component of scientific reproducibility. In this article we take a narrow interpretation of this goal, and attempt to regenerate published claims from author-supplied information, including data, code, inputs, and other provided specifications, on a different computational system than that used by the original authors. We are motivated by Claerbout and Donoho's exhortation of the importance of providing complete information for reproducibility of the published claim. We chose the Elsevier journal, the Journal of Computational Physics, which has stated author guidelines that encourage the availability of computational digital artifacts that support scholarly findings. In an IRB approved study at the University of Illinois at Urbana-Champaign (IRB #17329) we gathered artifacts from a sample of authors who published in this journal in 2016 and 2017. We then used the ICERM criteria generated at the 2012 ICERM workshop "Reproducibility in Computational and Experimental Mathematics" to evaluate the sufficiency of the information provided in the publications and the ease with which the digital artifacts afforded computational reproducibility. We find that, for the articles for which we obtained computational artifacts, we could not easily regenerate the findings for 67% of them, and we were unable to easily regenerate all the findings for any of the articles. We then evaluated the artifacts we did obtain (55 of 306 articles) and find that the main barriers to computational reproducibility are inadequate documentation of code, data, and workflow information (70.9%), missing code function and setting information, and missing licensing information (75%). We recommend improvements based on these findings, including the deposit of supporting digital artifacts for reproducibility as a condition of publication, and verification of computational findings via re-execution of the code when possible. --- title: | Improving Reproducibility of Distributed Computational Experiments link: https://dl.acm.org/citation.cfm?doid=3214239.3214241 date: 2018-06-16 00:00:00 tags: [reproducible paper] description: | Conference and journal publications increasingly require experiments associated with a submitted article to be repeatable. Authors comply to this requirement by sharing all associated digital artifacts, i.e., code, data, and environment configuration scripts. To ease aggregation of the digital artifacts, several tools have recently emerged that automate the aggregation of digital artifacts by auditing an experiment execution and building a portable container of code, data, and environment. However, current tools only package non-distributed computational experiments. Distributed computational experiments must either be packaged manually or supplemented with sufficient documentation. In this paper, we outline the reproducibility requirements of distributed experiments using a distributed computational science experiment involving use of message-passing interface (MPI), and propose a general method for auditing and repeating distributed experiments. Using Sciunit we show how this method can be implemented. We validate our method with initial experiments showing application re-execution runtime can be improved by 63% with a trade-off of longer run-time on initial audit execution. --- title: | Popper Pitfalls link: https://dl.acm.org/citation.cfm?doid=3214239.3214243 date: 2018-06-16 00:00:00 tags: [reproducible paper] description: | We describe the four publications we have tried to make reproducible and discuss how each paper has changed our workflows, practices, and collaboration policies. The fundamental insight is that paper artifacts must be made reproducible from the start of the project; artifacts are too difficult to make reproducible when the papers are (1) already published and (2) authored by researchers that are not thinking about reproducibility. In this paper, we present the best practices adopted by our research laboratory, which was sculpted by the pitfalls we have identified for the Popper convention. We conclude with a "call-to-arms" for the community focused on enhancing reproducibility initiatives for academic conferences, industry environments, and national laboratories. We hope that our experiences will shape a best practices guide for future reproducible papers. --- title: | Supporting Long-term Reproducible Software Execution link: https://dl.acm.org/citation.cfm?doid=3214239.3214245 date: 2018-06-16 00:00:00 tags: [reproducible paper] description: | A recent widespread realization that software experiments are not as easily replicated as once believed brought software execution preservation to the science spotlight. As a result, scientists, institutions, and funding agencies have recently been pushing for the development of methodologies and tools that preserve software artifacts. Despite current efforts, long term reproducibility still eludes us. In this paper, we present the requirements for software execution preservation and discuss how to improve long-term reproducibility in science. In particular, we discuss the reasons why preserving binaries and pre-built execution environments is not enough and why preserving the ability to replicate results is not the same as preserving software for reproducible science. Finally, we show how these requirements are supported by Occam, an open curation framework that fully preserves software and its dependencies from source to execution, promoting transparency, longevity, and re-use. Specifically, Occam provides the ability to automatically deploy workflows in a fully-functional environment that is able to not only run them, but make them easily replicable. --- title: | Have researchers increased reporting of outliers in response to the reproducibility crisis? link: https://osf.io/jq8eu date: 2018-06-02 00:00:00 tags: [reproducible paper] description: | Psychology is currently experiencing a "renaissance" where the replication and reproducibility of published reports are at the forefront of conversations in the field. While researchers have worked to discuss possible problems and solutions, work has yet to uncover how this new culture may have altered reporting practices in the social sciences. As outliers can bias both descriptive and inferential statistics, the search for these data points is essential to any analysis using these parameters. We quantified the rates of reporting of outliers within psychology at two time points: 2012 when the replication crisis was born, and 2017, after the publication of reports concerning replication, questionable research practices, and transparency. A total of 2235 experiments were identified and analyzed, finding an increase in reporting of outliers from only 15.7% of experiments mentioning outliers in 2012 to 25.0% in 2017. We investigated differences across years given the psychological field or statistical analysis that experiment employed. Further, we inspected whether outliers mentioned are whole participant observations or data points, and what reasons authors gave for stating the observation was deviant. We conclude that while report rates are improving overall, there is still room for improvement in the reporting practices of psychological scientists which can only aid in strengthening our science. --- title: | Sharing and Preserving Computational Analyses for Posterity with encapsulator link: http://tfjmp.org/files/publications/cise-2018.pdf date: 2018-06-02 00:00:00 tags: [reproducible paper] description: | Open data and open-source software may be part of the solution to sciences reproducibility crisis, but they are insufficient to guarantee reproducibility. Requiring minimal end-user expertise, encapsulator creates a "time capsule" with reproducible code (right now, only supporting R code) in a self-contained computational environment. encapsulator provides end-users with a fully-featured desktop environment for reproducible research. --- title: | The Crisis of Reproducibility, the Denominator Problem and the Scientific Role of Multi-Scale Modeling link: https://www.preprints.org/manuscript/201805.0308/v2 date: 2018-05-29 00:00:00 tags: [reproducible paper] description: | The "Crisis of Reproducibility" has received considerable attention both within the scientific community and without. While factors associated with scientific culture and practical practice are most often invoked, I propose that the Crisis of Reproducibility is ultimately a failure of generalization with a fundamental scientific basis in the methods used for biomedical research. The Denominator Problem describes how limitations intrinsic to the two primary approaches of biomedical research, clinical studies and pre-clinical experimental biology, lead to an inability to effectively characterize the full extent of biological heterogeneity, which compromises the task of generalizing acquired knowledge. Drawing on the example of the unifying role of theory in the physical sciences, I propose that multi-scale mathematical and dynamic computational models, when mapped to the modular structure of biological systems, can serve a unifying role as formal representations of what is conserved and similar from one biological context to another. This ability to explicitly describe the generation of heterogeneity from similarity addresses the Denominator Problem and provides a scientific response to the Crisis of Reproducibility. --- title: | The 'end of the expert': why science needs to be above criticism link: https://www.repository.cam.ac.uk/handle/1810/276106 date: 2018-05-29 00:00:00 tags: [reproducible paper] description: | In 1942, Robert Merton wrote that "Incipient and actual attacks upon the integrity of science" meant that science needed to "restate its objectives, seek out its rationale". Some 77 years later we are similarly in an environment where “the people of this country have had enough of experts". It is essential that science is able to withstand rigorous scrutiny to avoid being dismissed, pilloried or ignored. Transparency and reproducibility in the scientific process is a mechanism to meet this challenge and good research data management is a fundamental factor in this. --- title: | Replication and Reproducibility in Cross-Cultural Psychology link: http://journals.sagepub.com/doi/abs/10.1177/0022022117744892 date: 2018-05-26 00:00:00 tags: [reproducible paper] description: | Replication is the scientific gold standard that enables the confirmation of research findings. Concerns related to publication bias, flexibility in data analysis, and high-profile cases of academic misconduct have led to recent calls for more replication and systematic accumulation of scientific knowledge in psychological science. This renewed emphasis on replication may pose specific challenges to cross-cultural research due to inherent practical difficulties in emulating an original study in other cultural groups. The purpose of the present article is to discuss how the core concepts of this replication debate apply to cross-cultural psychology. Distinct to replications in cross-cultural research are examinations of bias and equivalence in manipulations and procedures, and that targeted research populations may differ in meaningful ways. We identify issues in current psychological research (analytic flexibility, low power) and possible solutions (preregistration, power analysis), and discuss ways to implement best practices in cross-cultural replication attempts. --- title: | Transparency on scientific instruments link: http://embor.embopress.org/content/early/2018/05/22/embr.201845853 date: 2018-05-26 00:00:00 tags: [reproducible paper] description: | Scientific instruments are at the heart of the scientific process, from 17th‐century telescopes and microscopes, to modern particle colliders and DNA sequencing machines. Nowadays, most scientific instruments in biomedical research come from commercial suppliers [1], [2], and yet, compared to the biopharmaceutical and medical devices industries, little is known about the interactions between scientific instrument makers and academic researchers. Our research suggests that this knowledge gap is a cause for concern. --- title: | Before reproducibility must come preproducibility link: https://www.nature.com/articles/d41586-018-05256-0 date: 2018-05-24 00:00:00 tags: [popular news] description: | The lack of standard terminology means that we do not clearly distinguish between situations in which there is not enough information to attempt repetition, and those in which attempts do not yield substantially the same outcome. To reduce confusion, I propose an intuitive, unambiguous neologism: ‘preproducibility’. An experiment or analysis is preproducible if it has been described in adequate detail for others to undertake it. Preproducibility is a prerequisite for reproducibility, and the idea makes sense across disciplines. --- title: | Facilitating Reproducibility and Collaboration with Literate Programming link: https://hdekk.github.io/escience2018/ date: 2018-05-13 00:00:00 tags: [reproducibility talk] description: | A fundamental challenge for open science is how best to create and share documents containing computational results. Traditional methods involve maintaining the code, generated tables and figures, and text as separate files and manually assembling them into a finished document. As projects grow in complexity, this approach can lead to procedures which are error prone and hard to replicate. Fortunately, new tools are emerging to address this problem and librarians who provide data services are ideally positioned to provide training. In the workshop we’ll use RStudio to demonstrate how to create a "compilable" document containing all the text elements (including bibliography), as well as the code required to create embedded graphs and tables. We’ll demonstrate how the process facilitates making revisions when, for example, a reviewer has suggested a revision or when there has been a change in the underlying data. We’ll also demonstrate the convenience of integrating version control into the workflow using RStudio’s built-in support for git. --- title: | Evaluating Reproducibility in Computational Biology Research link: https://scholarworks.gvsu.edu/cgi/viewcontent.cgi?article=1682&context=honorsprojects date: 2018-05-11 00:00:00 tags: [replication study] description: | For my Honors Senior Project, I read five research papers in the field of computational biology and attempted to reproduce the results. However, for the most part, this proved a challenge, as many details vital to utilizing relevant software and data had been excluded. Using Geir Kjetil Sandve's paper "Ten Simple Rules for Reproducible Computational Research" as a guide, I discuss how authors of these five papers did and did not obey these rules of reproducibility and how this affected my ability to reproduce their results. --- title: | Systematic reviews and evidence synthesis link: https://crln.acrl.org/index.php/crlnews/article/view/16967/18703 date: 2018-05-11 00:00:00 tags: [reproducibility bibliography] description: | While comprehensive and expert searching may be part of the traditional aspects of academic librarianship, systematic reviews also require transparency and reproducibility of search methodology. This work is supported by use of reporting guidelines and related librarian expertise. This guide provides resources that are useful to librarians assisting with systematic reviews in a broad range of disciplines outside the biomedical sciences. Because the bulk of published literature on systematic reviews is concentrated in the health sciences, some resources are subject-specific in title, but have broader applications. --- title: | Climate Science Can Be More Transparent, Researchers Say link: https://www.scientificamerican.com/article/climate-science-can-be-more-transparent-researchers-say/ date: 2018-05-11 00:00:00 tags: [popular news] description: | Top climate scientists say their field can improve its transparency. A group of researchers presented their findings on reproducibility in climate science to the National Academies of Sciences, Engineering and Medicine yesterday as part of a monthslong examination of scientific transparency. --- title: | Scientific Research: Reproducibility and Bias in Chemistry link: https://www.decodedscience.org/scientific-research-reproducibility-bias-chemistry/62997 date: 2018-05-11 00:00:00 tags: [popular news] description: | When scientists are able to recreate earlier research results, published by other scientists, the research is considered reproducible. But what happens when the results don’t match? It means that the initial research is non-reproducible. Reproducibility, or non-reproducibility, of scientific experiments seems straightforward; it implies that an experimental result is either valid or invalid. In fact, researchers affiliated with Stanford University, Tufts University, and University of Ioannina in Greece concluded in 2005 that a majority of all research findings are false. How do those invalid results end up in scientific papers? A group of Stanford researchers concluded that, in many cases, bias is to blame. --- title: | Use Cases of Computational Reproducibility for Scientific Workflows at Exascale link: https://arxiv.org/pdf/1805.00967.pdf date: 2018-05-05 00:00:00 tags: [reproducible paper] description: | We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics, in a hybrid queriable system, the ProvEn server. The system capabilities are illustrated on two use cases: scientific reproducibility of results in the ACME climate simulations and performance reproducibility in molecular dynamics workflows on HPC computing platforms. --- title: | DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments link: https://arxiv.org/pdf/1805.00329.pdf date: 2018-05-05 00:00:00 tags: [reproducibility infrastructure] description: | We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices. --- title: | Open Data & Reproducibility link: https://escholarship.org/uc/item/115856kh date: 2018-05-03 00:00:00 tags: [reproducibility talk] description: | Presentation given for Love Data Week 2018. --- title: | Qualitative Coding: Strategies for Transparency and Reproducibility link: https://scholarworks.iu.edu/dspace/bitstream/handle/2022/22052/2018-04-27_wim_meanwell_coding_slides.pdf?sequence=3&isAllowed=y date: 2018-05-01 00:00:00 tags: [reproducibility talk] description: | Workshop in methods at IU. --- title: | Open access to data at Yale University link: https://elischolar.library.yale.edu/dayofdata/2017/posters/6/ date: 2018-05-01 00:00:00 tags: [reproducibility talk] description: | Open access to research data increases knowledge, advances science, and benefits society. Many researchers are now required to share data. Two research centers at Yale have launched projects that support this mission. Both centers have developed technology, policies, and workflows to facilitate open access to data in their respective fields. The Yale University Open Data Access (YODA) Project at the Center for Outcomes Research and Evaluation advocates for the responsible sharing of clinical research data. The Project, which began in 2014, is committed to open science and data transparency, and supports research attempting to produce concrete benefits to patients, the medical community, and society as a whole. Early experience sharing data, made available by Johnson & Johnson (J&J) through the YODA Project, has demonstrated a demand for shared clinical research data as a resource for investigators. To date, the YODA Project has facilitated the sharing of data for over 65 research projects. The Institution for Social and Policy Studies (ISPS) Data Archive is a digital repository that shares and preserves the research produced by scholars affiliated with ISPS. Since its launch in 2011, the Archive holds data and code underlying almost 90 studies. The Archive is committed to the ideals of scientific reproducibility and transparency: It provides free and public access to research materials and accepts content for distribution under a Creative Commons license. The Archive has pioneered a workflow, “curating for reproducibility,” that ensures long term usability and data quality. --- title: | Reflections on the Future of Research Curation and Research Reproducibility link: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8347157 date: 2018-05-01 00:00:00 tags: [reproducibility report] description: | In the years since the launch of the World Wide Web in 1993, there have been profoundly transformative changes to the entire concept of publishing—exceeding all the previous combined technical advances of the centuries following the introduction of movable type in medieval Asia around the year 10001 and the subsequent large-scale commercialization of printing several centuries later by J. Gutenberg (circa 1440). Periodicals in print—from daily newspapers to scholarly journals—are now quickly disappearing, never to return, and while no publishing sector has been unaffected, many scholarly journals are almost unrecognizable in comparison with their counterparts of two decades ago. To say that digital delivery of the written word is fundamentally different is a huge understatement. Online publishing permits inclusion of multimedia and interactive content that add new dimensions to what had been available in print-only renderings. As of this writing, the IEEE portfolio of journal titles comprises 59 online only2 (31%) and 132 that are published in both print and online. The migration from print to online is more stark than these numbers indicate because of the 132 periodicals that are both print and online, the print runs are now quite small and continue to decline. In short, most readers prefer to have their subscriptions fulfilled by digital renderings only. --- title: | Dealing with the reproducibility crisis: what can ECRs do about it? link: http://blogs.plos.org/thestudentblog/2018/04/27/dealing-with-the-reproducibility-crisis-what-can-ecrs-do-about-it/ date: 2018-04-27 00:00:00 tags: [popular news] description: | Unless you’ve been living under a rock (no judgment, by the way), I’m sure you’ve heard about the reproducibility crisis in scientific research. In 2016, two posts on this blog covered what the main causes of irreproducibility are and what can be done, and how we can reform scientific publishing to value integrity. To briefly recap, a study published in PLOS biology noted that half of preclinical research is not reproducible. The estimated price tag on this irreproducibility is alarming—a whopping \$28 billion. In my opinion, however, the most troubling cost of this crisis is its impact on public trust in science. --- title: | Computational Reproducibility at Exascale 2017 (CRE2017) link: https://sc17.supercomputing.org/presentation/?id=wksp144&sess=sess132 date: 2018-04-24 00:00:00 tags: [reproducibility conference] description: | Reproducibility is an important concern in all areas of computation. As such, computational reproducibility is receiving increasing interest from a variety of parties who are concerned with different aspects of computational reproducibility. Computational reproducibility encompasses several concerns including the sharing of code and data, as well as reproducible numerical results which may depend on operating system, tools, levels of parallelism, and numerical effects. In addition, the publication of reproducible computational results motivates a host of computational reproducibility concerns that arise from the fundamental notion of reproducibility of scientific results that has normally been restricted to experimental science. This workshop combines the Numerical Reproducibility at Exascale Workshops (conducted in 2015 and 2016 at SC) and the panel on Reproducibility held at SC16 (originally a BOF at SC15) to address several different issues in reproducibility that arise when computing at exascale. The workshop will include issues of numerical reproducibility as well as approaches and best practices to sharing and running code. --- title: | Reproducible scientific paper/project link: https://cral.univ-lyon1.fr/labo/perso/mohammad.akhlaghi//pdf/reproducible-paper.pdf date: 2018-04-18 00:00:00 tags: [reproducibility talk] description: | A presentation by Mohammad Akhlaghi from the Centre de Recherche Astrophysique de Lyon (CRAL) on reproducibility through using Make. --- title: | The state of reproducibility in the computational geosciences link: https://eartharxiv.org/kzu8e/ date: 2018-04-16 00:00:00 tags: [reproducible paper] description: | Figures are essential outputs of computational geoscientific research, e.g. maps and time series showing the results of spatiotemporal analyses. They also play a key role in open reproducible research, where public access is provided to paper, data, and source code to enable reproduction of the reported results. This scientific ideal is rarely practiced as studies, e.g. in biology have shown. In this article, we report on a series of studies to evaluate open reproducible research in the geosciences from the perspectives of authors and readers. First, we asked geoscientists what they understand by open reproducible research and what hinders its realisation. We found there is disagreement amongst authors, and a lack of openness impedes the adoption by authors and readers alike. However, reproducible research also includes the ability to achieve the same results requiring not only accessible but executable source code. Hence, to further examine the reader’s perspective, we searched for open access papers from the geosciences that have code/data attached (in R) and executed the analysis. We encountered several technical issues while executing the code and found differences between the original and reproduced figures. Based on these findings, we propose guidelines for authors to address these. --- title: | Estimating the Reproducibility of Experimental Philosophy link: https://psyarxiv.com/sxdah date: 2018-04-16 00:00:00 tags: [reproducible paper] description: | For scientific theories grounded in empirical data, replicability is a core principle, for at least two reasons. First, unless we accept to have scientific theories rest on the authority of a small number of researchers, empirical studies should be replicable, in the sense that its methods and procedure should be detailed enough for someone else to conduct the same study. Second, for empirical results to provide a solid foundation for scientific theorizing, they should also be replicable, in the sense that most attempts at replicating the original study that produced them would yield similar results. The XPhi Replicability Project is primarily concerned with replicability in the second sense, that is: the replicability of results. In the past year, several projects have shed doubt on the replicability of key findings in psychology, and most notably social psychology. Because the methods of experimental philosophy have often been close to the ones used in social psychology, it is only natural to wonder to which extent the results experimental philosophers ground their theory are replicable. The aim of the XPhi Replicability Project is precisely to reach a reliable estimate of the replicability of empirical results in experimental philosophy. To this end, several research teams across the world will replicate around 40 studies in experimental philosophy, some among the most cited, others drawn at random. The results of the project will be published in a special issue of the Review of Philosophy and Psychology dedicated to the topic of replicability in cognitive science. --- title: | Developer Interaction Traces backed by IDE Screen Recordings from Think aloud Sessions link: http://swat.polymtl.ca/~foutsekh/docs/MSR-Aiko-Trace.pdf date: 2018-04-10 00:00:00 tags: [reproducible paper, reproducibility infrastructure] description: | There are two well-known difficulties to test and interpret methodologies for mining developer interaction traces: first, the lack of enough large datasets needed by mining or machine learning approaches to provide reliable results; and second, the lack of "ground truth" or empirical evidence that can be used to triangulate the results, or to verify their accuracy and correctness. Moreover, relying solely on interaction traces limits our ability to take into account contextual factors that can affect the applicability of mining techniques in other contexts, as well hinders our ability to fully understand the mechanics behind observed phenomena. The data presented in this paper attempts to alleviate these challenges by providing 600+ hours of developer interaction traces, from which 26+ hours are backed with video recordings of the IDE screen and developer’s comments. This data set is relevant to researchers interested in investigating program comprehension, and those who are developing techniques for interaction traces analysis and mining. --- title: | Reproducibility does not imply, innovation speeds up, and epistemic diversity optimizes discovery of truth in a model-centric meta-scientific framework link: https://arxiv.org/pdf/1803.10118.pdf date: 2018-04-01 00:00:00 tags: [reproducible paper] description: | Theoretical work on reproducibility of scientific claims has hitherto focused on hypothesis testing as the desired mode of statistical inference. Focusing on hypothesis testing, however, poses a challenge to identify salient properties of the scientific process related to reproducibility, especially for fields that progress by building, comparing, selecting, and re-building models. We build a model-centric meta-scientific framework in which scientific discovery progresses by confirming models proposed in idealized experiments. In a temporal stochastic process of scientific discovery, we define scientists with diverse research strategies who search the true model generating the data. When there is no replication in the system, the structure of scientific discovery is a particularly simple Markov chain. We analyze the effect of diversity of research strategies in the scientific community and the complexity of the true model on the time spent at each model, the mean first time to hit the true model and staying with the true model, and the rate of reproducibility given a true model. Inclusion of replication in the system breaks the Markov property and fundamentally alters the structure of scientific discovery. In this case, we analyze aforementioned properties of scientific discovery by an agent-based model. In our system, the seeming paradox of scientific progress despite irreproducibility persists even in the absence of questionable research practices and incentive structures, as the rate of reproducibility and scientific discovery of the truth are uncorrelated. We explain this seeming paradox by a combination of research strategies in the population and the state of truth. Further, we find that innovation speeds up the discovery of truth by making otherwise inaccessible, possibly true models visible to the scientific population. We also show that epistemic diversity in the scientific population optimizes across a range of desirable properties of scientific discovery. --- title: | archivist: Boost the reproducibility of your research link: http://smarterpoland.pl/index.php/2017/12/boost-the-reproducibility-of-your-research-with-archivist/?utm_content=buffer7c1ab&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer date: 2018-04-01 00:00:00 tags: [reproducibility infrastructure] description: | The safest solution would be to store copies of every object, ever created during the data analysis. All forks, wrong paths, everything. Along with detailed information which functions with what parameters were used to generate each result. Something like the ultimate TimeMachine or GitHub for R objects. With such detailed information, every analysis would be auditable and replicable. Right now the full tracking of all created objects is not possible without deep changes in the R interpreter. The archivist is the light-weight version of such solution. --- title: | Questionable Research Practices in Ecology and Evolution link: https://osf.io/ajyqg date: 2018-03-26 00:00:00 tags: [reproducible paper] description: | We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results, p hacking, and hypothesising after the results are known (HARKing). We also asked them to estimate the proportion of their colleagues that use each of these QRPs. Several of the QRPs were prevalent within the ecology and evolution research community. Across the two groups, we found 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking); 42% had collected more data after inspecting whether results were statistically significant (a form of p hacking) and 51% had reported an unexpected finding as though it had been hypothesised from the start (HARKing). Such practices have been directly implicated in the low rates of reproducible results uncovered by recent large scale replication studies in psychology and other disciplines. The rates of QRPs found in this study are comparable with the rates seen in psychology, indicating that the reproducibility problems discovered in psychology are also likely to be present in ecology and evolution. --- title: | Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition link: https://osf.io/preprints/bitss/39cfb/ date: 2018-03-26 00:00:00 tags: [reproducible paper] description: | Access to research data is a critical feature of an efficient, progressive, and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data ("analytic reproducibility"). To investigate, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), and data that were in-principle reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). However, for 35 articles with in-principle reusable data, the analytic reproducibility of target outcomes related to key findings was poor: 11 (31%) cases were reproducible without author assistance, 11 (31%) cases were reproducible only with author assistance, and 13 (37%) cases were not fully reproducible despite author assistance. Importantly, original conclusions did not appear to be seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification, and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings. --- title: | The Scientific Filesystem (SCIF) link: https://academic.oup.com/gigascience/advance-article/doi/10.1093/gigascience/giy023/4931737 date: 2018-03-19 00:00:00 tags: [reproducible paper] description: | Here we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entrypoints to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We will start by reviewing the background and rationale for the Scientific Filesystem, followed by an overview of the specification, and the different levels of internal modules (“apps”) that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with Scientific Filesystems. When used inside of a reproducible container, a Scientific Filesystem is a recipe for reproducibility and introspection of the functions and users that it serves. --- title: | An empirical analysis of journal policy effectiveness for computational reproducibility link: http://www.pnas.org/content/pnas/115/11/2584.full.pdf date: 2018-03-19 00:00:00 tags: [reproducible paper] description: | A key component of scientific communication is sufficient information for other researchers in the field to reproduce published findings. For computational and data-enabled research, this has often been interpreted to mean making available the raw data from which results were generated, the computer code that generated the findings, and any additional information needed such as workflows and input parameters. Many journals are revising author guidelines to include data and code availability. This work evaluates the effectiveness of journal policy that requires the data and code necessary for reproducibility be made available postpublication by the authors upon request. We assess the effectiveness of such a policy by (i) requesting data and code from authors and (ii) attempting replication of the published findings. We chose a random sample of 204 scientific papers published in the journal Science after the implementation of their policy in February 2011. We found that we were able to obtain artifacts from 44% of our sample and were able to reproduce the findings for 26%. We find this policy—author remission of data and code postpublication upon request—an improvement over no policy, but currently insufficient for reproducibility. --- title: | Reproducibility in Cancer Biology: The who, where and how of fusobacteria and colon cancer link: https://elifesciences.org/articles/28434 date: 2018-03-19 00:00:00 tags: [reproducible paper] description: | The association between the bacterium Fusobacterium nucleatum and human colon cancer is more complicated than it first appeared. --- title: | A Windows-Based Framework for Enhancing Scalability and Reproducibility of Large-scale Research Data link: http://www.asee-se.org/proceedings/ASEE2018/papers2018/100.pdf date: 2018-03-19 00:00:00 tags: [reproducible paper] description: | Graduate and undergraduate students involved in research projects that generate or analyze extensive datasets use several software applications for data input and processing subject to guidelines for ensuring data quality and availability. Data management guidelines are based on existing practices of the associated academic or funding institutions and may be automated to minimize human error and maintenance overhead. This paper presents a framework for automating data management processes, and it details the flow of data from generation/acquisition through processing to the output of final reports. It is designed to adapt to changing requirements and limit overhead costs. The paper also presents a representative case study applying the framework to the finite element characterization of the magnetically coupled linear variable reluctance motor. It utilizes modern widely available scripting tools particularly Windows PowerShell® to automate workflows. This task requires generating motor characteristics for several thousands of operating conditions using finite element analysis. --- title: | Utilizing Provenance in Reusable Research Objects link: http://www.mdpi.com/2227-9709/5/1/14/htm date: 2018-03-13 00:00:00 tags: [reproducible paper] description: | Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms. --- title: | Provenance and the Different Flavors of Computational Reproducibility link: http://sites.computer.org/debull/A18mar/A18MAR-CD.pdf#page=17 date: 2018-03-13 00:00:00 tags: [reproducible paper] description: | While reproducibility has been a requirement in natural sciences for centuries, computational experiments have not followed the same standard. Often, there is insufficient information to reproduce computational results described in publications, and in the recent past, this has led to many retractions. Although scientists are aware of the numerous benefits of reproducibility, the perceived amount of work to make results reproducible is a significant disincentive. Fortunately, much of the information needed to reproduce an experiment can be obtained by systematically capturing its provenance. In this paper, we give an overview of different types of provenance and how they can be used to support reproducibility. We also describe a representative set of provenance tools and approaches that make it easy to create reproducible experiments. --- title: | Re-Thinking Reproducibility as a Criterion for Research Quality link: http://philsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf date: 2018-02-11 00:00:00 tags: [reproducible paper] description: | A heated debate surrounds the significance of reproducibility as an indicator for research quality and reliability, with many commentators linking a "crisis of reproducibility" to the rise of fraudulent, careless and unreliable practices of knowledge production. Through the analysis of discourse and practices across research fields, I point out that reproducibility is not only interpreted in different ways, but also serves a variety of epistemic functions depending on the research at hand. Given such variation, I argue that the uncritical pursuit of reproducibility as an overarching epistemic value is misleading and potentially damaging to scientific advancement. Requirements for reproducibility, however they are interpreted, are one of many available means to securere liable research outcomes. Furthermore, there are cases wherethe focus on enhancing reproducibility turns out not to foster high-quality research. Scientific communities and Open Science advocates should learn from inferential reasoning from irreproducible data, and promoteincentives for all researchers to explicitly and publicly discuss (1) their methodological commitments, (2) the ways in which they learn from mistakes and problems in everyday practice, and (3) the strategies they use to choose which research component of any project needs to be preserved in the long term, and how. --- title: | EnosStack: A LAMP-like stack for the experimenter link: https://hal.inria.fr/hal-01689726/document date: 2018-01-29 00:00:00 tags: [reproducible paper] description: | Reproducibility and repeatability dramatically increase the value of scientific experiments, but remain two challenging goals for the experimenters. Similar to the LAMP stack that considerably eased the web developers life, in this paper, we advocate the need of an analogous software stack to help the experimenters making reproducible research. We propose the EnosStack, an open source software stack especially designed for reproducible scientific experiments. EnosStack enables to easily describe experimental workflows meant to be re-used, while abstracting the underlying infrastructure running them. Being able to switch experiments from a local to a real testbed deployment greatly lower code development and validation time. We describe the abstractions that have driven its design, before presenting a real experiment we deployed on Grid'5000 to illustrate its usefulness. We also provide all the experiment code, data and results to the community. --- title: | Building capacity to encourage research reproducibility and #MakeResearchTrue link: http://jmla.mlanet.org/ojs/jmla/article/view/273/584 date: 2018-01-14 00:00:00 tags: [reproducible paper] description: | In this case study, the authors present one library’s work to help increase awareness of reproducibility and to build capacity for our institution to improve reproducibility of ongoing and future research. --- title: | The Reproducibility Crisis and Academic Libraries link: http://crl.acrl.org/index.php/crl/article/view/16846/18452 date: 2018-01-08 00:00:00 tags: [reproducible paper] description: | In recent years, evidence has emerged from disciplines ranging from biology to economics that many scientific studies are not reproducible. This evidence has led to declarations in both the scientific and lay press that science is experiencing a “reproducibility crisis” and that this crisis has significant impacts on both science and society, including misdirected effort, funding, and policy implemented on the basis of irreproducible research. In many cases, academic libraries are the natural organizations to lead efforts to implement recommendations from journals, funders, and societies to improve research reproducibility. In this editorial, we introduce the reproducibility crisis, define reproducibility and replicability, and then discusses how academic libraries can lead institutional support for reproducible research. --- title: | Scientific replication in the study of social animals link: https://psyarxiv.com/gsz85/ date: 2018-01-06 00:00:00 tags: [reproducible paper] description: | This chapter is written to help undergraduate students better understand the role of replication in psychology and how it applies to the study of social behavior. We briefly review various replication initiatives in psychology and the events that preceded our renewed focus on replication. We then discuss challenges in interpreting the low rate of replication in psychology, especially social psychology. Finally, we stress the need for better methods and theories to learn the right lessons when replications fail. --- title: | An Open Solution for Urgent Problems: Increasing Research Quality, Reproducibility, & Diversity link: https://vtechworks.lib.vt.edu/handle/10919/80970 date: 2017-12-11 00:00:00 tags: [reproducibility talk] description: | Jeffrey Spies, Ph.D., is the Co-Founder and Chief Technology Officer of the non-profit Center for Open Science. In this presentation, Dr. Spies discusses motivations, values, and common experiences of researchers and scholars in research and publication processes. Spies explores biases toward confirmatory research to the exclusion of exploratory research, funding and reward incentives that conflict with scholarly values, and, costs of delayed research publication -- as measured in human lives. This critical approach to ethics and values in research and publication begs the questions “Where would we be if this [publishing] system were a little more reproducible, a little more efficient?” and asks for an examination of values as revealed by our practice; are we implying that some lives matter more than others? Spies discusses how open output [open access] and open workflow policies and practices assist scholars in aligning their scholarly practices more closely to their scholarly values. For more information: Center for Open Science: https://cos.io Open badges: https://cos.io/our-services/open-science-badges Open Science Framework: https://cos.io/our-products/open-science-framework PrePrint servers: https://cos.io/our-products/osf-preprints/ Registered Reports: https://cos.io/rr Transparency and Openness Promotion Guidelines: https://cos.io/our-services/top-guidelines --- title: | Utilising Semantic Web Ontologies To publish Experimental Workflows link: https://pdfs.semanticscholar.org/d480/3552b1e460c2acaf3848ad360db186257a61.pdf date: 2017-11-16 00:00:00 tags: [reproducible paper] description: | Reproducibility in experiments is necessary to verify claims and to reuse prior work in experiments that advance research. However, the traditional model of publication validates research claims through peer-review without taking reproducibility into account. Workflows encapsulate experiment descriptions and components and are suitable for representing reproducibility. Additionally, they can be published alongside traditional patterns as a form of documentation for the experiment which can be combined with linked open data. For reproducibility utilising published datasets, it is necessary to declare the conditions or restrictions for permissible reuse. In this paper, we take a look at the state of workflow reproducibility through a browser based tool and a corresponding study to identify how workflows might be combined with traditional forms of documentation and publication. We also discuss the licensing aspects for data in workflows and how it can be annotated using linked open data ontologies. --- title: | THE DISMAL SCIENCE REMAINS DISMAL, SAY SCIENTISTS link: https://www.wired.com/story/econ-statbias-study/ date: 2017-11-14 00:00:00 tags: [popular news] description: | The paper inhales more than 6,700 individual pieces of research, all meta-analyses that themselves encompass 64,076 estimates of economic outcomes. That’s right: It’s a meta-meta-analysis. And in this case, Doucouliagos never meta-analyzed something he didn’t dislike. Of the fields covered in this corpus, half were statistically underpowered—the studies couldn’t show the effect they said they did. And most of the ones that were powerful enough overestimated the size of the effect they purported to show. Economics has a profound effect on policymaking and understanding human behavior. For a science, this is, frankly, dismal. --- title: | The reproducibility challenge – what researchers need link: http://septentrio.uit.no/index.php/SCS/article/view/4257 date: 2017-11-14 00:00:00 tags: [reproducibility guidelines] description: | Within the Open Science discussions, the current call for “reproducibility” comes from the raising awareness that results as presented in research papers are not as easily reproducible as expected, or even contradicted those original results in some reproduction efforts. In this context, transparency and openness are seen as key components to facilitate good scientific practices, as well as scientific discovery. As a result, many funding agencies now require the deposit of research data sets, institutions improve the training on the application of statistical methods, and journals begin to mandate a high level of detail on the methods and materials used. How can researchers be supported and encouraged to provide that level of transparency? An important component is the underlying research data, which is currently often only partly available within the article. At Elsevier we have therefore been working on journal data guidelines which clearly explain to researchers when and how they are expected to make their research data available. Simultaneously, we have also developed the corresponding infrastructure to make it as easy as possible for researchers to share their data in a way that is appropriate in their field. To ensure researchers get credit for the work they do on managing and sharing data, all our journals support data citation in line with the FORCE11 data citation principles – a key step in the direction of ensuring that we address the lack of credits and incentives which emerged from the Open Data analysis (Open Data - the Researcher Perspective https://www.elsevier.com/about/open-science/research-data/open-data-report ) recently carried out by Elsevier together with CWTS. Finally, the presentation will also touch upon a number of initiatives to ensure the reproducibility of software, protocols and methods. With STAR methods, for instance, methods are submitted in a Structured, Transparent, Accessible Reporting format; this approach promotes rigor and robustness, and makes reporting easier for the author and replication easier for the reader. --- title: | LHC PARAMETER REPRODUCIBILITY link: https://inspirehep.net/record/1635348/files/1635044_45-52.pdf date: 2017-11-14 00:00:00 tags: [reproducible paper] description: | This document reviews the stability of the main LHC operational parameters, namely orbit, tune, coupling and chromaticity. The analysis will be based on the LSA settings, measured parameters and real-time trims. The focus will be set on ramp and high energy reproducibility as they are more diflicult to assess and correct on a daily basis for certain parameters like chromaticity and coupling. The reproducibility of the machine in collision will be analysed in detail, in particular the beam offsets at the IPS since the ever decreasing beam sizes at the IPs make beam steering at the IP more and mode delicate. --- title: | Code and Data for the Social Sciences: A Practitioner’s Guide link: https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf date: 2017-11-13 00:00:00 tags: [reproducibility guidelines] description: | This handbook is about translating insights from experts in code and data into practical terms for empirical social scientists. We are not ourselves software engineers, database managers, or computer scientists, and we don’t presume to contribute anything to those disciplines. If this handbook accomplishes something, we hope it will be to help other social scientists realize that there are better ways to work. Much of the time, when you are solving problems with code and data, you are solving problems that have been solved before, better, and on a larger scale. Recognizing that will let you spend less time wrestling with your RA’s messy code, and more time on the research problems that got you interested in the first place. --- title: | Irreproducible data: Problems and solutions for psychiatry link: http://journals.sagepub.com/doi/full/10.1177/0004867417741558 date: 2017-11-13 00:00:00 tags: [reproducibility report] description: | Growing pressure in Australia to translate pre-clinical and clinical research into improving treatment outcomes (https://www.nhmrc.gov.au/research/research-translation-0) means that concerns about the irreproducibility of published data slowing research translation (Collins and Tabak, 2014) must be addressed. --- title: | Science retracts paper after Nobel laureate’s lab can’t replicate results link: http://retractionwatch.com/2017/10/26/science-retracts-paper-nobel-laureates-lab-cant-replicate-results/ date: 2017-10-27 00:00:00 tags: [news article] description: | In January, Bruce Beutler, an immunologist at University of Texas Southwestern Medical Center and winner of the 2011 Nobel Prize in Physiology or Medicine, emailed Science editor-in-chief Jeremy Berg to report that attempts to replicate the findings in "MAVS, cGAS, and endogenous retroviruses in T-independent B cell responses" had weakened his confidence in original results. The paper had found that virus-like elements in the human genome play an important role in the immune system’s response to pathogens. Although Beutler and several co-authors requested retraction right off the bat, the journal discovered that two co-authors disagreed, which Berg told us drew out the retraction process. In an attempt to resolve the situation, the journal waited for Beutler’s lab to perform another replication attempt. Those findings were inconclusive and the dissenting authors continued to push back against retraction. --- title: | Negative Results: A Crucial Piece of the Scientific Puzzle link: http://blogs.plos.org/everyone/2017/10/26/negative-results-a-crucial-piece-of-the-scientific-puzzle/ date: 2017-10-27 00:00:00 tags: [popular news] description: | Scientific advance relies on transparency, rigour and reproducibility. At PLOS ONE we have always supported the publication of rigorous research, in all its forms, positive or negative, as showcased in our earlier Missing Pieces Collection. In this 10th Anniversary Collection, A Decade of Missing Pieces Senior Editor Alejandra Clark revisits this important theme and highlights a decade of null and negative results, replication studies and studies refuting previously published work. --- title: | Tackling the reproducibility crisis requires universal standards link: https://www.timeshighereducation.com/opinion/tackling-reproducibility-crisis-requires-universal-standards#survey-answer date: 2017-10-26 00:00:00 tags: [news article] description: | In order for research methods to be consistent, accessible and reproducible, we need universal, widely understood standards for research that all scientists adhere to. NPL has been responsible for maintaining fundamental standards and units for more than 100 years and is now engaged in pioneering work to create a set of “gold standards” for all scientific methodologies, materials, analyses and protocols, based on exhaustive testing at a large number of laboratories, in tandem with both industry and national and international standardisation organisations. --- title: | A survey on provenance: What for? What form? What from? link: https://link.springer.com/article/10.1007/s00778-017-0486-1 date: 2017-10-21 00:00:00 tags: [reproducible paper] description: | Provenance refers to any information describing the production process of an end product, which can be anything from a piece of digital data to a physical object. While this survey focuses on the former type of end product, this definition still leaves room for many different interpretations of and approaches to provenance. These are typically motivated by different application domains for provenance (e.g., accountability, reproducibility, process debugging) and varying technical requirements such as runtime, scalability, or privacy. As a result, we observe a wide variety of provenance types and provenance-generating methods. This survey provides an overview of the research field of provenance, focusing on what provenance is used for (what for?), what types of provenance have been defined and captured for the different applications (what form?), and which resources and system requirements impact the choice of deploying a particular provenance solution (what from?). For each of these three key questions, we provide a classification and review the state of the art for each class. We conclude with a summary and possible future research challenges. --- title: | Jug: Software for parallel reproducible computation in Python link: http://luispedro.org/files/JugPaper.pdf date: 2017-10-21 00:00:00 tags: [reproducible paper] description: | As computational pipelines become a bigger part of science, it is important to ensure that the results are reproducible, a concern which has come to the fore in recent years. All developed software should be able to be run automatically without any user intervention. In addition to being valuable to the wider community, which may wish to reproduce or extend a published analysis, reproducible research practices allow for better control over the project by the original authors themselves. For example, keeping a non-executable record of parameters and command line arguments leads to error-prone analysis and opens up the possibility that, when the results are to be written up for publication, the researcher will no longer be able to even completely describe the process that led to them. For large projects, the use of multiple computational cores (either in a multi-core machine or distributed across a compute cluster) is necessary to obtain results in a useful time frame. Furthermore, it is often the case that, as the project evolves, it becomes necessary to save intermediate results while down-stream analyses are designed (or redesigned) and implemented. Under many frameworks, this causes having a single point of entry for the computation becomes increasingly difficult. Jug is a software framework which addresses these issues by caching intermediate results and distributing the computational work as tasks across a network. Jug is written in Python without the use of compiled modules, is completely crossplatform, and available as free software under the liberal MIT license. --- title: | Utilising Semantic Web Ontologies To Publish Experimental Workflows link: http://ceur-ws.org/Vol-1878/article-06.pdf date: 2017-09-14 00:00:00 tags: [reproducible paper] description: | Reproducibility in experiments is necessary to verify claims and to reuse prior work in experiments that advance research. However,the traditional model of publication validates research claims through peer-review without taking reproducibility into account. Workflows encapsulate experiment descriptions and components and are suitable for representing reproducibility. Additionally, they can be published alongside traditional patterns as a form of documentation for the experiment which can be combined with linked open data. For reproducibility utilising published datasets, it is necessary to declare the conditions or restrictions for permissible reuse. In this paper, we take a look at the state of workflow reproducibility through a browser based tool and a corresponding study to identify how workflows might be combined with traditional forms of documentation and publication. We also discuss the licensing aspects for data in workflows and how it can be annotated using linked open data ontologies --- title: | Report on the First IEEE Workshop on the Future of Research Curation and Research Reproducibility link: http://www.ieee.org/publications_standards/publications/ieee_workshops/ieee_reproducibility_workshop_report_final.pdf date: 2017-09-13 00:00:00 tags: [reproducibility report] description: | This report describes perspectives from the Workshop on the Future of Research Curation and Research Reproducibility that was collaboratively sponsored by the U.S. National Science Foundation (NSF) and IEEE (Institute of Electrical and Electronics Engineers) in November 2016. The workshop brought together stakeholders including researchers, funders, and notably, leading science, technology, engineering, and mathematics (STEM) publishers. The overarching objective was a deep dive into new kinds of research products and how the costs of creation and curation of these products can be sustainably borne by the agencies, publishers, and researcher communities that were represented by workshop participants. The purpose of this document is to describe the ideas that participants exchanged on approaches to increasing the value of all research by encouraging the archiving of reusable data sets, curating reusable software, and encouraging broader dialogue within and across disciplinary boundaries. How should the review and publication processes change to promote reproducibility? What kinds of objects should the curatorial process expand to embrace? What infrastructure is required to preserve the necessary range of objects associated with an experiment? Who will undertake this work? And who will pay for it? These are the questions the workshop was convened to address in presentations, panels, small working groups, and general discussion. --- title: | Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses link: https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx554/4103398/Dugong-a-Docker-image-based-on-Ubuntu-Linux date: 2017-09-05 00:00:00 tags: [reproducibility infrastructure] description: | Summary: This manuscript introduces and describes Dugong, a Docker image based on Ubuntu 16.04, which automates installation of more than 3500 bioinformatics tools (along with their respective libraries and dependencies), in alternative computational environments. The software operates through a user-friendly XFCE4 graphic interface that allows software management and installation by users not fully familiarized with the Linux command line and provides the Jupyter Notebook to assist in the delivery and exchange of consistent and reproducible protocols and results across laboratories, assisting in the development of open science projects. --- title: | open science, software, reproducibility, Galaxy link: https://f1000research.com/slides/6-1575 date: 2017-09-03 00:00:00 tags: [reproducibility talk] description: | Fairly high-level entry slides presented to the members of intercollegiate graduate program in Bioinformatics and Genomics at The Pennsylvania State University. --- title: | Enabling reproducible real-time quantitative PCR research: the RDML package link: https://academic.oup.com/bioinformatics/article-abstract/doi/10.1093/bioinformatics/btx528/4095640/Enabling-reproducible-real-time-quantitative-PCR?redirectedFrom=fulltext date: 2017-09-01 00:00:00 tags: [reproducible paper] description: | Reproducibility, a cornerstone of research, requires defined data formats, which include the set-up and output of experiments. The Real-time PCR Data Markup Language (RDML) is a recommended standard of the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines. Despite the popularity of the RDML format for analysis of qPCR data, handling of RDML files is not yet widely supported in all PCR curve analysis softwares. Results: This study describes the open source RDML package for the statistical computing language R.RDML is compatible with RDML versions ≤ 1.2 and provides functionality to (i) import RDML data; (ii) extract sample information (e.g., targets, concentration); (iii) transform data to various formats of the R environment; (iv) generate human readable run summaries; and (v) to create RDML files from user data. In addition, RDML offers a graphical user interface to read, edit and create RDML files. --- title: | American Geophysical Union Coalition Receives Grant to Advance Open and FAIR Data Standards in the Earth and Space Sciences link: http://markets.businessinsider.com/news/stocks/American-Geophysical-Union-Coalition-Receives-Grant-to-Advance-Open-and-FAIR-Data-Standards-in-the-Earth-and-Space-Sciences-607529 date: 2017-08-30 00:00:00 tags: [news article] description: | To address this critical need, the Laura and John Arnold Foundation has awarded a grant to a coalition of groups representing the international Earth and space science community, convened by the American Geophysical Union (AGU), to develop standards that will connect researchers, publishers, and data repositories in the Earth and space sciences to enable FAIR (findable, accessible, interoperable, and reusable) data – a concept first developed by Force11.org – on a large scale. This will accelerate scientific discovery and enhance the integrity, transparency, and reproducibility of this data. The resulting set of best practices will include: metadata and identifier standards; data services; common taxonomies; landing pages at repositories to expose the metadata and standard repository information; standard data citation; and standard integration into editorial peer review workflows. --- title: | The Sacred Infrastructure for Computational Research link: http://conference.scipy.org/proceedings/scipy2017/pdfs/klaus_greff.pdf date: 2017-08-29 00:00:00 tags: [reproducible paper] description: | We present a toolchain for computational research consisting of Sacred and two supporting tools. Sacred is an open source Python framework which aims to provide basic infrastructure for running computational experiments independent of the methods and libraries used. Instead, it focuses on solving universal everyday problems, such as managing configurations, reproducing results, and bookkeeping. Moreover, it provides an extensible basis for other tools, two of which we present here: Labwatch helps with tuning hyperparameters, and Sacredboard provides a web-dashboard for organizing and analyzing runs and results. --- title: | The Practice of Reproducible Research link: https://books.google.com/books?id=NDEyDwAAQBAJ&lpg=PA41&ots=xB24ENyKLu&dq=ReproZip&lr=lang_en&pg=PR1#v=onepage&q&f=false date: 2017-08-29 00:00:00 tags: [reproducible book] description: | This book contains a collection of 31 case studies of reproducible research workflows, written by academic researchers in the data-intensive sciences. Each case study describes how the author combined specific tools, ideas, and practices in order to complete a real-world research project. Emphasis is placed on the practical aspects of how the author organized his or her research to make it as reproducible as possible. --- title: | The Recomputation Manifesto link: https://arxiv.org/abs/1304.3674 date: 2017-08-25 00:00:00 tags: [reproducible paper] description: | Replication of scientific experiments is critical to the advance of science. Unfortunately, the discipline of Computer Science has never treated replication seriously, even though computers are very good at doing the same thing over and over again. Not only are experiments rarely replicated, they are rarely even replicable in a meaningful way. Scientists are being encouraged to make their source code available, but this is only a small step. Even in the happy event that source code can be built and run successfully, running code is a long way away from being able to replicate the experiment that code was used for. I propose that the discipline of Computer Science must embrace replication of experiments as standard practice. I propose that the only credible technique to make experiments truly replicable is to provide copies of virtual machines in which the experiments are validated to run. I propose that tools and repositories should be made available to make this happen. I propose to be one of those who makes it happen. --- title: | noWorkflow: a Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts link: http://www.vldb.org/pvldb/vol10/p1841-pimentel.pdf date: 2017-08-20 20:52:44 tags: [reproducible paper] description: | We present noWorkflow, an open-source tool that systematically and transparently collects provenance from Python scripts, including data about the script execution and how the script evolves over time. During the demo, we will show how noWorkflow collects and manages provenance, as well as how it supports the analysis of computational experiments.We will also encourage attendees to use noWorkflow for their own scripts. --- title: | Using ReproZip for Reproducibility and Library Services link: https://osf.io/5tm8d/ date: 2017-08-18 00:00:00 tags: [reproducible paper, ReproZip] description: | Achieving research reproducibility is challenging in many ways: there are social and cultural obstacles as well as a constantly changing technical landscape that makes replicating and reproducing research difficult. Users face challenges in reproducing research across different operating systems, in using different versions of software across long projects and among collaborations, and in using publicly available work. The dependencies required to reproduce the computational environments in which research happens can be exceptionally hard to track – in many cases, these dependencies are hidden or nested too deeply to discover, and thus impossible to install on a new machine, which means adoption remains low. In this paper, we present ReproZip, an open source tool to help overcome the technical difficulties involved in preserving and replicating research, applications, databases, software, and more. We examine the current use cases of ReproZip, ranging from digital humanities to machine learning. We also explore potential library use cases for ReproZip, particularly in digital libraries and archives, liaison librarianship, and other library services. We believe that libraries and archives can leverage ReproZip to deliver more robust reproducibility services, repository services, as well as enhanced discoverability and preservation of research materials, applications, software, and computational environments. --- title: | Science Reproducibility Taxonomy link: https://figshare.com/articles/Science_Reproducibility_Taxonomy/5248273 date: 2017-07-26 20:52:44 tags: [reproducibility talk] description: | Presentation slides for the 2017 Workshop on Reproducibility Taxonomies for Computing and Computational Science --- title: | Promoting transparency and reproducibility in Behavioral Neuroscience: Publishing replications, registered reports, and null results link: https://www.ncbi.nlm.nih.gov/pubmed/28714713 date: 2017-07-17 00:00:00 tags: [reproducible journal] description: | The editors of Behavioral Neuroscience have been discussing several recent developments in the landscape of scientific publishing. The discussion was prompted, in part, by reported issues of reproducibility and concerns about the integrity of the scientific literature. Although enhanced rigor and transparency in science are certainly important, a related issue is that increased competition and focus on novel findings has impeded the extent to which the scientific process is cumulative. We have decided to join the growing number of journals that are adopting new reviewing and publishing practices to address these problems. In addition to our standard research articles, we are pleased to announce 3 new categories of articles: replications, registered reports, and null results. In joining other journals in psychology and related fields to offer these publication types, we hope to promote higher standards of methodological rigor in our science. This will ensure that our discoveries are based on sound evidence and that they provide a durable foundation for future progress. (PsycINFO Database Record) --- title: | Sustainable computational science: the ReScience initiative link: https://arxiv.org/pdf/1707.04393.pdf date: 2017-07-14 06:14:54 tags: [reproducible journal] description: | Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests --- title: | Here's the three-pronged approach we're using in our own research to tackle the reproducibility issue link: https://theconversation.com/heres-the-three-pronged-approach-were-using-in-our-own-research-to-tackle-the-reproducibility-issue-80997 date: 2017-07-20 00:00:00 tags: [news article] description: | A big part of this problem has to do with what’s been called a “reproducibility crisis” in science – many studies if run a second time don’t come up with the same results. Scientists are worried about this situation, and high-profile international research journals have raised the alarm, too, calling on researchers to put more effort into ensuring their results can be reproduced, rather than only striving for splashy, one-off outcomes. Concerns about irreproducible results in science resonate outside the ivory tower, as well, because a lot of this research translates into information that affects our everyday lives. --- title: | Reproducibility Librarianship link: http://digitalcommons.du.edu/collaborativelibrarianship/vol9/iss2/4/ date: 2017-07-14 00:00:00 tags: [reproducible paper, ReproZip] description: | Over the past few years, research reproducibility has been increasingly highlighted as a multifaceted challenge across many disciplines. There are socio-cultural obstacles as well as a constantly changing technical landscape that make replicating and reproducing research extremely difficult. Researchers face challenges in reproducing research across different operating systems and different versions of software, to name just a few of the many technical barriers. The prioritization of citation counts and journal prestige has undermined incentives to make research reproducible. While libraries have been building support around research data management and digital scholarship, reproducibility is an emerging area that has yet to be systematically addressed. To respond to this, New York University (NYU) created the position of Librarian for Research Data Management and Reproducibility (RDM & R), a dual appointment between the Center for Data Science (CDS) and the Division of Libraries. This report will outline the role of the RDM & R librarian, paying close attention to the collaboration between the CDS and Libraries to bring reproducible research practices into the norm. --- title: | The Reproducibility Of Research And The Misinterpretation Of P Values link: http://www.biorxiv.org/content/early/2017/07/13/144337 date: 2017-07-14 00:00:00 tags: [reproducible paper] description: | We wish to answer this question: If you observe a "significant" P value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by P values between 0.01 and 0.05 is explored by exact calculations of false positive rates. When you observe P = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3:1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the P value. And if you want to limit the false positive rate to 5%, you would have to assume that you were 87% sure that there was a real effect before the experiment was done. If you observe P = 0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100:1 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive rate would still be 8% if the prior probability of a real effect was only 0.1. And, in this case, if you wanted to achieve a false positive rate of 5% you would need to observe P = 0.00045. It is recommended that P values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive rate. It may also be helpful to specify the minimum false positive rate associated with the observed P value. And that the terms "significant" and "non-significant" should never be used. Despite decades of warnings, many areas of science still insist on labelling a result of P < 0.05 as "significant". This practice must account for a substantial part of the lack of reproducibility in some areas of science. And this is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and P-hacking. Science is endangered by statistical misunderstanding, and by university presidents and research funders who impose perverse incentives on scientists. --- title: | Reproducible high energy physics analyses link: https://zenodo.org/record/819983#.WVPDl8alndR date: 2017-06-30 00:00:00 tags: [reproducibility talk] description: | Presentation on analysis preservation and reusability at #C4RR in Cambridge. --- title: | Cancer studies pass reproducibility test link: https://www.sciencemag.org/news/2017/06/cancer-studies-pass-reproducibility-test date: 2017-06-27 00:00:00 tags: [news article, replication study] description: | A high-profile project aiming to test reproducibility in cancer biology has released a second batch of results, and this time the news is good: Most of the experiments from two key cancer papers could be repeated. The latest replication studies, which appear today in eLife, come on top of five published in January that delivered a mixed message about whether high-impact cancer research can be reproduced. Taken together, however, results from the completed studies are “encouraging,” says Sean Morrison of the University of Texas Southwestern Medical Center in Dallas, an eLife editor. Overall, he adds, independent labs have now “reproduced substantial aspects” of the original experiments in four of five replication efforts that have produced clear results. --- title: | Reproducibility in Machine Learning-Based Studies: An Example of Text Mining link: https://openreview.net/pdf?id=By4l2PbQ- date: 2017-06-20 00:00:00 tags: [reproducible paper] description: | Reproducibility is an essential requirement for computational studies including those based on machine learning techniques. However, many machine learning studies are either not reproducible or are difficult to reproduce. In this paper, we consider what information about text mining studies is crucial to successful reproduction of such studies. We identify a set of factors that affect reproducibility based on our experience of attempting to reproduce six studies proposing text mining techniques for the automation of the citation screening stage in the systematic review process. Subsequently, the reproducibility of 30 studies was evaluated based on the presence or otherwise of information relating to the factors. While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports. --- title: | trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R link: https://arxiv.org/pdf/1706.04440.pdf date: 2017-06-15 00:00:00 tags: [reproducible paper] description: | Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an opensource implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language. --- title: | The Challenges of Reproducibility In Data Scarce Fields link: https://www.dataone.org/webinars/challenges-reproducibility-data-scarce-fields date: 2017-05-31 00:00:00 tags: [reproducibility talk] description: | A webinar on the challenges of reproducibility in data scarce fields. --- title: | RCE Podcast Looks at Reproducibility of Scientific Results link: http://insidehpc.com/2017/05/rce-podcast-looks-reproducibility-scientific-results/ date: 2017-05-31 00:00:00 tags: [news article, reproducibility talk] description: | In this RCE Podcast, Brock Palen and Jeff Squyres discuss Reproducible Neuroscience with RCE Podcast Chris Gorgolewski from Stanford. "In recent years there has been increasing concern about the reproducibility of scientific results. Because scientific research represents a major public investment and is the basis for many decisions that we make in medicine and society, it is essential that we can trust the results. Our goal is to provide researchers with tools to do better science. Our starting point is in the field of neuroimaging, because that’s the domain where our expertise lies." --- title: | Open Research/Open Data Forum: Transparency, Sharing, and Reproducibility in Scholarship link: https://vtechworks.lib.vt.edu/handle/10919/77859 date: 2017-05-31 00:00:00 tags: [reproducibility talk] description: | Join our panelists for a discussion on challenges and opportunities related to sharing and using open data in research, including meeting funder and journal guidelines. --- title: | Improving the Reproducibility of Scientific Applications with Execution Environment Specification link: http://ccl.cse.nd.edu/research/papers/meng-thesis.pdf date: 2017-05-31 00:00:00 tags: [reproducibility thesis] description: | This work makes its contribution by demonstrating the importance of execution environments for the reproducibility of scientific applications and differentiating execution environment specifications, which should be lightweight, persistent and deployable, from various tools used to create execution environments, which may experience frequent changes due to technological evolution. It proposes two preservation approaches and prototypes for the purposes of both result verification and research extension, and provides recommendations on how to build reproducible scientific applications from the start. --- title: | Improving research through reproducibility link: https://www.continuum.umn.edu/2017/05/improving-research-reproducibility/ date: 2017-05-19 00:00:00 tags: [news article] description: | The University of Minnesota Libraries addressed this issue head-on this year by launching the reproducibility portal in an effort to help faculty and others on campus improve their research practices. The portal is a collaboration that includes Liberal Arts Technology and Information Services (LATIS) and the Minnesota Supercomputing Institute (MSI). --- title: | PyConUK 2016: Creating a reproducible more secure python application link: https://www.youtube.com/watch?v=N10FFI_hF8s date: 2017-05-08 00:00:00 tags: [ReproZip] description: | Introduce the python environment wrapper and packing tools; virtualenv & pip. Show you how you can stay up to date by using in requires.io egg security and update checking. Cover Fabric a python deployment tool and wider systems and workflow replication with Vagrant and Reprozip.If time allowing touch upon test driven development and adding Travis to your project. --- title: | PresQT Workshop 1: Reprozip Preview link: https://www.youtube.com/watch?v=Nq2_ZANs_ho&index=11&list=PL4eHrOoPCRQu4KvurwodIY-pulM0TVMQD date: 2017-05-08 00:00:00 tags: [ReproZip] description: | A preview of ReproZip by Vicky Steeves at the PresQT Workshop May 1, 2017 at the University of Notre Dame. --- title: | Progress toward openness, transparency, and reproducibility in cognitive neuroscience link: https://osf.io/5veew/ date: 2017-04-20 00:00:00 tags: [reproducible paper] description: | Accumulating evidence suggests that many findings in psychological science and cognitive neuroscience may prove difficult to reproduce; statistical power in brain imaging studies is low, and has not improved recently; software errors in common analysis tools are common, and can go undetected for many years; and, a few large scale studies notwithstanding, open sharing of data, code, and materials remains the rare exception. At the same time, there is a renewed focus on reproducibility, transparency, and openness as essential core values in cognitive neuroscience. The emergence and rapid growth of data archives, meta-analytic tools, software pipelines, and research groups devoted to improved methodology reflects this new sensibility. We review evidence that the field has begun to embrace new open research practices, and illustrate how these can begin to address problems of reproducibility, statistical power, and transparency in ways that will ultimately accelerate discovery. --- title: | Repeat: A Framework to Assess Empirical Reproducibility in Biomedical Research link: https://osf.io/4np66/ date: 2017-04-20 00:00:00 tags: [reproducible paper] description: | Background: The reproducibility of research is essential to rigorous science, yet significant concerns of the reliability and verifiability of biomedical research have been recently highlighted. Ongoing efforts across several domains of science and policy are working to clarify the fundamental characteristics of reproducibility and to enhance the transparency and accessibility of research. Methods: The aim of the proceeding work is to develop an assessment tool operationalizing key concepts of research transparency in the biomedical domain, specifically for secondary biomedical data research using electronic health record data. The tool (RepeAT) was developed through a multi-phase process that involved coding and extracting recommendations and practices for improving reproducibility from publications and reports across the biomedical and statistical sciences, field testing the instrument, and refining variables. Results: RepeAT includes 103 unique variables grouped into five categories (research design and aim, database and data collection methods, data mining and data cleaning, data analysis, data sharing and documentation). Preliminary results in manually processing 40 scientific manuscripts indicate components of the proposed framework with strong inter-rater reliability, as well as directions for further research and refinement of RepeAT. Conclusions: The use of RepeAT may allow the biomedical community to have a better understanding of the current practices of research transparency and accessibility among principal investigators. Common adoption of RepeAT may improve reporting of research practices and the availability of research outputs. Additionally, use of RepeAT will facilitate comparisons of research transparency and accessibility across domains and institutions. --- title: | Video can make science more open, transparent, robust, and reproducible link: https://osf.io/3kvp7/ date: 2017-04-20 00:00:00 tags: [reproducible paper] description: | Amidst the recent flood of concerns about transparency and reproducibility in the behavioral and clinical sciences, we suggest a simple, inexpensive, easy-to-implement, and uniquely powerful tool to improve the reproducibility of scientific research and accelerate progress—video recordings of experimental procedures. Widespread use of video for documenting procedures could make moot disagreements about whether empirical replications truly reproduced the original experimental conditions. We call on researchers, funders, and journals to make commonplace the collection and open sharing of video-recorded procedures. --- title: | Need to find a replication partner, or collaborator? There’s an online platform for that link: http://retractionwatch.com/2017/04/19/need-find-replication-partner-collaborator-theres-online-platform/ date: 2017-04-19 00:00:00 tags: [replication study] description: | Do researchers need a new "Craigslist?" We were recently alerted to a new online platform called StudySwap by one of its creators, who said it was partially inspired by one of our posts. The platform creates an "online marketplace" that previous researchers have called for, connecting scientists with willing partners – such as a team looking for someone to replicate its results, and vice versa. As co-creators Christopher Chartier at Ashland University and Randy McCarthy at Northern Illinois University tell us, having a place where researchers can find each other more efficiently "is in everyone’s best interest." --- title: | Ethical and legal implications of the methodological crisis in neuroimaging link: https://osf.io/2gehg/ date: 2017-04-12 00:00:00 tags: [reproducible paper] description: | Currently, many scientific fields such as psychology or biomedicine face a methodological crisis concerning the reproducibility, replicability and validity of their research. In neuroimaging, similar methodological concerns have taken hold of the field and researchers are working frantically towards finding solutions for the methodological problems specific to neuroimaging. This paper examines some ethical and legal implications of this methodological crisis in neuroimaging. With respect to ethical challenges, the paper discusses the impact of flawed methods in neuroimaging research in cognitive and clinical neuroscience, particulyrly with respect to faulty brain-based models of human cognition, behavior and personality. Specifically examined is whether such faulty models, when they are applied to neurological or psychiatric diseases, could put patients at risk and whether this places special obligations upon researchers using neuroimaging. In the legal domain, the actual use of neuroimaging as evidence in U.S. courtrooms is surveyed, followed by an examination of ways the methodological problems may create challenges for the criminal justice system. Finally, the paper reviews and promotes some promising ideas and initiatives from within the neuroimaging community for addressing the methodological problems. --- title: | Developing Standards for Data Citation and Attribution for Reproducible Research in Linguistics link: https://sites.google.com/a/hawaii.edu/data-citation/ date: 2017-04-08 00:00:00 tags: [reproducibility guidelines] description: | While linguists have always relied on language data, they have not always facilitated access to those data. Linguistic publications typically include short excerpts from data sets, ordinarily consisting of fewer than five words, and often without citation. Where citations are provided, the connection to the data set is usually only vaguely identified. An excerpt might be given a citation which refers to the name of the text from which it was extracted, but in practice the reader has no way to access that text. That is, in spite of the potential generated by recent shifts in the field, a great deal of linguistic research created today is not reproducible, either in principle or in practice. The workshops and panel presentation will facilitate development of standards for the curation and citation of linguistics data that are responsive to these changing conditions and shift the field of linguistics toward a more scientific, data-driven model which results in reproducible research. --- title: | Open for Comments: Linguistics Data Interest Group Charter Statement link: https://www.rd-alliance.org/group/linguistics-data-interest-group/case-statement/linguistics-data-interest-group-charter date: 2017-04-08 00:00:00 tags: [reproducibility guidelines] description: | Data are fundamental to the field of linguistics. Examples drawn from natural languages provide a foundation for claims about the nature of human language, and validation of these linguistic claims relies crucially on these supporting data. Yet, while linguists have always relied on language data, they have not always facilitated access to those data. Publications typically include only short excerpts from data sets, and where citations are provided, the connections to the data sets are usually only vaguely identified. At the same time, the field of linguistics has generally viewed the value of data without accompanying analysis with some degree of skepticism, and thus linguists have murky benchmarks for evaluating the creation, curation, and sharing of data sets in hiring, tenure and promotion decisions.This disconnect between linguistics publications and their supporting data results in much linguistic research being unreproducible, either in principle or in practice. Without reproducibility, linguistic claims cannot be readily validated or tested, rendering their scientific value moot. In order to facilitate the development of reproducible research in linguistics, The Linguistics Data Interest Group plans to develop the discipline-wide adoption of common standards for data citation and attribution. In our parlance citation refers to the practice of identifying the source of linguistic data, and attribution refers to mechanisms for assessing the intellectual and academic value of data citations. --- title: | Reproducible research in the Python ecosystem: a reality check link: http://blog.khinsen.net/posts/2017/04/06/reproducible-research-in-the-python-ecosystem-a-reality-check/ date: 2017-04-06 00:00:00 tags: [reproducibility study] description: | In summary, my little experiment has shown that reproducibility of Python scripts requires preserving the original environment, which fortunately is not so difficult over a time span of four years, at least if everything you need is part of the Anaconda distribution. I am not sure I would have had the patience to reinstall everything from source, given an earlier bad experience. The purely computational part of my code was even surprisingly robust under updates in its dependencies. But the plotting code wasn’t, as matplotlib has introduced backwards-incompatible changes in a widely used function. Clearly the matplotlib team prepared this carefully, introducing a deprecation warning before introducing the breaking change. For properly maintained client code, this can probably be dealt with. --- title: | Structuring supplemental materials in support of reproducibility link: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1205-3 date: 2017-04-07 00:00:00 tags: [reproducible paper] description: | Supplements are increasingly important to the scientific record, particularly in genomics. However, they are often underutilized. Optimally, supplements should make results findable, accessible, interoperable, and reusable (i.e., “FAIR”). Moreover, properly off-loading to them the data and detail in a paper could make the main text more readable. We propose a hierarchical organization for supplements, with some parts paralleling and “shadowing” the main text and other elements branching off from it, and we suggest a specific formatting to make this structure explicit. Furthermore, sections of the supplement could be presented in multiple scientific “dialects”, including machine-readable and lay-friendly formats. --- title: | AI buzzwords explained: scientific workflows link: https://dl.acm.org/citation.cfm?id=3067683 date: 2017-03-23 00:00:00 tags: [reproducible paper] description: | The reproducibility of scientific experiments is crucial for corroborating, consolidating and reusing new scientific discoveries. However, the constant pressure for publishing results (Fanelli, 2010) has removed reproducibility from the agenda of many researchers: in a recent survey published in Nature (with more than 1500 scientists) over 70% of the participants recognize to have failed to reproduce the work from another colleague at some point in time (Baker, 2016). Analyses from psychology and cancer biology show reproducibility rates below 40% and 10% respectively (Collaboration, 2015) (Begley & Lee, 2012). As a consequence, retractions of publications have occurred in the last years in several disciplines (Marcus & Oransky, 2014) (Rockoff, 2015), and the general public is now skeptical about scientific studies on topics like pesticides, depression drugs or flu pandemics (American, 2010). --- title: | The role of the IACUC in ensuring research reproducibility link: https://www.ncbi.nlm.nih.gov/pubmed/28328872 date: 2017-03-22 00:00:00 tags: [reproducible paper] description: | There is a "village" of people impacting research reproducibility, such as funding panels, the IACUC and its support staff, institutional leaders, investigators, veterinarians, animal facilities, and professional journals. IACUCs can contribute to research reproducibility by ensuring that reviews of animal use requests, program self-assessments and post-approval monitoring programs are sufficiently thorough, the animal model is appropriate for testing the hypothesis, animal care and use is conducted in a manner that is compliant with external and institutional requirements, and extraneous variables are minimized. The persons comprising the village also must have a shared vision that guards against reproducibility problems while simultaneously avoids being viewed as a burden to research. This review analyzes and discusses aspects of the IACUC's "must do" and "can do" activities that impact the ability of a study to be reproduced. We believe that the IACUC, with support from and when working synergistically with other entities in the village, can contribute to minimizing unintended research variables and strengthen research reproducibility. --- title: | Taking on chemistry's reproducibility problem link: https://www.chemistryworld.com/news/taking-on-chemistrys-reproducibility-problem/3006991.article date: 2017-03-21 00:00:00 tags: [news article] description: | Not a week passes without reproducibility in science – or the lack of it – hitting the headlines. Although much of the criticism is directed at the biomedical sciences or psychology, many of the same problems also pervade the chemical sciences. --- title: | A very simple, re-executable neuroimaging publication link: https://f1000research.com/articles/6-124/v1 date: 2017-03-19 00:00:00 tags: [reproducible paper] description: | Reproducible research is a key element of the scientific process. Re-executability of neuroimaging workflows that lead to the conclusions arrived at in the literature has not yet been sufficiently addressed and adopted by the neuroimaging community. In this paper, we document a set of procedures, which include supplemental additions to a manuscript, that unambiguously define the data, workflow, execution environment and results of a neuroimaging analysis, in order to generate a verifiable re-executable publication. Re-executability provides a starting point for examination of the generalizability and reproducibility of a given finding. --- title: | Reproducibility of computational workflows is automated using continuous analysis link: http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3780.html date: 2017-03-19 00:00:00 tags: [reproducible paper] description: | Replication, validation and extension of experiments are crucial for scientific progress. Computational experiments are scriptable and should be easy to reproduce. However, computational analyses are designed and run in a specific computing environment, which may be difficult or impossible to match using written instructions. We report the development of continuous analysis, a workflow that enables reproducible computational analyses. Continuous analysis combines Docker, a container technology akin to virtual machines, with continuous integration, a software development technique, to automatically rerun a computational analysis whenever updates or improvements are made to source code or data. This enables researchers to reproduce results without contacting the study authors. Continuous analysis allows reviewers, editors or readers to verify reproducibility without manually downloading and rerunning code and can provide an audit trail for analyses of data that cannot be shared. --- title: | Federating heterogeneous datasets to enhance data sharing and experiment reproducibility link: http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=2612413 date: 2017-03-19 00:00:00 tags: [reproducible paper] description: | Recent studies have demonstrated the difficulties to replicate scientific findings and/or experiments published in past.1 The effects seen in the replicated experiments were smaller than previously reported. Some of the explanations for these findings include the complexity of the experimental design and the pressure on researches to report positive findings. The International Committee of Medical Journal Editors (ICMJE) suggests that every study considered for publication must submit a plan to share the de-identified patient data no later than 6 months after publication. There is a growing demand to enhance the management of clinical data, facilitate data sharing across institutions and also to keep track of the data from previous experiments. The ultimate goal is to assure the reproducibility of experiments in the future. This paper describes Shiny-tooth, a web based application created to improve clinical data acquisition during the clinical trial; data federation of such data as well as morphological data derived from medical images; Currently, this application is being used to store clinical data from an osteoarthritis (OA) study. This work is submitted to the SPIE Biomedical Applications in Molecular, Structural, and Functional Imaging conference. --- title: | Reproducibility and Practical Adoption of GEOBIA with Open-Source Software in Docker Containers link: http://www.mdpi.com/2072-4292/9/3/290 date: 2017-03-18 00:00:00 tags: [reproducible paper] description: | Geographic Object-Based Image Analysis (GEOBIA) mostly uses proprietary software,but the interest in Free and Open-Source Software (FOSS) for GEOBIA is growing. This interest stems not only from cost savings, but also from benefits concerning reproducibility and collaboration. Technical challenges hamper practical reproducibility, especially when multiple software packages are required to conduct an analysis. In this study, we use containerization to package a GEOBIA workflow in a well-defined FOSS environment. We explore the approach using two software stacks to perform an exemplary analysis detecting destruction of buildings in bi-temporal images of a conflict area. The analysis combines feature extraction techniques with segmentation and object-based analysis to detect changes using automatically-defined local reference values and to distinguish disappeared buildings from non-target structures. The resulting workflow is published as FOSS comprising both the model and data in a ready to use Docker image and a user interface for interaction with the containerized workflow. The presented solution advances GEOBIA in the following aspects: higher transparency of methodology; easier reuse and adaption of workflows; better transferability between operating systems; complete description of the software environment; and easy application of workflows by image analysis experts and non-experts. As a result, it promotes not only the reproducibility of GEOBIA, but also its practical adoption. --- title: | Reproducibility in biomarker research and clinical development: a global challenge link: http://www.futuremedicine.com/doi/full/10.2217/bmm-2017-0024 date: 2017-03-18 00:00:00 tags: [reproducible paper] description: | According to a recent survey conducted by the journal Nature, a large percentage of scientists agrees we live in times of irreproducibility of research results [1]. They believe that much of what is published just cannot be trusted. While the results of the survey may be biased toward respondents with interest in the area of reproducibility, a concern is recognizable. Goodman et al. discriminate between different aspects of reproducibility and dissect the term into ‘material reproducibility’ (provision of sufficient information to enable repetition of the procedures), ‘results reproducibility’ (obtaining the same results from an independent study; formerly termed ‘replicability’) and ‘inferential reproducibility’ (drawing the same conclusions from separate studies) [2]. The validity of data is threatened by many issues, among others by poor utility of public information, poor protocols and design, lack of standard analytical, clinical practices and knowledge, conflict of interest and other biases, as well as publication strategy. --- title: | The science 'reproducibility crisis' – and what can be done about it link: https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198 date: 2017-03-16 00:00:00 tags: [news article] description: | Reproducibility is the idea that an experiment can be repeated by another scientist and they will get the same result. It is important to show that the claims of any experiment are true and for them to be useful for any further research. However, science appears to have an issue with reproducibility. A survey by Nature revealed that 52% of researchers believed there was a "significant reproducibility crisis" and 38% said there was a "slight crisis". We asked three experts how they think the situation could be improved. --- title: | Research team presents a molecular switch so far unmatched in its reproducibility link: https://phys.org/news/2017-03-team-molecular-unmatched.html date: 2017-03-13 00:00:00 tags: [news article] description: | The theoretical physicists Junior Professor Fabian Pauly and his postdoc Dr. Safa G. Bahoosh now succeeded in a team of experimental physicists and chemists in demonstrating a reliable and reproducible single molecule switch. The basis for this switch is a specifically synthesized molecule with special properties. This is an important step towards realising fundamental ideas of molecular electronics. The results were published in the online journal Nature Communications on 9 March 2017. --- title: | Addressing reproducibility in single-laboratory phenotyping experiments [Supplementary Materials] link: https://figshare.com/projects/Supplementary_Materials_for_Addressing_reproducibility_in_single-laboratory_phenotyping_experiments_/19483 date: 2017-03-13 00:00:00 tags: [reproducible paper] description: | Supplementary Materials for the article, including datasets for the analyses, the analyses reports, R scripts to reproduce them and the article's figures data. --- title: | Preprint: Transparency, Reproducibility, and the Credibility of Economics Research link: https://osf.io/c4uwk/ date: 2017-03-04 00:00:00 tags: [case studies, reproducible paper] description: | There is growing interest in enhancing research transparency and reproducibility in economics and other scientific fields. We survey existing work on these topics within economics, and discuss the evidence suggesting that publication bias, inability to replicate, and specification searching remain widespread in the discipline. We next discuss recent progress in this area, including through improved research design, study registration and pre-analysis plans, disclosure standards, and open sharing of data and materials, drawing on experiences in both economics and other social sciences. We discuss areas where consensus is emerging on new practices, as well as approaches that remain controversial, and speculate about the most effective ways to make economics research more credible in the future. --- title: | Reproducible Data Analysis in Jupyter link: https://jakevdp.github.io/blog/2017/03/03/reproducible-data-analysis-in-jupyter/ date: 2017-03-03 00:00:00 tags: [case studies, reproducibility guidelines, reproducibility infrastructure] description: | Jupyter notebooks provide a useful environment for interactive exploration of data. A common question I get, though, is how you can progress from this nonlinear, interactive, trial-and-error style of exploration to a more linear and reproducible analysis based on organized, packaged, and tested code. This series of videos presents a case study in how I personally approach reproducible data analysis within the Jupyter notebook. --- title: | Publishing a reproducible paper link: https://figshare.com/articles/Publishing_a_reproducible_paper/4720996 date: 2017-03-03 00:00:00 tags: [reproducibility talk] description: | Adolescence is a period of human brain growth and high incidence of mental health disorders. In 2016 the Neuroscience in Psychiatry Network published a study of adolescent brain development which showed that the hubs of the structural connectome are late developing and are found in association cortex (https://doi.org/10.1073/pnas.1601745113). Furthermore these regions are enriched for genes related to schizophrenia. In this presentation Dr Kirstie Whitaker will demonstrate how this work is supported by open data and analysis code, and that the results replicate in two independent cohorts of teenagers. She will encourage Brainhack-Global participants to take steps towards ensuring that their work meets these standards for open and reproducible science in 2017 and beyond. --- title: | HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset link: http://www.sciencedirect.com/science/article/pii/S0306437916303246 date: 2017-02-27 00:00:00 tags: [reproducible paper, ReproZip] description: | This work is a detailed companion reproducibility paper of the methods and experiments proposed in three previous works by Lastra-Díaz and García-Serrano, which introduce a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the aforementioned works. This work introduces a new representation model for taxonomies called PosetHERep, and a Java software library called Half-Edge Semantic Measures Library (HESML) based on it, which implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature. --- title: | Reproducibility of biomedical research – The importance of editorial vigilance link: http://www.sciencedirect.com/science/article/pii/S2214753517300049 date: 2017-02-27 00:00:00 tags: [reproducible paper] description: | Many journal editors are a failing to implement their own authors’ instructions, resulting in the publication of many articles that do not meet basic standards of transparency, employ unsuitable data analysis methods and report overly optimistic conclusions. This problem is particularly acute where quantitative measurements are made and results in the publication of papers that lack scientific rigor and contributes to the concerns with regard to the reproducibility of biomedical research. This hampers research areas such as biomarker identification, as reproducing all but the most striking changes is challenging and translation to patient care rare. --- title: | Reproducibility Data: The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model link: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/Y72Z1J date: 2017-02-27 00:00:00 tags: [reproducible paper] description: | Unlike most other SAPA datasets available on Dataverse, these data are specifically tied to the reproducible manuscript entitled "The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model." Most of these files are images that should be downloaded and organized in the same location as the source .Rnw file. A few files contain data that have already been processed (and could be independently re-created using code in the .Rnw file) - these are included to shorten the processing time needed to reproduce the original document. The raw data files for most of the analyses are stored in 3 separate locations, 1 for each of the 3 samples. These are: Exploratory sample - doi:10.7910/DVN/SD7SVE Replication sample - doi:10.7910/DVN/3LFNJZ Confirmatory sample - doi:10.7910/DVN/I8I3D3 . If you have any questions about reproducing the file, please first consult the instructions in the Preface of the PDF version. Note that the .Rnw version of the file includes many annotations that are not visible in the PDF version (https://sapa-project.org/research/SPI/SPIdevelopment.pdf) and which may also be useful. If you still have questions, feel free to email me directly. Note that it is unlikely that I will be able to help with technical issues that do not relate of R, Knitr, Sweave, and LaTeX. --- title: | Most scientists 'can't replicate studies by their peers' link: http://www.bbc.com/news/science-environment-39054778 date: 2017-02-24 00:00:00 tags: [news article] description: | Science is facing a "reproducibility crisis" where more than two-thirds of researchers have tried and failed to reproduce another scientist's experiments, research suggests. This is frustrating clinicians and drug developers who want solid foundations of pre-clinical research to build upon. From his lab at the University of Virginia's Centre for Open Science, immunologist Dr Tim Errington runs The Reproducibility Project, which attempted to repeat the findings reported in five landmark cancer studies. --- title: | When Evidence Says No, but Doctors Say Yes link: https://www.theatlantic.com/health/archive/2017/02/when-evidence-says-no-but-doctors-say-yes/517368/ date: 2017-02-24 00:00:00 tags: [popular news] description: | According to Vinay Prasad, an oncologist and one of the authors of the Mayo Clinic Proceedings paper, medicine is quick to adopt practices based on shaky evidence but slow to drop them once they’ve been blown up by solid proof. As a young doctor, Prasad had an experience that left him determined to banish ineffective procedures. He was the medical resident on a team caring for a middle-aged woman with stable chest pain. She underwent a stent procedure and suffered a stroke, resulting in brain damage. Prasad, now at the Oregon Health and Sciences University, still winces slightly when he talks about it. University of Chicago professor and physician Adam Cifu had a similar experience. Cifu had spent several years convincing newly postmenopausal patients to go on hormone therapy for heart health—a treatment that at the millennium accounted for 90 million annual prescriptions—only to then see a well-designed trial show no heart benefit and perhaps even a risk of harm. "I had to basically run back all those decisions with women," he says. "And, boy, that really sticks with you, when you have patients saying, 'But I thought you said this was the right thing.'" So he and Prasad coauthored a 2015 book, Ending Medical Reversal, a call to raise the evidence bar for adopting new medical standards. "We have a culture where we reward discovery; we don’t reward replication," Prasad says, referring to the process of retesting initial scientific findings to make sure they’re valid. --- title: | Encouraging Progress toward Reproducibility Reported link: http://www.genengnews.com/gen-news-highlights/encouraging-progress-toward-reproducibility-reported/81253910 date: 2017-02-22 00:00:00 tags: [news article] description: | At AAAS 2017, a pair of panel discussions addressed the reproducibility crisis in science, particularly biomedical science, and suggested that it is manageable, provided stakeholders continue to demonstrate a commitment to quality. One panel, led by Leonard P. Freedman, Ph.D., president of Global Biological Standards Institute (GBSI), was comprehensive. It prescribed a range of initiatives. --- title: | Figuring out a handshake link: http://kn-x.com/static/PWFeb17forum.pdf date: 2017-02-01 00:00:00 tags: [reproducibility guidelines] description: | Similarities between incentives in science and incentives in finance suggest a solution to crises in both. Published in the Feb 2017 print edition of Physics World magazine (physicsworld.com). --- title: | How to run a lab for reproducible research link: https://figshare.com/articles/How_to_run_a_lab_for_reproducible_research/4676170 date: 2017-02-21 00:00:00 tags: [reproducibility talk] description: | As a principal investigator, how do you run your lab for reproducibility? I submit the following action areas: commitment, transparency and open science, onboarding, collaboration, community and leadership. Make a public commitment to reproducible research—what this means for you could differ from others, but an essential core is common to all. Transparency is an essential value, and embracing open science is the best route to realize it. Onboarding every lab member with a deliberate group “syllabus” for reproducibility sets the expectations high. What is your list of must-read literature on reproducible research? I can share mine with you: my lab members helped to make it. For collaborating efficiently and building community, we take inspiration from the open-source world. We adopt its technology platforms to work on software and to communicate, openly and collaboratively. Key to the open-source culture is to give credit—give lots of credit for every contribution: code, documentation, tests, issue reports! The tools and methods require training, but running a lab for reproducibility is your decision. Start here–>commitment. --- title: | GBSI reports encouraging progress toward improved research reproducibility by year 2020 link: https://phys.org/news/2017-02-gbsi-year.html date: 2017-02-19 00:00:00 tags: [news article] description: | One year after the Global Biological Standards Institute (GBSI) issued its Reproducibility2020 challenge and action plan for the biomedical research community, the organization reports encouraging progress toward the goal to significantly improve the quality of preclinical biological research by year 2020. "Reproducibility2020 Report: Progress and Priorities," posted today on bioRxiv, identifies action and impact that has been achieved by the life science research community and outlines priorities going forward. The report is the first comprehensive review of the steps being taken to improve reproducibility since the issue became more widely known in 2012. --- title: | Supporting accessibility and reproducibility in language research in the Alveo virtual laboratory link: http://www.sciencedirect.com/science/article/pii/S0885230816302583 date: 2017-02-18 00:00:00 tags: [reproducible paper] description: | Reproducibility is an important part of scientific research and studies published in speech and language research usually make some attempt at ensuring that the work reported could be reproduced by other researchers. This paper looks at the current practice in the field relating to the citation and availability of both data and software methods. It is common to use widely available shared datasets in this field which helps to ensure that studies can be reproduced; however a brief survey of recent papers shows a wide range of styles of citation of data only some of which clearly identify the exact data used in the study. Similarly, practices in describing and sharing software artefacts vary considerably from detailed descriptions of algorithms to linked repositories. The Alveo Virtual Laboratory is a web based platform to support research based on collections of text, speech and video. Alveo provides a central repository for language data and provides a set of services for discovery and analysis of data. We argue that some of the features of the Alveo platform may make it easier for researchers to share their data more precisely and cite the exact software tools used to develop published results. Alveo makes use of ideas developed in other areas of science and we discuss these and how they can be applied to speech and language research. --- title: | New Tools for Content Innovation and Data Sharing: Enhancing Reproducibility and Rigor in Biomechanics Research link: http://www.sciencedirect.com/science/article/pii/S0021929017300763 date: 2017-02-18 00:00:00 tags: [reproducible paper] description: | We are currently in one of the most exciting times for science and engineering as we witness unprecedented growth computational and experimental capabilities to generate new data and models. To facilitate data and model sharing, and to enhance reproducibility and rigor in biomechanics research, the Journal of Biomechanics has introduced a number of tools for Content Innovation to allow presentation, sharing, and archiving of methods, models, and data in our articles. The tools include an Interactive Plot Viewer, 3D Geometric Shape and Model Viewer, Virtual Microscope, Interactive MATLAB Figure Viewer, and Audioslides. Authors are highly encouraged to make use of these in upcoming journal submissions. --- title: | Advancing meta-research through data sharing and transparency link: http://blogs.biomedcentral.com/on-medicine/2017/02/15/advancing-meta-research-through-data-sharing-and-transparency/ date: 2017-02-15 00:00:00 tags: [popular news] description: | A study published today in Systematic Reviews compares two concurrent systematic reviews from the Medtronic-Yale partnership that established the Yale Open Data Access (YODA) Project, which offered a unique opportunity to study meta-research reproducibility and to test models of data sharing. --- title: | Using Docker Containers to Extend Reproducibility Architecture for the NASA Earth Exchange (NEX) link: https://ntrs.nasa.gov/search.jsp?R=20170001275 date: 2017-02-11 00:00:00 tags: [reproducible paper] description: | NASA Earth Exchange (NEX) is a data, supercomputing and knowledge collaboratory that houses NASA satellite, climate and ancillary data where a focused community can come together to address large-scale challenges in Earth sciences. As NEX has been growing into a petabyte-size platform for analysis, experiments and data production, it has been increasingly important to enable users to easily retrace their steps, identify what datasets were produced by which process chains, and give them ability to readily reproduce their results. This can be a tedious and difficult task even for a small project, but is almost impossible on large processing pipelines. We have developed an initial reproducibility and knowledge capture solution for the NEX, however, if users want to move the code to another system, whether it is their home institution cluster, laptop or the cloud, they have to find, build and install all the required dependencies that would run their code. This can be a very tedious and tricky process and is a big impediment to moving code to data and reproducibility outside the original system. The NEX team has tried to assist users who wanted to move their code into OpenNEX on Amazon cloud by creating custom virtual machines with all the software and dependencies installed, but this, while solving some of the issues, creates a new bottleneck that requires the NEX team to be involved with any new request, updates to virtual machines and general maintenance support. In this presentation, we will describe a solution that integrates NEX and Docker to bridge the gap in code-to-data migration. The core of the solution is saemi-automatic conversion of science codes, tools and services that are already tracked and described in the NEX provenance system, to Docker - an open-source Linux container software. Docker is available on most computer platforms, easy to install and capable of seamlessly creating and/or executing any application packaged in the appropriate format. We believe this is an important step towards seamless process deployment in heterogeneous environments that will enhance community access to NASA data and tools in a scalable way, promote software reuse, and improve reproducibility of scientific results. --- title: | Facilitating reproducible research by investigating computational metadata link: http://ieeexplore.ieee.org/abstract/document/7840958/?reload=true date: 2017-02-11 00:00:00 tags: [reproducible paper] description: | Computational workflows consist of a series of steps in which data is generated, manipulated, analysed and transformed. Researchers use tools and techniques to capture the provenance associated with the data to aid reproducibility. The metadata collected not only helps in reproducing the computation but also aids in comparing the original and reproduced computations. In this paper, we present an approach, "Why-Diff", to analyse the difference between two related computations by changing the artifacts and how the existing tools "YesWorkflow" and "NoWorkflow" record the changed artifacts. --- title: | Challenges of archiving and preserving born-digital news applications link: http://journals.sagepub.com/doi/abs/10.1177/0340035216686355 date: 2017-02-11 00:00:00 tags: [reproducible paper] description: | Born-digital news content is increasingly becoming the format of the first draft of history. Archiving and preserving this history is of paramount importance to the future of scholarly research, but many technical, legal, financial, and logistical challenges stand in the way of these efforts. This is especially true for news applications, or custom-built websites that comprise some of the most sophisticated journalism stories today, such as the “Dollars for Docs” project by ProPublica. Many news applications are standalone pieces of software that query a database, and this significant subset of apps cannot be archived in the same way as text-based news stories, or fully captured by web archiving tools such as Archive-It. As such, they are currently disappearing. This paper will outline the various challenges facing the archiving and preservation of born-digital news applications, as well as outline suggestions for how to approach this important work. --- title: | Computational Analysis of Lifespan Experiment Reproducibility link: http://biorxiv.org/content/early/2017/02/09/107417.article-info date: 2017-02-09 00:00:00 tags: [reproducible paper] description: | Independent reproducibility is essential to the generation of scientific knowledge. Optimizing experimental protocols to ensure reproducibility is an important aspect of scientific work. Genetic or pharmacological lifespan extensions are generally small compared to the inherent variability in mean lifespan even in isogenic populations housed under identical conditions. This variability makes reproducible detection of small but real effects experimentally challenging. In this study, we aimed to determine the reproducibility of C. elegans lifespan measurements under ideal conditions, in the absence of methodological errors or environmental or genetic background influences. To accomplish this, we generated a parametric model of C. elegans lifespan based on data collected from 5,026 wild-type N2 animals. We use this model to predict how different experimental practices, effect sizes, number of animals, and how different ‘shapes’ of survival curves affect the ability to reproduce real longevity effects. We find that the chances of reproducing real but small effects are exceedingly low and would require substantially more animals than are commonly used. Our results indicate that many lifespan studies are underpowered to detect reported changes and that, as a consequence, stochastic variation alone can account for many failures to reproduce longevity results. As a remedy, we provide power of detection tables that can be used as guidelines to plan experiments with statistical power to reliably detect real changes in lifespan and limit spurious false positive results. These considerations will improve best-practices in designing lifespan experiment to increase reproducibility. --- title: | Using the Nextflow framework for reproducible in-silico omics analyses across clusters and clouds link: https://peerj.com/preprints/2796.pdf date: 2017-02-08 00:00:00 tags: [reproducible paper, reproducibility infrastructure] description: | Reproducibility has become one of biology’s most pressing issues. This impasse has been fueled by the combined reliance on increasingly complex data analysis methods and the exponential growth of biological datasets. When considering the installation, deployment and maintenance of bioinformatic pipelines, an even more challenging picture emerges due to the lack of community standards. The effect of limited standards on reproducibility is amplified by the very diverse range of computational platforms and configurations on which these applications are expected to be applied (workstations, clusters, HPC, clouds, etc.). With no established standard at any level, diversity cannot be taken for granted. --- title: | Is software reproducibility possible and practical? link: https://danielskatzblog.wordpress.com/2017/02/07/is-software-reproducibility-possible-and-practical/ date: 2017-02-07 00:00:00 tags: [reproducibility talk] description: | This blog is based on part of a talk I gave in January 2017, and the thinking behind it, in turn, is based on my view of a series of recent talks and blogs, and how they might be fit together. The short summary is that general software reproducibly is hard at best, and may not be practical except in special cases. --- title: | Reproducibility analysis of the scientific workflows link: http://lib.uni-obuda.hu/sites/lib.uni-obuda.hu/files/BanatiAnna_ertekezes2016.pdf date: 2017-02-09 00:00:00 tags: [reproducible paper] description: | In this dissertation I deal with the requirements and the analysis of the reproducibility. I set out methods based on provenance data to handle or eliminate the unavailable or changing descriptors in order to be able reproduce an – in other way – non-reproducible scientific workflow. In this way I intend to support the scientist’s community in designing and creating reproducible scientific workflows. --- title: | Home Healthcare Transparent Toxicology: Towards improved reproducibility and data reusability link: http://www.biospectrumasia.com/biospectrum/opinion/224708/transparent-toxicology-towards-improved-reproducibility-reusability date: 2017-02-08 00:00:00 tags: [news article] description: | The concept of reproducibility is one of the foundations of scientific practice and the bedrock by which scientific validity can be established. However, the extent to which reproducibility is being achieved in the sciences is currently under question. Several studies have shown that much peer-reviewed scientific literature is not reproducible. One crucial contributor to the obstruction of reproducibility is the lack of transparency of original data and methods. Reproducibility, the ability of scientific results and conclusions to be independently replicated by independent parties, potentially using different tools and approaches, can only be achieved if data and methods are fully disclosed. --- title: | Video: Singularity – Containers for Science, Reproducibility, and HPC link: http://insidehpc.com/2017/02/video-singularity-containers-science-reproducibility-hpc/ date: 2017-02-07 00:00:00 tags: [reproducibility infrastructure] description: | Explore how Singularity liberates non-privileged users and host resources (such as interconnects, resource managers, file systems, accelerators …) allowing users to take full control to set-up and run in their native environments. This talk explores Singularity how it combines software packaging models with minimalistic containers to create very lightweight application bundles which can be simply executed and contained completely within their environment or be used to interact directly with the host file systems at native speeds. A Singularity application bundle can be as simple as containing a single binary application or as complicated as containing an entire workflow and is as flexible as you will need. --- title: | Data Science Environments partners publish reproducibility book link: http://escience.washington.edu/new-reproducibility-book-published date: 2017-02-04 00:00:00 tags: [news article] description: | Researchers from the UW’s eScience Institute, New York University Center for Data Science and Berkeley Institute for Data Science (BIDS) have authored a new book titled The Practice of Reproducible Research. Representatives from the three universities, all Moore-Sloan Data Science Environments partners, joined on January 27, 2017, at a symposium hosted by BIDS. There, speakers discussed the book’s content, including case studies, lessons learned and the potential future of reproducible research practices. --- title: | BIDS Apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods link: http://biorxiv.org/content/early/2017/01/29/079145 date: 2017-02-02 00:00:00 tags: [reproducible paper] description: | The rate of progress in human neurosciences is limited by the inability to easily apply a wide range of analysis methods to the plethora of different datasets acquired in labs around the world. In this work, we introduce a framework for creating, testing, versioning and archiving portable applications for analyzing neuroimaging data organized and described in compliance with the Brain Imaging Data Structure (BIDS). The portability of these applications (BIDS Apps) is achieved by using container technologies that encapsulate all binary and other dependencies in one convenient package. BIDS Apps run on all three major operating systems with no need for complex setup and configuration and thanks to the richness of the BIDS standard they require little manual user input. Previous containerized data processing solutions were limited to single user environments and not compatible with most multi-tenant High Performance Computing systems. BIDS Apps overcome this limitation by taking advantage of the Singularity container technology. As a proof of concept, this work is accompanied by 22 ready to use BIDS Apps, packaging a diverse set of commonly used neuroimaging algorithms. --- title: | JoVE Builds on Ten Years of Making Science Clearer, More Reproducible link: http://www.prweb.com/releases/2017/01/prweb14012037.htm date: 2017-02-02 00:00:00 tags: [news article] description: | JoVE, the leading creator and publisher of video solutions that increase productivity in scientific research and education, today announced 2017 plans to mark the Company’s 10th anniversary. This year-long initiative will include the introduction of new Engineering and the Physical Sciences Collections within JoVE Science Education. JoVE will launch ten major initiatives, including a new JoVE Unlimited pricing formula, enhanced web experience, and establish a number of grants to advance scientific research and education. --- title: | BOOK LAUNCH: The Practice of Reproducible Research link: https://events.berkeley.edu/?event_ID=106379&date=2017-01-27&tab=academic date: 2017-01-26 00:00:00 tags: [reproducibility report] description: | This symposium will serve as the launch event for our new open, online book, titled The Practice of Reproducible Research. The book contains a collection of 31 case studies in reproducible research practices written by scientists and engineers working in the data-intensive sciences. Each case study presents the specific approach that the author used to achieve reproducibility in a real-world research project, including a discussion of the overall project workflow, major challenges, and key tools and practices used to increase the reproducibility of the research. --- title: | Reproducibility in cancer biology: Making sense of replications link: https://elifesciences.org/content/6/e23383 date: 2017-01-24 00:00:00 tags: [replication study] description: | The first results from the Reproducibility Project: Cancer Biology suggest that there is scope for improving reproducibility in pre-clinical cancer research. --- title: | Reproducibility in cancer biology: Mixed outcomes for computational predictions link: https://elifesciences.org/content/6/e22661 date: 2017-01-24 00:00:00 tags: [replication study] description: | Experimental efforts to validate the output of a computational model that predicts new uses for existing drugs highlights the inherently complex nature of cancer biology. --- title: | Cancer scientists are having trouble replicating groundbreaking research link: http://www.vox.com/science-and-health/2017/1/23/14324326/replication-science-is-hard date: 2017-01-23 00:00:00 tags: [popular news] description: | Take the latest findings from the large-scale Reproducibility Project: Cancer Biology. Here, researchers focused on reproducing experiments from the highest-impact papers about cancer biology published from 2010 to 2012. They shared their results in five papers in the journal ELife last week — and not one of their replications definitively confirmed the original results. The findings echoed those of another landmark reproducibility project, which, like the cancer biology project, came from the Center for Open Science. This time, the researchers replicated major psychology studies — and only 36 percent of them confirmed the original conclusions. --- title: | Why Should Scientific Results Be Reproducible? link: http://www.pbs.org/wgbh/nova/next/body/reproducibility-explainer/ date: 2017-01-19 00:00:00 tags: [popular news] description: | Since 2005, when Stanford University professor John Ioannidis published his paper “Why Most Published Findings Are False” in PLOS Medicine, reports have been mounting of studies that are false, misleading, and/or irreproducible. Two major pharmaceutical companies each took a sample of “landmark” cancer biology papers and only were able to validate the findings of 6% and 11%, respectively. A similar attempt to validate 70 potential drugs targets for treating amytrophic lateral sclerosis in mice came up with zero positive results. In psychology, an effort to replicate 100 peer-reviewed studies successfully reproduced the results for only 39. While most replication efforts have focused on biomedicine, health, and psychology, a recent survey of over 1,500 scientists from various fields suggests that the problem is widespread. What originally began as a rumor among scientists has become a heated debate garnering national attention. The assertion that many published scientific studies cannot be reproduced has been covered in nearly every major newspaper, featured in TED talks, and discussed on televised late night talk shows. --- title: | Enabling Reproducibility for Small and Large Scale Research Data Sets link: http://www.dlib.org/dlib/january17/proell/01proell.html date: 2017-01-18 00:00:00 tags: [reproducible paper] description: | A large portion of scientific results is based on analysing and processing research data. In order for an eScience experiment to be reproducible, we need to able to identify precisely the data set which was used in a study. Considering evolving data sources this can be a challenge, as studies often use subsets which have been extracted from a potentially large parent data set. Exporting and storing subsets in multiple versions does not scale with large amounts of data sets. For tackling this challenge, the RDA Working Group on Data Citation has developed a framework and provides a set of recommendations, which allow identifying precise subsets of evolving data sources based on versioned data and timestamped queries. In this work, we describe how this method can be applied in small scale research data scenarios and how it can be implemented in large scale data facilities having access to sophisticated data infrastructure. We describe how the RDA approach improves the reproducibility of eScience experiments and we provide an overview of existing pilots and use cases in small and large scale settings. --- title: | Cancer reproducibility project releases first results link: http://www.nature.com/news/cancer-reproducibility-project-releases-first-results-1.21304 date: 2017-01-18 00:00:00 tags: [reproducible paper, news article] description: | The Reproducibility Project: Cancer Biology launched in 2013 as an ambitious effort to scrutinize key findings in 50 cancer papers published in Nature, Science, Cell and other high-impact journals. It aims to determine what fraction of influential cancer biology studies are probably sound — a pressing question for the field. In 2012, researchers at the biotechnology firm Amgen in Thousand Oaks, California, announced that they had failed to replicate 47 of 53 landmark cancer papers2. That was widely reported, but Amgen has not identified the studies involved. --- title: | A Survey of Current Reproducibility Practices in Linguistics Publications link: https://scholarspace.manoa.hawaii.edu/bitstream/10125/43567/1/Poster_Gawne_Berez-Kroeker_Kelly_Heston.pdf date: 2017-01-18 00:00:00 tags: [reproducibility report] description: | This project considers the role of reproducibility in increasing verification and accountability in linguistic research. An analysis of over 370 journal articles, dissertations, and grammars from a ten-year span is taken as a sample of current practices in the field. These are critiqued on the basis of transparency of data source, data collection methods, analysis, and storage. While we find examples of transparent reporting, much of the surveyed research does not include key metadata, methodological information, or citations that are resolvable to the data on which the analyses are based. This has implications for reproducibility and hence accountability, hallmarks of social science research which are currently under-represented in linguistic research. --- title: | Opening the Publication Process with Executable Research Compendia link: https://doi.org/10.1045/january2017-nuest date: 2017-01-16 00:00:00 tags: [reproducible paper] description: | A strong movement towards openness has seized science. Open data and methods, open source software, Open Access, open reviews, and open research platforms provide the legal and technical solutions to new forms of research and publishing. However, publishing reproducible research is still not common practice. Reasons include a lack of incentives and a missing standardized infrastructure for providing research material such as data sets and source code together with a scientific paper. Therefore we first study fundamentals and existing approaches. On that basis, our key contributions are the identification of core requirements of authors, readers, publishers, curators, as well as preservationists and the subsequent description of an executable research compendium (ERC). It is the main component of a publication process providing a new way to publish and access computational research. ERCs provide a new standardisable packaging mechanism which combines data, software, text, and a user interface description. We discuss the potential of ERCs and their challenges in the context of user requirements and the established publication processes. We conclude that ERCs provide a novel potential to find, explore, reuse, and archive computer-based research. --- title: | Supporting Data Reproducibility at NCI Using the Provenance Capture System link: http://www.dlib.org/dlib/january17/wang/01wang.html date: 2017-01-16 00:00:00 tags: [reproducible paper] description: | Scientific research is published in journals so that the research community is able to share knowledge and results, verify hypotheses, contribute evidence-based opinions and promote discussion. However, it is hard to fully understand, let alone reproduce, the results if the complex data manipulation that was undertaken to obtain the results are not clearly explained and/or the final data used is not available. Furthermore, the scale of research data assets has now exponentially increased to the point that even when available, it can be difficult to store and use these data assets. In this paper, we describe the solution we have implemented at the National Computational Infrastructure (NCI) whereby researchers can capture workflows, using a standards-based provenance representation. This provenance information, combined with access to the original dataset and other related information systems, allow datasets to be regenerated as needed which simultaneously addresses both result reproducibility and storage issues. --- title: | A manifesto for reproducible science link: http://www.nature.com/articles/s41562-016-0021 date: 2017-01-10 00:00:00 tags: [reproducible paper, reproducibility report] description: | Improving the reliability and efficiency of scientific research will increase the credibility of the published scientific literature and accelerate discovery. Here we argue for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives. There is some evidence from both simulations and empirical studies supporting the likely effectiveness of these measures, but their broad adoption by researchers, institutions, funders and journals will require iterative evaluation and improvement. We discuss the goals of these measures, and how they can be implemented, in the hope that this will facilitate action toward improving the transparency, reproducibility and efficiency of scientific research. --- title: | Scientific papers need better feedback systems. Here's why link: http://www.wired.co.uk/article/science-academic-papers-review date: 2017-01-06 00:00:00 tags: [news article] description: | Somewhere between 65 and 90 per cent of biomedical literature is considered non-reproducible. This means that if you try to reproduce an experiment described in a given paper, 65 to 90 per cent of the time you won't get the same findings. We call this the reproducibility crisis. The issue became live thanks to a study by Glenn Begley, who ran the oncology department at Amgen, a pharmaceutical company. In 2011, Begley decided to try to reproduce findings in 53 foundational papers in oncology: highly cited papers published in the top journals. He was unable to reproduce 47 of them - 89 per cent. --- title: | Leveraging Statistical Methods to Improve Validity and Reproducibility of Research Findings link: http://jamanetwork.com/journals/jamapsychiatry/article-abstract/2594382 date: 2016-12-28 00:00:00 tags: [reproducibility guidelines] description: | Scientific discoveries have the profound opportunity to impact the lives of patients. They can lead to advances in medical decision making when the findings are correct, or mislead when not. We owe it to our peers, funding sources, and patients to take every precaution against false conclusions, and to communicate our discoveries with accuracy, precision, and clarity. With the National Institutes of Health’s new focus on rigor and reproducibility, scientists are returning attention to the ideas of validity and reliability. At JAMA Psychiatry, we seek to publish science that leverages the power of statistics and contributes discoveries that are reproducible and valid. Toward that end, I provide guidelines for using statistical methods: the essentials, good practices, and advanced methods. --- title: | Lack of reproducibility triggers retractions of Nature Materials articles link: http://retractionwatch.com/2016/12/28/lack-reproducibility-triggers-retractions-nature-materials-articles/ date: 2016-12-28 00:00:00 tags: [news article] description: | The authors of a highly cited 2015 paper in Nature Materials have retracted it, after being unable to reproduce some of the key findings. We’ve seen this kind of thing before, from another Nature journal, although in one case the News & Views article only earned a warning notice. --- title: | Transparency, Reproducibility, and the Credibility of Economics Research link: https://www.nber.org/papers/w22989 date: 2016-12-22 00:00:00 tags: [reproducibility report] description: | There is growing interest in enhancing research transparency and reproducibility in economics and other scientific fields. We survey existing work on these topics within economics, and discuss the evidence suggesting that publication bias, inability to replicate, and specification searching remain widespread in the discipline. We next discuss recent progress in this area, including through improved research design, study registration and pre-analysis plans, disclosure standards, and open sharing of data and materials, drawing on experiences in both economics and other social sciences. We discuss areas where consensus is emerging on new practices, as well as approaches that remain controversial, and speculate about the most effective ways to make economics research more credible in the future. --- title: | The State of Reproducibility: 16 Advances from 2016 link: http://www.jove.com/blog/2016/12/21/the-state-of-reproducibility-16-advances-from-2016 date: 2016-12-21 00:00:00 tags: [popular news] description: | 2016 saw a tremendous amount of discussion and development on the subject of scientific reproducibility. Were you able to keep up? If not, check out this list of 16 sources from 2016 to get you up to date for the new year! The reproducibility crisis in science refers to the difficulty scientists have faced in reproducing or replicating results from previously published scientific experiments. Although this crisis has existed in the scientific community for a very long time, it gained much more visibility in in the past few years. The terms “reproducibility crisis” and “replicability crisis” were coined in the early 2010s due to the growing awareness of the problem. --- title: | Introduction to the special issue on recentering science: Replication, robustness, and reproducibility in psychophysiology link: http://onlinelibrary.wiley.com/doi/10.1111/psyp.12787/full date: 2016-12-20 00:00:00 tags: [reproducible journal] description: | In recent years, the psychological and behavioral sciences have increased efforts to strengthen methodological practices and publication standards, with the ultimate goal of enhancing the value and reproducibility of published reports. These issues are especially important in the multidisciplinary field of psychophysiology, which yields rich and complex data sets with a large number of observations. In addition, the technological tools and analysis methods available in the field of psychophysiology are continually evolving, widening the array of techniques and approaches available to researchers. This special issue presents articles detailing rigorous and systematic evaluations of tasks, measures, materials, analysis approaches, and statistical practices in a variety of subdisciplines of psychophysiology. These articles highlight challenges in conducting and interpreting psychophysiological research and provide data-driven, evidence-based recommendations for overcoming those challenges to produce robust, reproducible results in the field of psychophysiology. --- title: | Ensuring Reproducibility in Computational Processes: Automating Data Identification/Citation and Process Documentation link: http://riuma.uma.es/xmlui/handle/10630/12605 date: 2016-12-19 00:00:00 tags: [reproducible paper] description: | In this talk I will review a few examples of reproducibility challenges in computational environments and discuss their potential effects. Based on discussions in a recent Dagstuhl seminar we will identify different types of reproducibility. Here, we will focus specifically on what we gain from them, rather than seeing them merely as means to an end. We subsequently will address two core challenges impacting reproducibility, namely (1) understanding and automatically capturing process context and provenance information, and (2) approaches allowing us to deal with dynamically evolving data sets relying on recommendation of the Research Data Alliance (RDA). The goal is to raise awareness of reproducibility challenges and show ways how these can be addressed with minimal impact on the researchers via research infrastructures offering according services. --- title: | Enabling access to reproducible research link: http://www.ecs.soton.ac.uk/news/4972 date: 2016-12-19 00:00:00 tags: [news article] description: | A team of Web and Internet Science (WAIS) researchers, from Electronics and Computer Science at Southampton, has been working with statistical colleagues at the Centre for Multilevel Modelling, University of Bristol, to develop new software technology that allows UK students and young researchers to access reproducible statistical research. --- title: | Research transparency depends on sharing computational tools, says John Ioannidis link: http://scopeblog.stanford.edu/2016/12/15/research-transparency-depends-on-sharing-computational-tools-says-john-ioannidis/ date: 2016-12-15 00:00:00 tags: [reproducible paper] description: | A team of scientists including Stanford’s John Ioannidis, MD, DSc, has proposed a set of principles to improve the transparency and reproducibility of computational methods used in all areas of research. The group’s summary of those principles, known as the Reproducibility Enhancement Principles, was published recently in a paper in Science. --- title: | Enhancing reproducibility for computational methods link: http://science.sciencemag.org/content/354/6317/1240.summary date: 2016-12-13 00:00:00 tags: [reproducible paper] description: | Over the past two decades, computational methods have radically changed the ability of researchers from all areas of scholarship to process and analyze data and to simulate complex systems. But with these advances come challenges that are contributing to broader concerns over irreproducibility in the scholarly literature, among them the lack of transparency in disclosure of computational methods. Current reporting methods are often uneven, incomplete, and still evolving. We present a novel set of Reproducibility Enhancement Principles (REP) targeting disclosure challenges involving computation. These recommendations, which build upon more general proposals from the Transparency and Openness Promotion (TOP) guidelines (1) and recommendations for field data (2), emerged from workshop discussions among funding agencies, publishers and journal editors, industry participants, and researchers representing a broad range of domains. Although some of these actions may be aspirational, we believe it is important to recognize and move toward ameliorating irreproducibility in computational research. --- title: | Weekend reads: A flawed paper makes it into Nature; is science in big trouble?; a reproducibility crisis history link: http://retractionwatch.com/2016/12/10/weekend-reads-flawed-paper-makes-nature-science-big-trouble-reproducibility-crisis-history/ date: 2016-12-11 00:00:00 tags: [popular news] description: | The week at Retraction Watch featured a refreshingly honest retraction, and a big win for PubPeer. Here’s what was happening elsewhere. --- title: | Could Critical Incident Reporting Fix Preclinical Research? link: http://www.the-scientist.com/?articles.view/articleNo/47707/title/Could-Critical-Incident-Reporting-Fix-Preclinical-Research-/ date: 2016-12-11 00:00:00 tags: [news article] description: | Scientists propose a modified critical incident reporting system to help combat the reproducibility crisis.When Dirnagl first considered that his lab might benefit from a formal incident reporting system, he was surprised to find that no such system existed for biomedical researchers. Other high-stakes fields, from clinical medicine to nuclear power research, have long had such systems in place, but for the preclinical space, "we had to create one, because there’s nothing like it," Dirnagl said. But once Dirnagl and colleagues introduced an anonymous, online system, people began submitting reports. At meetings, the team would discuss what had gone wrong and strategize how to fix it. After a short while, Dirnagl said, his team began voluntarily filing virtually all reports with their signatures on them. --- title: | The Researchers’ View of Scientific Rigor—Survey on the Conduct and Reporting of In Vivo Research link: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0165999 date: 2016-12-02 00:00:00 tags: [reproducible paper] description: | Reproducibility in animal research is alarmingly low, and a lack of scientific rigor has been proposed as a major cause. Systematic reviews found low reporting rates of measures against risks of bias (e.g., randomization, blinding), and a correlation between low reporting rates and overstated treatment effects. Reporting rates of measures against bias are thus used as a proxy measure for scientific rigor, and reporting guidelines (e.g., ARRIVE) have become a major weapon in the fight against risks of bias in animal research. Surprisingly, animal scientists have never been asked about their use of measures against risks of bias and how they report these in publications. Whether poor reporting reflects poor use of such measures, and whether reporting guidelines may effectively reduce risks of bias has therefore remained elusive. To address these questions, we asked in vivo researchers about their use and reporting of measures against risks of bias and examined how self-reports relate to reporting rates obtained through systematic reviews. An online survey was sent out to all registered in vivo researchers in Switzerland (N = 1891) and was complemented by personal interviews with five representative in vivo researchers to facilitate interpretation of the survey results. Return rate was 28% (N = 530), of which 302 participants (16%) returned fully completed questionnaires that were used for further analysis. --- title: | ReproZip in the Journal of Open Source Software link: http://joss.theoj.org/papers/b578b171263c73f64dfb9d040ca80fe0 date: 2016-12-02 00:00:00 tags: [reproducible paper, ReproZip] description: | ReproZip (Rampin et al. 2014) is a tool aimed at simplifying the process of creating reproducible experiments. After finishing an experiment, writing a website, constructing a database, or creating an interactive environment, users can run ReproZip to create reproducible packages, archival snapshots, and an easy way for reviewers to validate their work. --- title: | Authorization of Animal Experiments Is Based on Confidence Rather than Evidence of Scientific Rigor link: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2000598 date: 2016-12-05 00:00:00 tags: [reproducible paper] description: | Accumulating evidence indicates high risk of bias in preclinical animal research, questioning the scientific validity and reproducibility of published research findings. Systematic reviews found low rates of reporting of measures against risks of bias in the published literature (e.g., randomization, blinding, sample size calculation) and a correlation between low reporting rates and inflated treatment effects. That most animal research undergoes peer review or ethical review would offer the possibility to detect risks of bias at an earlier stage, before the research has been conducted. --- title: | Reproducibility Crisis Timeline: Milestones in Tackling Research Reliability link: http://blogs.plos.org/absolutely-maybe/2016/12/05/reproducibility-crisis-timeline-milestones-in-tackling-research-reliability/ date: 2016-12-05 00:00:00 tags: [news article] description: | It’s not a new story, although "the reproducibility crisis" may seem to be. For life sciences, I think it started in the late 1950s. Problems caused in clinical research burst into the open in a very public way then. But before we get to that, what is "research reproducibility"? It’s a euphemism for unreliable research or research reporting. Steve Goodman and colleagues (2016) say 3 dimensions of science that affect reliability are at play: Methods reproducibility – enough detail available to enable a study to be repeated; Results reproducibility – the findings are replicated by others; Inferential reproducibility – similar conclusions are drawn about results, which brings statistics and interpretation squarely into the mix. There is a lot of history behind each of those. Here are some of the milestones in awareness and proposed solutions that stick out for me. --- title: | NIH-Wide Policy Doubles Down on Scientific Rigor and Reproducibility link: https://www.psychologicalscience.org/observer/nih-wide-policy-doubles-down-on-scientific-rigor-and-reproducibility date: 2016-12-02 00:00:00 tags: [news article] description: | The US National Institutes of Health (NIH) is now assessing all research grant submissions based on the rigor and transparency of the proposed research plans. Previously, efforts to strengthen scientific practices had been undertaken by individual institutes, beginning in 2011 with the National Institute on Aging, which partnered with APS and the NIH Office of Behavioral and Social Science Research to begin a conversation about improving reproducibility across science. These early efforts were noted and encouraged by Congress. Now, the entire agency has committed to this important goal: NIH's 2016–2020 strategic plan announces, "NIH will take the lead in promoting new approaches toward enhancing the rigor of experimental design, analysis, and reporting." --- title: | Reproducibility and Validity of a Food Frequency Questionnaire Designed to Assess Diet in Children Aged 4-5 Years link: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167338 date: 2016-11-30 00:00:00 tags: [reproducible paper] description: | The food frequency questionnaire (FFQ) is the most efficient and cost-effective method to investigate the relationship between usual diet and disease in epidemiologic studies. Although FFQs have been validated in many adult populations worldwide, the number of valid FFQ in preschool children is very scarce. The aim of this study was to evaluate the reproducibility and validity of a semi-quantitative FFQ designed for children aged 4 to 5 years. --- title: | KDD 2017 Research Papers New Reproducibility Policy link: http://www.kdd.org/kdd2017/calls/view/kdd-2017-call-for-research-papers date: 2016-11-30 00:00:00 tags: [reproducible paper, reproducibility conference] description: | Reproducibility: Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. Authors are strongly encouraged to make their code and data publicly available whenever possible. Algorithms and resources used in a paper should be described as completely as possible to allow reproducibility. This includes experimental methodology, empirical evaluations, and results. The reproducibility factor will play an important role in the assessment of each submission. --- title: | Replication in computing education research: researcher attitudes and experiences link: http://dl.acm.org/citation.cfm?id=2999554 date: 2016-11-29 00:00:00 tags: [reproducibility study] description: | Replicability is a core principle of the scientific method. However, several scientific disciplines have suffered crises in confidence caused, in large part, by attitudes toward replication. This work reports on the value the computing education research community associates with studies that aim to replicate, reproduce or repeat earlier research. The results were obtained from a survey of 73 computing education researchers. An analysis of the responses confirms that researchers in our field hold many of the same biases as those in other fields experiencing a crisis in replication. In particular, researchers agree that original works - novel works that report new phenomena - have more impact and are more prestigious. They also agree that originality is an important criteria for accepting a paper, making such work more likely to be published. Furthermore, while the respondents agree that published work should be verifiable, they doubt this standard is widely met in the computing education field and are not eager to perform the work of verifying others' work themselves. --- title: | Reproducible research: Stripe’s approach to data science link: https://stripe.com/blog/reproducible-research date: 2016-11-29 00:00:00 tags: [case studies, reproducible paper] description: | When people talk about their data infrastructure, they tend to focus on the technologies: Hadoop, Scalding, Impala, and the like. However, we’ve found that just as important as the technologies themselves are the principles that guide their use. We’d like to share our experience with one such principle that we’ve found particularly useful: reproducibility. We’ll talk about our motivation for focusing on reproducibility, how we’re using Jupyter Notebooks as our core tool, and the workflow we’ve developed around Jupyter to operationalize our approach. --- title: | Reproducible Risk Assessment link: http://onlinelibrary.wiley.com/doi/10.1111/risa.12730/full date: 2016-11-20 00:00:00 tags: [reproducible journal] description: | Reproducible research is a concept that has emerged in data and computationally intensive sciences in which the code used to conduct all analyses, including generation of publication quality figures, is directly available, and preferably in open source manner. This perspective outlines the processes and attributes, and illustrates the execution of reproducible research via a simple exposure assessment of air pollutants in metropolitan Philadelphia. --- title: | Student teams take on synbio reproducibility problem link: http://blogs.plos.org/synbio/2016/11/18/student-teams-take-on-synbio-reproducibility-problem/ date: 2016-11-19 00:00:00 tags: [news article] description: | Well, over the last two years iGEM teams around the world have been working to find out just how reproducible fluorescent proteins measurements are. They distributed testing plasmids and compared results across labs, measurement instruments, genetic parts, and E. coli strains. It’s a thorough 2 year study of interlab variability, and the results are out in PLOS ONE, “Reproducibility of Fluorescent Expression from Engineered Biological Constructs in E. coli“. --- title: | NIH Request for Information on Strategies for NIH Data Management, Sharing, and Citation link: http://osp.od.nih.gov/content/nih-request-information-strategies-nih-data-management-sharing-and-citation date: 2016-11-19 00:00:00 tags: [reproducibility guidelines] description: | This Request for Information (RFI) seeks public comments on data management and sharing strategies and priorities in order to consider: (1) how digital scientific data generated from NIH-funded research should be managed, and to the fullest extent possible, made publicly available; and, (2) how to set standards for citing shared data and software. Response to this RFI is voluntary. Responders are free to address any or all of the items in Sections I and II, delineated below, or any other relevant topics respondents recognize as important for NIH to consider. Respondents should not feel compelled to address all items. Instructions on how to respond to this RFI are provided in "Concluding Comments." --- title: | Linux Foundation Back Reproducible Builds Effort for Secure Software link: http://www.eweek.com/security/linux-foundation-back-reproducible-builds-effort-for-secure-software.html date: 2016-11-15 00:00:00 tags: [reproducibility infrastructure] description: | In an effort to help open-source software developers build more secure software, the Linux Foundation is doubling down on its efforts to help the reproducible builds project. Among the most basic and often most difficult aspects of software development is making sure that the software end-users get is the same software that developers actually built. "Reproducible builds are a set of software development practices that create a verifiable path from human readable source code to the binary code used by computers," the Reproducible Builds project explains. --- title: | From old York to New York: PASIG 2016 link: http://digital-archiving.blogspot.co.uk/2016/11/pasig-made-me-think.html date: 2016-11-04 00:00:00 tags: [reproducibility conference, ReproZip, reproducibility infrastructure] description: | One of the most valuable talks of the day for me was from Fernando Chirigati from New York University. He introduced us to a useful new tool called ReproZip. He made the point that the computational environment is as important as the data itself for the reproducibility of research data. This could include information about libraries used, environment variables and options. You can not expect your depositors to find or document all of the dependencies (or your future users to install them). What ReproZip does is package up all the necessary dependencies along with the data itself. This package can then be archived and re-used in the future. ReproZip can also be used to unpack and re-use the data in the future. I can see a very real use case for this for researchers within our institution. --- title: | Reward, reproducibility and recognition in research – the case for going Open link: http://septentrio.uit.no/index.php/SCS/article/view/4036 date: 2016-11-05 00:00:00 tags: [open access] description: | The advent of the internet has meant that scholarly communication has changed immeasurably over the past two decades but in some ways it has hardly changed at all. The coin in the realm of any research remains the publication of novel results in a high impact journal – despite known issues with the Journal Impact Factor. This elusive goal has led to many problems in the research process: from hyperauthorship to high levels of retractions, reproducibility problems and 'cherry picking' of results. The veracity of the academic record is increasingly being brought into question. An additional problem is this static reward systems binds us to the current publishing regime, preventing any real progress in terms of widespread open access or even adoption of novel publishing opportunities. But there is a possible solution. Increased calls to open research up and provide a greater level of transparency have started to yield practical real solutions. This talk will cover the problems we currently face and describe some of the innovations that might offer a way forward. --- title: | Scientific Data Science and the Case for Open Access link: https://arxiv.org/pdf/1611.00097.pdf date: 2016-11-05 00:00:00 tags: [data science, open access] description: | "Open access" has become a central theme of journal reform inacademic publishing. In this article, Iexamine the consequences of an important technological loophole in which publishers can claim to be adhering to the principles of open access by releasing articles in proprietary or “locked” formats that cannot be processed by automated tools, whereby even simple copy and pasting of text is disabled. These restrictions will prevent the development of an important infrastructural element of a modern research enterprise, namely,scientific data science, or the use of data analytic techniques to conduct meta-analyses and investigations into the scientific corpus. I give a brief history of the open access movement, discuss novel journalistic practices, and an overview of data-driven investigation of the scientific corpus. I arguethat particularly in an era where the veracity of many research studies has been called into question, scientific data science should be oneof the key motivations for open access publishing. The enormous benefits of unrestricted access to the research literature should prompt scholars from all disciplines to reject publishing models whereby articles are released in proprietary formats or are otherwise restricted from being processed by automated tools as part of a data science pipeline. --- title: | The research data reproducibility problem solicits a 21st century solution link: http://ojs.whioce.com/index.php/apm/article/viewFile/53/50 date: 2016-11-05 00:00:00 tags: [reproducibility report] description: | Reproducibility is a hallmark of scientific efforts. Estimates indicate that lack of reproducibility of data ranges from 50% to 90% among published research reports. The inability to reproduce major findings of published data confounds new discoveries, and importantly, result in wastage of limited resources in the futile effort to build on these published reports. This poses a challenge to the research community to change the way we approach reproducibility by developing new tools to help progress the reliability of methods and materials we use in our trade. --- title: | Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments link: https://arxiv.org/pdf/1610.09958.pdf date: 2016-11-05 00:00:00 tags: [reproducibility infrastructure] description: | We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, WholeTale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways. --- title: | A Reproducibility Reading List link: http://software-carpentry.org/blog/2016/11/reproducibility-reading-list.html date: 2016-11-02 00:00:00 tags: [reproducibility bibliography] description: | Prof. Lorena Barba has just posted a reading list for reproducible research that includes ten key papers to understand reproducibility. --- title: | Reinventing the Methods Journal: Increasing Reproducibility with Video Journals link: http://docs.lib.purdue.edu/atg/vol25/iss5/9/ date: 2016-11-01 00:00:00 tags: [reproducible journal] description: | The way science journals present research must be rehabilitated or risk becoming obsolete, causing foreseeable negative consequences to research funding and pro-ductivity. Researchers are dealing with ever- increasing complexities, and as techniques and solutions become more involved, so too does the task of describing them. Unfortunately, simply explaining a technique with text does not always paint a clear enough picture. Scientific publishing has followed essentially the same model since the original scientific journal was published in the mid-seventeenth century. Thanks to advances in technology, we have seen some minor improvements such as the addition of color printing and better dissemination and search functionality through online cataloging. But what has actually changed? In truth, not all that much. Articles are still published as text heavy-tomes with the occasional pho-tograph or chart to demonstrate a point. --- title: | A Framework for Scientific Workflow Reproducibility in the Cloud link: https://www.researchgate.net/profile/Rawaa_Qasha/publication/307905445_A_Framework_for_Scientific_Workflow_Reproducibility_in_the_Cloud/links/57ecf52c08ae92eb4d2689d0.pdf date: 2016-10-18 00:00:00 tags: [reproducibility infrastructure, ReproZip] description: | Workflow is a well-established means by which to capture scientific methods in an abstract graph of interrelated processing tasks. The reproducibility of scientific workflows is therefore fundamental to reproducible e-Science. However, the ability to record all the required details so as to make a workflow fully reproducible is a long-standing problem that is very difficult to solve. In this paper, we introduce an approach that integrates system description, source control, container management and automatic deployment techniques to facilitate workflow reproducibility. We have developed a framework that leverages this integration to support workflow execution, re-execution and reproducibility in the cloud and in a personal computing environment. We demonstrate the effectiveness of our approach by ex-amining various aspects of repeatability and reproducibility on real scientific workflows. The framework allows workflow andtask images to be captured automatically, which improves not only repeatability but also runtime performance. It also gives workflows portability across different cloud environments. Finally, the framework can also track changes in the development of tasks and workflows to protect them from unintentional failures. --- title: | Reproducibility and research misconduct: time for radical reform link: http://onlinelibrary.wiley.com/doi/10.1111/imj.13206/full date: 2016-10-18 00:00:00 tags: [reproducible paper] description: | We know now that much health and medical research which is published in peer-reviewed journals is wrong,[1] and consequently much is unable to be replicated.[2-4] This is due in part to poor research practice, biases in publication, and simply a pressure to publish in order to ‘survive’. Cognitive biases that unreasonably wed to our hypotheses and results are to blame.[5] Strongly embedded in our culture of health and medical research is the natural selection of poor science practice driven by the dependence for survival on high rates of publication in academic life. It is a classic form of cultural evolution along Darwinian lines.[6, 7] Do not think that even publications in the most illustrious medical journal are immune from these problems: the COMPare project[8] reveals that more than 85% of large randomised controlled trials deviate seriously from their plan when the trial was registered prior to its start. An average of more than five new outcome measures was secretly added to the publication and a similar number of nominated outcomes were silently omitted. It is hardly far-fetched to propose that this drive to publish is contributing to the growth in the number of papers retracted from the literature for dubious conduct[9] along with the increasing number of cases of research misconduct. --- title: | A University Symposium: Promoting Credibility, Reproducibility and Integrity in Research link: http://evpr.columbia.edu/content/PCRI date: 2016-10-17 00:00:00 tags: [reproducibility conference] description: | Columbia University and other New York City research institutions, including NYU, are hosting a one-day symposium on December 9, 2016 to showcase a robust discussion of reproducibility and research integrity among leading experts, high-profile journal editors, funders and researchers. This program will reveal the "inside story" of how issues are handled by institutions, journals and federal agencies and offer strategies for responding to challenges in these areas. The stimulating and provacative program is for researchers at all stages of their careers. --- title: | What is Replication Crisis? And what can be done to fix it? link: http://www.popsci.com/what-is-replication-crisis date: 2016-10-16 00:00:00 tags: [popular news] description: | Psychology has a replication problem. Since 2010, scientists conducting replications of hundreds of studies have discovered that a dismal amount of published results can be reproduced. This realization by psychologists has come to be known as "replication crisis". For me, this story all started with ego-depletion, and the comics I had drawn about it in 2014. The idea is that your self-control is a resource that can be diminished with use. When you think about all the times you've been slowly worn down by temptation, it seems obvious. When I drew the comics, there had been new research pointing to blood sugar levels as the font of self-control from which we all drew from. It also made sense—people get cranky when they're hungry. We even made up a word for it. We call it being "hangry". --- title: | Reproducibility and transparency in biomedical sciences link: http://onlinelibrary.wiley.com/doi/10.1111/odi.12588/full date: 2016-10-15 00:00:00 tags: [reproducible paper] description: | The biomedical research sciences are currently facing a challenge highlighted in several recent publications: concerns about the rigor and reproducibility of studies published in the scientific literature.Research progress is strongly dependent on published work. Basic science researchers build on their own prior work and the published findings of other researchers. This work becomes the foundation for preclinical and clinical research aimed at developing innovative new diagnostic tools and disease therapies. At each of the stages of research, scientific rigor and reproducibility are critical, and the financial and ethical stakes rise as drug development research moves through these stages. --- title: | Introduction: The Challenge of Reproducibility link: http://www.annualreviews.org/doi/full/10.1146/annurev-cb-32-100316-100001 date: 2016-10-15 00:00:00 tags: [reproducibility report] description: | Science progresses by an iterative process whereby discoveries build upon a foundation of established facts and principles. The integrity of the advancement of knowledge depends crucially on the reliability and reproducibility of our published results. Although mistakes and falsification of results have always been an unfortunate part of the process, most viewed scientific research as self-correcting; the incorrect results and conclusions would inevitably be challenged and replaced with more reliable information. But what happens if the process is corrupted by systematic errors brought about by the misapplication of statistics, the use of unreliable reagents and inappropriate cell models, and the pressure to publish in the most selective venues? We may be facing this scenario now in areas of biomedical science in which claims have been made that a majority of the most important work in, for example, cancer biology is not reproducible in the hands of drug companies that would seek to rely on the biomedical literature for opportunities in drug discovery. --- title: | A Year of Reproducibility Initiatives: The Replication Revolution Forges Ahead link: http://www.psychologicalscience.org/publications/observer/2014/july-august-14/a-year-of-reproducibility-initiatives-the-replication-revolution-forges-ahead.html date: 2016-10-14 00:00:00 tags: [news article] description: | Adhering faithfully to the scientific method is at the very heart of psychological inquiry. It requires scientists to be passionately dispassionate, to be intensely interested in scientific questions but not wedded to the answers. It asks that scientists not personally identify with their past work or theories — even those that bear their names — so that science as a whole can inch ever closer to illuminating elusive truths. That compliance isn’t so easy. But those who champion the so-called replication revolution in psychological science believe that it is possible — with the right structural reforms and personal incentives. --- title: | The hard road to reproducibility link: http://science.sciencemag.org/content/354/6308/142 date: 2016-10-07 00:00:00 tags: [popular news] description: | Early in my Ph.D. studies, my supervisor assigned me the task of running computer code written by a previous student who was graduated and gone. It was hell. I had to sort through many different versions of the code, saved in folders with a mysterious numbering scheme. There was no documentation and scarcely an explanatory comment in the code itself. It took me at least a year to run the code reliably, and more to get results that reproduced those in my predecessor's thesis. Now that I run my own lab, I make sure that my students don't have to go through that. --- title: | Scientific Misconduct: The Elephant in the Lab. A Response to Parker et al. link: http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347(16)30159-8?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0169534716301598%3Fshowall%3Dtrue date: 2016-10-07 00:00:00 tags: [reproducible paper] description: | In a recent Opinion article, Parker et al. [1] highlight a range of important issues and provide tangible solutions to improve transparency in ecology and evolution (E&E). We agree wholeheartedly with their points and encourage the E&E community to heed their advice. However, a key issue remains conspicuously unaddressed: Parker et al. assume that ‘deliberate dishonesty’ is rare in E&E, yet evidence suggests that occurrences of scientific misconduct (i.e., data fabrication, falsification, and/or plagiarism) are disturbingly common in the life sciences [2]. --- title: | Most computational hydrology is not reproducible, so is it really science? link: http://onlinelibrary.wiley.com/doi/10.1002/2016WR019285/full date: 2016-10-07 00:00:00 tags: [reproducible paper] description: | Reproducibility is a foundational principle in scientific research. Yet in computational hydrology, the code and data that actually produces published results is not regularly made available, inhibiting the ability of the community to reproduce and verify previous findings. In order to overcome this problem we recommend that re-useable code and formal workflows, which unambiguously reproduce published scientific results, are made available for the community alongside data, so that we can verify previous findings, and build directly from previous work. In cases where reproducing large-scale hydrologic studies is computationally very expensive and time-consuming, new processes are required to ensure scientific rigour. Such changes will strongly improve the transparency of hydrological research, and thus provide a more credible foundation for scientific advancement and policy support. --- title: | Reproducibility and replicability of rodent phenotyping in preclinical studies link: http://biorxiv.org/content/early/2016/10/05/079350 date: 2016-10-06 00:00:00 tags: [reproducible paper] description: | The scientific community is increasingly concerned with cases of published "discoveries" that are not replicated in further studies. The field of mouse phenotyping was one of the first to raise this concern, and to relate it to other complicated methodological issues: the complex interaction between genotype and environment; the definitions of behavioral constructs; and the use of the mouse as a model animal for human health and disease mechanisms. In January 2015, researchers from various disciplines including genetics, behavior genetics, neuroscience, ethology, statistics and bioinformatics gathered in Tel Aviv University to discuss these issues. The general consent presented here was that the issue is prevalent and of concern, and should be addressed at the statistical, methodological and policy levels, but is not so severe as to call into question the validity and the usefulness of the field as a whole. Well-organized community efforts, coupled with improved data and metadata sharing were agreed by all to have a key role to play in view of identifying specific problems, as well as promoting effective solutions. As replicability is related to validity and may also affect generalizability and translation of findings, the implications of the present discussion reach far beyond the issue of replicability of mouse phenotypes but may be highly relevant throughout biomedical research. --- title: | Repeat After Me: Why can't anyone replicate the scientific studies from those eye-grabbing headlines? link: https://thenib.com/repeat-after-me?t=default date: 2016-10-06 00:00:00 tags: [popular news] description: | A comic illustrating the complexities and history of research reproducibility. --- title: | Incentivizing Reproducibility link: http://cacm.acm.org/magazines/2016/10/207757-incentivizing-reproducibility/fulltext?1475360375780=1 date: 2016-10-06 00:00:00 tags: [reproducible journal] description: | A scientific result is not truly established until it is independently confirmed. This is one of the tenets of experimental science. Yet, we have seen a rash of recent headlines about experimental results that could not be reproduced. In the biomedical field, efforts to reproduce results of academic research by drug companies have had less than a 50% success rate,a resulting in billions of dollars in wasted effort. In most cases the cause is not intentional fraud, but rather sloppy research protocols and faulty statistical analysis. Nevertheless, this has led to both a loss in public confidence in the scientific enterprise and some serious soul searching within certain fields. Publishers have begun to take the lead in insisting on more careful reporting and review, as well as facilitating government open science initiatives mandating sharing of research data and code. To support efforts of this type, the ACM Publications Board recently approved a new policy on Result and Artifact Review and Badging. This policy defines two badges ACM will use to highlight papers that have undergone independent verification. Results Replicated is applied when the paper's main results have been replicated using artifacts provided by the author, or Results Reproduced if done completely independently. --- title: | Reproducibility: Seek out stronger science link: http://www.nature.com/nature/journal/v537/n7622/full/nj7622-703a.html date: 2016-10-05 00:00:00 tags: [news article] description: | When graduate student Alyssa Ward took a science-policy internship, she expected to learn about policy — not to unearth gaps in her biomedical training. She was compiling a bibliography about the reproducibility of experiments, and one of the papers, a meta-analysis, found that scientists routinely fail to explain how they choose the number of samples to use in a study. "My surprise was not about the omission — it was because I had no clue how, or when, to calculate sample size," Ward says. Nor had she ever been taught about major categories of experimental design, or the limitations of P values. (Although they can help to judge the strength of scientific evidence, P values do not — as many think — estimate the likelihood that a hypothesis is true.) --- title: | The Solution to Science's Replication Crisis link: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2835131 date: 2016-10-03 00:00:00 tags: [reproducible paper] description: | The solution to science's replication crisis is a new ecosystem in which scientists sell what they learn from their research. In each pairwise transaction, the information seller makes (loses) money if he turns out to be correct (incorrect). Responsibility for the determination of correctness is delegated, with appropriate incentives, to the information purchaser. Each transaction is brokered by a central exchange, which holds money from the anonymous information buyer and anonymous information seller in escrow, and which enforces a set of incentives facilitating the transfer of useful, bluntly honest information from the seller to the buyer. This new ecosystem, capitalist science, directly addresses socialist science's replication crisis by explicitly rewarding accuracy and penalizing inaccuracy. --- title: | Strategies to Increase Rigor and Reproducibility of Data in Manuscripts: Reply to Héroux link: http://jn.physiology.org/content/116/3/1538 date: 2016-10-01 00:00:00 tags: [reproducible paper] description: | A number of proactive steps are underway to improve the rigor and reproducibility of the data reported in the Journal of Neurophysiology. The American Physiological Society's Publications Committee is currently devising implementation plans for the following recommendations from editors of the Society's journals. --- title: | Reproducibility in wireless experimentation: need, challenges, and approaches link: http://dl.acm.org/citation.cfm?id=2984738 date: 2016-10-01 00:00:00 tags: [reproducible paper] description: | Wireless networks are the key enabling technology of the mobile revolution. However, experimental mobile and wireless research is still hindered by the lack of a solid framework to adequately evaluate the performance of a wide variety of techniques and protocols proposed by the community. In this talk, I will motivate the need for experimental reproducibility as a necessary aspect for healthy progress as accepted by other communities. I will illustrate how other research communities went through similar processes. I will then present the unique challenges of mobile and wireless experimentation, and discuss approaches, past, current, and future to address these challenges. Finally, I will discuss how reproducibility extends to mobile and wireless security research. --- title: | Validate your antibodies to improve reproducibility? Easier said than done link: http://www.sciencemag.org/news/2016/09/validate-your-antibodies-improve-reproducibility-easier-said-done date: 2016-09-29 00:00:00 tags: [news article] description: | It seems like the most elementary of research principles: Make sure the cells and reagents in your experiment are what they claim to be and behave as expected. But when it comes to antibodies—the immune proteins used in all kinds of experiments to tag a molecule of interest in a sample—that validation process is not straightforward. Research antibodies from commercial vendors are often screened and optimized for narrow experimental conditions, which means they may not work as advertised for many scientists. Indeed, problems with antibodies are thought to have led many drug developers astray and generated a host of misleading or irreproducible scientific results. --- title: | AN INTERNATIONAL INTER-LABORATORY DIGITAL PCR STUDY DEMONSTRATES HIGH REPRODUCIBILITY FOR THE MEASUREMENT OF A RARE SEQUENCE VARIANT link: http://biorxiv.org/content/early/2016/09/28/077917 date: 2016-09-29 00:00:00 tags: [reproducible paper] description: | This study tested the claim that digital PCR (dPCR) can offer highly reproducible quantitative measurements in disparate labs. Twenty-one laboratories measured four blinded samples containing different quantities of a KRAS fragment encoding G12D, an important genetic marker for guiding therapy of certain cancers. This marker is challenging to quantify reproducibly using qPCR or NGS due to the presence of competing wild type sequences and the need for calibration. Using dPCR, eighteen laboratories were able to quantify the G12D marker within 12% of each other in all samples. Three laboratories appeared to measure consistently outlying results; however, proper application of a follow-up analysis recommendation rectified their data. Our findings show that dPCR has demonstrable reproducibility across a large number of laboratories without calibration and could enable the reproducible application of molecular stratification to guide therapy, and potentially for molecular diagnostics. --- title: | Reproducibility of Search Strategies Is Poor in Systematic Reviews Published in High-Impact Pediatrics, Cardiology and Surgery Journals: A Cross-Sectional Study link: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163309 date: 2016-09-27 00:00:00 tags: [reproducibility study] description: | A high-quality search strategy is considered an essential component of systematic reviews but many do not contain reproducible search strategies. It is unclear if low reproducibility spans medical disciplines, is affected by librarian/search specialist involvement or has improved with increased awareness of reporting guidelines. --- title: | A Reproducibility Study of Information Retrieval Models link: http://dl.acm.org/citation.cfm?id=2970415 date: 2016-09-27 00:00:00 tags: [reproducibility study] description: | Developing effective information retrieval models has been a long standing challenge in Information Retrieval (IR), and significant progresses have been made over the years. With the increasing number of developed retrieval functions and the release of new data collections, it becomes more difficult, if not impossible, to compare a new retrieval function with all existing retrieval functions over all available data collections. To tackle thisproblem, this paper describes our efforts on constructing a platform that aims to improve the reproducibility of IR researchand facilitate the evaluation and comparison of retrieval functions. --- title: | Reproducibility: Harness passion of private fossil owners link: http://www.nature.com/nature/journal/v537/n7620/full/537307a.html date: 2016-09-20 00:00:00 tags: [news article] description: | Reproducing palaeontological results depends on unrestricted access to fossils described in the literature, allowing others to re-examine or reinterpret them. Museums have policies and protocols for keeping materials in the public trust, but accessibility to privately owned fossil collections can be a problem. --- title: | What do we mean by "reproducibility"? link: http://www.stats.org/what-do-we-mean-by-reproducibility/ date: 2016-09-19 00:00:00 tags: [news article] description: | There’s been a lot of discussion across many scientific fields about the "reproducibility crisis" in the past few years. Hundreds of psychologists attempted to redo 100 studies as part of the Reproducibility Project in Psychology, and claimed that fewer than half of the replication attempts succeeded. In Biomedicine, a study from the biotech firm Amgen tried to re-create results of 53 "landmark" preclinical cancer studies, and only got the same results for six of them. Amid a growing concern about research reliability, funders including the National Institutes of Health (NIH) have called for a greater effort to make research reproducible through transparent reporting of the methods researchers use to conduct their investigations. --- title: | Research Antibody Reproducibility link: http://www.genengnews.com/gen-articles/research-antibody-reproducibility/5833/ date: 2016-09-15 00:00:00 tags: [news article] description: | The ongoing dialogue has included the role of improperly validated research reagents, such as antibodies, with blame falling at the feet of reagent vendors, researchers, and journals. This article will highlight how the lack of consistent research on antibody validation has contributed to the reproducibility crisis and the role of vendors from Cell Signaling Technology’s (CST) perspective in making research more robust and reproducible. --- title: | Reproducibility: Respect your cells! link: http://www.nature.com/nature/journal/v537/n7620/full/537433a.html date: 2016-09-14 00:00:00 tags: [news article] description: | Numerous variables can torpedo attempts to replicate cell experiments, from the batch of serum to the shape of growth plates. But there are ways to ensure reliability. --- title: | Never Waste a Good Crisis: Confronting Reproducibility in Translational Research link: https://doi.org/10.1016/j.cmet.2016.08.006 date: 2016-09-14 00:00:00 tags: [news article] description: | The lack of reproducibility of preclinical experimentation has implications for sustaining trust in and ensuring the viability and funding of the academic research enterprise. Here I identify problematic behaviors and practices and suggest solutions to enhance reproducibility in translational research. --- title: | Why scientists must share their research code link: http://www.nature.com/news/why-scientists-must-share-their-research-code-1.20504 date: 2016-09-13 00:00:00 tags: [news article] description: | Many scientists worry over the reproducibility of wet-lab experiments, but data scientist Victoria Stodden's focus is on how to validate computational research: analyses that can involve thousands of lines of code and complex data sets. Beginning this month, Stodden — who works at the University of Illinois at Urbana-Champaign — becomes one of three ‘reproducibility editors’ appointed to look over code and data sets submitted by authors to the Applications and Case Studies (ACS) section of the Journal of the American Statistical Association (JASA). Other journals including Nature have established guidelines for accommodating data requests after publication, but they rarely consider the availability of code and data during the review of a manuscript. JASA ACS will now insist that — with a few exceptions for privacy — authors submit this information as a condition of publication. --- title: | Reprowd: Crowdsourced Data Processing Made Reproducible link: http://arxiv.org/pdf/1609.00791.pdf date: 2016-09-10 00:00:00 tags: [reproducible paper, ReproZip] description: | Crowdsourcing is a multidisciplinary research area in-cluding disciplines like artificial intelligence, human-computer interaction, database, and social science. One of the main objectives of AAAI HCOMP conferences is to bring together researchers from different fields and provide them opportunities to exchange ideas and share new research results. To facilitate cooperation across disciplines,repro-ducibilityis a crucial factor, but unfortunately it has not got-ten enough attention in the HCOMP community. --- title: | Vive la Petite Différence! Exploiting Small Differences for Gender Attribution of Short Texts link: http://link.springer.com/chapter/10.1007/978-3-319-45510-5_7 date: 2016-09-08 00:00:00 tags: [reproducible paper] description: | This article describes a series of experiments on gender attribution of Polish texts. The research was conducted on the publicly available corpus called "He Said She Said", consisting of a large number of short texts from the Polish version of Common Crawl. As opposed to other experiments on gender attribution, this research takes on a task of classifying relatively short texts, authored by many different people. For the sake of this work, the original "He Said She Said" corpus was filtered in order to eliminate noise and apparent errors in the training data. In the next step, various machine learning algorithms were developed in order to achieve better classification accuracy. Interestingly, the results of the experiments presented in this paper are fully reproducible, as all the source codes were deposited in the open platform Gonito.net. Gonito.net allows for defining machine learning tasks to be tackled by multiple researchers and provides the researchers with easy access to each other’s results. --- title: | Conducting Reproducible Research with Umbrella: Tracking, Creating, and Preserving Execution Environments link: http://ccl.cse.nd.edu/research/papers/umbrella-escience-2016.pdf date: 2016-09-08 00:00:00 tags: [reproducibility infrastructure] description: | Publishing scientific results without the detailed execution environments describing how the results were collected makes it difficult or even impossible for the reader to reproduce thework. However, the configurations of the execution environ-ments are too complex to be described easily by authors. To solve this problem, we propose a framework facilitating the conduct of reproducible research by tracking, creating, and preserving the comprehensive execution environments with Umbrella. The framework includes a lightweight, persistent anddeployable execution environment specification, an execution engine which creates the specified execution environments, and an archiver which archives an execution environment into persistent storage services like Amazon S3 and Open Science Framework (OSF). The execution engine utilizes sandbox techniques like virtual machines (VMs), Linux containers and user-space tracers, to cre-ate an execution environment, and allows common dependencies like base OS images to be shared by sandboxes for different applications. We evaluate our framework by utilizing it to reproduce three scientific applications from epidemiology, scene rendering, and high energy physics. We evaluate the time and space overhead of reproducing these applications, and the effectiveness of the chosen archive unit and mounting mechanism for allowing different applications to share dependencies. Our results show that these applications can be reproduced using different sandbox techniques successfully and efficiently, even through the overhead andperformance slightly vary. --- title: | PRUNE: A Preserving Run Environment for Reproducible Scientific Computing link: http://ccl.cse.nd.edu/research/papers/prune-escience-2016.pdf date: 2016-09-08 00:00:00 tags: [reproducibility infrastructure] description: | Computing as a whole suffers from a crisis of reproducibility. Programs executed in one context are aston-ishingly hard to reproduce in another context, resulting in wasted effort by people and general distrust of results produced by computer. The root of the problem lies in the fact that every program has implicit dependencies on data and execution environment whichare rarely understood by the end user. To address this problem, we present PRUNE, the Preserving Run Environment.In PRUNE, every task to be executed is wrapped in a functional interface and coupled with a strictly defined environment. The task is then executed by PRUNErather than the user to ensure reproducibility. As a scientific workflow evolves in PRUNE, a growing but immutable tree of derived data is created. The provenance of every item in the system can be precisely described, facilitating sharing and modification between collaborating researchers, along with efficient management of limited storage space. We present the user interface and the initial prototype of PRUNE, and demonstrate its application in matching records and comparing surnames in U.S. Censuses. --- title: | Moving Towards Model Reproducibility and Reusability link: http://www.ncbi.nlm.nih.gov/pubmed/27576241 date: 2016-09-05 00:00:00 tags: [reproducibility study] description: | This commentary provides a brief history of the U.S. funding initiatives associated with promoting multiscale modeling of the physiome since 2003. An effort led in the United States is the Interagency Modeling and Analysis Group (IMAG) Multiscale Modeling Consortium (MSM). Though IMAG and the MSM have generated much interest in developing MSM models of the physiome, challenges associated with model and data sharing in biomedical, biological and behavioral systems still exist. Since 2013, the IEEE EMBS Technical Committee on Computational Biology and the Physiome (CBaP TC) has supported discussions on promoting model reproducibility through publication. This Special Issue on Model Sharing and Reproducibility is a realization of the CBaP TC discussions. Though open questions remain on how we can further facilitate model reproducibility, accessibility and reuse by the worldwide community for different biomedical domain applications, this special issue provides a unique demonstration of both the challenges and opportunities for publishing reproducible computational models. --- title: | Proposal for first validating antibody specificity strategies to publish in Nature Methods link: http://www.eurekalert.org/pub_releases/2016-09/gh-pff083016.php date: 2016-09-05 00:00:00 tags: [reproducibility study] description: | The International Working Group on Antibody Validation (IWGAV), an independent group of international scientists with diverse research interests in the field of protein biology, today announced the publication of initial strategies developed to address a critical unmet need for antibody specificity, functionality and reproducibility in the online issue of Nature Methods. The IWGAV is the first initiative of its size and scope to establish strategic recommendations for antibody validation for both antibody producers and users. Thermo Fisher Scientific, the world leader in serving science, provided financial support to the IWGAV in 2015 to spearhead the development of industry standards and help combat the common challenges associated with antibody specificity and reproducibility. --- title: | A Framework for Improving the Quality of Research in the Biological Sciences link: http://mbio.asm.org/content/7/4/e01256-16.full date: 2016-09-02 00:00:00 tags: [reproducibility study] description: | The American Academy of Microbiology convened a colloquium to discuss problems in the biological sciences, with emphasis on identifying mechanisms to improve the quality of research. Participants from various disciplines made six recommendations: (i) design rigorous and comprehensive evaluation criteria to recognize and reward high-quality scientific research; (ii) require universal training in good scientific practices, appropriate statistical usage, and responsible research practices for scientists at all levels, with training content regularly updated and presented by qualified scientists; (iii) establish open data at the timing of publication as the standard operating procedure throughout the scientific enterprise; (iv) encourage scientific journals to publish negative data that meet methodologic standards of quality; (v) agree upon common criteria among scientific journals for retraction of published papers, to provide consistency and transparency; and (vi) strengthen research integrity oversight and training. These recommendations constitute an actionable framework that, in combination, could improve the quality of biological research. --- title: | Reproducibility and Variation of Diffusion Measures in the Squirrel Monkey Brain, In Vivo and Ex Vivo link: http://www.mrijournal.com/article/S0730-725X(16)30120-5/abstract