# Reproducible Bioinformatics Research

![](../images/reproducibility.jpeg)

## What does research reproducibility mean?

According to the *U.S. National Science Foundation (NSF) subcommittee on replicability in science*...
>reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results...Reproducibility is a minimum necessary condition for a finding to be believable and informative.
{cite:p}`goodmanWhatDoesResearch2016c`

**Why does this matter?** If you discovered something interesting in your data, it is critical that the methods you used can be repeated exactly as you performed them. If you and others cannot replicate and reproduce your results, then you cannot be sure that your results are correct. 

<!-- TODO: add definition of replicability? -->


## The reproducibility crisis 

As the rate of scientific publication has dramatically increased over the last 50 years, so have concerns about a growing inability for scientific findings to be reproduced, termed the "reproducibility crisis". In 2016, *Nature* published a survey of 1576 scientists, who were asked about their experiences with research reproducibility  {cite:p}`baker500ScientistsLift2016`. 

>More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments (Figure 1)


1. The problem                     | 2. The contributing factors
-----------------------------------|---------------------------------
![](../images/failedreproduce.jpg) | ![](../images/irreproduciblefactors.jpg)


When asked about the factors that contribute to irreproducible research, the scientists surveyed identified terms relevant to data analysis among others. As bioinformaticians, we can help address these issues by making our code and data available to others, and by using best practices in our work.

## Factors limiting bioinformatics reproducibility

1. Code availability 
2. Code versioning
3. Software and hardware dependencies
4. Analysis logic
5. Analysis executiong


## Proposed bioinformatics reprodubility standards 

{cite:t}`heilReproducibilityStandardsMachine2021` articulated some standards for reproducible bioinformatics research, which address key . These standards are summarized in the table below.


|                              | Bronze | Silver | Gold |
|------------------------------|--------|--------|------|
| Data published and downloadable         |   x    |   x    |   x   |
| Models published and downloadable       |   x    |   x    |   x   |
| Source code published and downloadable  |   x    |   x    |   x   |
| Dependencies set up in a single command |        |   x    |   x   |
| Key analysis details recorded           |        |   x    |   x   |
| Analysis components set to deterministic|        |   x    |   x   |
| Entire analysis reproducible with a single command | | |   x   |>


## Conda 

## Git/Github

## Jupyter notebooks

## Docker

## Snakemake

### Bibliography

```{bibliography}
```