#
<figure style="width: 100%">
  <img src="i/DL1_text.png" style="width: 100%"/>
  <figcaption>Credit: Discord software - Midjourney bot</figcaption>
</figure>

## The story behind our journey towards reproducibility

<figure style="float: right; width: 320px; height: 320px; margin: 10px">
  <img src="i/3computer.png"/>
  <figcaption>Credit: Discord software - Midjourney bot</figcaption>
</figure>

Have you ever faced the challenge of running your code on different computers, only to find that it doesn't work as expected? Or have you found yourself struggling to keep track of changes you made to your code, wishing you had a better way to collaborate with your team? Then what comes next might bring you some practical advices to help you tackle these challenges. The learning process behind our work _Making deep learning algorithms reproducible: the devil is in the details_ has given us valuable insights into how to (try to) ensure the reproducibility of scientific work. This notebook is the "story behind the work" component of a tripartite structure that includes two additional documents: (a) a [Jupyter notebook](http://c100-159.cloud.gwdg.de:9009/lab/tree/notebooks/main.ipynb?token=7cf55c2887d81e8ea8da627112d0753e4b4fc79345f121fc) that introduces the problem and the data and allows users to run all code chunks and visualise the results; and (b) detailed instructions ([readme file](https://github.com/dsvanidze/replicability#making-deep-learning-algorithms-reproducible-the-devil-is-in-the-details)) to reproduce the work. 
 
This text will provide various links with practical coding examples and illustrations to prepare you to tackle various challenges that you are likely to encounter in your own path towards reproducibility. These three acts should hopefully offer you a broad picture of the thought process and the technical implementation required to make your work reproducible even if you use very complex statistical methods or algorithms such as deep learning. The main protagonist, Davit, currently pursuing a Master degree in Economics from the London School of Economics (LSE), will embark us into a journey towards making his master thesis reproducible. We hope that his story will offer not only useful insights for your work, but also a captivating and enlightening read.


## Reproducibility barriers and tips to tackle them

<figure style="float: left; width: 450px; height: 450px; margin: 20px">
  <img src="i/students.png"/>
  <figcaption>Credit: Discord software - Midjourney bot</figcaption>
</figure>

### More power, please!
The focus of my undergraduate thesis was to apply deep learning algorithms models on spatial data to better understand the initial spread of Covid-19 in China. Initially, I gathered all data and started working on my computer. After I built the algorithms to train the data &mdash; adapting deep learning algorithms to spatial data was pretty challenging too but that's another story &mdash; my first challenge to reproducibility was computational. I realised that training models on my local computer was taking far too long so I needed a faster solution to be able to submit my thesis in time. I had not much choice but to train the models on more powerful computers. Hopefully I had the opportunity to access the university server to train the algorithms. I generated the results on my local computer since producing maps and tables was not as demanding as training deep learning algorithms.

### Bloody paths
But I soon encountered another issue. The [paths](https://en.wikipedia.org/wiki/Path_(computing)) associated with the location of the algorithms were hardcoded. As my code became longer, I overlooked the path names linked to algorithms that was generating the results. This mistake, which would have been very easily to correct if pointed it our earlier, resulted in incorrect results. This error, as minor as it may sound, is unacceptable in science where results may have enormous implications especially in areas where decisions can have important impact on human lives such as public health. And wrong results do not necessarily appear wrong to an audience, especially if the results do not diverge from the findings of the literature. This is where I realised that my code is a fundamental pillar of all my empirical work. How can someone trust my work if not able to verify it?

### Solving compatibility chaos with Docker
So what was the solution to these initial problems? One might think that it would be easy to copy the code and run it from one computer to another, but it turned out to be a real headache. Different operating systems on my local computer &mdash; OK I agree, not all of you run both Windows and Linux on your laptop, but this is pretty common for people working in computer science &mdash; and the university server caused many compatibility issues and errors. The university server was running Ubuntu, a Linux distribution, which was not compatible with my MacOS-based code editor. Moreover, it did not support Python programming language and all deep learning algorithm packages I needed in the project in the same way as my MacOS computer did. This caused many compatibility issues and it was very time consuming to try to solve them.

As a remedy, I used Docker containers, which allowed me to create a virtual environment with all the necessary packages and dependencies installed.  By doing so, I could integrate it on different hardware and use the computation power of that hardware. So this allows anyone to reproduce the project on their local computers or servers. That's where Docker containers came in handy, allowing me to create an environment where everything needed to run the code was installed, and integrate it on different hardware. To get started with Docker, I first had to install it on my local computer. The installation process is straightforward and the Docker website provides step-by-step instructions for different operating systems. Here's the link to the Docker in case you want to mess around with it: [installation guide for MacOS](https://docs.docker.com/docker-for-mac/install/) and for [Ubuntu](https://docs.docker.com/engine/install/ubuntu/). I found the Docker website to be very helpful, with a lot of resources and tutorials available. Once Docker was installed, it was very easy to create virtual environments for my project and work with my code, libraries, and packages, without any compatibility issues. Not only did Docker containers save me a lot of time and efforts, they also made it easier for others to reproduce my work.

### Why does nobody check your code?
But even with Docker, I still faced another challenge: making the verification process of my code feasible. Have you ever wondered when you submit your work to your boss or a paper to an academic journal if the recipients of the work will really check your code? What makes you believe that they truly want to do it and that they can do it? It is very likely that in most cases, those receiving your work will not check the code. And the reason might not be (only) due to the readers' bad will or lack of time or the excess of trust they may show you. Instead it might be due to their incapacity to run your code  in an efficient manner (fast enough) and without having to debug it in order to run it. So often, people rely on the assumptions that the code is correct without any evidence that this is the case. Without being carefully checked, your code may hide important mistakes whichg may remain unnoticed, and this can have various potential consequences, more or less dramatic, depending on each individual case.

### Jupyter, King of the gods notebook
<figure style="float: right; width: 420px; height: 420px; margin: 10px">
  <img src="i/jupyter.png"/>
  <figcaption>Credit: Discord software - Midjourney bot</figcaption>
</figure>
How to make the verification process feasible and as simple as possible? Here I wanted to make sure that my supervisors can appreciate the work I have done for my undergraduate thesis. I spent much efforts to make my code readable, efficient, and also absent of bug (or at least this is what I was hoping for). The only way for me to ensure this was to facilitate their work by reducing all possible barriers to reproduciblity. I found that Jupyter Notebooks was a good choice to increase the chance that they will check my code. This web application allows you to edit your code in your preferred browser and access it from anywhere without any installation required. As long as you have an internet connection you are fine. This text is written in a Jupyter notebook and is indeed very convenient. You can find the Jupyter Notebook that showcase our work (including the code and results) via this [link](http://c100-159.cloud.gwdg.de:9009/doc/tree/notebooks/main.ipynb). 



Jupyter Notebooks combines markdown text, code, and visualizations, and therefore I could create a complete narrative of my work that was easy to understand and follow. Because you can organize the folder and all files, including all code, results, and visualizations in one location, it is easy for your supervisor (or anyone interested in your work) to quickly understand the processes behind the work that has been produced. Also, it can handle large datasets . Another important aspect is its accessibility. Jupyter Notebooks can be hosted on a cloud-based platform such as GitHub. If so, the work can be easily shared using a simple link. Readers can see exactly what I did, how I did it, and what my results were. 

Another important note is that currently Jupyter Notebooks are free for everyone. This is a huge advantage because it means that anyone can start using them, regardless of their technical skills or budget. Whether you are a beginner or an experienced data scientist, Jupyter Notebooks can help you make your work accessible and reproducible. When I was a notice, Jupyter Notebooks allowed me to make various experiments on my code and iterate quickly. Because I can combine text, code, and visualizations, I was able to easily try different approaches to solving a problem and see the results almost in real-time. I had the opportunity to test different ideas and get feedback from others without having to spend a lot of time writing and debugging code. I would like to invite you to try Jupyter Notebooks espeically if you want to collaborate with others. By sharing your notebook, you can get feedback and see changes from your collaborators on projects in real-time. 

### Reproducibility barriers are more common than you may think
The problems I faced and that you are likely to have experienced it too, are representative of a much global issue that has caused a lot of concern in the scientific community. It is often referred to as the "reproducibility crisis" in science$^{1, 2}$. Researchers need to be able to reproduce and replicate findings to make sure that scientific progress is actually happening$^{3-5}$. A import study found that around one third of social science studies published in top journals like *Nature* and *Science* between 2010 and 2015 couldn't be reproduced$^4$. It does not necessarily mean that research that cannot be reproduced automatically lead to wrong results. However, one cannot be sure about the validity of their findings. This is indeed pretty much alarming! 

It is important to get a common understanding on what "replicability" and "reproducibility" mean in our context. Here, we use the definitions suggested by the US National Science Foundation (2015)$^6$. Reproducibility means the ability of duplicating the results using the same materials as the original investigator. This can be straightforward but the use of complex algorithms make this process more tricky. Instead, replicability means being able to duplicate the results of a study by following the same procedures but using new data. So this can show you how the important role of reproduciblity. To check whether some findings may be generalized to other datasets or contexts, the first step is to be at least able to reproduce the work using the same materials (data, code, etc.). So reproducibility, which is the focus of our work here, is extremely important and a fundamental pillar of science.

### Shoudn't reproducibility be easy?
<figure style="float: left; width: 475px; height: 475px; margin: 20px">
  <img src="i/DL_bkg.png"/>
  <figcaption>Credit: Discord software - Midjourney bot</figcaption>
</figure>

You might think that getting identical results should be easy with a given dataset and clear methods. In principle, you are right. If the data is available and the procedure to reach to the conclusions are clear and well detailed, there should not be any reasons for others to fail to reproduce the results. Well, this can be true for studies using standard statistical procedures applied to datasets where all operations are carried out within a statistical software. But is it true in general?

You have certainly already observed in your daily life that even by following exactly the same procedure in doing things, some differences in the outputs may occur and you cannot be sure about what cause them. This happens frequently if you cook bread from scratch. Tiny changes of the room temperature or in the quality of the flour may require you to adjust the quantity of the ingredients. Otherwise, your bread may look different, and in worse case scenarios, may for example not rise at all.

This is somewhat similar when we follow a procedure (receipe) that describes the steps required to apply an artificial intelligence (AI) algorithm to some data. If you look at the four images (_left_), their faces look very similar. However, if you look carefully, you can see important differences. Look again, more carefully, and you will see that they are actually very different! These images are the outcome of deep learning algorithms that generate images using the exact same order from the user (a few keywords). Tiny randomness in the algorithms lead to variations in the generated images. Why should it matter to us? When similar algorithms are applied in a context where evidence-based decisions may have potential social, economic, or environmental repercussions (for example, computer simulations that guide policy makers in epidemiology), changes in the results can lead to wrong conclusions, and ill-advised decisions may have catastrophic consequences. Reproducibility is therefore crucial in some areas and making it feasible requires thoughts and a careful implementation of various procedures. Below you will learn a few tips to ensure that your work can be reproduced even when dealing with very complex algorithms. 


### Version control with Git and Github
This has certaintly happened to you several times: you lost your work because you forgot to save it, or you erased some files and lost track of their previous version and cannot find them anymore. Or perhaps worse, you lost an entire folder from a failed hard disc without copies on a secured online storage platform. Keeping a history of track of changes and copies of your work is indeed very important. Also, when working in a team, you may need to collaborate with others without the risk of losing any work. This is where version control comes in.

[Git](https://git-scm.com/) is a version control system that allows you to keep track of changes to your code over time. [Github](https://github.com/) is a web-based platform that provides a central repository for storing and sharing code. With Git and Github, I was able to version my code, collaborate with others, and I avoided the risk of losing any work. I really didn't want to work several years on my dissertation and take the risk of losing it. 

Git and Github are also great for reproducibility. By sharing your code via these platforms, others can access your work, verify it and reproduce your results without risking to change, or worse, destroy your work or part of it. It also makes it easy for others to build on your work if they want to further develop your research. You can also use Git and Github to share or promote your results across a wider community. The ability to easily store and share your code also makes it easier to keep track of the different versions of your code and to see how your work has evolved over time. You should definitely give a try! 

### Randomness is really everywhere, watch out!

One of the challenges with machine learning and deep learning algorithms is that their results can be influenced by some inherent randomness of the algorithm. This means that even if you run the same code multiple times, you may get different results. While this is not necessarily a bad thing &mdash; randomness contributes to mitigate overfitting and generalize the predictions &mdash; it also represents an additional barrier to reproducibility. If you cannot get the same results using the same data and methods, then you might have good reasons to not trust the findings. There are many elements of your analysis in which randomness may lead to different results. For example, in a classification or regression setting when you split your data into training and testing sets (the training set is used to estimate the model (-hyper) parameters; the testing set is used to compute the peformance of the model), the way the split is operationalized usually as a random selection of rows of your data. In principle, each time you might end up with different datasets each time you split your data into training and testing sets.

But this aspect of randomness in machine learning or deep learning algorithms is relatively well known. Randomness may hide in other parts of the code, however. Here we show an example how randomness can affect the results. In the code below we first set the _seed number_ to 0 using `np.random.seed(seed value)`. The `random.seed()` function from the package `numpy` (abbreviated `np`) saves the state of a random function, so that it can create identical random numbers independently of the machine you use, and this for any number of executions. The argument of the function in parentheses (here `seed value`) set the previous value number generated by the generator. Without providing this seed value, the first execution of the function usually uses the current system time.  In the example below, we generate two random arrays `arr1` and `arr2` using `np.random.rand(3,2)`. Note that the values `3,2` indicate that we want random values for an array that has `3` rows and `2` columns.

In [None]:
import numpy as np

#Set the seed number e.g. to 0
np.random.seed(0)
# Generate random array
arr1 = np.random.rand(3,2)
print("Array 1:")
print(arr1)

#Set the seed number as before to get the same results
np.random.seed(0)
# Generate another random array
arr2 = np.random.rand(3,2)
print("\nArray 2:")
print(arr2)

Run the code above and have a look. As you can see, if you run it multiple times, the values of `arr1` and `arr2` should remain identical. If this is not the case, check that the `seed value` is set `0` in lines 4 and 11. These identical results are possible because we set the `seed value` to `0`, which ensures that the random number generator produces the same sequence of numbers each time the code is run. Now, let's look at what happens if we remove the line `np.random.seed(0)`:

In [9]:
# Generate random array
arr1 = np.random.rand(3,2)
print("Array 1:")
print(arr1)

# Generate another random array
arr2 = np.random.rand(3,2)
print("\nArray 2:")
print(arr2)

Array 1:
[[0.43758721 0.891773  ]
 [0.96366276 0.38344152]
 [0.79172504 0.52889492]]

Array 2:
[[0.56804456 0.92559664]
 [0.07103606 0.0871293 ]
 [0.0202184  0.83261985]]


Here, the values of `arr1` and `arr2` will be different each time we run the code since the `seed value` was not set, and therefore changing over time. This short code demonstrates how `seed value` affects the randomness in your code. Therefore, unless randomness is required, it is important to set the `seed value` to ensure reproducibility. I also find it helpful to document the `seed number` I used in my code so that I can easily reproduce my findings in the future. If you are currently working on some code that involves random number generators, it might be worth checking your code and make all necessary changes. In our work (see Code chunk 9 in the [Jupyter notebook](http://c100-159.cloud.gwdg.de:9009/lab/tree/notebooks/main.ipynb?token=7cf55c2887d81e8ea8da627112d0753e4b4fc79345f121fc)) we set the `seed value` in a general way, using a framework (config) so that our code always uses the same seed to train our algorithm.

# Conclusion
I hope you've enjoyed joining us in our quest for reproducibility. We have shown why reproducibility matters and what can be done to reach it. We have introduced a few important topics and tips that you are likely to encounter during your own path to make your work reproducible. In sum we have mentioned:

- The need for a version control using for example Git and Github, which allows you to keep track of changes in your code and collaborate with others efficiently
- Operating system compatibility issues, which can be solved by using Docker containers for a consistent computing environment
- The convenience of Jupyter Notebooks for code editing, which is particularly useful for data science and work using deep learning because of its ability to include text and code in the same document and make the work accessible to everyone with an internet connection
- The importance of setting the seed values in random number generators to ensure reproducibility
-  Some studies cannot be reproduced due to their inherent randomness. Therefore what we have learned cannot be applied in all study cases. However, when applicable, we believe that reproducibility can be achieved when we use the right tools.

To summarize all elements that are associated with the reproducibility of our study, we have implemented a system (illustrated in Figure 1) composed of four main components (A - D). We use (A) the version control system Git and its hosting service GitHub, which enable a team to share code with peers, efficiently track and synchronize code changes between local and server machines, and reset the project to a working state in case of breaking changes.

![Figure 1](../data/workflow/docker-workflow.jpg "Figure 1")

Now, let's take a moment to reflect on why reproducibility are so crucial in work using deep learning algorithms. In practice, these algorithms can be used to make decisions that affect people's lives. They may be used to make (or help making) medical diagnoses, financial predictions, and criminal justice assessments. In such cases, it is essential that the results are reliable, and the methods to obtain them are verifiable. Otherwise, hidden errors could lead to wrong conclusions and have serious consequences. Reproducibility is also crucial in research. By making your work accessible and verifiable, you can build on the efforts of others and make advances in your field in a more efficient way. It also helps mitigate the risk of duplicated work, which may save time and resources.

In conclusion, we would like to emphasize that reproducibility is crucial and must be upheld in all scientific endeavours, as long as it remains feasible. With the right tools and processes at disposal, it is possible to ensure that your work is verifiable, which is a necessary (although not sufficient) step to aim for replicability. Whether you are working in a research environment or in an industry setting, these principles are essential for advancing the field and building trust between modelers and our society.

# References

1. Roger D Peng. Reproducible research in computational science. _Science_, 334(6060): 1226–1227, 2011.
2. John P A Ioannidis, Sander Greenland, Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, and Robert Tibshirani. Increasing value and reducing waste in research design, conduct, and analysis. _The Lancet_, 383(9912):166–175, 2014.
3. Open Science Collaboration. Estimating the reproducibility of psychological science. _Science_, 349(6251):aac4716, 2015.
4. Colin F Camerer, Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A Nosek, Thomas Pfeiffer, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. _Nature Human Behaviour_, 2018.
5. Monya Baker. Reproducibility crisis. _Nature_, 2016.
6. Kenneth Bollen, JT Cacioppo, RM Kaplan, JA Krosnick, and JL Olds. Social, behavioral, and economic sciences perspectives on robust and reliable science: Report of the Subcommittee on Replicability in Science, Advisory Committee to the US National Science Foundation Directorate for Social, Behavioral, and Economic Sciences. US National Science Foundation, 2015.