# Environment reproducibility
  
  The issue of software or environment reproducibility can sometimes look like this:
  
  
   ```mermaid
   flowchart LR
    R[More reproducibility] <--> M[Middle ground];
    M <--> D[More developer friendliness];
  ```
  
  This is not always the case. In many cases the lack of reproducibility can make development harder:
  
  1. What if you mess up your development environment? Can you recreate it?
  2. What if you need to transfer your training code to another system? Does the move require huge amounts of effort or can you do it quickly?
  3. What if your colleagues want to help with the project? Can they recreate it?
  4. What if the reviewer of your paper asks you how can you reproduce the code?
  
  Usually it is a good idea to keep reproducibility in mind throughout the developement and harden the requirements of reproducibility the closer you get to releasing the results.
  
  So the actual flowcart looks something like this:
  
```mermaid
flowchart LR
    D[Start recording the environment while developing] --> C[Record changes to the environment while developing];
    C --> R[Lock down versions towards releases];
```
  
  When it comes to machine learning, there are multiple ways of recording your environments. This is because most likely you'll be using Python for development.
  
  ## Environment specifications
  
  ### requirements.txt
  
  PyPI (Python Package Index) provides huge number of Python packages. You'll commonly install these with `pip`. For PyPI you'll usually want to record requirements as [requirements.txt](https://pip.pypa.io/en/stable/reference/requirements-file-format/).
  
  ### pyproject.toml
  
  When developing your own Python module you'll usually want to use [pyproject.toml](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/). It is a specification that specifies your module's naming, version and requirements.
  
  ### environment.yml
  
  `environment.yml` is a Conda specification. Conda is a packaging system originally developed by Anaconda Inc., but nowadays people often use `conda-forge` to install their conda packages. It is a massiv  e open source software repository that can install not only Python packages, but also libraries and other programs.
  
  ## Other tools for creating reproducible environments
  
  ### Apptainer / singularity
  
  Apptainer (originally called singularity) is a tool for creating and running containers on shared systems like HPC systems. CSC and LUMI utilize a tool called Tykky to create these containers, when you   want to install Python environments.

  ### Docker
  
  Docker is a software for creating and launching containers. Nowadays most web applications run in containers. However, it usually requires root rights to run, so they are not common in shared system lik  e HPC clusters.
  
  Both Apptainer and Docker are very useful when you want to create an exact replica of your environment.


## More information

- [Python for SciComp - dependencies](https://aaltoscicomp.github.io/python-for-scicomp/dependencies/)
- [Tykky](https://docs.csc.fi/computing/containers/tykky/)