<H1>Reproducible research: package managers and virtual environments</H1>

## Virtual environments are isolated environments for compatible software packages

Let's say, you need to analyse a metagenomic dataset (DNA originating from different organisms). You want to start with a taxonomic profiling, i.e. to determine what organisms are in the dataset. You discover the tool [Metaxa](https://www.ncbi.nlm.nih.gov/pubmed/21674231), one of many tools for this task. You find the [download page](https://microbiology.se/software/metaxa2/) and the [manual](https://microbiology.se/publ/metaxa2_users_guide_2.2.pdf) with detailed installation instructions. The installation sounds complicated: It requires several steps, and the tool needs a bunch of other software, like HMMER 3.1. However, you have HMMER 3.0 installed, which you need for other scripts/tools. If you install HMMER 3.1, these other tools may stop working. What should you do? This problem can be addressed using **virtual environments**.

>Applications will sometimes need a specific version of a library, because the application may require that a particular bug has been fixed or the application may be written using an obsolete version of the library’s interface.
>This means it may not be possible for one environment to meet the requirements of every application. If application A needs version 1.0 of a particular package, but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run.
>The solution for this problem is to create a **virtual environment**, which is a self-contained directory tree that contains a number of **compatible packages**. Different applications can then use different virtual environments. (based on [Python docs](https://docs.python.org/3/tutorial/venv.html))

There are different tools for creating virtual environments.

- For Python packages, the official Python tutorial recommends the standard library module `venv` ([Python docs](https://docs.python.org/3/tutorial/venv.html), [realpython.com](https://realpython.com/python-virtual-environments-a-primer/)), which is related to the <a href="https://virtualenv.pypa.io/en/latest/">virtualenv</a> project ([Stackoverflow](https://stackoverflow.com/questions/41573587/what-is-the-difference-between-venv-pyvenv-pyenv-virtualenv-virtualenvwrappe)). One advantage of `venv` is that the module is part of the standard library, and is therefore included with all Python installations.
- However, we already know one tool that can create virtual environments: conda (<a href="https://en.wikipedia.org/wiki/Conda_(package_manager)">Wikipedia</a>). One advantage is that conda is **also** a package manager. Unlike the default Python package manager pip, it can install not only Python packages, but also other packages.

<p class="more">Note: If you work in a multi-user environment (e.g. on a shared server), you typically don't install software yourself, but you ask the system administrator to do it.</p>

### Package managers simplify software installation

There is another issue: Manual software installation is more time-consuming and is harder to keep track of, the more software is installed. Unless necessary, it's preferable to use a **package manager** instead. Nowadays, they exist for all platforms ([Wikipedia](https://en.wikipedia.org/wiki/List_of_software_package_management_systems)).

However, the [repositories](https://packages.ubuntu.com/) of the **system package manager** ([Ubuntu](https://ubuntu.com/server/docs/package-management): `dpkg`/`apt`) probably don't provide the Metaxa package. Usually, system package managers provide only generic and widely used software. Luckily, there is another package program, which is focused on scientific software and has the additional advantage of creating virtual environments: [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html). It's a **language-agnostic** (not only for Python) both **package and environment manager**.

---
<div class="more">🔭 [Additional information]</div>

### What is a software package?

A package (as in "package manager" and "package management system") is a **distribution** of software and data in archive files. Packages also contain metadata, such as the software's name, description of its purpose, version number, vendor, checksum, and a list of dependencies necessary for the software to run properly. ([Wikipedia](https://en.wikipedia.org/wiki/Package_manager))

**What is a Python package**? You learned that a Python package is a folder with Python modules and an additional `__init__.py` file ([Python tutorial](https://docs.python.org/3/tutorial/modules.html#packages)). However, when people talk about "installing a Python package", what they mean is a **distribution package**: An archive file that contains Python packages, modules, and other resource files that are used to distribute a software release. The archive file is what an end-user will download from the internet and install. A distribution package is more commonly referred to with the single word “package”. ([packaging.python.org](https://packaging.python.org/en/latest/glossary/#term-Distribution-Package))

>It’s important to note that the term “package” in this context is being used to describe a bundle of software to be installed (i.e. as a synonym for a distribution). It does not to refer to the kind of package that you import in your Python source code (i.e. a container of modules). ([packaging.python.org](https://packaging.python.org/en/latest/tutorials/installing-packages/))

**How to use a Python package?** Typically, Python modules are either executed as a script (standalone program), or imported into other Python programs as a library. Therefore, after the installation of a Python package, you will either have one (or more) new executables that you can run from the command line, or you will be able to import new modules within Python, according to the documentation.

**What is a conda package**? Conda was originally developed to solve package management challenges faced by data scientists. The conda package format is identical across platforms and operating systems. A conda package is a compressed tarball file (`.tar.bz2`) or `.conda` file that contains system libraries, Python or other modules, executable programs and other components, metadata and additional files like sample data ([docs.conda.io](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/packages.html)).

***

### Searching and installing conda packages

`conda` has several subcommands for different purposes. The command [`conda search`](https://docs.conda.io/projects/conda/en/latest/commands/search.html) searches for packages. It looks in the default channels, or channels from the `.condarc` configuration file ([conda docs](https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html)).

The package `anaconda-client` provides the command `anaconda`, that searches across **all channels** and all platforms, which is often more useful. E.g. if you know the name of the package, but not in which channel(s) (or by which users) it was published, you can search across all channels and platforms using: `anaconda search packagename`. This is equivalent to searching the Anaconda repository via the browser ([anaconda.org](https://anaconda.org/anaconda/repo)). This will tell you which [conda channel](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html) provides the package. In this case, it's the [bioconda channel](https://bioconda.github.io/).

<p class="more">Channels are the locations where the conda software packages are stored. Building the packages requires <a href="https://docs.conda.io/projects/conda-build/en/latest/concepts/recipe.html">package recipes</a>, which are text files with instructions for a special conda tool (<a href="https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html">conda-build</a>) how to build a package. For example, the recipes for bioconda packages are hosted in a <a href="https://github.com/bioconda/bioconda-recipes">GitHub repo</a>. The packages themselves (software) are hosted on the <a href="https://anaconda.org/anaconda/repo">Anaconda</a> webpage (<a href="https://docs.anaconda.com/anacondaorg/faq/#what-is-anaconda-inc">FAQ</a>: What is Anaconda, Inc.?).</p>

Here is how you can install Metaxa using `conda`:

1. To find all available metaxa packages for conda, run `anaconda search metaxa`, or use [Anaconda online search](https://anaconda.org/anaconda/repo).
1. As metaxa has several version-restricted dependencies, it may be better to [create a new environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) (e.g. call it "metaxa"): `conda create --name metaxa`. Activate it with `conda activate metaxa`.
1. The command `conda install metaxa` will then download/install the required packages. If the channel with the package is not default or in your `.condarc` file, you need to specify [this channel](https://docs.conda.io/projects/conda/en/latest/commands/install.html#Channel%20Customization) using the `-c` switch: `conda install -c bioconda metaxa`

The separate virtual environment allows to quickly switch between the metaxa environment (with HMMER 3.1 and other compatible software) and your regular working environment.

<div class="exercise">TODO:
    <ol>
        <li>What command displays information about the current conda install?
        <li>What is a conda environment? (<a href="https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html">Hint</a>)
        <li>What happens when the environment is activated? (<a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment">Hint</a>)
        <li>Inspect the bioconda recipe of the metaxa package. Which additional software will be installed?
        <li>Create a <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html">conda environment</a> and activate it.
        <li><a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#installing-packages">Install</a> a conda package in the new environment.
        <li>Download the <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html">conda cheat sheet</a> and save it with your other cheat sheets.
        <li> Add the bioconda and conda-forge channels to your conda configuration according to the <a href="https://bioconda.github.io/#usage">bioconda docs</a>. Now you will be able to install packages from the bioconda channel without the <code>-c</code> switch. Inspect your <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html">conda configuration</a>, which is written to the <code>~/.condarc</code> file.
    </ol>
</div>            

<div class="selftest">📝 Self-test:
    <ol class="selftest">
        <li>What is the purpose of virtual environments?
        <li>What is conda?
    </ol>
</div>

## The Python ecosystem

When following different internet tutorials for software installations, you can easily end up with a situation like in the cartoon below. You want to prevent this, if possible. Knowing the differences and sticking to a few best practices can help.

<div class="lecture-figure">
    <img src=https://imgs.xkcd.com/comics/python_environment.png width="350">
    <a class="img-caption" href="https://snarky.ca/deconstructing-xkcd-com-1987/">The Python ecosystem can be complicated (xkcd.com/1987)</a>
</div>

### conda vs. Anaconda

Differences between conda and Anaconda ([reddit](https://www.reddit.com/r/learnpython/comments/9e7ww8/what_are_some_positives_and_negatives_of_using/)): Conda is a command-line tool, which is both a package manager and an environment manager. Anaconda is a Python distribution that includes:

- a Python interpreter (the same thing that you can download from python.org),
- the conda package manager (popular package manager for scientific software, and the main selling point of Anaconda),
- many data science related 3rd party packages (that you could also install later via conda or otherwise).

Many people prefer the [miniconda](https://docs.conda.io/en/latest/miniconda.html) distribution, which includes only Python, conda and a few other essential packages like pip. You can perform all required installations later in custom conda environments.

<div class="lecture-figure">
    <img src=https://protostar.space/wp-content/uploads/2019/04/conda-root-and-additional-environments.jpeg width="600">
    <a class="img-caption" href="https://protostar.space/why-you-need-python-environments-and-how-to-manage-them-with-conda">Conda base environment and additional environments</a>
</div>

### conda vs. pip

Differences between conda and <a href="https://en.wikipedia.org/wiki/Pip_(package_manager)">pip</a> ([Stackoverflow](https://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda), [anaconda.com](https://www.anaconda.com/understanding-conda-and-pip/)):

Pip is Python's package manager, and is used to install **Python packages from PyPI**, the [Python package index](https://en.wikipedia.org/wiki/Python_Package_Index). Unlike conda, it installs only Python packages, and still **lacks dependency resolution**, i.e. it doesn't check if installing a new package will introduce conflicts (break things). Pip itself is written in Python and thus requires that Python is already installed.

Conda can install **any kind of packages**, not only Python packages. It provides proper **dependecy resolution**, i.e. it makes sure that all packages installed via conda in one environment are compatible. Also, it can be used to create separate **virtual environments**, where different sets of packages are installed, and easily switch between environments. It is focused on scientific software, and will contain a much larger and more up-to-date package selection than your system package manager.

Conda can install packages from different repositories (called [channels](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html)). Anaconda provides a number of "default" channels, but any user can create a channel, and there are, in fact, many community-driven channels. E.g., the [bioconda channel](https://bioconda.github.io/#usage) has a large collection of bioinformatics-related packages. ([Examples](https://stackoverflow.com/a/54150817/) of channel-configuration commands for your conda installation.)

It's recommended to install Python packages using conda whenever possible, and only use pip if the package is unavailable through conda, or if the conda version is too outdated ([conda.io](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#installing-non-conda-packages)).

### pip vs. setup.py

Sometimes, Python packages are provided not via the PyPI repository, but directly as a `tar.gz` archive, containing a `setup.py` file ([Stackoverflow](https://stackoverflow.com/questions/1471994/what-is-setup-py)). How are these packages installed?

- Installation with: `pip install packagename.tar.gz` or `pip install .` ([Stackoverflow](https://stackoverflow.com/a/36014474/)); you don't even need to unpack the `.tar.gz` archive, `pip` does everything
- Do not use the "old" way `python setup.py install` for installing, it's outdated ([Stackoverflow](https://stackoverflow.com/a/15731459/))
- It's highly recommended to perform pip installations **only in virtual environments**, like a conda environment
    - Preferably not in the base environment; in case the environment breaks, it can be simply deleted
    - Double-check that the conda environment-specific pip is used: `which pip`; should NOT be system-wide pip (`/usr/bin/pip`, or similar)
    - More information: [Stackoverflow](https://stackoverflow.com/questions/41060382/using-pip-to-install-packages-to-anaconda-environment), [Stackoverflow](https://stackoverflow.com/questions/35245401/combining-conda-environment-yml-with-pip-requirements-txt)

<br>

### Bottom line

* Install software (packages) preferably via **package managers**
  * it keeps track of everything it did, and can easily update/uninstall software later
  * use the system package manager for general-purpose packages (unless they are too outdated for your needs)
* Install not general-purpose packages preferably **in virtual environments**, via conda (or pip, or manually)
  * installing in virtual environments is safer than system-wide: if something goes wrong, you can delete and recreate the environment
  * you should NOT have to use `sudo` for installing custom packages (unless you know exactly why you need it)
  * conda is useful for all Python-related packages and, generally, all data science-related packages
* Virtual environments: **use conda** instead of virtualenv (conda made virtualenv pretty much [obsolete](https://stackoverflow.com/questions/34398676/does-conda-replace-the-need-for-virtualenv/34398795))
  * install conda via Anaconda or Miniconda distribution ([Stackoverflow](https://stackoverflow.com/a/45421527/))
  * don't modify the conda base environment, use custom environments instead (if something goes wrong, you can simply delete and recreate the custom environment; if the base environment breaks, you might need to re-install conda; luckily, uninstalling is [easy](https://docs.anaconda.com/anaconda/install/uninstall/))
* Python packages: use **conda over pip** whenever possible ([Stackoverflow](https://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda))
  * if a conda package is not available, install it via **pip** (do NOT use easy_install: [Stackoverflow](https://stackoverflow.com/questions/3220404/why-use-pip-over-easy-install))
  * do NOT perform system-wide installations with **pip**; use ONLY [environment-specific pip](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#using-pip-in-an-environment) ([Stackoverflow](https://stackoverflow.com/a/54715875/))
  * note that conda can't update or remove pip-installed packages, even though it can "see" them ([Stackoverflow](https://stackoverflow.com/a/33694864/))
* Do not simply follow every internet tutorial you find, or your system may become a mess, and hard to fix in case of problems
* Note: conda is actively developed and does have bugs. You should be prepared to use their issue tracker ([Wikipedia](https://en.wikipedia.org/wiki/Bug_tracking_system)): [conda issues](https://github.com/conda/conda/issues)

Further reading:

- [Conda: myths and misconceptions (2016)](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/)

<div class="exercise">TODO:
    <ol>
        <li>Compare the <a href="https://biopython.org/">Biopython</a> versions in the <a href="https://packages.ubuntu.com/">Ubuntu repositories</a> (also see <a href="https://help.ubuntu.com/community/Repositories/Ubuntu">here</a>), <a href="https://docs.conda.io/projects/conda/en/latest/commands/search.html">conda channels</a> (or <a href="https://anaconda.org/anaconda/repo">here</a>) and the <a href="https://pypi.org/">PyPI repository</a>.
        <li>The <a href="https://biopython.org/wiki/Download">Biopython documentation</a> suggests to install Biopython via <code>pip</code>. Discuss if this is a good idea and which of <a href="https://biopython.org/wiki/Packages">these</a> options you prefer.
        <li>Which command updates all conda packages in the current environment? Does it also update the pip-installed packages in this environment?
        <li>Create two additional conda environments with Python 2 and Python 3. Activate each and check the Python version.
        <li>Modify the <code>.bashrc</code> file so that a conda environment is automatically activated in a new shell.
        <li>Activate a conda environment. Install a conda package of your choice, e.g. the <a href="https://en.wikipedia.org/wiki/Poppler_(software)">Poppler library</a> (<a href="https://anaconda.org/conda-forge/poppler">Anaconda link</a>). It contains the <code>pdftotext</code> tool, which we might use later.
        <li>Which <code>pip</code> is active within the environment? Install a Python (non-conda) package of your choice in this environment (e.g. <a href="https://pypi.org/project/pythonds/">pythonds</a>). Where is it located after installation?
        <li>One reply in <a href="https://www.reddit.com/r/Python/comments/w564g0/can_anyone_explain_the_differences_of_conda_vs_pip/">this reddit post</a> says: "The thing that a lot of people do not understand is that conda and pip are NOT mutually exclusive. In fact, you are supposed to use them together." Explain what this means.
        <li>Where is the conda executable located?
        <li>Where are conda environments located?
        <li>What is homebrew?
        <li>Is it possible to install software in a virtual environment without a package manager? (<a href="https://stackoverflow.com/questions/47799803/installing-from-source-within-anaconda-environment">hint</a>)
        <li>Where does conda store its configuration/information about environments?
    </ol>
</div>

<div class="selftest">📝 Self-test:
    <ol class="selftest">
        <li>What is the difference between Anaconda and conda?
        <li>What is the difference between conda and pip?
        <li>What is the difference between system-wide pip, and a conda environment-specific pip?
    </ol>
</div>

## Sharing with others: Distributing Python code

Think about what can happen if you write a useful Python script, maybe as part of a data analysis pipeline, and want to share it with others.

- You wrote it using features from Python 3.8, but other people will try to run it on Python 3.7, and the script won't work.
- You imported a library like `numpy`, but other people don't have it installed, and the script won't work.
- Or, maybe, they have `numpy-1.16.0`, while your script used `numpy-1.15.4`. One might expect that the newer `numpy` correctly handles code written using the older `numpy`, but sometimes it doesn't, and the script will break with confusing error messages.

Solving this kind of problems is crucial for any kind of software development and reproducible computational projects. The central question is how to distribute the software (your code) to other people, and solve the "Well, it worked on my machine" problem, which is a serious problem. Here are some suggestions:

1. **Defined environment**. Many people use a dedicated conda environment for each new project ([haveagreatdata.com](https://haveagreatdata.com/posts/data-science-python-dependency-management/)). The project scripts are distributed together with the environment specifications, which guarantees that the environment has all the necessary packages to run the code. This is a simple but effective approach, and often used for scientific projects. (Problems can still arise if you relied on additional packages like system libraries that were installed otherwise, e.g. via `apt`.) The basic commands for this approach are:
    - `conda env export > environment.yml` → export environment specifications (also lists pip-installed packages in this environment) ([Conda docs](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#sharing-an-environment), [tdhopper.com](https://tdhopper.com/blog/my-python-environment-workflow-with-conda))
    - `conda env create -f environment.yml` → recreate environment from specifications file ([Conda docs](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file))
1. **Packaging**. If the software becomes more complex than a few simple scripts, this is what "packaging" and "deployment" is about ([stackexchange.com](https://softwareengineering.stackexchange.com/a/308342)). You can build a `pip`-installable [Python package](https://packaging.python.org/tutorials/packaging-projects/) based on `setup.py`, which is the build script for [setuptools](https://packaging.python.org/guides/distributing-packages-using-setuptools/). You can distribute the package via [PyPI](https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56), or as a `tar.gz` archive, e.g. from a GitHub repository ([example1](https://www.freecodecamp.org/news/how-to-use-github-as-a-pypi-server-1c3b0d07db2/), [example2](https://dev.to/rf_schubert/how-to-create-a-pip-package-and-host-on-private-github-repo-58pa)). While building the package, you list all its dependencies, so you don't need to provide additional information to the user later. You can even provide your Python package as a conda package.
1. **Freezing** or **containers**. Other options like [PyInstaller](https://realpython.com/pyinstaller-python/) (this approach is called [freezing](https://docs.python-guide.org/shipping/freezing/)), or Docker (virtualization/containers).

Also see [Ten simple rules for developing usable software in computational biology: Rule 8](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005265#sec009). Example:

- [Wang and Ma'ayan, 2016](https://dx.doi.org/10.12688%2Ff1000research.9110.1). An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study: [github.com](https://github.com/MaayanLab/Zika-RNAseq-Pipeline)

<div class="exercise">TODO:
    <ol>
        <li><a href="https://www.reddit.com/r/Python/comments/tx73j8/why_and_how_to_use_conda/">This reddit post</a> asks "Why use conda? In which situation is it the best option?" What is your reply?
        <li><a href="https://stackoverflow.com/questions/60755957/install-r-packages-using-conda-via-an-environment-yml-file/">This Stackoverflow post</a> asks if and how R packages (usually installed from CRAN) should be included in a conda <code>environment.yml</code> file. What are your ideas about this?
        <li>How would you deal with R packages that are not available via conda but required for your project?
    </ol>
</div>

### Python packaging with pip and conda

This is an advanced topic, and probably not (yet) very important for you. If you are interested in more details, the links below may be useful.

Python packages:

- [python.org: Python Packaging User Guide](https://packaging.python.org/) + [model project](https://github.com/pypa/sampleproject)
- [The Hitchhiker's Guide to Python: Packaging Your Code](https://docs.python-guide.org/shipping/packaging/)
- [YouTube: Inside the Cheeseshop: How Python Packaging Works - PyCon 2018](https://www.youtube.com/watch?v=AQsZsgJ30AE)
- [medium.com: A Simple Guide for Python Packaging](https://medium.com/little-big-engineering/lets-talk-about-python-packaging-6d84b81f1bb5)
- [Build Your First pip Package](https://dzone.com/articles/executable-package-pip-install)

Conda packages:

- [conda-docs: Building conda packages tutorials](https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/index.html)
- [Bioconda-docs: Contributing Bioconda recipes](https://bioconda.github.io/contributor/index.html)

### Software in containers: Docker & co

Another approach to distributing production-ready environments (with all required apps and data files) are virtual machines or containers. A popular solution is Docker.

>A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably in every computing environment. Containerized software will always run the same, regardless of the infrastructure. (based on [Docker docs](https://www.docker.com/resources/what-container))

<a href="https://en.wikipedia.org/wiki/Docker_(software)">Docker</a> is free software that uses [OS-level virtualization](https://en.wikipedia.org/wiki/List_of_Linux_containers) to execute programs in isolated virtual environments. A container image is a standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Essentially, a Docker container is a [lightweight virtual machine](https://en.wikipedia.org/wiki/OS-level_virtualization), and facilitates the deployment of software with complex dependencies and in isolated environments. This is especially helpful if your software has many dependencies, however it also introduces additional complexity ([Nüst et al. 2020](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316)). Other container systems may be even better suited for data analysis, like <a href="https://apptainer.org/">Apptainer</a> (former Singularity, see e.g. <a href="https://pythonspeed.com/articles/containers-filesystem-data-processing/">pythonspeed.com</a> for very brief comparison to Docker).

<div class="lecture-figure">
    <a href="https://www.docker.com/blog/containers-replacing-virtual-machines/"><img src="../_images/VM-vs-containers.png" width="550"></a>
    <a class="img-caption" href="https://www.ibm.com/cloud/blog/containers-vs-vms">Containers make use of the underlying OS capabilities (Linux kernel), and are therefore more lightweight than virtual machines</a>
</div>

Recommended watching:

- "Learn Docker in 12 Minutes 🐳" by Jake Wright, 2016 ([YouTube](https://www.youtube.com/watch?v=YFl2mCHdv24))
- "Docker Tutorial: A Brief Introduction to Docker Virtualization" by Dark Zebra, 2014 ([YouTube](https://www.youtube.com/watch?v=umJYDAYxZys))
- "Docker Concepts Introduction" by Engineer Man, 2019 ([YouTube](https://www.youtube.com/watch?v=6aBsjT5HoGY))
- "Docker vs Virtual Machine | simply explained || Docker Tutorial 6" by TechWorld with Nana, 2019 ([YouTube](https://www.youtube.com/watch?v=5GanJdbHlAA)) (and other videos of this excellent video series)

Containers became so popular that additional software was written to define and manage groups of containers, running on different hosts (computers). One prominent example is Kubernetes ([redhat.com](https://www.redhat.com/en/topics/containers/what-is-kubernetes)).

### Portable package formats

A more light-weight alternative are portable package formats like Flatpak, AppImage and Snaps. They are pretty new and it remains to be seen if they will gain popularity. They are options that you should keep in mind for distributing software.

- AppImage, Flatpak und Snap in comparison ([cstan.io](https://cstan.io/?p=13084&lang=en))
- Flatpak vs Snaps vs AppImage vs Packages ([dev.to](https://dev.to/bearlike/flatpak-vs-snaps-vs-appimage-vs-packages-linux-packaging-formats-compared-3nhl))

<div class="selftest">📝 Self-test:
    <ol class="selftest">
        <li>What is the recommended way to share the code necessary to reproduce your scientific project with others and make sure that it works as expected? (<a href="https://tdhopper.com/blog/my-python-environment-workflow-with-conda">Hint</a>)
        <li>What are other possibilities to distribute your Python code?
    </ol>
</div>

In [2]:
from IPython.core.display import HTML
def css_styling():
    with open("../../_styles/custom.css") as fin:
        styles = fin.read().replace(';', ' !important;')
    return HTML(styles)
css_styling()

---