# Python Environments

![xkcd comic on Python environments](https://imgs.xkcd.com/comics/python_environment.png)

## **Note: _This notebook will not work as designed unless ran from our [JupyterHub](https://europa.hpc.nrel.gov) instance_**

## Table of Contents
* [System Python](#system-python)
* [User Environments](#user-environments)
    * [`.local` User Environment](#Local-/home-install,-a.k.a-~/.local)
    * [Python Virtual Environments](#Python-Virtual-Environments)
    * [Anaconda Environments](#Anaconda-Environments)
* [The Right Tool for the Job](#The-Right-Tool-for-the-Job)

---

## System Python
This python environment is used by Operating System root user for package managers and system utilities. These are built on long-term stable versions so they don't have bleeding-edge versions of packages, let alone many packages outside of the standard library. Some systems are still using Python 2 as system python as of this workshop.

Without loading any modules, the python you are referencing is the system-installation which does not offer much freedom to non-admin users. **You are likely not an admin user and do not have write-privileges to the system python directories, as that may compromise the integrity of the system.**

In [19]:
%%bash -l
module purge

# Default Python on normal Eagle nodes is system python
echo "Default Python paths on Eagle login/compute nodes:"
ssh -T el1 <<EOF
which \
    python  \
    python2 \
    python3
EOF
echo

# Default Python on JupyterHub is a Jupyter installation
echo "Default Python paths on Europa:"
which \
    python  \
    python2 \
    python3

Default Python paths on Eagle login/compute nodes:
/usr/bin/python
/usr/bin/python2
/usr/bin/python3

Default Python paths on Europa:
/opt/jupyter/anaconda3/bin/python
/bin/python2
/opt/jupyter/anaconda3/bin/python3


For the sake of demonstration, let's try to install some libraries and notice we will get denied permission:

In [20]:
# Timeout prevents pip from hanging forever
!timeout 5 \
    pip install colorz

Collecting colorz
  Using cached https://files.pythonhosted.org/packages/6b/38/1dcc0641bfbf8edf1d3310a879cde418ff8e86c13f121c8d018940a992aa/colorz-1.0.3-py2.py3-none-any.whl
Installing collected packages: colorz
[31mCould not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/opt/jupyter/anaconda3/lib/python3.7/site-packages/colorz.py'
Consider using the `--user` option or check the permissions.
[0m


---

## User Environments

Obviously, a python environment where we can't manage dependencies is not of much use. Since the HPC cluster is a shared system by design, it is important to provide tools for each user to manage their own Python environments, otherwise there would be rampant compability issues and version mismatches from many users trying to install the dependencies they need.

*Enter Python virtual environments*

Luckily, there is a plethora of options for users to manage their own virtual installations of Python in directories that they (*and usually **only** they*) have write privileges in. Let's demonstrate these methods:

### Local `/home` install, a.k.a `~/.local`

Perhaps you noticed in the error response we got from trying to install a module to the operating system python above that it suggests to use the `--user` flag. This will create a Python installation hierarchy in `$HOME/.local` which is a hidden directory in your `/home` where you certainly (or at least *should*) have write privileges while logged in. However, this is a singular Python environment&mdash;if you find yourself needing to use libraries which require different versions of a dependency, that is where this method starts to show its flaws.

Let's see it in action:

In [30]:
%%bash
PATH="$PATH:$HOME/.local/bin" # this is so our shell can find .local executables

check_if_executable_exists() {
    local path="$(command -v $1)"
    if [ -z "$path" ]
        then
            echo "No executables found named \"$1\""
        else
            echo "\"$1\" found in $path"
    fi
    echo # Newline for readability in output
}

# There probably isn't any executable named this before installing
check_if_executable_exists colorz

# Timeout prevents pip from hanging forever
timeout 5 \
    pip install --user \
        --no-warn-script-location \
        colorz \
    | tail -1 # We don't need the whole `pip` output
        
echo # Newline for readability in output
    
# Now this should print the path of the executable in $HOME/.local/bin
check_if_executable_exists colorz

timeout 5 \
    pip uninstall --yes --quiet colorz # Don't keep `colorz` unless you want to

No executables found named "colorz"

Successfully installed colorz-1.0.3

"colorz" found in /home/mbartlet/.local/bin/colorz



### Python Virtual Environments

Virtual environments allow you to create a virtual python installation in any directory. Usually, they are created alongside any project source code they may accompany. Virtual environments give you the liberty of cleanly managing collections of dependencies with specific versions.

*Note that earlier versions of python used the "virtualenv" package to install and manage virtual environments, however as of writing that module is largely deprecated relative to "venv"*

We can create a virtual environment to accomplish the same as above:

In [22]:
%%bash -l

TMP_DIR=$(mktemp -d) # Create a temporary directory
VENV_NAME="demo_virtual_environment"

cd $TMP_DIR
python3 -m venv  $VENV_NAME # Create a virtual environment

printf "Path before activating virtual environment:\n\t$(which python3)\n\n"

source  $VENV_NAME/bin/activate # Activate our virtual environment

printf "Path after activating virtual environment:\n\t$(which python3)\n\n"

timeout 10 \
    pip install --quiet colorz # Install a package in the virtual environment

printf "\`colorz\` installed in virtual environment:\n\t$(which colorz)\n\n"

deactivate # Deactivate the virtual environment

printf "Path after deactivating virtual environment:\n\t$(which python3)\n\n"

rm -rf $TMP_DIR # Destroy the demo virtual environment for cleanliness

Path before activating virtual environment:
	/opt/jupyter/anaconda3/bin/python3

Path after activating virtual environment:
	/tmp/tmp.SEwHHmJKIb/demo_virtual_environment/bin/python3

`colorz` installed in virtual environment:
	/tmp/tmp.SEwHHmJKIb/demo_virtual_environment/bin/colorz

Path after deactivating virtual environment:
	/opt/jupyter/anaconda3/bin/python3



### Anaconda Environments

Virtual environment are well and good, however you are limited to use the version of Python that the `venv` module is installed with. Bleeding-edge versions of packages may start to require the latest python language features, or similarly known-stable versions of modules may not be compatible with later version of Python. This is part of Anaconda's niche, as it allows you to manage entire installations of many common data science softwares (not just limited to Python).

On NREL HPC, Anaconda is accessible as an optional software module using LMOD. Here is how to source the Anaconda software module so your shell is able to utilize the installation:

In [23]:
%%bash -l

# Simply loading the Anaconda module will change your Python environment

module purge # Make sure no other modules conflict 

printf "Python path before loading Anaconda:\n\t$(which python)\n\n"

module load conda # Load Anaconda, named "conda" after its executable name.

printf "Python path after loading Anaconda:\n\t$(which python)\n\n"

Python path before loading Anaconda:
	/opt/jupyter/anaconda3/bin/python

Python path after loading Anaconda:
	/nopt/nrel/apps/anaconda/5.3/bin/python



Now let's repeat what we were able to accomplish using the `venv` module, but using an Anaconda environment to install a specific version of Python:

In [31]:
# This block doesn't use the %%bash magic so output from conda can be observed in real-time

ENV_NAME="conda_demo_env"

import os
os.environ['DEMO_ENV_NAME']=ENV_NAME # Export variable to cells further down

# Make sure the environment doesn't exist already
!conda-env remove --name $ENV_NAME

!conda create --use-local --yes --name $ENV_NAME python=3.5

Collecting package metadata: done
Solving environment: done


  current version: 4.6.14
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/mbartlet/.conda/envs/conda_demo_env

  added / updated specs:
    - python=3.5


The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-main
  bzip2              conda-forge/linux-64::bzip2-1.0.8-h516909a_1
  ca-certificates    conda-forge/linux-64::ca-certificates-2019.11.28-hecc5488_0
  certifi            conda-forge/linux-64::certifi-2018.8.24-py35_1001
  libffi             conda-forge/linux-64::libffi-3.2.1-he1b5a44_1006
  libgcc-ng          conda-forge/linux-64::libgcc-ng-9.2.0-hdf63c60_0
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-9.2.0-hdf63c60_0
  ncurses            conda-forge/linux-64::ncurses-6.1-hf484d3e_1002
  openssl            conda-forge/linux-64::openssl-1.0.2t-h14c

In [32]:
%%bash -l

module purge ; module load conda
ENV_NAME=$DEMO_ENV_NAME       # Exported from previous cell

printf "Before activating conda environment:\n"
printf "\tPython version: $(python --version 2>&1)\n"
printf "\tPython path: $(which python)\n"
printf "\tpip path: $(which pip)\n\n"

source activate $ENV_NAME

pip install --quiet colorz

printf "After activating conda environment:\n"
printf "\tPython version: $(python --version)\n"
printf "\tPython path: $(which python)\n"
printf "\tpip path: $(which pip)\n"
printf "\tcolorz path: $(which colorz)\n\n"

conda deactivate

conda-env remove --yes --quiet --name $ENV_NAME &>/dev/null

Before activating conda environment:
	Python version: Python 2.7.15 :: Anaconda, Inc.
	Python path: /nopt/nrel/apps/anaconda/5.3/bin/python
	pip path: /nopt/nrel/apps/anaconda/5.3/bin/pip

After activating conda environment:
	Python version: Python 3.5.5
	Python path: /home/mbartlet/.conda/envs/conda_demo_env/bin/python
	pip path: /home/mbartlet/.conda/envs/conda_demo_env/bin/pip
	colorz path: /home/mbartlet/.conda/envs/conda_demo_env/bin/colorz



Here we see that after activating our conda environment, we now have a Python 3.5 interpretter as requested in the `conda create` command.

There are many other features that Anaconda provides to create/update/clone/export environments you create, such as creating environments from an easily shareable file, duplicating existing environments, or even completely modifying the underlying python installation of an existing environment. For intimate detail one should consult the [Anaconda docs on environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

---

## The Right Tool for the Job

Here are our recommendations for managing user environments on NREL HPC systems:

* Avoid using the `--user` installation, even for dependencies you are likely to use very frequently such as numpy and pandas. Many users are not aware of how the `--user` flag operates and can create ambiguities about which environment is hosting certain packages and modules that are found by the python interpretter.

* If your workflow uses Python exclusively and modestly you should prefer virtual environments created with the `venv` module over fully-loaded Anaconda environments. A frequent issue observed by support staff is HPC users will often exhaust their limited `/home` storage quotas by simply having a handful of Anaconda environments. There are techniques to minimize the footprint of your conda environments, but simplicity over versatility is one of the tenets of [The Zen of Python](https://www.python.org/dev/peps/pep-0020/#id3).

* Where Anaconda shines is managing data-science software stacks. If you need to switch between various versions of Python, R, Julia, etc. then that is where the convenience of conda becomes obvious. Conda may also handle other external dependencies at the operating system level, such as automatically installing CUDA and cuDNN libraries when asked to install Tensorflow. If your needs end up transcending a single Python installation, then that is when Anaconda is most likely to be a convience.

Below is a comparison between various management commands mentioned above as it is presented in the [Anaconda docs](https://conda.io/projects/conda/en/latest/commands.html#conda-vs-pip-vs-virtualenv-commands).



| Task                                 | Conda package and environment manager command       | Pip package manager command                                       | Virtualenv environment manager command              |
|--------------------------------------|-----------------------------------------------------|-------------------------------------------------------------------|-----------------------------------------------------|
| Install a package                    | conda install <span>\$</span>PACKAGE_NAME                         | pip install <span>\$</span>PACKAGE_NAME                                         | X                                                   |
| Update a package                     | conda update --name <span>\$</span>ENVIRONMENT_NAME <span>\$</span>PACKAGE_NAME | pip install --upgrade <span>\$</span>PACKAGE_NAME                               | X                                                   |
| Update package manager               | conda update conda                                  | Linux/macOS: pip install -U pip Win: python -m pip install -U pip | X                                                   |
| Uninstall a package                  | conda remove --name <span>\$</span>ENVIRONMENT_NAME <span>\$</span>PACKAGE_NAME | pip uninstall <span>\$</span>PACKAGE_NAME                                       | X                                                   |
| Create an environment                | conda create --name <span>\$</span>ENVIRONMENT_NAME python        | X                                                                 | cd <span>\$</span>ENV_BASE_DIR; virtualenv <span>\$</span>ENVIRONMENT_NAME      |
| Activate an environment              | conda activate <span>\$</span>ENVIRONMENT_NAME*                   | X                                                                 | source <span>\$</span>ENV_BASE_DIR/<span>\$</span>ENVIRONMENT_NAME/bin/activate |
| Deactivate an environment            | conda deactivate                                    | X                                                                 | deactivate                                          |
| Search available packages            | conda search <span>\$</span>SEARCH_TERM                           | pip search <span>\$</span>SEARCH_TERM                                           | X                                                   |
| Install package from specific source | conda install --channel <span>\$</span>URL <span>\$</span>PACKAGE_NAME          | pip install --index-url <span>\$</span>URL <span>\$</span>PACKAGE_NAME                        | X                                                   |
| List installed packages              | conda list --name <span>\$</span>ENVIRONMENT_NAME                 | pip list                                                          | X                                                   |
| Create requirements file             | conda list --export                                 | pip freeze                                                        | X                                                   |
| List all environments                | conda info --envs                                   | X                                                                 | Install virtualenv wrapper, then lsvirtualenv       |
| Install other package manager        | conda install pip                                   | pip install conda                                                 | X                                                   |
| Install Python                       | conda install python=x.x                            | X                                                                 | X                                                   |
| Update Python                        | conda update python*                                | X                                                                 | X                                                   |