# Python Packages and Environments

## What to use Where, and Why

### MOAD Group Software Discussion
### ??-Nov-2020

This notebook can be viewed as a slideshow by using the 
(RISE)[https://rise.readthedocs.io/en/stable/index.html] 
slide show extension for Jupyter.

*Note: RISE only works with `jupyter notebook`, not with `jupyter lab` :-(*

If you are working in an up to date clone of the 
(UBC-MOAD/PythonNotes repo)[https://github.com/UBC-MOAD/PythonNotes],
you can run the slideshow locally.
To do so:
* create an conda environment containing `jupyter` and `rise` with:
```bash
conda env create -f PythonNotes/pkgs-envs/environment.yaml
```
* start `jupyter notebook`
* open `PythonNotes/pkgs-envs/PythonPkgsEnvsSlides.ipynb`
* use `Alt+r` or the `Enter/Exit RISE Slideshow` toobar button to start/stop the slideshow mode
* use `Space` and `Shift+Space` to navigate forward and backward through the slide cells

* What is a Python package?
* What is a Python environment?

* 2 Python package managers: `conda` and `pip`
* 2 Python environment managers: `conda env` and `virtualenv`
* 4 ways to intall Python packages:
    * `conda install ...`
    * `pip install ...`
    * `pip install -e ...`
    * `pip install --user ...`

* What to use Where, and Why

# Python Ecosystem

* Interpreter  (https://docs.python.org/3/reference/index.html)
    * Written in C

* Standard Library
    * Built-in Functions, Built-in Constants, Built-in Exceptions
      (https://docs.python.org/3/library/index.html)
        * Also written in C

    * Python modules that you `import`

    * Included with the language

* Community Developed Packages
    * Collections of Python modules that we install so that we can `import` things from them

# Python Modules

* Python code in a `.py` file
* Usually function definitions (`def`) and class definitions (`class`)
* Everything in a module is executed when it is imported

# Python Packages

* Collection of Python modules with some metadata
* Mechanism for distributing Python code between users
* Obtained from package channels (conda-forge), 
  or indices (PyPI) on the Internet,
  or code repository clones (GitHub, Bitbucket, GitLab, ...)
  
### Aside

[Docs about how MOAD Python packages are structured, and why](https://ubc-moad-docs.readthedocs.io/en/latest/python_packaging/pkg_structure.html)

## Packages Give Us 2 Features/Challenges 

* Can include Python extensions written in C
    * NumPy, SciPy, netCDF4, ...
    * Compiler(s), libraries, build tools, etc. are required to install from source code
* Packages can depend on other packages; i.e. PackageA requires that PackageB is also installed in order to work
    * Leads to a web of dependencies
    * Need to construct and solve a graph to satisfy package version constraints

# Package Managers

Because `sys.path.append(...)` doesn't scale 😱

* Download packages we want to install,
  and their dependencies,
  from the Internet
* Store the files where Python can find them

## pip

## conda

#### Others past, and present

## pip

* **p**ip **i**nstalls **p**ackages
* But only from source code
    * Until recently, when "wheels" were introduced
* Naive dependency resolver
    * Until very recently: "new resolver" in pip=20.2.4 released 2020-10-16
* Package isolation is a separate (but highly recommended) story
    * `pip install` ... permission denied
    * ~`sudo pip install`~ 😱

## conda

Scientific Python community,
lead by Travis Oliphant in 2012,
couldn't wait for the Python Packaging Authority (PyPA)'s plans for pip and wheels to come to fruition

## conda

* Packages are built by maintainer, not users
    * Solves the build problem for extensions
    * Allows installation of binary packages that aren't even Python; e.g. gfortran
    * Allows installation of different versions of Python itself
* Meta-packages; e.g. anaconda
* Dependency resolver looks at packages being installed *and* packages already installed
    * But dependency resolution is still a hard problem...
* Implicitly uses environments to isolate collections of packages
* `pip` can be used inside `conda`-managed environments

# Python Environments

### Directory tree

* In user file space where Python packages are installed
* Isolates those packages from the system Python packages, avoiding:
    * `pip install` ... permission denied
    * ~`sudo pip install`~ 😱
    * Breaking your operating system by overwriting Python packages it installed
    
### PATH environment variable manipulation

* Ensure that the operating system finds the Python packages in the environment before it looks in the usual operating system places
* Maybe other environment variables too

# Installing to the User Site

`python3 -m pip install --user package`

* Limited usefulness
* Really only for packages that provide a command-line interface
  (i.e. not really about being able to `import` from the package)
* Installs files into a hidden tree in your home directory
    * Typically `~/.local/`, but `python3 -m site --user-base` will say for sure
* Also need to ensure that `~/.local/bin` is near the front of your `PATH`

# Installing to the User Site

We use this on HPC machines to install packages like `NEMO-Cmd`, `SalishSeaCmd`, and `MOHID-Cmd` from our Git clones:

`python3 -m pip install --user -e $PROJECT/$USER/MEOPAR/NEMO-Cmd/`

It makes it so that you can do `nemo run`, `salishsea run`, or `mohid run` without worrying about activating a more sophisticated Python environment.

Relies on:

* Already having done `module load python/3.8.2`
* Having `$HOME/.local/bin` in your `PATH`

# Installing to the User Site

### Docs

https://packaging.python.org/tutorials/installing-packages/#installing-to-the-user-site

# Aside 1: `python3 -m pip` ???

The `-m` option on `python3` means:

    Search sys.path for the named module and execute its contents
    
This ensures that the `pip` (or other package module) that you run is the one associated with the presently activate Python environment.
If there is no environment active,
it ensures that you are using the Python 3 from the HPC module you loaded.

Getting things installed in the wrong environment is one of the biggest pain-points of using environment.
This avoids that.
    
    
### Docs
https://docs.python.org/3/using/cmdline.html#cmdoption-m

# Aside 2: `python3 -m pip install -e` ???

The `-e` option (short for `--editable`) on `pip install` means:

    Install the package using symbolic links,
    such that it’s available on sys.path, 
    yet can still be edited directly from its source files.
    
We use this for our group-developed packages.
It makes the workflow for getting updates into our installed packages (usually) a simple `git pull` in the package repository clone directory.

# Aside 2: `python3 -m pip install -e`

Editable installs avoid:

* me having to build releases of our packages and upload them to a package repository
* me having to decide when the changes that have happened warrant building a release
* you having to wait for me to do those things
* you having to install the new realease into your environment to get access to the changes

# Progress Check

* What is a Python package?
* What is a Python environment?

* 2 Python package managers: `conda` and `pip`
* 2 Python environment managers: `conda env` and `virtualenv`
* 4 ways to intall Python packages:
    * `conda install ...`
    * `pip install ...`
    * `pip install -e ...`
    * `pip install --user ...`

* What to use Where, and Why

## `conda env` and `virtualenv` Environments

### Directory tree

* In user file space where Python packages are installed
* Isolates those packages from the system Python packages
    
### PATH environment variable manipulation

* Ensure that the operating system finds the Python packages in the environment before it looks in the usual operating system places
* Maybe other environment variables too

# `conda env`

The first choice!

Use them everywhere (except on Compute Canada HPC clusters):
* on your laptop
* on Waterhole workstations and `salish`
* on cloud VMs

If you have `anaconda` installed,
you have `conda env`.

If you are starting from scratch,
use [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
to get the `conda` package manager and `conda env` environment manager
without the hundreds of packages in the `anaconda` meta-package.

# `python3 -m virtualenv`

Use them on Compute Canada HPC clusters.

* Compute Canada have built wheels for many (but not all)
  of the scientific Python packages.
  In some cases those take advantage are specific features of
  the HPC architecture.
* Their docs explicitly request us not to install `anaconda` on clusters.

`module load python/3.8.2` (or whatever is the latest version)
includes `virtualenv` and `pip`.