# Modules and Packaging

Two halves to the this presentation:

1. Modules (how they work, python import logic) and virtualenv
2. Packaging (setup.py)

## Modules

  - what is a module
  - what is a package
  - importing
  - python path
  - virtualenv/interpreters
  - entry-points

## Packaging (setup.py)

* what resources to trust and what not to trust
* projects to avoid and projects to look out for
* creating a package
* releasing as open source
* beyond pure python

# Modules

## What is a module

Two things:
* .py file (or possibly something else, if there's compiled code or other magic going on).
* A namespace (an object which holds variables, functions, classes or even other modules).

They are related, but changing one doesn't affect the other (we'll come to reloading in a bit).

## A special module: `__main__`

* `__main__` is the name allocated to the first python file run
* `python example.py` -> `example.py` is `__main__`
* This is how
```python
if __name__ == '__main__':
    main()
```
works
* See https://docs.python.org/3/library/__main__.html#module-__main__

## What is a package

* Module which contains other modules
* A directory with a `__init__.py` file (usually)
* Not the thing we distribute code as, it just has the same name :(

## Namespace packages :(

* Is a way of sharing code between different groups (e.g. `scikits.learn`, `scikits.images`)
* Too many different, incompatible ways, causing too much breakage
* Packages renamed (`scikits.learn` -> `sklearn`, `scikits.images` -> `skimage`)
* Avoid making new ones

## Importing

* See https://docs.python.org/3/reference/import.html and https://docs.python.org/3/library/importlib.html for all the details about how importing works (there are a few surprising bits of how importing works).


* Two types of imports:
  1. Absolute
  2. Relative
* Absolute: `import numpy.fft`
* Relative: `from . import my_module`
* **Note:** On Python 2, `import numpy` will import `numpy.py` in the current directory if it exists (use `from __future import absolute_import` to fix this) 

## Star imports

* `from numpy import *` works by looking at `numpy.__all__` first, otherwise grabs everything (can be quite slow)

## Renaming imports

* Can use `as` to set name of import (to avoid conflicts)
* Can use it multiple times: `from numpy import sin as np_sin, cos as np_cos`

## (Aside) Reloading

* You can try to reloading code via `importlib.reload` (or other reloading code) however:
  - Most compiled modules aren't set up to handle reloading, you may get random segfaults (or corruption of data)
  - Some modules are quite dynamic, and will not reload correctly
* Best to save work to a file (e.g. csv, fits, HDF5 -> pickle is also useful, but assumes modules do not change), and then rerun the code.

## The python path (`sys.path`)

* The python path is like `$PATH` in the shell: it sets what order different places should be looked at for a module.
* The python path on my system is currently:

In [2]:
import sys; print(sys.path)

['/home/aragilar/Projects/Uni/PhD/ADACS-AAO-PythonII', '/home/aragilar/.virtualenvs/tmp-83bfef26f6db874/lib/python37.zip', '/home/aragilar/.virtualenvs/tmp-83bfef26f6db874/lib/python3.7', '/home/aragilar/.virtualenvs/tmp-83bfef26f6db874/lib/python3.7/lib-dynload', '/usr/lib/python3.7', '', '/home/aragilar/.virtualenvs/tmp-83bfef26f6db874/lib/python3.7/site-packages', '/home/aragilar/.virtualenvs/tmp-83bfef26f6db874/lib/python3.7/site-packages/IPython/extensions', '/home/aragilar/.ipython']


* Note that `''` shows that the current directory is on the path

* You can configure the python path via `$PYTHONPATH` but don't, there are better ways, and having `$PYTHONPATH` set is one of the first things to look at when modules are not importing correctly.

## Virtualenv and Interpreters

* When python starts, to work out where the standard library and other packages are, it looks for a site.py (or a pyvenv.cfg on Python 3.3), which contains the right information to load the rest of python.
* Virtualenv is a package which does the right set of things to make this work, especially on older (pre Python 3.3) versions where this is harder.
* Conda environments work in a similar way

## Why use virtualenv (or conda environments)?

* These projects allow you to experiment with newer versions of packages
* Different projects may require older numpy/astropy versions
* Keeps everything clean -> easy reinstalls.

## Using virtualenv — Demo time!

## (Aside) `python -m pip`

* One common source of issues with installs is where `python` and `pip` are associated with different python interpreters (you can try to identify this via `which python` and `which pip`).
* It is strongly recommended (by the Python Packaging Authority - PyPA) to use `python -m pip` (or `python3 -m pip`).
* Don't use `pip3`, it's more likely than `pip` to be pointing to the wrong place (as `pip3` is being phased out).

## Useful tools

* [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/) helps with managing lots of virtualenvs, and can create temporary ones for testing with one command
* [tox](https://tox.readthedocs.io/en/latest/) makes it easy to test with lots of different python versions/configurations at ones

## (Extra) Entry Points and Plugins

* It's possible to use the python import system to load data stored in a package
* Or create wrappers around python functions to call them from shell
* `pkg_resources` was the old way of doing (part of setuptools)
* [`importlib.resources`](https://docs.python.org/3/library/importlib.html#module-importlib.resources) is its replacement
* There's other projects like [pluggy](https://pypi.org/project/pluggy/), which makes it easier to make plugin system

# Packaging

## First, a warning

* Unlike many areas of python, searching on the internet is going to give you incorrect information
  - Information is likely out-of-date, not applicable, or just wrong
  - See https://stackoverflow.com/a/4806227/1306020 - this once was the correct answer (with infographics!), now it is almost completely wrong

* Python is used by many different groups (webdev, science, sysadmin, education, VF/CGI, desktop applications, ...), all of which have different needs.
  - This has created a long legacy of projects which claim to solve everything, only for the projects to die a few years later.

* After many years of different resources contradicting each other, [packaging.python.org](https://packaging.python.org/) is the official place for looking up how to install, package and distribute python code. See what it says first, rather than google.

## Projects to avoid

* Some projects that whilst sound nice, will likely cause problems down the road (come speak to me if you're curious why):
  - pipenv
  - poetry
  - hatch

## Projects to look out for

* I'm not ready to recommend [flit](https://flit.readthedocs.io/en/latest/) yet, but it will avoid scope creep like the other projects, and so should be nicer than the current system. Maybe go ahead and try it?

## Creating a python project (and packaging it)

* [packaging.python.org](https://packaging.python.org/) has a full tutorial on how to package your code, with all the options explained, but I'm going to quickly go over the template I use to setup my projects with versioning handled by git
* You will need a single directory which will hold everything related to the project (code, documentation, tests), and make this a git repository


* Inside this directory, create a setup.py file, a setup.cfg file, and a src directory
* The src directory will hold your code (we use an src directory to avoid some fun with python paths)

Paste the following into the setup.py file (or see the template next to this file):
```
import setuptools
import versioneer

with open(filename, 'r') as f:
    long_description = f.read()

setuptools.setup(
    cmdclass=versioneer.get_cmdclass(),
    name = "package-name",
    version = versioneer.get_version(),
    author = "",
    author_email = "",
    description = "",
    long_description = long_description,
    packages = setuptools.find_packages('src'),
    package_dir = {'': 'src'},
    install_requires = [],
    python_requires = '',
    license = "",
    keywords = "",
    url = "",
    project_urls={
        'Documentation': '',
        'Source': '',
        'Tracker': '',
    },
    classifiers=[
    ],
)
```

(You'll need to fill in most of the blanks, but most aren't required)

Paste the following into the `setup.cfg` file (replacing `your_package_name`):
```
[versioneer]
VCS = git
style = pep440
versionfile_source = src/your_package_name/_version.py
versionfile_build = your_package_name/_version.py
tag_prefix = v
```

Finally, run `python -m pip install -U versioneer setuptools`, and then `versioneer install`. We've now set up everything.

## Sharing your code

* To package up your code, run `python setup.py sdist`, this will create a new directory `dist/` will will contain a tar file which you can send people.
* You can install your code for testing by going `pip install .` in the directory with the `setup.py` file, or by going `pip install dist/my-project-0.1.tar.gz` (or similar).

* You can also make the code public by uploading it to the [Python Package Index (PyPI or the cheeseshop)](https://pypi.org)
* You want to use twine (get it via `python -m pip install twine`) to upload the file
  - `twine upload dist/project-0.1.tar.gz` or similar will work - the most up to date and secure instructions are [here](https://packaging.python.org/tutorials/packaging-projects/#uploading-the-distribution-archives).

* Note that before you go uploading code to PyPI, make sure you [choose a license to release it under](https://choosealicense.com/), so that people can use it. You may need to speak to your supervisor or other people if they have contributed code to the project, when choosing a license.

## Additional topics

## Generating scripts

* Here's how to generate a script from function (make sure it has no required arguments though!)
```
    entry_points = {
        'console_scripts': [
            "scriptname = package_name.module_name:function_name",
        ],
    }
```


## Beyond pure python

* Covering how to use non-python code as part of a python package is beyond the scope of this workshop, come see Ellert or I for more details. Here's some links to get you started:
  - [c-extensions](https://packaging.python.org/guides/packaging-binary-extensions/)
  - [cython](https://cython.org)
  - [cffi](https://cffi.readthedocs.io/en/latest/)/[ctypes](https://docs.python.org/3/library/ctypes.html)
  - [numba](https://numba.pydata.org/)
  - [f2py and numpy](https://docs.scipy.org/doc/numpy/f2py/)

# Questions