# From modules to packages

Note: We will be installing a couple of packages (`pytest`, `build`, `twine`) in this lesson, so it's best to do all this inside a new conda environment, not your system python or base conda environment. **Always use the latest pip, no exceptions!**

In [None]:
%%bash

pip install --upgrade pip
pip install pytest build twine

The folder `00-walrus-package/` has the bare-minimum folder structure that we can respectably call a `package`. 

- It has the package source (of course).
- Our package has a version.
- Our package comes with tests.

In [None]:
%%bash

tree 00-walrus-package

Our package is called `walrus`. All its source is in a `src/` folder (Later on it will become clear why we have a `src/walrus/` folder and not just a `walrus/` folder), and tests inside the `tests/` folder.

**What we want to hand over to our users is this entire `00-walrus-package/` folder, so they have our package source, as well as the tests to verify that our package is working correctly.**

So, how can we hand this over (*create a distribution*)?

- Zip up the `00-walrus-package/` folder and email it/put it up on the server.
- Point users to our Github page so they can clone our repo.
- Make everything (*the distribution archive*) available at a common marketplace - The Python Packaging Index (PyPI)

The last option seems most attractive (`pip install walrus` is as simple as it can get). **In addition, there are some HUGE advantages to it**:

- For complicated projects involving C/C++ code, we can do all the preparatory work (compilation/testing etc.) ahead of time such that `pip install walrus` works regardless of ones operating system or python version.
- For any other packages that our code depends on (*dependencies*, e.g. numpy/pandas/..), a `pip install walrus` will automatically install those dependencies for the user, and they don't have to worry about any of those details.

# Doing things (mostly) manually

In what follows, we'll be doing things largely manually so we can inspect files under the hood and understand what's going on. There are (arguably) easier ways to develop/upload packages, which we'll see soon.

## Testing our package under development

Our *tests* currently just ensure that our package has a version, so users can do a `import walrus; print(walrus.__version__);`. Let's make sure our tests pass.

In [None]:
%%bash

cd 00-walrus-package/
pytest tests

### What happened?

Of course python cannot find our package because it is buried inside the `src/` folder. Based on what we've learnt so far, let's see what we can do to fix the situation: (we'll be using `!` to run bash commands as a shortcut).

In [None]:
!cd 00-walrus-package/ && pytest tests

In [None]:
!cd 00-walrus-package/ && python -c "import sys; print(sys.path)"

In [None]:
!cd 00-walrus-package/ && PYTHONPATH=src python -c "import sys; print(sys.path)"

In [None]:
!cd 00-walrus-package/ && PYTHONPATH=src pytest tests

## Building our package

We'll be dealing with the `01-walrus-package/` folder from now on. Here we've added a `walrus.math.add_one` function, and added a test for it. We've also added three files - `pyproject.toml`, `setup.cfg`, and a `README.md`.

In [None]:
%%bash

cd 01-walrus-package
tree

You may have seen other examples of packages involving a `setup.py`. Forget about those - `pyproject.toml` is the latest and greatest way of doing things (see PEP517).

Essentially, it tells python to use a package called `setuptools` to start packaging the project. `setuptools` in turn looks at `setup.cfg` for some metadata about our package (name, version, where to find the source code etc). The `README.md` is for a longer description of our project that we've created just to be good citizens (`setuptools` likes it if we have it).

Go ahead and modify the package name/author/author_email/url in `setup.cfg`. *Why do you think I made the package name so complex instead of just using `walrus`*?

Internally, `pip` creates and activates a new virtual environment, installs `setuptools` and hands over control to it. `setuptools` builds our package, and creates two things for us (*artifacts*):

- The *source distribution* (`sdist`) - a `.tar.gz` file that represents all our source code.
- The *binary wheel* (or just *wheel*) - a `.whl` file. This is a compiled/optimized version, one for each platform and python version where we ran the whole build process.

Since in our case we have nothing but pure Python code (so no platform-specific stuff), the above two are mostly interchangeable. Note also that the wheel file is called `<name>-<version>-py3-none-any.whl` to indicate this platform/python-version free nature of our wheel.

**It is important that we always generate the *sdist***, so that our users (or rather `pip` running on their machines), can generate the `.whl` files on their machines, if they need to. The more `.whl` files we generate on our end and make available, the better of course.

For a much more detailed writeup on all this, see [An Overview of Packaging for Python](https://packaging.python.org/overview/)

In [None]:
%%bash

cd 01-walrus-package
python -m build .
tree

Let's open up these `.tar.gz` and `.whl` files and see what's actually inside them..

There's a `PKG_INFO`, some `egg` files etc. Nothing too complex, but not something that we would like to create by hand. Remember, **`setuptools` (and other newer, fancier ways of doing things like `poetry` or `flit`) make it easier to generate these sdists and wheel files, but in the end are doing the same thing.** Which one you choose is entirely a matter of taste.

## Installing our built package

Now that we have files in the `dist/` folder, we can install them using `pip`. Try it out!
    
1. `pip install <path/to/sdist>`
2. Uninstall the package we installed above (Hint: you could use `import walrus; print(walrus.__path__)` to see where it's installed, and `pip freeze` to see what the package is called so we can use `pip uninstall ..`
3. `pip install <path/to/whl>`
    
Notice how the second command runs much faster than the first one? Why?

(In fact, we could just have done `pip install .` to install our package instead of `python -m build .`, but we wanted to see what's really happening here.)

## Testing our installed package

Now that we have installed our package, let's see if our tests pass:

In [None]:
%%bash

cd 01-walrus-package/
pytest tests

This brings us to why we had a `src/` folder inside our source in the first place - before installing the package, `pytest tests` would have failed, but now it passes. In general, **we would like to test the behavior of the package that eventually gets installed, not what we're developing (certainly our users would be concerned about the behavior of the installed package)**. Having a `src/` folder disallows us from *accidentally* picking up our source folder during development, unless we're very explicit about it (by specifying `PYTHONPATH` etc.)

## Making the package available on PyPI

Now that the distribution artifacts are built in `dist/` (and tested!), we can upload them to [PyPI](https://pypi.org/) using `twine`. We'll have to create an account on PyPI first of course.

In [None]:
%%bash

twine upload dist/*

### Potential Gotchas

1. We wisely named our package `edu-princeton-<netid>-walrus` to avoid conflicts with any existing packages that might already be on PyPI, but this was just to get around uploading it to PyPI. What if our users wanted to use our walrus package as well as the one that is already available on PyPI? Which one would `import walrus` use?

      In general all hell will break lose in such situations, and user experience will depend on the order in which the packagese were installed. This is almost never a real problem though, as long as package names are unique and wisely chosen. **You will almost always have the two names match exactly.** We just chose different names to illustrate the point. In the rare cases where we *do* want to support names that are already out there (e.g. when writing extensions for packages that others have written), we use something called *namespace packages*. More details at [Packaging namespace packages](https://packaging.python.org/guides/packaging-namespace-packages/).
      
2. We had to specify our package version twice! (once in __init__.py, and again in `setup.cfg`.

    The former is to allow users to easily inquire `walrus.__version__` programmatically (which has become a convention by now). The latter is to let `pip` do its job properly (for example, if we say `pip install edu-princeton-vineetb-walrus==0.1.0`). Having to specify them independently twice is inconvenient and error prone for sure. We'll see ways to get around this soon.

# Using `poetry` to automate everything

Coming soon..