(03:How-to-package-a-Python)=
# How to package a Python
<hr>

To start this book, we will first develop an entire example Python package from beginning to end. The aim of this chapter is to provide a simple and high level overview of the key steps involved in developing a Python package. Later chapters will explore each of these steps in more detail.

## partypy: simulate attendance at your party!

The example package we are going to create in this chapter will help us simulate guest attendance at an event - we'll call it `partpy`. Have you ever planned a party, a wedding, a conference, or any other kind of event and wondered how many of the invited guests will actually show up? Knowing this can be helpful for organising things like seating, catering, and gifts at your event. One way we can estimate how many guests will attend your event is by using simulations. The general idea is to assign a "probability of attendance" to each invited guest and then run virtual versions of our event (simulations) where we model each guest's attendance based on the probability we assigned them. We can repeat this process as many times as we like to generate many estimates of how many guests will attend our event.

We'll explore this concept further as we progress through the chapter, but beloew is an example of what the `partpy` package we are going to build can do. We'll first load in an example guest list of 100 guests, each with their own probability of attendance:

```{note}
For this book, we assume readers have basic familiarity with popular Python packages like `numpy` and `pandas`.
```

```{prompt} python >>> auto
>>> import pandas as pd
>>> guest_list = pd.read_csv("example-guest-list.csv")
>>> guest_list
```

```python
               Name  Probability of attendance
0    Donovan Willis                       0.70
1   Jocelyn Navarro                       0.70
2     Houston Stein                       0.90
3    Carlos Mullins                       0.50
4    Bridger Pruitt                       0.70
..              ...                        ...
95   Maddox Santana                       0.50
96    Ariel Proctor                       0.50
97       Pedro Hull                       0.90
98  Janessa Collins                       0.95
99   Kendrick Burke                       0.30
```

We'll now use `partypy` to run 500 simulations of a party and plot the results in a histogram:

```{prompt} python >>> auto
>>> from partypy.simulate import simulate_party
>>> from partypy.plotting import plot_simulation
>>> results = simulate_party(guest_list["Probability of attendance"], simulations=500)
>>> print(f"Average guests: {results.mean()}")
>>> plot_simulation(results)
```

```{figure} images/altair-plot-1.png
---
width: 50%
name: 03-altair-plot-1a
alt: Histogram of simulation results.
---
Histogram of simulation results.
```

## Package structure

The first thing we need to do to develop our `partypy` package is create an appropriate directory structure. Without getting too technical, a Python package is just a collection of Python modules. A module is a file with a *.py* extension that contains Python definitions and statements such as functions, classes, variables or executable statements. The code you wish to easily reuse and/or share as part of your package will be contained within your package's modules. Along with these Python modules, packages typically include additional files for documentation, tests, and other metadata that together, define a self-contained, shareable, and interpretable piece of software.

We'll discuss modules and Python package structure in more detail in **Chapter 4: {ref}`04:Package-structure-and-distribution`**. While you can create your Python package structure from scratch if you know what you're doing, it's typically much easier to use a pre-made template to set up your package structure - that's what we'll do here. We will use the Python package `cookiecutter` (which you installed back in **Chapter 2: {ref}`02:System-setup`**) to quickly create our package structure for us.

The `cookiecutter` package is a tool for populating a file and directory structure from a pre-made template. People have developed and open-sourced many different `cookiecutter` templates for different projects, such as for creating Python packages, R packages, websites, etc. You can find these templates by, for example, searching online repositories on [GitHub.com](https://www.github.com). We have developed our own `cookiecutter` [template](https://github.com/UBC-MDS/cookiecutter-ubc-mds) for creating Python packages to supplement this book. To use the `cookiecutter` template to set up the structure of our Python package, open up a terminal, change into the directory where you want your package to live and run the line of code below:

```{prompt} bash \$ auto
$ cookiecutter https://github.com/UBC-MDS/cookiecutter-ubc-mds.git
```

You will be prompted to provide information that will help customize the project and pre-populate files with information. Below is an example of how to respond to these prompts (default values for each attribute are shown in square brackets and hitting enter without entering any text will accept the default value). In this tutorial we will be calling our package `partypy`, however, we will eventually be publishing our package to Python's main package index [PyPI](https://pypi.org/). Package names on PyPI must be unique. As a result, **if you plan to follow along with this tutorial you should choose a unique name for your package**. Something like `partypy_[your intials]` might be appropriate, but you can always check if a particular name is already taken by visiting PyPI and searching for that name.

```console
author_name [Monty Python]: Tomas Beuzen
github_username [mpython]: TomasBeuzen
project_name [My Python package]: partypy
project_slug [partypy]: 
project_short_description [A package for doing great things!]: Simulate attendance at your party!
version [0.1.0]: 
python_version [3.9]: 
Select open_source_license:
1 - MIT
2 - Apache License 2.0
3 - GNU General Public License v3.0
4 - Creative Commons Attribution 4.0
5 - None
Choose from 1, 2, 3, 4, 5 [1]: 
Select include_github_actions:
1 - no
2 - build
3 - build+deploy
Choose from 1, 2, 3 [1]: 
```

```{attention}
Most of the options above are fairly self-explanatory but you'll learn more about each one as you make your way through this book. If you're unsure of what value to enter, just follow our lead above.

It's worth noting that in the example above we chose not to include any GitHub Actions files in our initial directory structure. GitHub Actions can help automate the building, testing and deployment of your Python package. We'll explore these topics in more detail in **Chapter 8: {ref}`08:Continuous-integration-and-deployment`**.
```

After responding to the `cookiecutter` prompts, we now have a new directory called `partypy`, with the following structure:

```
partypy
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.rst
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── docs
│   ├── make.bat
│   ├── Makefile
│   ├── requirements.txt
│   └── source
│       ├── changelog.rst
│       ├── conduct.rst
│       ├── conf.py
│       ├── contributing.rst
│       ├── index.rst
│       ├── installation.rst
│       └── usage.ipynb
├── LICENSE
├── pyproject.toml
├── README.md
├── src
│   └── partypy
│       ├── __init__.py
│       └── partypy.py
└── tests
    ├── __init__.py
    └── test_partypy.py
```

This simple step has given us a file and directory structure suitable for building a fully-featured Python package. While there are quite a few files here, at this point we only need to worry about a few of these to get a working package together (we'll explore the others in later chapters). Specifically, we'll be working on:

1. `pyproject.toml`: the file that defines our project's metadata and dependencies and how it will eventually be built and distributed;
2. `src/partypy/partypy.py`: the file where we will write the Python functions that our package will distribute;
3. `tests/test_partypy.py`: the file where we will write tests to ensure that our package's functions work as we expect; and,
4. `docs/`: the directory where we will write and build documentation for our package.

## Putting your project under version control

Before continuing to develop our package it is generally good practice to put your projects under local and remote version control, to better track changes to the project over time and to facilitate collaboration (if desired). The tools we recommend using for this are Git & GitHub (which we set up in **Chapter 2: {ref}`02:System-setup`**). 

```{note}
For this book, we assume readers have [basic Git skills](https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository).
```

### Set up local version control

To set up local version control from a terminal, enter the root `partypy` directory, and initialize the project as a repository to be tracked by Git using:

```{prompt} bash \$ auto
$ cd partypy
$ git init
```

```console
Initialized empty Git repository in /Users/tbeuzen/partypy/.git/
```

Next, we need to tell Git which files to track (which will be all of them at this point) and then commit these changes locally:

```{prompt} bash \$ auto
$ git add .
$ git commit -m "initial package setup"
```

```console
[master (root-commit) 8b4edcb] initial package setup
 19 files changed, 722 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 .readthedocs.yml
 create mode 100755 CONDUCT.rst
 create mode 100755 CONTRIBUTING.rst
 ...
 create mode 100644 src/partypy/__init__.py
 create mode 100644 src/partypy/partypy.py
 create mode 100644 tests/__init__.py
 create mode 100644 tests/test_partypy.py
```

### Set up remote version control

Now that we have set up our local version control, let's create a repository on [GitHub.com](https://github.com/) and set that as the remote version control home for this project. Head over to [GitHub.com](https://www.github.com) and create a new repository as demonstrated in the image below:

```{figure} images/set-up-github-1.png
---
width: 100%
name: 03-set-up-github-1
alt: Creating a new repository in GitHub.
---
Creating a new repository in GitHub.
```

To follow along with this tutorial, select the following options when setting up your GitHub repository: 

1. Give the GitHub repository the same name as your Python package and give it a short description;
2. Make the GitHub repository public; and,
3. **Do not** initialize the GitHub.com repository with a README file (we've already created our own README using `cookiecutter`).

```{figure} images/set-up-github-2.png
---
width: 100%
name: 03-github-2
alt: Setting up a new repository in GitHub.
---
Setting up a new repository in GitHub.
```

Next, copy the remote link to your repository and then use the commands below to link your local repository with the remote repository, and push your project to GitHub:

```{figure} images/set-up-github-3.png
---
width: 100%
name: 03-github-3
alt: Instructions on how to link local and remote repositories.
---
Instructions on how to link local and remote repositories.
```

```{prompt} bash \$ auto
$ git remote add origin git@github.com:TomasBeuzen/partypy.git
$ git branch -M main
$ git push -u origin main
```

```console
Enumerating objects: 24, done.
Counting objects: 100% (24/24), done.
Delta compression using up to 8 threads
Compressing objects: 100% (18/18), done.
Writing objects: 100% (24/24), 9.76 KiB | 3.25 MiB/s, done.
Total 24 (delta 0), reused 0 (delta 0)
To github.com:TomasBeuzen/partypy.git
 * [new branch]      main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.
```

```{note}
The commands above should be specified to your GitHub username and the name of your Python package.

Further, the example above uses SSH authentication with GitHub which we recommend setting up. SSH is useful for connecting to GitHub without having to supply your username and password every time. If you're interested in setting up SSH, take a look at the [GitHub documentation](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh). If you don't have SSH authentication set up, HTTPS authentication works as well and would require the use of the following url in place of the one shown above to set the remote: `https://github.com/TomasBeuzen/partypy.git`. 
```

## Creating a virtual environment

Before we get started writing the Python code for our package, it is good practice to set up a virtual environment for our project. Recall that a virtual environment will help isolate our package and its dependencies from other software installed on our computer. There are several options available when it comes to creating and managing virtual environments but `conda` (which we installed back in **Chapter 2: {ref}`02:System-setup`**) is a simple, commonly-used, and effective tool for managing virtual environments.

To use `conda` to create and activate a new virtual environment called `partypy` that includes Python 3.9, run the following in your terminal:

```{prompt} bash \$ auto
$ conda create --name partypy python=3.9 -y
```

To use this new environment for developing and installing software, we should "activate" the environment:

```{prompt} bash \$ auto
$ conda activate partypy
```

By default, `conda` will add a prefix, `(partypy)` in this case, to your terminal prompt to indicate which environment you are working in. Anytime you wish to work on this package, you should activate this environment using the command above.

## Adding dependencies

Let's review the steps we've taken so far:
1. Set up our Python package structure using `cookiecutter`;
2. Put our project under local and remote version control using Git and GitHub; and,
3. Created a virtual environment called `partypy` for our project.

We're now ready to start writing the code for our package. Often, you'll know what other packages your package will depend on before even writing any code. For example, our `partypy` package is going to leverage the `numpy` and `pandas` packages. Thus, before we get started we should install these dependencies and record them as part of our packages' metadata. It's fine if you don't know what dependencies your package will have in advance, you will be able to add new dependencies as you need them using the same workflow shown below (we'll also do this later in the chapter). We will use the command `poetry add` to add the `numpy` and `pandas` dependencies to our package now. This command will install packages into the current environment and update the `[tool.poetry.dependencies]` section of the `pyproject.toml` file which currently only lists Python as a project dependency:

```toml
[tool.poetry.dependencies]
python = "^3.9"
```

Let's add `numpy` and `pandas` as dependencies now by running the following in the terminal:

```{prompt} bash \$ auto
$ poetry add numpy pandas
```

```console
Using version ^1.20.2 for numpy
Using version ^1.2.4 for pandas

Updating dependencies
Resolving dependencies... (0.2s)

Writing lock file

Package operations: 5 installs, 0 updates, 0 removals

  • Installing six (1.15.0)
  • Installing numpy (1.20.2)
  • Installing python-dateutil (2.8.1)
  • Installing pytz (2021.1)
  • Installing pandas (1.2.4)
```

Now if we view our `pyproject.toml` file we see that `numpy` and `pandas` are listed as a dependencies:

```toml
[tool.poetry.dependencies]
python = "^3.9"
numpy = "^1.20.2"
pandas = "^1.2.4"
```

Running `poetry add` actually changed two files, `pyproject.toml` (which we showed above and which records the dependencies of our project) and `poetry.lock` (a record of all the packages and exact versions of them that `poetry` downloaded for this project). These changes are important for our package, so let's commit them to local and remote version control:

```{prompt} bash \$ auto
$ git add pyproject.toml poetry.lock
$ git commit -m "add numpy and pandas as dependencies"
$ git push
```

```{note}
For readers who have used `requirements.txt` before with `pip` or `environment.yaml` with `conda`, you can think of `poetry.lock` as the `poetry` equivalent of those files.
```

## Your first package code

We're now ready to write some Python code for our package! Recall that the package we want to create will estimate guest attendance at a party using simulations. The core idea is to assign a "probability of attendance" to each guest invited to the party and then simulate their attendance as a Bernoulli random variable. You can think of this as modelling each guest's attendance by flipping a coin with two sides, "won't attend" and "attend", but we can specify the probability of the coin landing on "attend". We can also flip the coin as many times as we like (i.e., run as many simulations as we like).

We can run a Bernoulli simulation using the `binomial` function in the `numpy` library, with the argument `n=1` (for the statistically inclined, a Bernoulli random variable is the same as a Binomial random variable with a single trial). As an example, imagine we have a guest that we believe will attend our party with a probability of 0.9 (90%). We can simulate the attendance of that guest by first opening up an interactive Python interpreter:

```{prompt} bash \$ auto
$ python
```

Then running the following code:

```{prompt} python >>> auto
>>> import numpy as np
>>> np.random.binomial(n=1, p=0.9)
```

```python
1
```

A value of `1` indicates the guest attended the party and a value of `0` indicates the guest did not attend the party. If you run the above code several times, you will see many `1`'s and a `0` every now and then. Rather than just re-running our code, we can repeat our simulation more efficiently using the `size` argument of the `binomial()` function. Let's run it 10 times:

```{prompt} python >>> auto
>>> simulations = 10
>>> probability = 0.9
>>> results = np.random.binomial(n=1, p=probability, size=simulations)
>>> results
```

```python
array([1, 1, 0, 1, 0, 0, 1, 0, 1, 1])
```

So our guest attended six of our simulated parties (there are six `1`'s). Now imagine we have three guests that we believe will attend our party with probabilities 0.3, 0.5, 0.9. We can simulate each guest's attendance in 10 simulations using the following code:

```{prompt} python >>> auto
>>> probability = [0.3, 0.5, 0.9]
>>> results = np.random.binomial(n=1, p=probability, size=(simulations, len(probability)))
>>> results
```

```python
array([[0, 1, 1],
       [0, 1, 0],
       [1, 0, 1],
       [0, 1, 1],
       [0, 0, 1],
       [0, 0, 1],
       [0, 1, 1],
       [0, 1, 1],
       [1, 1, 1],
       [1, 0, 1]])
```

The above array represents 10 simulations of a three guests invited to our party. We typically want to know how many total guests attended each simulated party, so we should take the sum of each simulation:

```{prompt} python >>> auto
>>> results.sum(axis=1)
```

```python
array([2, 1, 2, 2, 1, 1, 2, 2, 3, 2])
```

It would be nice to display this information in a clean, tabular format. We'll use a `pandas` dataframe for that:

```{prompt} python >>> auto
>>> import pandas as pd
>>> (pd.DataFrame({"Total guests": results.sum(axis=1),
                   "Simulation": range(1, simulations + 1)})
       .set_index("Simulation")
    )
```

```python
            Total guests
Simulation              
1                      2
2                      1
3                      2
4                      2
5                      1
6                      1
7                      2
8                      2
9                      3
10                     2
```

We now have a nice way to run simulations of guest attendance at a party! But we don't want to have to re-run all that code every time we want to run some simulations. Let's turn the code into a function called `simulate_party()` and execute it in our interactive Python session:

```{note}
This book assumes you know how to write and document functions in Python. To learn more about this see [Think Python, Chapter 3: Functions](http://greenteapress.com/thinkpython/html/thinkpython004.html) by Allen Downey.
```

```{prompt} python >>> auto
>>> def simulate_party(p, simulations = 500):
        """Simulate guest attendance at a party.

        The attendance of each guest is treated as a Bernoulli random variable
        with probability of attendance `p`. The total number of attending guests
        is summed up for each `simulations`.

        Parameters
        ----------
        p : float or array_like of floats
            Probability of guest attendance, >= 0 and <=1.
        simulations : int, optional
            Number of simulations to run. By default, 500.

        Returns
        -------
        pandas.DataFrame
            DataFrame with total number of guests per simulation. 

        Examples
        --------
        >>> simulate_party([0.1, 0.5, 0.9], simulations=5)
                    Total guests
        Simulation              
        1                      2
        2                      1
        3                      2
        4                      2
        5                      1
        """
        result = np.random.binomial(n=1, p=p, size=(simulations, len(p))).sum(axis=1)
        return pd.DataFrame(
            {"Total guests": result, "Simulation": range(1, simulations + 1)}
        ).set_index("Simulation")
```

We can now use the function as follows:

```{prompt} python >>> auto
>>> results = simulate_party(p=[0.3, 0.5, 0.9], simulations = 10)
>>> results
```

```python
            Total guests
Simulation              
1                      2
2                      1
3                      2
4                      2
5                      1
6                      1
7                      2
8                      2
9                      3
10                     2
```

At this point, if you quit from the Python interactive session, the function we defined above will be lost and you will have to define it again in new sessions. The whole idea of a Python package is that we can store Python code, like our `simulate_party()` function, in an installable package, that will allow us, or others, to reuse the code at will in any project without having to rewrite it.

So, let's now include the `simulate_party()` function into our `partypy` package. Where should we put it? Let's review the structure of our Python project:

```
partypy
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.rst
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── docs
├── LICENSE
├── pyproject.toml
├── README.md
├── src
│   └── partypy
│       ├── __init__.py
│       └── partypy.py
└── tests
```

All the code that we would like the user to run as part of our package should live inside the `src` directory. We'll discuss the layout of this package, including the `src` directory, more in **Chapter 4: {ref}`04:Package-structure-and-distribution`**. For a relatively small package with just a few functions, we would house them inside a single python module (i.e., a `.py` file). Our template project directory structure already created and named such a module for us: `src/partypy/partypy.py`. Let's save our function there. Because our function depends on `numpy` and `pandas`, we should also be sure to import them at the top of the file. Here's what `src/partypy/partypy.py` should now look like:

```python
import numpy as np
import pandas as pd


def simulate_party(p, simulations=500):
    """Simulate guest attendance at a party.

    The attendance of each guest is treated as a Bernoulli random variable
    with probability of attendance `p`. The total number of attending guests
    is summed up for each `simulations`.

    Parameters
    ----------
    p : float or array_like of floats
        Probability of guest attendance, >= 0 and <=1.
    simulations : int, optional
        Number of simulations to run. By default, 500.

    Returns
    -------
    pandas.DataFrame
        DataFrame with total number of guests per simulation.

    Examples
    --------
    >>> simulate_party([0.1, 0.5, 0.9], simulations=5)
                Total guests
    Simulation              
    1                      2
    2                      2
    3                      2
    4                      2
    5                      2
    """
    result = np.random.binomial(n=1, p=p, size=(simulations, len(p))).sum(
        axis=1
    )
    return pd.DataFrame(
        {"Total guests": result, "Simulation": range(1, simulations + 1)}
    ).set_index("Simulation")

```

## Test drive your package code

As stated earlier, the whole point of creating a package is so that we can easily reuse our code in any new Python project or interactive session. To test drive our `partypy` package, we can install it in our environment using `poetry install` from the root package directory:

```{prompt} bash \$ auto
$ poetry install
```

```console
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: partypy (0.1.0)
```

```{note}
The above command will install `partypy` and its dependencies in the current virtual environment. Recall that we are working in the `partypy` environment which we activated by running `conda activate partypy` in the terminal.
```

Now, inside the root project directory we can open an interactive Python session:

```{prompt} bash \$ auto
$ python
```

Then import and use our `simulate_party` function from our `partypy` module with the following code:

```{prompt} python >>> auto
>>> from partypy.partypy import simulate_party
>>> simulate_party([0.1, 0.5, 0.9], simulations=5)
```

```python
            Total guests
Simulation              
1                      1
2                      2
3                      2
4                      2
5                      2
```

```{note}
The above syntax is telling Python to import the function `simulate_party` from the `partypy` module of the `partypy` package. There are various other ways to import code from python modules, which we'll explore more in **Chapter 4: {ref}`04:Package-structure-and-distribution`**.
```

Looks like everything is working! In the next section, we'll add some additional code and functionality to our package.

## Your second package code

For very simple packages, you may choose to add all your code into `partypy.py`. But more complex packages will benefit from better compartmentalisation and organisation of code into multiple, logical modules. To illustrate this point, we are going to add a plotting function to our `partypy` package which will plot a histogram of the simulation results output from our `simulate_party()` function. The code and workflow for creating a visualization is quite different to the simulation code we wrote previously, and it makes sense to create a new module to house the visualization code of our package. To that end, we're now going to rename `partypy.py` to `simulate.py` and create a new module called `plotting.py` such that our package will now comprise two modules, each containing code for a distinct purpose:
1. `src/partypy/plotting.py`: contains code related to producing visualizations; and,
2. `src/partypy/simulate.py`: contains code related to running simulations.

With those changes, here's the structure of our Python project:

```
partypy
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.rst
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── docs
├── LICENSE
├── pyproject.toml
├── README.md
├── src
│   └── partypy
│       ├── __init__.py
│       ├── plotting.py
│       └── simulate.py
└── tests
```

We'll be using the `altair` library to make our visualization (but you could of course use any visualization library you like). Let's first add `altair` as a dependency of our package:

```{prompt} bash \$ auto
$ poetry add altair altair_viewer
```

Open up an interactive Python session, and try out the following code to produce a visualization. There is an [example list of guests](https://github.com/UBC-MDS/py-pkgs/tree/master/py-pkgs/data/example-guest-list.csv) in the GitHub repository housing this book which we'll load in for this demonstration:

```{prompt} bash \$ auto
$ python
```

```{prompt} python >>> auto
>>> import pandas as pd
>>> guest_list = pd.read_csv("example-guest-list.csv")
>>> guest_list
```

```python
               Name  Probability of attendance
0    Donovan Willis                       0.70
1   Jocelyn Navarro                       0.70
2     Houston Stein                       0.90
3    Carlos Mullins                       0.50
4    Bridger Pruitt                       0.70
..              ...                        ...
95   Maddox Santana                       0.50
96    Ariel Proctor                       0.50
97       Pedro Hull                       0.90
98  Janessa Collins                       0.95
99   Kendrick Burke                       0.30
```

```{prompt} python >>> auto
>>> import altair as alt
>>> from partypy.partypy import simulate_party
>>> results = simulate_party(guest_list["Probability of attendance"], simulations=500)
>>> histogram = (
        alt.Chart(results)
        .mark_bar()
        .encode(
            x=alt.X(
                "Total guests",
            ),
            y="count()",
            tooltip="count()",
        )
    )
>>> histogram.show()
```

```{figure} images/altair-plot-1.png
---
width: 50%
name: 03-altair-plot-1b
alt: Histogram of simulation results.
---
Histogram of simulation results.
```

To add this plotting functionality to our package, we can add the following code to `plotting.py`:

```python
import altair as alt


def plot_simulation(results):
    """Plot a histogram of simulation results.

    Parameters
    ----------
    results : pandas.DataFrame
        DataFrame of simulation results from `partpy.simulate_party()`

    Returns
    -------
    altair.Chart
        Histogram of simulation results.

    Examples
    --------
    >>> from partypy.simulate import simulate_party
    >>> from partypy.plotting import plot_simulation
    >>> results = simulate([0.1, 0.5, 0.9])
    >>> plot_simulation(results)
    altair.Chart
    """

    histogram = (
        alt.Chart(results)
        .mark_bar()
        .encode(
            x=alt.X(
                "Total guests",
                bin=alt.Bin(maxbins=30),
                axis=alt.Axis(format=".0f"),
            ),
            y="count()",
            tooltip="count()",
        )
    )

    return histogram

```

Let's make sure everything is working by first installing our updated package using `poetry install`:

```{prompt} bash \$ auto
$ poetry install
```

Open an interactive Python session:

```{prompt} bash \$ auto
$ python
```

Then import and use our package's functions with the following code:

```{prompt} python >>> auto
>>> from partypy.simulate import simulate_party
>>> from partypy.plotting import plot_simulation
>>> results = simulate_party([0.1, 0.5, 0.9], simulations=20)
>>> histogram = plot_simulation(results)
>>> histogram
```

```python
alt.Chart(...)
```

```{note}
Altair require a Javascript frontend to display charts. Notebook environments like Jupyter Notebook, JupyterLab, and Zeppelin combine a Python backend with a Javascript frontend, so can display Altair charts out-of-the-box. But when working in the Python interpreter from the command line, we need to explicitly call the `.show()` method which will display our chart in the browser: `histogram.show()`.
```

Now that we have a working package, you can exit your Python session and we should commit changes to local and remote version control. We'll use the shorthand `git add .` here to commit all our changed files to version control:

```{prompt} bash \$ auto
git add .
git commit -m "first working version of partypy"
git push
```

## Writing tests

At this point we have a package, `partypy`, which we can install locally in any environment and use in any project we wish. But to make our package robust and to ensure it does in fact do what it is supposed to do, we should write some formal unit tests. We'll discuss testing in detail in **Chapter 5: {ref}`05:Testing`**, but will go over the key steps here. In Python packages, tests typically live inside the `tests` directory, in a file called `test_<module_name>.py`. Thus for the `partypy` package this is `tests/test_partypy.py`. Let's add the below unit tests (`test_version()`, `test_simulate_party` and `test_plot_simulation`) for our `partypy` function to `tests/test_partypy.py` now. The tests themselves are fairly self-explanatory - take a moment to take a look at the code and convince yourself that these tests are testing some of the expected behaviour of our `partypy` package:

```python
from partypy import __version__
from partypy.simulate import simulate_party
from partypy.plotting import plot_simulation
import pandas as pd
import altair as alt


def test_version():
    assert __version__ == "0.1.0"


def test_simulate_party():
    assert isinstance(simulate_party([0]), pd.DataFrame)
    assert simulate_party([0], 10)["Total guests"].sum() == 0
    assert simulate_party([1], 10)["Total guests"].sum() == 10


def test_plot_simulation():
    results = simulate_party([0])
    plot = plot_simulation(results)
    assert isinstance(plot, alt.Chart)
    assert plot.mark == "bar"
    assert plot.data["Total guests"].sum() == 0
    
```

While we could run our test functions by starting a Python session, and importing and running them manually, it is much more efficient to automate the testing workflow. One way we can do this is to use the `pytest` package. A single call to `pytest` from the root of a project will look for all files in the `tests` directory, import all files prefixed with `test*` and then call all functions prefixed with `test*`.

To try this out, we first add `pytest` as a development dependency via `poetry`:

```{prompt} bash \$ auto
$ poetry add --dev pytest
```

A development dependency is a package that is not required by a user to use your package, but is required for development purposes. The use of `--dev` in the above command specifies a development dependency, rather than a package function dependency. If you look in `pyproject.toml` you will see that `pytest` gets added under the `[tool.poetry.dev-dependencies]` section as opposed to the `[tool.poetry.dependencies]` section.

To run the above tests, we simply type the following in a terminal from our root package directory:

```{prompt} bash \$ auto
$ pytest
```

```console
============================= test session starts ==============================
platform darwin -- Python 3.9.2, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /Users/tomasbeuzen/GitHub/py-pkgs/partypy
collected 0 items / 1 error                                                    

==================================== ERRORS ====================================
____________________ ERROR collecting tests/test_partypy.py ____________________
ImportError while importing test module '~/partypy/tests/test_partypy.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
~/envs/partypy/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_partypy.py:2: in <module>
    from partypy.simulate import simulate_party
E   ModuleNotFoundError: No module named 'partypy.simulate'
=========================== short test summary info ============================
ERROR tests/test_partypy.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 0.04s ===============================
```

Why did we get an error? Well these tests get run against the installed version of our package. We previously modified our package by included some plotting code and separating things out into separate modules, but we haven't re-installed it yet! Let's re-install our package and run our tests:

```{prompt} bash \$ auto
$ poetry install
$ pytest
```

```console
============================= test session starts ==============================
platform darwin -- Python 3.9.2, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
rootdir: /Users/tomasbeuzen/GitHub/py-pkgs/partypy
collected 3 items                                                              

tests/test_partypy.py ...                                                [100%]

============================== 3 passed in 0.26s ===============================
```

We get no error returned to us, indicating that our tests passed! This suggests that the code we wrote is correct (at least to our test specifications)! We'll explore writing tests in more detail in **Chapter 5: {ref}`05:Testing`**. For now, let's put our tests under local and remote version control:

```{prompt} bash \$ auto
$ git add pyproject.toml poetry.lock tests/test_partypy.py
$ git commit -m "add unit tests for partypy"
$ git push
```

## Package documentation

### Rendering documentation locally

For the users of your code (including your future self) it is important to have readable and accessible documentation describing how to install your package, and how to use the code within it. We'll discuss documentation in detail in **Chapter 6: {ref}`06:Documentation`**, but for now, we will demonstrate the practical steps required to build your package's documentation.

Currently, the most commonly used tool in the Python packaging ecosystem for making documentation is `sphinx`. It is a "documentation generator" that translates a set of plain-text source files into various output formats (such as HTML or PDF). The default plain-text format used in `sphinx` is [reStructuredText](https://docutils.sourceforge.io/rst.html) which is an easy-to-use, many-featured markup language (meaning that it defines how the document should appear once it's processed for display). We'll discuss reStructuredText files (with a .rst extension) further in **Chapter 6: {ref}`06:Documentation`**. 

Writing quality documentation can sometimes take longer than writing the code that you're documenting - but the effort is worth it. The aim here is to provide an overview and introduction to generating documentation, which you can build upon, if required, for future projects. When we used `cookiecutter` to set up our Python package structure, a `docs` directory was automatically created for us and filled with some basic documentation files:

```
partypy
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.rst
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── docs
│   ├── make.bat
│   ├── Makefile
│   ├── requirements.txt
│   └── source
│       ├── changelog.rst
│       ├── conduct.rst
│       ├── conf.py
│       ├── contributing.rst
│       ├── index.rst
│       ├── installation.rst
│       └── usage.ipynb
├── LICENSE
├── pyproject.toml
├── README.md
├── src
└── tests
```

There are many ways to structure your documentation and we'll discuss some of these, and the files above, in more detail in  **Chapter 6: {ref}`06:Documentation`**, but the structure shown above is common and intuitive. Briefly:
- `make.bat` and `Makefile` contain commands needed to build our documentation;
- `requirement.txt` contains documentation-specified dependencies that we'll need to define when it comes to hosting our documention online later in this chapter;
- The `source` directory contains the actual files that will make up our documentation. Our documentation will be composed of .rst files and .ipynb (notebook) files which contain Python code that we'll use a `sphinx` extension to help execute, generate output and then render into our documentation.

These template files provided by the `cookiecutter` have already been formatted and pre-populated with your package's information, so even if you aren't familiar with `.rst`, it should be fairly straight-forward to see how to modify these files if desired. At this point, you'll likely want to edit these files to add additional information about your package but for the purpose of this tutorial, we'll generate our documentation as is. To help us render all our individual documentation files into a single, coherent, easy-to-access document, we first need to install some new development dependencies:
- `sphinx`: the core `sphinx` package;
- `nbsphinx` and `ipykernel`: packages required to help us render notebooks into our documentation;
- `sphinx-autoapi`: package that will help us extract docstrings from our code and render them into our documentation; and,
- `sphinx-rtd-theme`: a custom theme for styling the way our documentation will look.

```{prompt} bash \$ auto
$ poetry add --dev sphinx nbsphinx ipykernel sphinx-autoapi sphinx-rtd-theme
```

It is typical to render documentation to `.html` for easy viewing and for sharing online. To do that, run the following:

```{prompt} bash \$ auto
$ make html -C docs
```

```{note}
Note that to use `sphinx` extensions like we are doing here, we usually have to add them to the list `extensions = [...]` in the `conf.py` file in the `docs/source` directory. However, the `cookiecutter` template already took care of this for us.
```

If we now look inside our `docs` directory we see a new directory `_build/html` which contains the rendered `.html` files. We can open `_build/html/index.html` to view our documentation:

```{figure} images/documentation-1.png
---
width: 100%
name: 03-documentation-1
alt: The rendered documentation homepage.
---
The rendered documentation homepage.
```

The `sphinx-autoapi` extension extracted the docstrings within each module and rendered them into our documentation. You can find them by clicking "API Reference". For example, here are the functions and docstrings extracted from the `partypy.plotting` module (note there is currently only one function and docstring in this module):

```{figure} images/documentation-2.png
---
width: 100%
name: 03-documentation-2
alt: Documentation for the partypy plotting module.
---
Documentation for the partypy plotting module.
```

You can easily and efficiently make beautiful and insightful documentation with `sphinx` and its ecosystem of extensions. We'll discuss this more in **Chapter 6: {ref}`06:Documentation`**, but for now let's commit our work to local and remote version control:

```{prompt} bash \$ auto
$ cd ..
$ git add .
$ git commit -m "generated and rendered docs for local viewing"
$ git push
```

### Rendering documentation online

If you intend to share your package with others, it will be useful to make your documentation accessible online. There are various ways to do this, but one of the most common and easiest ways is to link our GitHub repository to [Read the Docs](https://readthedocs.org/) - a service for automating the building, versioning, and hosting of documentation. To do this (at the time of writing):

1. Visit <https://readthedocs.org/> and click on "Sign up";
2. Select "Sign up with GitHub";
3. Click "Import a Project";
4. Click "Import Manually";
5. Fill in the project details by:
    1. Providing your package name (e.g., `partypy`);
    2. The GitHub repository URL (e.g., `https://github.com/TomasBeuzen/partypy`); and,
    3. Specify the default branch as `main`.
6. Click "Next" and then "Build version".

After following the steps above, your documentation should be successfully built by [Read the Docs](https://readthedocs.org/) and you should be able to access it via the "View Docs" button on the build page, or from the link that the `cookiecutter` created for you at the top of the `README.md` file in your GitHub repository. For example, our packge is now available at <https://partypy.readthedocs.io/en/latest/>. This documentation will be automatically re-built each time you push changes to the specified default branch (`main` for us) of your GitHub repository.

```{note}
The `.readthedocs.yml` file that `cookiecutter` created for us in the root directory of our Python package contains some basic configuration settings for how Read the Docs should build our documentation. Importantly, this file tells Read the Docs that our documentation depends on packages specified in `docs/requirements.txt`.
```

## Building and publishing your package

### TestPyPI

Python packages are generally shared via the [PyPI package index](https://pypi.org/). However, we typically do a "dry run" and check that everything works as expected by submitting to [testPyPi](https://test.pypi.org/) first. `poetry` has a `publish` command which we can use to do this, however the default behaviour is to publish to PyPI. So we need to add testPyPI to the list of repositories `poetry` knows about via:

```{prompt} bash \$ auto
$ poetry config repositories.test-pypi https://test.pypi.org/legacy/
```

Before we send our package to testPyPi, we will first build it to source and wheel distributions (the preferred package format on PyPI and something we'll discuss further in **Chapter 4: {ref}`04:Package-structure-and-distribution`**) using `poetry build`:

```{prompt} bash \$ auto
$ poetry build
```

After running this command, you'll notice a new directory in your package called `dist`:

```
partypy
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.rst
├── CONDUCT.rst
├── CONTRIBUTING.rst
├── dist
│   ├── partypy-0.1.0-py3-none-any.whl
│   └── partypy-0.1.0.tar.gz
├── docs
├── LICENSE
├── pyproject.toml
├── README.md
├── src
└── tests
```

Those two new files are the "built" versions of your package which can be easily distributed and installed by others. To publish to testPyPI we can use `poetry publish` (you will be prompted for your testPyPI username and password - sign up if you have not already done so):

```{prompt} bash \$ auto
$ poetry publish -r test-pypi
```

```console
Username: TomasBeuzen
Password: 
Publishing partypy (0.1.0) to test-pypi
 - Uploading partypy-0.1.0-py3-none-any.whl 100%
 - Uploading partypy-0.1.0.tar.gz 100%
```

```{note}
It is recommended to use API tokens when uploading packages to PyPI rather than a username and password. You can read more about that in the [PyPI documentation](https://pypi.org/help/#apitoken).
```

Now we should be able to visit our package on testPyPI (for example, the url for our package is: <https://test.pypi.org/project/partypy/>) and download it from there using `pip` via:

```{prompt} bash \$ auto
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple partypy
```

```{note}
By default `pip install` will search PyPI for the named package. However, we want to search testPyPI because that is where we uploaded our package. The argument `--index-url` points `pip` to the testPyPI index. However, our package `partypy` depends on some packages, like `pandas`, which can't be found on testPyPI (it is hosted only on PyPI). So, we need to use the `--extra-index-url` argument to also point `pip` to PyPI so that it can pull any necessary dependencies of `partypy` from there.
```

### PyPI

If you're happy to officially share your package with the world, you can publish to PyPI by simply typing:

```{prompt} bash \$ auto
poetry publish
```

Your package will then be available on PyPI (e.g., <https://pypi.org/project/partypy/>) and can be installed with `pip`:

```{prompt} bash \$ auto
pip install partypy
```

## Summary and next steps

This chapter provided a practical overview of the key steps required to generate a fully-featured Python package. In the the following chapters we'll explore and expand upon each of these steps in more detail. For those intending to share and collaborate on their package with others, a key workflow we have yet to discuss is CI/CD - that is, setting up automated pipelines for running tests, building documentation, and building and deploying your package. Such pipelines are an essential part of open-source software and allow you to efficiently collaborate on software with others while maintaining package standards and functionality.

Before moving onto the next chapter, let's summarise the steps we took for developing a Python package in this chapter:

1. Create package structure using a `cookiecutter` template:
    ```{prompt} bash \$ auto
    $ cookiecutter https://github.com/UBC-MDS/cookiecutter-ubc-mds.git
    ```
2. Create and activate a virtual environment using `conda`:
    ```{prompt} bash \$ auto
    $ conda create --name <your-env-name> python=3.9 -y
    $ conda activate <your-env-name>
    ```
3. Add package dependencies:
    ```{prompt} bash \$ auto
    $ poetry add <packages>
    ```
4. Write package code in the `src/` directory.
5. (Optional) Write tests in `tests/` directory. Add `pytest` as a development dependency, install package, and run tests:
    ```{prompt} bash \$ auto
    $ poetry add --dev pytest
    $ poetry install
    $ pytest
    ```
6. (Optional) Create documentation source files and render locally:
    ```{prompt} bash \$ auto
    $ cd docs
    $ make html
    ```
7. (Optional) Host documentation online with [Read the Docs](https://readthedocs.org/).
8. (Optional) Build package for distribution and publish to [testPyPi](https://test.pypi.org/):
    ```{prompt} bash \$ auto
    $ poetry config repositories.test-pypi https://test.pypi.org/legacy/
    $ poetry build
    $ poetry publish -r test-pypi
    ```
9. (Optional) Install package from testPyPi to ensure everything is working as expected:
    ```{prompt} bash \$ auto
    pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple partypy
    ```
10. (Optional) Publish to [PyPi](https://test.pypi.org/):
    ```{prompt} bash \$ auto
    poetry publish
    ```
11. Your package can now be installed by anyone:
    ```{prompt} bash \$ auto
    pip install <your-package-name>
    ```
    
The above workflow uses a particular suite of tools (e.g., `conda`, `poetry`, `sphinx`, etc.) to develop a Python package. While there are other tools that can be used to help build Python packages, the aim of this book is to give an intuitive and simple introduction to Python packaging using modern, popular tools, and this has influenced our selection of tools in this chapter and book. However, the concepts and workflow discussed here remain relevant to the Python packaging ecosystem, regardless of the tools you use to develop your Python packages.