# Developing Python Packages

## Introduction

These are my notes for DataCamp's course [_Developing Python Packages_](https://www.datacamp.com/courses/developing-python-packages).

This course is presented by James Fulton, Climate Informatics Researcher. Collaborators are Amy Peterson and Maggie Matsui.

Prerequisites:

- Introduction to Shell
- [Writing Functions in Python](../Writing%20Functions%20in%20Python/Writing%20Functions%20in%20Python.ipynb)

This course is part of these tracks:

- Data Scientist Professional with Python
- Python Programmer

There are no downloadable data sets for this course.

## Versions

This notebook was created using Python 3.11.2.


## Imports
Imports are collected here for convenience and clarity.

In [None]:
import numpy as np
import scipy

import textanalysis.textanalysis

## From Loose Code to Local Package

### Starting a Package

#### Why Build a Package?

Build a package to:
- make your code easier to use
- avoid copying and pasting code
- keep your functions up to date
- give your code to others

#### Course Content

This course involves building a full package. The course covers:

- file layout
- structuring imports
- making the package installable
- adding licenses and READMEs
- style and unit tests for a high quality package
- registering and publishing your package to PyPI (the Python Package Index)
- using package templates

#### Scripts, Modules, and Packages

| Term | Description |
| :--- | :--- |
| sript | a Python file which is run like `python myscript.py`and which is design to do one set of tasks |
| package | a directory of Python code files to be imported (e.g., `numpy`); all of the code is related and works together |
| subpackage | a smaller package inside a package (e.g., `numpy.random`, `numpy.linalg` |
| module | a Python file inside a package which stores package code; each module stores some of the package code |
| library | either a package or a collection of packages (e.g., the Python standard library, which includes packages such as `math`, `os`, or `datetime`|

#### Directory Tree of a Package

This is an example of a directory as used in this course:

    mysimplepackage/
    |-- simplemodule.py
    |-- __init__.py

- This directory, `mysimplepackage`, is an example of a simplest Python package
- `simplemodule.py` contains all of the package code
- `__init__.py` marks this directory as a Python package

Initially, the `__init__.py` file is completely empty, but later in the course this file will be used to structure the package imports.

#### Subpackages

The directory tree for a package contains subdirectories, as in this example, where `preprocessing` and `regression` are subdirectories of `mysklearn`:

    mysklearn/
    |-- __init__.py
    |-- preprocessing
    |   |-- __init__.py
    |   |-- normalize.py
    |   |-- standardize.py
    |-- regression
    |   |-- __init__.py
    |   |-- regression.py
    |-- utils.py

Each subpackage has its own `__init__.py` file. Use subpackages to organize your code, placing related functions and classes in the same module, and related modules in the same subpackage.

#### Modules, Packages, and Subpackages (Exercise)

Name the different parts of this package directory tree:

    directory1/
    |-- __init__.py
    |-- directory2
    |   |-- __init__.py
    |   |-- file1.py
    |-- file2.py

- Module
    - file1.py
    - file2.py
- Package
    - directory1
- Subpackage
    - directory2

#### From Script to Package (Exercise)

Start with this code and convert it to a generalized function you can use on any text file for any list of search words. This will be the first function in a new library (module or package). This code comes from the course [Writing Functions in Python](../Writing%20Functions%20in%20Python/Writing%20Functions%20in%20Python.ipynb).
```python
# Open the text file
with open('alice.txt') as file:
    text = file.read()

n = 0
for word in text.split():
    # Count the number of times the words in the list appear
    if word.lower() in ['cat', 'cats']:
        n += 1

print('Lewis Carroll uses the word "cat" {} times'.format(n))
```
- Step 1: Create a new directory called textanalysis for your package.
- Step 2: Create`__init__.py` and `textanalysis.py` modules inside `textanalysis`.
- Step 3: Copy the code from `myscript.py` into `textanalysis.py`.
- Step 4: Modify `textanalysis.py` to create the function `count_words(filepath, words_list)`, which opens the text file `filepath` and returns the number of times the words in `words_list` appear.

The file `textanalysis/__init__.py` was empty.

This the code for file `textanalysis/textanalysis.py`:

```python
def count_words(filepath, words_list):
    with open(filepath) as file:
        text = file.read()
    
    n = 0
    for word in text.split():
        # Count the number of times the words in the list appear.
        if word.lower() in words_list:
            n += 1
    return n
```

#### Putting Your Package to Work (Exercise)

Create a script `newscript.py` that uses the textanalysis package. I couldn't find a way to get the contents of the file `hotel-reviews.txt` since commands didn't work in the course's terminal window when I use the Safari web browser, so I was unable to recreate this example.

```python
# File myscript.py
from textanalysis.textanalysis import count_words

# Count the number of positive words
nb_positive_words = count_words("hotel-reviews.txt", ["good", "great"])

# Count the number of negative words
nb_negative_words = count_words("hotel-reviews.txt", ["bad", "awful"])

print("{} positive words.".format(nb_positive_words))
print("{} negative words.".format(nb_negative_words))
```

The result was:

    $ python3 newscript.py
    18816 positive words.
    1706 negative words.

For the course, this was the directory tree at this point:

    project/
    |-- hotel-reviews.txt
    |-- myscript.py
    |-- textanalysis/
    |   |-- __init__.py
    |   |-- textanalysis.py

#### Testing the Package (Extra)

I created a script, count_cat.py, to use for testing the module.

```python
import textanalysis.textanalysis

print("Counting the word 'cat' in the 'alice.txt' file...")
count = textanalysis.textanalysis.count_words("alice.txt", ["cat", "cats"])
print("Count:", count)
```

I copied the `alice.txt` file from the ../Writing\ Python\ Functions/ directory.

```shell
cp ../Writing\ Python\ Functions/alice.txt .
```

I executed the `count_cat.py` script.

    $ python count_cat.py
    Counting the word 'cat' in the 'alice.txt' file...
    Count: 24

This was my directory tree so far:

    Developing\ Python\ Packages/
    |-- alice.txt
    |-- count_cat.py
    |-- textanalysis/
    |   |-- __init__.py
    |   |-- textanalysis.py

In [None]:
# Execute the code in this notebook.
print("Counting the word 'cat' in the 'alice.txt' file...")
count = textanalysis.textanalysis.count_words("alice.txt", ["cat", "cats"])
print("Count:", count)

### Documentation

#### Why Include Documentation?

Writing documentation helps your users use your code.

Document each
- function
- class
- class method

Users can access the documentation for your package using the `help` function. Here is an example of obtaining documentation for the numpy package:

    $ python
    Python 3.11.2 (main, Mar 24 2023, 09:03:37) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    
    >>> import numpy as np
    
    >>> help(np)
    NAME
        numpy

    DESCRIPTION
        NumPy
        =====

        Provides
          1. An array object of arbitrary homogeneous items
    ...

    >>> help(np.ndarray)
    class ndarray(builtins.object)
     |  ndarray(shape, dtype=float, buffer=None, offset=0,
     |          strides=None, order=None)
     |
     |  An array object represents a multidimensional, homogeneous array
     |  of fixed-size items.  An associated data-type object describes the
     |  format of each element in the array (its byte-order, how many bytes it
     |  occupies in memory, whether it is an integer, a floating point number,
     |  or something else, etc.)
    ...
    
    >>> help(np.sum)
    sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
        Sum of array elements over a given axis.

        Parameters
        ----------
        a : array_like
            Elements to sum.
        axis : None or int or tuple of ints, optional
            Axis or axes along which a sum is performed.  The default,
    ...
    
    >>> x = np.array([1, 2, 3, 4])
    
    >>> help(x.mean)
    Help on built-in function mean:

    mean(...) method of numpy.ndarray instance
        a.mean(axis=None, dtype=None, out=None, keepdims=False, *, where=True)

        Returns the average of the array elements along given axis.

        Refer to `numpy.mean` for full documentation.
    ...

#### Function documentation.

The first line of the documentation explains what the function does as an imperative.

```python
def count_words(filepath, words_list):
    """
    Count the total number of times these words appear.
    
    The count is performed on a text file at the given location.
    
    [explain what filepath and words_list are]
    
    [what is returned]
    """
```

#### Documentation Style

There are several standard documentation styles for Python, including Google style, NumPy style, reStructured text style, and Epytext style. (Personally, I prefer NumPy style.) Be consistent with which style you use.

This course uses NumPy style, which is more verbose but good for documenting complex functions. NumPy style is used in scientific Python packages such as `numpy`, `scipy`, `pandas`, `sklearn`, `matplotlib`, `dask`, etc.

#### NumPy Documentation Style

NumPy documentation style is described here: https://numpydoc.readthedocs.io/en/latest/format.html.

Here is an example from the scipy package.

    >>> import scipy
    
    >>> help(scipy.percentile)
    Help on function percentile in module numpy:

    percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, interpolation=None)
        Compute the q-th percentile of the data along the specified axis.

        Returns the q-th percentile(s) of the array elements.

        Parameters
        ----------
        a : array_like
            Input array or object that can be converted to an array.
        q : array_like of float
            Percentile or sequence of percentiles to compute, which must be between
            0 and 100 inclusive.
        axis : {int, tuple of int, None}, optional
            Axis or axes along which the percentiles are computed. The
            default is to compute the percentile(s) along a flattened
            version of the array.
    ...
        Returns
        -------
        percentile : scalar or ndarray
            If `q` is a single percentile and `axis=None`, then the result
            is a scalar. If multiple percentiles are given, first axis of
            the result corresponds to the percentiles. The other axes are
    ...

NumPy style uses section headings with hyphens as underscores. For example, parameters are listed as above, with the parameter name, a space, a colon, a space, and the data type of the parameter, where `array_like` means a `numpy.ndarray` object, a list, or a nested list. Below this, on an indented line, provide a description of the parameter. If there are multiple options for the type of a parameter, as shown above for the `axis` parameter, or a certain set of valid values, you should list them all. Indicate whether a parameter is optional.

The next section lists the returned values and their types in order. This is formatted just like the parameters list.

You can provide additional sections for `Raises`, `See Also`, `Notes`, `References`, `Examples`, etc.

#### Documentation Templates and Style Translation

`pyment` is a tool that can be used to generate docstrings and change the styles of docstrings. See https://github.com/dadadel/pyment for the source code and examples of how to use the tool. `pyment` is run from the terminal and generates many styles.

For example, to create docstrings for our `textanalysis.py` module (see above), the command is:

```shell
pyment -w -o numpydoc textanalysis/textanalysis.py
```

- `-w` : overwrite file
- `-o numpydoc` : output the documentation in NumPy style

#### Trying Out Pyment (Extra)

I installed `pyment` in the virtual environment used for this repository. This installed Pyment-0.4.0.dev0. (The repository owner and package maintainer has not worked on this package for two years.)

```shell
cd ~/src/conradhalling/datacamp
source venv/bin/activate
pip install git+https://github.com/dadadel/pyment.git
```

Since I had committed the `textanalysis` package to my Git repository, I could restore the `textanalysis/textanalysis.py` file if necessary.

I executed `pyment` on the `textanalysis/textanalysis.py` file.

```shell
cd ~/src/conradhalling/datacamp/Developing\ Python\ Packages/
pyment -w -o numpydoc textanalysis/textanalysis.py
```

When I wanted to restore the original file, I used this command:

```shell
git restore textanalysis/textanalysis.py
```

This was the `textanalysis/textanalysis.py` file before:

```python
def count_words(filepath, words_list):
    with open(filepath) as file:
        text = file.read()

    n = 0
    for word in text.split():
        # Count the number of times the words in the list appear.
        if word.lower() in words_list:
            n += 1
    return n
```

This was the file after:

```python
def count_words(filepath, words_list):
    """

    Parameters
    ----------
    filepath :

    words_list :


    Returns
    -------

    """
    with open(filepath) as file:
        text = file.read()

    n = 0
    for word in text.split():
        # Count the number of times the words in the list appear.
        if word.lower() in words_list:
            n += 1
    return n
```

The file was ready for me to add the missing information. This was the file when I was finished.

```python
def count_words(filepath, words_list):
    """
    Count the number of words from word_list in the specified file.

    Parameters
    ----------
    filepath : str
        input file path
        
    words_list : list of str
        list of words to count in the file

    Returns
    -------
        total count of all words found in the file
    """
    with open(filepath) as file:
        text = file.read()

    n = 0
    for word in text.split():
        # Count the number of times the words in the list appear.
        if word.lower() in words_list:
            n += 1
    return n
```

#### Package, Subpackage, and Module Documentation

Package documentation is placed at the top of the package's `__init__.py` file. In the course's example, the file was `mysklearn/__init__.py`.

```python
"""
Linear Regression for Python
============================

mysklearn is a complete package for implementing
linear regression in python.
"""
```

Subpackage documentation is placed at the top of the subpackage's `__init__.py` file. In the course's example, the file was `mysklearn/preprocessing/__init__.py`.

```python
"""
A subpackage for standard preprocessing operations.
"""
```

Module documentation is written at the top of the module's file. In the course's example, the file was `mysklearn/preprocessing/normalize.py`.

```python
"""
A module for normalizing data.
"""
```

#### Writing Function Documentation with pyment (Exercise)

In the course's terminal window, use `pyment` to create NumPy style documentation for the file`impyrial/length/core.py`.

The command was:

```shell
pyment -o numpydoc -w impyrial/length/core.py
```

This was the file before conversion.

```python
NCHES_PER_FOOT = 12.0  # 12 inches in a foot
INCHES_PER_YARD = INCHES_PER_FOOT * 3.0  # 3 feet in a yard

UNITS = ("in", "ft", "yd")


def inches_to_feet(x, reverse=False):
    if reverse:
        return x * INCHES_PER_FOOT
    else:
        return x / INCHES_PER_FOOT
```

This was the file after conversion:

```python
INCHES_PER_FOOT = 12.0  # 12 inches in a foot
INCHES_PER_YARD = INCHES_PER_FOOT * 3.0  # 3 feet in a yard

UNITS = ("in", "ft", "yd")


def inches_to_feet(x, reverse=False):
    """

    Parameters
    ----------
    x :
        
    reverse :
         (Default value = False)

    Returns
    -------

    """
    if reverse:
        return x * INCHES_PER_FOOT
    else:
        return x / INCHES_PER_FOOT
```

#### Writing Function Documentation with pyment II (Exercise)

Complete the documentation after running `pyment` to create the documentation template.

This was the file after editing.

```python
INCHES_PER_FOOT = 12.0  # 12 inches in a foot
INCHES_PER_YARD = INCHES_PER_FOOT * 3.0  # 3 feet in a yard

UNITS = ("in", "ft", "yd")


def inches_to_feet(x, reverse=False):
    """
    Convert lengths between inches and feet.

    Parameters
    ----------
    x : numpy.ndarray
        Lengths in feet

    reverse : bool, optional
        If true this function coverts from feet to inches
        instead of the default behavior of inches to feet.
        (Default value = False)

    Returns
    -------
        numpy.ndarray
    """
    if reverse:
        return x * INCHES_PER_FOOT
    else:
        return x / INCHES_PER_FOOT
```

#### Package and Module Documentation

Add package documentation to `impyrial/__init__.py`.

```python
"""
impyrial
========

A package for converting between imperial
measurements of length and weight.
"""
```

Add subpackage documentation to `impyrial/length/__init__.py`.

```python
"""
impyrial.length
===============

Length conversion between imperial units.
"""
```

Add module documentation to `impyrial/length/core.py`.

```python
"""
Conversions between inches and
larger imperial length units.
"""
```

### Structuring Imports

#### Without Package Imports

So far, with the example `mysklearn` package, since the package has no internal imports yet, you have to explicitly import the subpackage module. This is the package tree for the example:

    mysklearn/
    |-- __init__.py
    |-- peprocessing
    |   |-- __init__.py
    |   |-- normalize.py
    |   |-- standardize.py
    |-- regression
    |   |-- __init__.py
    |   |-- regression.py
    |-- utils.py


Here's how you can import module `mysklearn/preprocessing/normalize':

```python
import mysklearn.preprocessing.normalize
```

#### Importing Subpackages into Packages

In the package, we want to import the modules into the subpackages and the subpackage into the packages. The goal is to make it possible to gain access to the package, subpackages, and modules of subpackages using a single import:

```pyrhon
import mysklearn
```

In `mysklearn/__init__.py`, add the following code:

```python
from mysklearn import preprocessing # subpackage
from mysklearn import regression    # subpackage
from mysklearn import utils         # module
```

In `mysklearn/subprocessing/__init__.py`, add the following code to import the modules:

```python
from mysklearn.subprocessing import normalize   # module
from mysklearn.subprocessing import standardize # module
```

In `mysklearn/regression/__init__.py`, add the following code to import the module:

```python
from mysklearn.regression import regression # module
```

Now importing the package provides access to the objects in the modules, but the names can grow long.

```python
help(mysklearn.preprocessing.normalize.normalize_data)
```

#### Import Function into Subpackage

You can avoid the long paths to module functions by importing the function into the subpackage. For example, in `mysklearn/preprocessing/__init__.py`, you can use this import:

```python
from mysklearn.preprocessing.normalize import normalize_data
```

The relative import would be:

```python
from .normalize import normalize_data
```

The function is now available as `mysklearn.preprocessing.normalize_data`.

#### Importing Between Sibling Modules

Suppose you have some functions in `mysklearn.preprocessing.funcs` that are used by `mysklearn.preprocessing.normalize`.

    mysklearn/
    |-- __init__.py
    |-- peprocessing
    |   |-- __init__.py
    |   |-- normalize.py
    |   |-- funcs.py
    |   |-- standardize.py
    |-- regression
    |   |-- __init__.py
    |   |-- regression.py
    |-- utils.py

Modify `mysklearn/preprocessing/normalize.py` to add:

```python
from mysklearn.preprocessing.funcs import (mymax, mymin)
```

The relative import would be:

```python
from .funcs import mymax, mymin
```

#### Importing Between Modules Far Apart

Suppose a custom exception, `MyException`, is defined in `mysklearn/utils.py`, and this exception is used by many modules in the `mysklearn` package. In each module, add this import:

```python
from mysklearn.utils import MyException
```

The relative import from `normalize.py`, `standardize.py`, and `regression.py` would be:

```python
from ..utils import MyException
```

#### Relative Import Cheat Sheet

From current directory, import `module`:

```python
from . import module
```

From one directory up, import `module`:

```dpython
from .. import module
```

From module in current directory, import `function`

```python
from .module import function
```

From subpackage one directory up, from `module` in that subpackage, import `function`:

```python
from ..subpackage.module import function
```

#### Sibling Imports (Exercise)

I typed in the code files for the following tree and made the modifications.

    impyrial
    |-- __init__.py
    |-- utils.py
    |-- length
    |   |-- __init__.py
    |   |-- api.py
    |   |-- core.py

In [None]:
# This is my version of the example_script.py code.
import impyrial.length.api

result = impyrial.length.api.convert_unit(10, "in", "yd")
print(result)

#### Importing from Parents (Exercise)

Import the `check_units` function from the `utils.py` module at the top of your package. A `utils` module is usually used for small, often unrelated, pieces of code each of which which aren't enough to justify their own module.

Modify `example_script.py` to call `convert_unit` with an invalid unit "lb".

In [None]:
# This is the equivalent of the revised example_script.py file.
# I have chosen to show complete paths for the functions I call.
import impyrial.length.api

print("The following line should run:")
result1 = impyrial.length.api.convert_unit(10, "in", "yd")
print(result1)

print("The following line should cause an error:")
try:
    result2 = impyrial.length.api.convert_unit(10, "lb", "yd")
    print(result2)
except ValueError as exc:
    print("Caught exception:", exc)

#### Exposing Functions to Users (Exercise)

Currently, the only function you want to make easily available to users is the `convert_unit` function inside the module `imperial.length.api`.

Write import statements so that the package can be imported and used like this:

```python
import impyrial

result = impyrial.length.convert_unit(6, "ft", "yd")
```

This required modifying `impyrial/length/__init__.py` to import the `convert_unit` function from the `imperial.length.api` module using a relative import.

```python
from .api import convert_unit
```

This also required modifying `impyrial/__init__.py` to import the `impyrial.length` subpackage, using a relative import.

```python
from . import length
```

In [None]:
# This is the equivalent of the example_script.py from the course.
import impyrial

result = impyrial.length.convert_unit(10, 'in', 'yd')
print(result)

You've now got a fully functional package! The first import statement imported `convert_unit` into `length`, and the second one imported `length` into `impyrial`. Now users can access key functions of the package easily.

## Install Your Package from Anywhere

## Increasing Your Package Quality

## Rapid Package Development