In [70]:
import numpy as np
import pandas as pd

# Developing, packaging, and distributing a python package

In this workshop, we will go through the life cycle of developing, packaging, and distribuing a python module. In so doing, we will cover multiple advanced python topics. Some of these, you will have seen before. But, here we want to emphasise elements of good software development practise.

* Functions
* Classes
* Data I/O
* Modules
* Creating a package
* Documentation
* Distributing a package
* Testing

## Strings

Okay, you have seen strings before, but let's check you know about f-strings. If we want to print a message with a value inside, we can do that using Python3's f-strings

In [14]:
pi = 3.14159265359
print(f"pi is equal to {pi}")

pi is equal to 3.1415


Note the `f` in front of the string - this makes it an f-string. You can also define a format, say you wanted to only print 2 decimal places

In [15]:
print(f"pi is equal to {pi:0.2f}")

pi is equal to 3.14


Or, if you have a big number and want to you exponent notation

In [18]:
big_number = 18947598428945.945
print(f"My big number is {big_number:0.4g}")

My big number is 1.895e+13


## Dictionaries

These are useful ways to store data which is indexed in a meaningful way. You can create them in two different ways:

In [67]:
sun = dict(
    mass=2e30, #kg
    radius=7e9 #m
)
earth = {
    "mass": 6e24, #kg
    "radius": 6e9 #m
}

sun["mass"] > earth["mass"]

True

## Functions and Errors

By now, you will no doubt have seen functions in python, but let's give an example anyway:

In [3]:
def my_function(x):
    return x.split("@")[0]

This is a function, but it is a badly written function. As a user, I have no idea what it does, what the inputs are, or what it should return! Let's fix that by adding a `docstring`:

In [22]:
def get_username(email_address):
    """ Returns the username from an email address
     
    Parameters
    ----------
    email_address: string
        The users email address, e.g. user123@rhul.ac.uk
        
    Returns
    -------
    username: str
        The users username
        
    Examples
    --------
    >>> get_username("user123@rhul.ac.uk")
    "user123"
    
    """
    return email_address.split("@")[0]

Okay, that is better, I changed the function name and the docstring now tells me to do with the function (even giving a nice example!). But, the program itself is still a bit weird..what happens if the user gives it a `float` instead?

In [23]:
get_username(123)

AttributeError: 'int' object has no attribute 'split'

Oh, that isn't very useful. Let's improve things by telling the user when they do it wrong

In [24]:
def get_username(email_address):
    """ Returns the username from an email address
     
    Parameters
    ----------
    email_address: string
        The users email address, e.g. user123@rhul.ac.uk
        
    Returns
    -------
    username: str
        The users username
        
    Examples
    --------
    >>> get_username("user123@rhul.ac.uk")
    "user123"
    
    """
    if isinstance(email_address, str) and "@" in email_address:
        return email_address.split("@")[0]
    else:
        raise ValueError(f"The input {email_address} is not a valid email address")

Okay, let's check that works as expected

In [21]:
get_username("user123@rhul.ac.uk")

'user123'

In [25]:
get_username(123)

ValueError: The input 123 is not a valid email address

Okay, that is much better. 

Perhaps the most fun one can have with `functions` is recursive functions, that is a function which calls itself. For example, imagine we have a nested data structure of the names of students in a school. It is given in a dictionary of dictionaries where the top level is the year group, then the next level is the class name (they use animal names for each class). It might look something like

In [57]:
roster = dict(
    year_1=dict(
        tigers=["greg", "surabhi", "jamil", "nicolo"],
        elephants=["kayan", "casper", "emily"]
    ),
    year_2=dict(
        gazelle=["robert", "woody", "charlotte"],
        elephants=["robyn", "rory"]
    )
)

How can we count the total number of students? We can use a recursive function which calls itself!

In [60]:
def count_students(inputs, total=0):
    if isinstance(inputs, dict):
        for key in inputs:
            total = count_students(inputs[key], total)
    elif isinstance(inputs, list):
        total += len(inputs)
    return total
        
count_students(roster)

12

The nice thing is, we don't need to know in advance how complicated the list is. If teachers add sub-classes, then it will still work.

<div class="alert alert-block alert-danger">
<b>Challenge:</b> Write a recursive function to calculate the factorial function $f(N) = N\times(N-1)\times(N-2)\ldots1$
</div>

## Classes

Classes are a powerful way way to tie together data, and methods which act on that data. They also enable *inheritance*. That is the idea that we can build classes from other classes, inheriting there methods. As an example, here is a set of classes which add dimensional units to python floats.

In [45]:
class Unit(float):
    def __init__(self, value, si_base):
        """ Generic base class for units
        
        This base class should not be used directly, but all Unit classes should inherit from it
        
        Parameters
        ----------
        value: float
            The value of the float
        si_base: str
            The si base unit of the quantity
        """
        self.value = value
        self.si_base = si_base
        
    def __str__(self):
        """ When str() is called on the instance, return a string with units attached """
        return f"{self.value} [{self.units}]"
    
    @property
    def units(self):
        """ A units property, returns the si_base unit string
        
        Example
        -------
        >>> x = Unit(value=2, si_base="m")
        >>> x.units
        "m"
        """
        return self.si_base
    
    def __add__(u1, u2):
        """ Method to add to Unit instances together
        
        Note: this does not check that the units are the same!
        """
        return u1.__class__(u1.value + u2.value)
    
    def __sub__(u1, u2):
        """ Method to subtract Unit instances together
        
        Note: this does not check that the units are the same!
        """
        return u1.__class__(u1.value - u2.value)
    
    def __mul__(u1, u2):
        """ Method to multiple Unit instances together"""
        derived_si_base = f"{u1.si_base}*{u2.si_base}"
        return DerivedUnit(value=u1.value * u2.value, si_base=derived_si_base)
    
    def __truediv__(u1, u2):
        """ Method to divide Unit instances"""
        derived_si_base = f"{u1.si_base}/{u2.si_base}"
        return DerivedUnit(value=u1.value * u2.value, si_base=derived_si_base)
    

class DerivedUnit(Unit):
    def __init__(self, value, si_base):
        """ A class for derived units, e.g. the product/division of two Unit classes """
        super().__init__(value, si_base)
        
class Distance(Unit):
    def __init__(self, value):
        """ SI units for distance """
        super().__init__(value, si_base="m")

        
class Time(Unit):
    def __init__(self, value):
        """ SI units for distance """
        super().__init__(value, si_base="s")      

Okay, there is a lot going on in that cell. Skim over the documentation then take a look at the examples below.

We can define a distance and get a representation including units

In [46]:
x = Distance(10) 
x_as_a_string = str(x)
x_as_a_string

'10 [m]'

This conversion gets done automatically when we `print` the variable as well

In [47]:
print(x)

10 [m]


We can also combine quantities together

In [48]:
x = Distance(10) 
t = Time(3)

print(x / t)

30 [m/s]


Note this is a fairly limited implementation. The `Unit` classes above have oone cruical issue. I can add distances and times together! 

In [53]:
print(x + t)

13 [m]


<div class="alert alert-block alert-danger">
<b>Challenge:</b> Add a ValueError to the `Unit` class so that an error is raised when you try to add units together which don't have the same units.
</div>

There are many nice packages which implement units (for example [units](https://pypi.org/project/units/)). These include all sorts of clever features. But, it is still fun to implement things from scratch to see how they work

## Data

In computational physics, we frequently need to store data. This might be the output of a simulation, or  data from a telescope/collider. Inside of a `python` program, it is often a good idea to collect data into a class. For example, let's say we have a time series of data recorded from a measurement device, we could store it in a class like this

In [83]:
class TimeSeries(object):
    def __init__(self, times, data):
        """ An object to store data 
        
        Parameters
        ----------
        times: array
            The array of times
        data: array
            The array of data
        """
        self.times = times
        self.data = data
        
        
x = np.linspace(0, 1, 5)
y = np.sin(x)
timeseries = TimeSeries(x, y)

<div class="alert alert-block alert-danger">
<b>Challenge:</b> Add `plot` method to the TimeSeries class which plots the data
</div>


Now, you may want to store the times series data to disk for later analysis, or perhaps to publish it. There are many ways to do this. You should choose the best method for the problem. Here we give a quick overview of some common read/write formats, implemented as functions which accept a `TimeSeries` instance

In [84]:
def use_print_to_write_a_csv(timeseries, filename):
    with open(filename, "w+") as file: 
        for t, d in zip(timeseries.times, timeseries.data):
            print(f"{t},{d}", file=file)
        
use_print_to_write_a_csv(timeseries, "use_print_to_write_a_csv_example.csv")

In [85]:
! cat use_print_to_write_a_csv_example.csv

0.0,0.0
0.25,0.24740395925452294
0.5,0.479425538604203
0.75,0.6816387600233341
1.0,0.8414709848078965


## Modules

## Packages

## Versioning

## Documentation

## Distributing a package

## Testing

## Distributed development

## Continuous Integration

Aut