# Good practice for coding

Now we know how to control the development of our code we need to think about how to write the code itself.

The main goal of this section it to help you to write code that is readable and maintainable.  There is a well known addage which is: "The older code gets the more it starts to stink".  Wher we start projects it is always tempting to just get on with it.  Initally we know what we are doing and while we are starting out the code is fairly short and usually only does one or two things.  The problems only emerge a year or two down the line.  By now you will have modified your routines several times and added multiple new features.  The code will be long, messy and will likely contain redundant routines you are keeping "just in case".  Most likely there will be multiple sections that only work properly if you get the input just right.  Now editing the code has become a nightmare.  The code is too large to keep in your head so you can't quite remember exactly how every part works and which parts depend on each other.  Hence when you edit it you tend to introduce bugs that are then hard to track down.  Also you may wish to share the code with a collaborator so they can use for some new project but you have to spend a significant portion of everyday trying to explain how it works and what bits they can edit safely.  The following ideas are allabout stopping this happening.  

There are four main things we will focus on 
- Documentation
- Modularity
- Prototyping
- Unit Testing

The first two ensure readablity of your code so that others (and yourself) can follow what it does.  One of the great strengths of python is that its syntax closely follows normal English so it is easily readable, your code should be the same.  The last it about automating bug checking so when you change something you can easily check the code still does everything it should.

## Documentation

The first, and most important, topic in this section is documentation.  When you write code you should always add comments that explain exactly what it is that you are doing and why.  The goal for good documentation should be this:

<b>"If I die tomorrow, no-one should care"</b>

In essence you code should be so easy to follow anyone could read and understand it without any help from you.  Lets look at functions first.

Python has something called the 'Docstring' which is a comment immediatly after the `def` line and is what is returned with `functionname?`.  This should be a short description of the function and how to use it.  Here is a popular format:

In [None]:
import math

def harmonic_log(n,x):
    """
    Harmonic Logarithm
    
    Calculate and return the order 1 harmonic logarithm
    
    Parameters
    ----------
    n: int
        the degree, must be positive
    x: float
        must be positive
        
    Returns
    ----------
    float
        the harmonic logarithm of x with degree n
    """
    hn = sum([1.0/i for i in range(1,n)])
    return x**n*(math.log(x) - hn)

harmonic_log?  

You should always do this when creating your own functions.  Use complete sentences and describe the function, what the input should be, and what it returns. Docstrings should exist for both functions and modules. For packages they can be included in the `__init__.py` file.  The module/package docstring should describe its purpose and should list all classes, exceptions and functions that are exported from it. Here is the docstring in `__init__.py` for numpy as an example:

In [None]:
"""
NumPy
=====

Provides
  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation

How to use the documentation
----------------------------
Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
`the NumPy homepage <http://www.scipy.org>`_.

We recommend exploring the docstrings using
`IPython <http://ipython.scipy.org>`_, an advanced Python shell with
TAB-completion and introspection capabilities.  See below for further
instructions.

The docstring examples assume that `numpy` has been imported as `np`::

  >>> import numpy as np

Code snippets are indicated by three greater-than signs::

  >>> x = 42
  >>> x = x + 1

Use the built-in ``help`` function to view a function's docstring::

  >>> help(np.sort)
  ... # doctest: +SKIP

For some objects, ``np.info(obj)`` may provide additional help.  This is
particularly true if you see the line "Help on ufunc object:" at the top
of the help() page.  Ufuncs are implemented in C, not Python, for speed.
The native Python help() does not know how to view their help, but our
np.info() function does.

To search for documents containing a keyword, do::

  >>> np.lookfor('keyword')
  ... # doctest: +SKIP

General-purpose documents like a glossary and help on the basic concepts
of numpy are available under the ``doc`` sub-module::

  >>> from numpy import doc
  >>> help(doc)
  ... # doctest: +SKIP

Available subpackages
---------------------
doc
    Topical documentation on broadcasting, indexing, etc.
lib
    Basic functions used by several sub-packages.
random
    Core Random Tools
linalg
    Core Linear Algebra Tools
fft
    Core FFT routines
polynomial
    Polynomial tools
testing
    NumPy testing tools
f2py
    Fortran to Python Interface Generator.
distutils
    Enhancements to distutils with support for
    Fortran compilers support and more.

Utilities
---------
test
    Run numpy unittests
show_config
    Show numpy build configuration
dual
    Overwrite certain functions with high-performance Scipy tools
matlib
    Make everything matrices.
__version__
    NumPy version string

Viewing documentation using IPython
-----------------------------------
Start IPython with the NumPy profile (``ipython -p numpy``), which will
import `numpy` under the alias `np`.  Then, use the ``cpaste`` command to
paste examples into the shell.  To see which functions are available in
`numpy`, type ``np.<TAB>`` (where ``<TAB>`` refers to the TAB key), or use
``np.*cos*?<ENTER>`` (where ``<ENTER>`` refers to the ENTER key) to narrow
down the list.  To view the docstring for a function, use
``np.cos?<ENTER>`` (to view the docstring) and ``np.cos??<ENTER>`` (to view
the source code).

Copies vs. in-place operation
-----------------------------
Most of the functions in `numpy` return a copy of the array argument
(e.g., `np.sort`).  In-place versions of these functions are often
available as array methods, i.e. ``x = np.array([1,2,3]); x.sort()``.
Exceptions to this rule are documented.
"""

Next you should also include comments to describe what each code section does, again use complete sentences.  Comments begin with a `#` and in general should be on the preceding line. For the above function we could have:

In [None]:
import math

def harmonic_log(n,x):
    """
    Harmonic Logarithm
    
    Calculate and return the order 1 harmonic logarithm
    
    Parameters
    ----------
    n: int
        the degree, must be positive
    x: float
        must be positive
        
    Returns
    ----------
    float
        the harmonic logarithm of x with degree n
    """
    
    # Calculate the harmonic number H_n
    hn = sum([1.0/i for i in range(1,n)])
    
    #return the harmonic logarithm 
    return x**n*(math.log(x) - hn)

Docstrings should explain how to use funtions and modules, and comments should explain what the code does. Don't be afraid to be verbose, your goal it to make it as simple as possible for someone unfamiliar with the code to understand it at first reading.


## Modularity

The next thing you should do to make your code readable is to break it up into discreate blocks each of which excecutes one 'idea' by moving detailed calculation into functions and modules.  We met this idea when we first met modules earlier.  This makes the code more readable as you can see its structure without getting bogged down in detail.  It also makes generally makes it more stable and easier to edit as you can work on discrete blocks in isolation.  Finally it allows code reuse.  Once you have written a module that say, reads and processes your particular data set, you can use it everywhere and if you then find a bug in it you only have to fix it in one place.

It is tricky to write  guide as to had you modulise your code but here are three main rules to guide you:

1. Each routine to be not more than one screen long.
2. Any code that you will use in more than one place should be moved to a seperate routine.
3. Routines should only have one main task.

The first makes your code more readable and makes it harder to get 'lost'. It also makes editing safer as each part is understandable on its own.  The second not only saves time but means that if you want to change something you only need to do it once.  The third means that you only have to worry about what else you might be breaking when you edit something.  All of these rules can be broken if it makes your code simpler or clearer. There is also a risk that if code is very hevily modulised then it can be hard to find where calculations actually happen.

## Prototyping

One other important coding practice is prototyping.  Prototyping is simply deciding what you are going to do, then checking that your ideas work, before launching into writing the detailed code for your problem.  The temptation is always to "just get to work" and start coding.  The problem with this approach is that once you start coding in a particular direction when you meet issues it can be hard to go back and fix our original bad decisions rather than just finding a workaround for the particular issue.  The analogy is if you start building a car but discover halfday through that where you want to drive is very muddy. So you go back and add bigger grippier tyres but then you find the car doesn't have the tourqe to turn them. To solve this you install a low range gearbox but then the engine can't handle it so you add a second engine to boost power...   At the end of this you will end up with a weird mish-mash car that keeps breaking when really you should have gone back to the start and made a tractor instead. 

The best place to start prototyping is with a pen and paper.  As almost all programming we are going to do is related to data processing the first place thing to do is to think about what data objects you are going to need starting from input data and the final results you want to output. Next you have to decide how to get between the two in the most efficent manner and what data objects you will need on the way and this way slowly build up a plan for the code.  Here is a list of things you should consider when you prototype:

1. What is the size of the problem? Do I need to worry about memory/parallelisation/architecture constraints for this code?
2. What does my input data look like and what data do I want to output?
3. What logical operations do I need to perform on the data?
4. Can I map the operations onto standard optimised routines from numpy/scipy/etc...?
5. What temporary data objects will I need to create?
6. What is a sensible naming convention for my data objects/variables? (more important than it sounds)
7. How should I modularise this, what might I want to reuse from here in other future codes?
8. What should the inputs and outputs of each module be?

Try to be reasonably detailed, the goal is to identify roadblocks with your method and if they were obvious from a rough plan you would have already thought of them.

The next step is to make small models to test parts of the code to make sure our ideas work.  Our tests should check the method works, that is is accurate and also what the memory and CPU requirments wiil be for the full problem size.  Not everything needs this kind of check but you should try to do it with any section you are unsure about.  

So far this has been very theoretical, lets try a worked example to see the ideas in action.

### Example
Suppose you want to calculate a multidimensional sum over some combination of functions, $X^i_{l}$, but with some particular constraints:

$E_{i,j,k} = \sum_{l_1}\sum_{l_2}\sum_{l_3} \left( \int_{-1}^{1} d\mu P_{l_1}(\mu) P_{l_2}(\mu) P_{l_3}(\mu)  \right) X^i_{l_1} X^j_{l_2} X^k_{l_3} $

Here in the triple sum the integral over Legendre polynomials enforces $l_1+l_2+l_3$ is even (parity) and that no $l_i$ is larger than the sum of the other two (conservation of angular momentum).  The integral representation lets us re-factor the problem as:

$E_{i,j,k} = \int_{-1}^{1} d\mu Y^i(\mu)Y^j(\mu)Y^k(\mu)$

$Y^i(\mu) = \sum_l P_{l}(\mu)  X^i_{l}$

So our inputs are $X^i_{l}$, and we want to output $E_{i,j,k}$. Let's take $X^i_l$ to be Chebyshev polynomials of the first kind, $T_i(l)$, as an example which we will generate by recursion. We could just as easily read them from a file or whatever, they're not important.  

![](Plots/Proto1.png)

The next steps are to fill in the operations a sum and integration

![](Plots/Proto2.png)

How will we do the operations, the sum is easy this is a matrix multiplication so we can use numpy `np.dot`, we will also need to generate the Legendre polynomials in a matrix, recursion is best for this.  For the integral we want to stick with the array form so trapazoidal sounds pretty good guess as a first step

![](Plots/Proto3.png)

The only thing is the integral over $\mu$ is a bit tricky, we need it to be zero for both $l_1+l_2+l_3$ odd and is one $l_i$ is larger than the sum of the other two.  This has to happen <b>after</b> we sum over $l$ so there is a real chance of numerical issues here.  Let's do a quick check

In [None]:
#Test
import numpy as np
import scipy.integrate as sp

mu = 50
lmax = 5
P = np.zeros((lmax,mu), dtype='float')

X = np.linspace(-1,1,mu)

P[0,:]=1.0
P[1,:]=X
for n in range(2,lmax):
    P[n,:] = ((2*n-1)*X*P[n-1,:] - (n-1)*P[n-2,:])/n

PPP1 = P[2,:]*P[2,:]*P[2,:] # 4/35
PPP2 = P[0,:]*P[1,:]*P[3,:] # 0
PPP3 = P[0,:]*P[1,:]*P[2,:] # 0
PPP4 = P[2,:]*P[4,:]*P[4,:] # 40/693
print(sp.trapz(PPP1,X),' vs ',4/35)
print(sp.trapz(PPP2,X),' vs ',0)
print(sp.trapz(PPP3,X),' vs ',0)
print(sp.trapz(PPP4,X),' vs ',40/693)

Hmmm even with $l$'s as low as 4 using 50 points is pretty bad.  We could increase the number of points more but as it's a polynomial integral we should probably use Gauss-Legendre quadriture as that is exact with (degree-1)/2 points which here would only be 6 points (much better than 50).  Let's check it works:

In [None]:
import numpy as np

lmax = 5
npts = 3*lmax//2+1

X, W = np.polynomial.legendre.leggauss(npts)
P = np.zeros((lmax,npts), dtype='float')

P[0,:]=1.0
P[1,:]=X
for n in range(2,lmax):
    P[n,:] = ((2*n-1)*X*P[n-1,:] - (n-1)*P[n-2,:])/n

PPP1 = W[:]*P[2,:]*P[2,:]*P[2,:] # 4/35
PPP2 = W[:]*P[0,:]*P[1,:]*P[3,:] # 0
PPP3 = W[:]*P[0,:]*P[1,:]*P[2,:] # 0
PPP4 = W[:]*P[2,:]*P[4,:]*P[4,:] # 40/693
print(np.sum(PPP1),' vs ',4/35)
print(np.sum(PPP2),' vs ',0)
print(np.sum(PPP3),' vs ',0)
print(np.sum(PPP4),' vs ',40/693)

That looks much better! (and faster!).  Now our plan looks like this:

![](Plots/Proto4.png)

Now we can write our code:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from itertools import combinations_with_replacement
lmax = 50
imax = 5
npts = 3*lmax//2+1

X, W = np.polynomial.legendre.leggauss(npts)
P = np.zeros((lmax,npts), dtype='float')
C = np.zeros((imax,lmax), dtype='float')


P[0,:]=1.0
P[1,:]=X
for n in range(2,lmax):
    P[n,:] = ((2*n-1)*X*P[n-1,:] - (n-1)*P[n-2,:])/n


L = np.linspace(-1,1,lmax)

C[0,:]=1.0
C[1,:]=L
for n in range(2,imax):
    C[n,:] = 2*L*C[n-1,:] - C[n-2,:]

T = np.dot(C,P)

comb = combinations_with_replacement(range(imax),3)

for i,j,k in comb:
    z = np.sum(W[:]*T[i,:]*T[j,:]*T[k,:])
    print("({},{},{}) produces {:5g}".format(i,j,k,z))

It's pretty short so there is not much need for functions, however we may want to move our Legendre and Chebyshev polynomial generators to a module so we can use them elsewhere. Some documentation would also be great.

Here is the final module file:

In [None]:
"""
This Module contains functions to create arrays of common polynomials

Contains:
----------------------------------------
    legendre_array
        Creates an array of the legendre polynomials
    chebyshev_array
        Creates an array of the chebyshev polynomials
----------------------------------------
        
        
All functions take two arguments:
----------------------------------------
    lmax
        the maximum dregree to include
    X
        The array of X points to calculate the polynomials on
----------------------------------------

Calculation is done by recursion

Written by James Fergusson: J.Fergusson@DAMTP.cam.ac.uk 
"""

def legendre_array(X,lmax):
    """
    Calculates an array of Legendre polynomials
    with size: len(X) by lmax
    
    Parameters
    ----------
    lmax: int
        the maximum degree, must be positive
    X: array(float)
        points to calculate the polynomials on
        
    Returns
    ----------
    array(float)
        A table of the Legendre polynomials
    """
    
    # Initilise the array
    P = np.zeros((lmax,len(X)), dtype='float')
    
    # Set P_0 and P_l
    P[0,:]=1.0
    P[1,:]=X
    
    # Calculate the rest via iteration
    for n in range(2,lmax):
        P[n,:] = ((2*n-1)*X*P[n-1,:] - (n-1)*P[n-2,:])/n
    
    # return the array of Legendre polynomials
    return P

def chebyshev_array(X,lmax):
    """
    Calculates an array of Chebyshev polynomials
    with size: len(X) by lmax
    
    Parameters
    ----------
    lmax: int
        the maximum degree, must be positive
    X: array(float)
        points to calculate the polynomials on
        
    Returns
    ----------
    array(float)
        A table of the Chebyshev polynomials
    """
    
    # Initilise the array
    C = np.zeros((lmax,len(X)), dtype='float')
    
    # Set C_0 and C_l
    C[0,:]=1.0
    C[1,:]=X
    
    # Calculate the rest via iteration
    for n in range(2,lmax):
        C[n,:] = 2*X*C[n-1,:] - C[n-2,:]
     
    # return the array of Chebyshev polynomials   
    return C
    


And here is the main code:

In [None]:
"""
 This code computes the value of the integral:                        
     $E_{i,j,k} = \sum_{l_1}\sum_{l_2}\sum_{l_3}                      
         \left( \int_{-1}^{1} d\mu P_{l_1}(\mu) P_{l_2}(\mu)          
         P_{l_3}(\mu)\right) X^i_{l_1} X^j_{l_2} X^k_{l_3} $          
                                                                      
 The code takes some set of $X^i_{l}$ for i element of [0,imax]       
 and calculates $E_{i,j,k}$ for every possible triple of [0,imax]     
                                                                      
 The code works by reordering the integration and sum to form:        
     $Y^i(\mu) = \sum_l P_{l}(\mu)  X^i_{l}$                          
     $E_{i,j,k} = \int_{-1}^{1} d\mu Y^i(\mu)Y^j(\mu)Y^k(\mu)$       
                                                                      
 The first is computed as a matix product                             
 The second is computed using Gauss Legendre quadriture               
     as it is exact for polynomial arguments                          
                                                                      
 Written by James Fergusson: J.Fergusson@DAMTP.cam.ac.uk              
"""

# Load modules:
import numpy as np
from itertools import combinations_with_replacement

# Set maximums for l and i.
lmax = 50
imax = 5

# Compute number of points requiredfor the Gauss-Legendre 
# quadtiture to be exact for a given lmax.
npts = 3*lmax//2+1

# Calculate Gauss-Legendre quadriture points, X, and weights, W.
X, W = np.polynomial.legendre.leggauss(npts)

# Create arrays for Legenre polynomials
P = legendre_array(X,lmax)

    
# Create Chebyshev Polynomials 
# This could be changed to accept a general function or read from disk
# in future versions
L = np.linspace(-1,1,lmax)
C = legendre_array(L,imax)

# Compute the sum over L using a matrix product
Y = np.dot(C,P)

# Construct an iterator over possible triples of i
comb = combinations_with_replacement(range(imax),3)

# Calculate the integral (just a sum here) for each triple and print 
# the result 
for i,j,k in comb:
    z = np.sum(W[:]*Y[i,:]*Y[j,:]*Y[k,:])
    print("({},{},{}) produces {:5g}".format(i,j,k,z))

### Exercise

Now it's your turn. I want you to create a programme that can take the noisy data in the files `Data/period?.txt` extract the periodic features and stack them up on a plot.  You should plan you strategy on paper then test and implement it with proper ducumentation.  Once you have finished you will swap places with someone else and try to understand each others code without any explaination. The result should take the left and turn it into the right in the figues below:

![](Plots/Period1.png)
![](Plots/Period2.png)
![](Plots/Period3.png)
![](Plots/Period4.png)

Finally I will note that when you are starting with python it can be helpful to use "style checkers" which check if your code confirms to the standard stule guide for python.  the most used one is `pylint` and you can learn about it here:

https://pylint.readthedocs.io/en/latest/index.html

It will analyse python files and suggest stylistic changes which can help make the code clearer.