In [82]:
import os
os.getcwd()
#os.chdir(os.getcwd() + "/data")
os.chdir("/Users/Alexa/Desktop/swc-python/data")

-----


# 29 October 2018

## Python Functions

syntax similar to `if/else` statements and `for` loops: colon to end the function name/argument names, and indentation of the function body. "def" is the keyword (like "for" or "if" are keywords for loops & conditionals respectively).

```python
def function_name(arguments):
    """docstring here: to explain what the function does"""
    command indented
    indented command
    return value # It'll return None if there's no return statement
```

Arguments can be named and can have default values. In this example, the arguments are `arg1` and `arg2`:

In [3]:
def foo(arg1, arg2=0):
  """
  Return arg1 -1 + arg2.
  arg2 is optional, 0 by default.
  good practice: include examples.
  Examples:

  >>> foo(5)
  4
  >>> foo(5,8)
  12
  >>> foo(5, arg2=2)
  6
  """
  assert type(arg1)==int, "error message: here arg1 should be an integer"
  res = arg1 - 1 + arg2
  return res

It's not unusual for a function to be "mostly" its documentation. (Of course! You want the code to be reusable, right?)

In the docstring, at least do the following:
- write about the input: explain your arguments, what each one is and does, any defaults
- write about the output
- document any assumptions the function is making (e.g. libraries you're using)
- include examples

The triple-quote syntax isn't unique to Python: Julia uses it, too. (Julia also allows Markdown syntax within docstrings.)

Example:

In [1]:
import re

def startswithi(str):
    """Return True if the input string starts with 'i', False otherwise.
    Require that the "re" was imported beforehand.

    notes:
    - the double and single quotes inside my triple double-quoted docstring
    - in my text here the indentation adds 4 spaces on each line.
      Those are ignored because it's a triple set of quotes.

    Example:

    >>> startswithi("hohoho")
    False
    """
    return(bool(re.search(r'^i', str)))

help(startswithi) # or ?startswithi in interactive session
print(startswithi("iamcecile"))
print(startswithi("hohoho"))

Help on function startswithi in module __main__:

startswithi(str)
    Return True if the input string starts with 'i', False otherwise.
    Require that the "re" was imported beforehand.
    
    notes:
    - the double and single quotes inside my triple double-quoted docstring
    - in my text here the indentation adds 4 spaces on each line.
      Those are ignored because it's a triple set of quotes.
    
    Example:
    
    >>> startswithi("hohoho")
    False

True
False


### Key principle: Break programming problems down into small parts.

- Write functions: if you start to copy-paste your code, don't. You need to write a function.
- Functions make your code easier to debug and easier to read.
- Use meaningful names for functions and for variables.

Why?
- It's easier for humans to read
- Individual pieces can be reused later

Example where we re-use our functions:

In [7]:
def fahr_to_kelvin(temp):
    return ((temp - 32) * (5/9)) + 273.15

print('freezing point of water:', fahr_to_kelvin(32))
print('boiling point of water:', fahr_to_kelvin(212))

def kelvin_to_celsius(temp_k):
    return temp_k - 273.15

print('absolute zero in Celsius:', kelvin_to_celsius(0.0))

def fahr_to_celsius(temp_f):
    temp_k = fahr_to_kelvin(temp_f) # Functions within functions.
    result = kelvin_to_celsius(temp_k)
    return result

print('freezing point of water in Celsius:', fahr_to_celsius(32.0))

freezing point of water: 273.15
boiling point of water: 373.15
absolute zero in Celsius: -273.15
freezing point of water in Celsius: 0.0


example to break down:

In [32]:
import numpy
import glob
import matplotlib
filenames = glob.glob('inflammation*.csv')

def analyze_all(): # Breaks down the analysis task into 3 parts: analyze an item, detect problems, and loop to the next one.
  for f in filenames[:3]:
    print(f)
    analyze(f)
    detect_problems(f)

def analyze(filename):
    data = numpy.loadtxt(fname=filename, delimiter=',')
    # commands to make the figure for one data file

def detect_problems(filename):
    data = numpy.loadtxt(fname=filename, delimiter=',')
    if (numpy.max(data, axis=0)[0] == 0 and
        numpy.max(data, axis=0)[20] == 20):
        print('Suspicious looking maxima!')
    elif numpy.sum(numpy.min(data, axis=0)) == 0:
        print('Minima add up to zero!')
    else:
        print('Seems OK!')

analyze_all()

Starting with `analyze_all()` shows you the big-picture purpose of the script; that very first function shows you that for each file, I want to 1) print its name, 2) analyze it, and 3) look for problems. What those three tasks actually *entail* is then explained in the functions that follow.

----

## in-class exercise: binomial coefficients

learning goals (recall [best practices](notes0906-bestpractices.html)):

- write functions, divide a large problem into smaller problems
- use optional arguments and default argument values
- use loops and `if` statements
- use docstring to document functions
- check for assumptions
- test code automatically
- create a module: use the script to be run on the command line, or as a module

Calculating binomial coefficients is not easy numerically.
The number of ways to choose k elements among n is
`choose(n,k) = n! / (k! (n-k)!)`
where factorial n: `n! = 1*2*...*n` becomes very big very fast.
But many terms cancel each other in "n choose k",
and it is a lot easier numerically to calculate the log factorial:
`log(n!) = log(1) + ... + log(n)`.

1. Write a function "logfactorial" that calculates
   `log(n!)` for any integer `n>0`.  
   Hint: use `math.log()` to calculate the log of a single value,
   and use a loop to iterate over `i` and get the `log(i)` values.
2. Add a *docstring*
3. Add *checks* on the input `n`
4. Add *tests* as examples inside the docstring.
   For the tests to be used, add a section using the **doctest** module.
5. Add an *optional argument* `k` to calculate
   `log(n!/k!) = log((k+1)*...*n) = log(k+1) + ... + log(n)`,
   with default `k=0`. Add an associated test.
6. Write a function "choose" to calculate the log of the binomial
   `log(choose(n,k))` for any integers `n>=0` and `0 <= k <= n`.
   Start with the docstring and with a test.
   Recall that `choose(n,k) = n!/(k! (n-k)!)`, so
   `log(choose(n,k)) = log(n!/k!) - log((n-k)!)`
   and you can use the function from step 5 twice.
7. Add an optional argument to this `choose` function,
   to return either the binomial coefficient itself (as an integer)
   or its log (as a `float`).
   Make the function return the binomial coefficient by default, not its log.
8. Add a docstring for the module itself

In [21]:
def logfactorial(n, k=0):
    """Calculates the log factorial of n (number of possibilities), i.e. 
    log(n!) = log(1) + log(2) + ... + log(n) . n must be a positive integer.
    If a second argument k (number of outcomes) is provided (default: k=0), the function 
    instead calculates log(n!/k!) = log(k+1) + log(k+2) + ... + log(n) . k must also be
    a positive integer.
    Examples:
    >>> round(logfactorial(3), 5)
    0.77815
    >>> round(logfactorial(5,2), 5)
    1.77815
    >>> logfactorial(5,5)
    0
    >>> logfactorial(5,6)
    0
    """
    assert n > 0, "Error: input must be greater than zero."
    assert type(n)==int and type(k)==int, "Error: input must be an integer."
    logfac = 0
    if k > n:
        logfac = 0
    else:
        nlist = list(range(k+1,n+1))
        for step in nlist:
            logfac += math.log(step)
    return logfac

def choose(n,k,logoutput=False):
    """Calculates the binomial coefficient of (n,k), i.e. 'n choose k'. Provide a boolean 
    third argument to specify whether to return the binomial coefficient itself or its log.
    Examples:
    >>> choose(5,1)
    5
    >>> choose(5,2)
    10
    >>> choose(9,5)
    126
    >>> round(choose(5,1,True), 6)
    0.69897
    >>> choose(5,0)
    1
    """
    assert n>0, "Error: n must be greater than zero."
    assert 0 <= k and k <= n, "Error: k must be between zero and n."
    assert type(n) == int and type(k) == int, "Error: n and k must both be integers."
    nik = n - k
    logcoeff = logfactorial(n,k) - logfactorial(nik)
    if logoutput:
        return logcoeff
    else:
        return round(pow(2.71828, logcoeff))

In [29]:
print(round(math.log(3) + math.log(2) + math.log(1), 5))
print(type(1))

print("logfactorial(5,2) =", round(logfactorial(5,2), 5))
print("choose(5,2) =", choose(5,2))
print("choose(9,5) =", choose(9,5))
print(round(math.log(5),5))

1.79176
<class 'int'>
logfactorial(5,2) = 4.09434
choose(5,2) = 10
choose(9,5) = 126
1.60944


## python scripts

- to use functions in a script `binomial.py` inside a python session
  as a **module**:

  `import binomial`

  then use function `foo` as `binomial.foo()`, `help(binomial)`, etc.
  If your script in not in the directory that python is in,
  add the script path to the list of paths that python knows about:
  `import sys` then `sys.path.append("path/to/script")`.

  * special predefined variables: try
    `binomial.__name__` and `binomial.__file__` after importing the module
  * documentation for the module: add a docstring at the beginning,
    after the shebang line if you have one

- to **run the script** from the command line, first
  put what should be run inside a test:

  ```python
  if __name__ == '__main__':
       command1 # things to do if script called
       command2 # from the command line.
  ```

  then do `python binomial.py` to run the script.

- to run it with `./binomial.py` or simply `binomial.py`,
  change the file permission to let you execute the file, e.g. with `chmod u+x`,
  and add the "shebang" at the beginning of the file:

  `#!/usr/bin/env python`

note: `env` is a shell command. `env python` find the path to the python program and runs it. The shebang line has to give an absolute path, and the path to `env` is quasi-always `/usr/bin/env`: so this line makes your script portable to other userswho might not have the same path to python as you.

## script arguments

script name and arguments are captured in the list `sys.argv`
after you `import sys`, but use the
[argparse](https://docs.python.org/dev/howto/argparse.html) library instead.

In [None]:
#!/usr/bin/env python
"""module with very cool functions to say 'hi'"""

import argparse
# use an Argument Parser object to handle script arguments
parser = argparse.ArgumentParser()
parser.add_argument("-n", type=int, help="number of times to say hi")
parser.add_argument("-l", "--long", action="store_true", help="whether to say 'hi' the long way")
parser.add_argument("-g", "--greetings", type=str, help="greeting message, like a name")
parser.add_argument("--test", action="store_true", help="tests the module and quits")
args = parser.parse_args()
hi = "Howdy" if args.long else "Hi"

# test argument problems early:
if not args.test and __name__ == '__main__':
    if args.n<0:
        raise Exception("argument -n must be 0 or positive")
    # no error if file imported as module

def print_greetings(extra_greetings, n=args.n):
    """
    print individualized greeting. example:
    >>> print_greetings("have a good day", 0)
    have a good day, you.
    """
    s = ""
    for i in range(0,n):
        s += hi + ", "
    if extra_greetings:
        s += extra_greetings + ", "
    s += args.greetings if args.greetings else "you"
    s += "."
    print(s)

def runTests():
    print("testing the module...")
    if args.n:
        print("ignoring n for testing purposes")
    import doctest
    doctest.testmod()
    print("done with tests.")

if __name__ == '__main__':
    if args.test:
        runTests()
    else:
        print_greetings("")

we could save the example above in a file `example.py` and use it in various ways
from the shell:
```shell
./example.py --help
./example.py --test
./example.py -n=1 --long -g=cecile
./example.py -n 1 --long -g cecile
```
or within python:
```python
import example
help(example)
example.print_greetings("happy halloween", 3)
```

## test python code automatically

- test each function in your code,
  run *all* tests each time your change your code.
- big thing: new features often break older functions.
- each time you fix a bug: add a new test,
  for the situation in which the bug appeared
- there are many modules for automatic testing; one is `doctest`.

first, add examples the docstring of each function:

In [12]:
def choose(n, k):
    """returns the binomial coefficient.
    Examples:

    >>> choose(5,3)
    10
    """
    # function body that does the calculations

second, call `doctest.testmod()`, for example when the
file is run as a script:

In [13]:
if __name__ == '__main__':
    import doctest
    doctest.testmod()

**********************************************************************
File "__main__", line 5, in __main__.choose
Failed example:
    choose(5,3)
Expected:
    10
Got nothing
**********************************************************************
1 items had failures:
   1 of   1 in __main__.choose
***Test Failed*** 1 failures.
