# 🌎 GPGN268 - Geophysical Data Analysis
- **Instructor:** Bia Villas Boas  
- **TA:** Seunghoo Kim

## Lecture 12: Functions, documentation and unit testing.

#### 🎯 Learning Objectives from this Lecture:
- Define a function that takes parameters.
- Return a value from a function.
- Test and debug a function.
- Set default values for function parameters.
- Explain why we should divide programs into small, single-purpose functions.

## Background
At this point, we’ve written code to draw some interesting features in our meteorological data. But, our code is getting pretty long and complicated; what if we had thousands of datasets, wanted to generate a figures for every single one? Also, what if we want to use that code again, on a different dataset or at a different point in our program? Copying and pasting it is going to make our code get very long and very repetitive, very quickly.

In [1]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt

Suppose we wanted to check if there are any invalid measurements on our meteorogical files. We could do something like this:

In [2]:
# Read tmax and print statistics
tmax = np.loadtxt(fname='../assignments/intro-python/data/meteo_denver_tmax_2000_2022.txt',
                  delimiter='\t')

print('Maximum:', tmax.max())
print('Minimum:', tmax.min())
print('Mean:', tmax.mean())
print('Standard deviation:', tmax.std())

Maximum: 96.7
Minimum: 30.6
Mean: 67.9909420289855
Standard deviation: 16.614984435991314


In [3]:
# Read precip and print statistics
precip = np.loadtxt(fname='../assignments/intro-python/data/meteo_denver_precip_2000_2022.txt',
                    delimiter='\t')

print('Maximum:', precip.max())
print('Minimum:', precip.min())
print('Mean:', precip.mean())
print('Standard deviation:', precip.std())

Maximum: 6.69
Minimum: -999.0
Mean: -6.003659420289854
Standard deviation: 84.84519883429958


Copying and pasting for each file will not only be repetitive but it's also an approach that is prone to errors and typos. We’d like a way to package our code so that it is easier to reuse, and Python provides for this by letting us define things called ‘functions’ — a shorthand way of re-executing longer pieces of code.

Functions can help you to both eliminate repetition and improve efficiency in your code through **modularity**.

> Modularity means that code is separated into independent units that can be reused and even be combined to complete a longer chain of tasks.

## Functions

Let’s start by defining a function `fahr_to_celsius` that converts temperatures from Fahrenheit to Celsius:

In [4]:
def fahr_to_celsius(temp):
    return ((temp - 32) * (5/9))

<center><img src="https://swcarpentry.github.io/python-novice-inflammation/fig/python-function.svg"/></center>

The function definition opens with the keyword def followed by the name of the function (fahr_to_celsius) and a parenthesized list of parameter names (temp). The body of the function — the statements that are executed when it runs — is indented below the definition line. The body concludes with a return keyword followed by the return value.

When we call the function, the values we pass to it are assigned to those variables so that we can use them inside the function. Inside the function, we use a return statement to send a result back to whoever asked for it.

Let’s try running our function.

In [5]:
fahr_to_celsius(32)

0.0

This command should call our function, using “32” as the input and return the function value.

In fact, calling our own function is no different from calling any other function, such as `print`, and we can combine functions like below:

In [6]:
print('freezing point of water:', fahr_to_celsius(32), 'C')
print('boiling point of water:', fahr_to_celsius(212), 'C')

freezing point of water: 0.0 C
boiling point of water: 100.0 C


We’ve successfully called the function that we defined, and we have access to the value that we returned.

### Combining functions

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also write the function to turn Celsius into Kelvin:

In [7]:
def celsius_to_kelvin(temp_c):
    return temp_c + 273.15

print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))

freezing point of water in Kelvin: 273.15


What about converting Fahrenheit to Kelvin? We could write out the formula, but we don’t need to. Instead, we can combine the two functions we have already created:

In [8]:
def fahr_to_kelvin(temp_f):
    temp_c = fahr_to_celsius(temp_f)
    temp_k = celsius_to_kelvin(temp_c)
    return temp_k

print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))

boiling point of water in Kelvin: 373.15


This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-larger chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here — typically half a dozen to a few dozen lines — but they shouldn’t ever be much longer than that, or the next person who reads it won’t be able to understand what’s going on.

### Variable Scope

In composing our temperature conversion functions, we created variables inside of those functions: `temp`, `temp_c`, `temp_f`, and `temp_k`. We refer to these variables as local variables because they no longer exist once the function is done executing. If we try to access their values outside of the function, we will encounter an error:

In [10]:
print('Again, temperature in Kelvin was:', temp_k)

NameError: name 'temp_k' is not defined

If you want to reuse the temperature in Kelvin after you have calculated it with fahr_to_kelvin, you can store the result of the function call in a variable:

In [11]:
temp_kelvin = fahr_to_kelvin(212.0)
print('temperature in Kelvin was:', temp_kelvin)

temperature in Kelvin was: 373.15


The variable temp_kelvin, being defined outside any function, is said to be **global**. Inside a function, one can read the value of such global variables:

In [12]:
def print_temperatures():
  print('temperature in Fahrenheit was:', temp_fahr)
  print('temperature in Kelvin was:', temp_kelvin)

temp_fahr = 212.0
temp_kelvin = fahr_to_kelvin(temp_fahr)

print_temperatures()

temperature in Fahrenheit was: 212.0
temperature in Kelvin was: 373.15


### Writing a function for our own data
Let's define a function to print the statistics of our data

In [13]:
def print_stats(data):
    print('###### Statistics ######')
    print('Maximum:', data.max())
    print('Minimum:', data.min())
    print('Mean:', data.mean())
    print('Standard deviation:', data.std())

In [14]:
print_stats(precip)

###### Statistics ######
Maximum: 6.69
Minimum: -999.0
Mean: -6.003659420289854
Standard deviation: 84.84519883429958


Note that `return` and `print` are not interchangeable. `print` is a Python function that prints data to the screen. It enables us, users, see the data. A `return` statement, on the other hand, makes data visible to the program. 

> Note that we named our function `print_stats`. By giving our functions human-readable names, we can more easily read and understand what is happening in our code. 

## Testing and Documenting

Once we start putting things in functions so that we can re-use them, we need to start documenting and testing to see if those functions are working correctly. To see how to do this, let’s write a function that converts from centimiters to inches.

In [15]:
def centimeters_to_inches(cm):
    inches = cm / 2.54    
    return inches

### Documenting
In Python, functions should also contain a docstring, or a multi-line documentation comment, that provides details about the function, including the specifics of the input parameters and the returns (e.g. type of objects, additional description) and any other important documentation about how to use the function. Docstrings look something like this:

In [16]:
 def centimeters_to_inches(cm):
    """Docstrings should include a short description of the function here 
    
    Then you skip a line and write a detailed description 
    as well as identify the parameters (inputs) that the function 
    can take and the return (output) provided by the function,
    as shown below. 
    
    Parameters
    ----------
    input : type
        Description of input.
    
    Returns
    ------
    output : type
        Description of output.
    
    Notes
    -----
    Add notes here.
    
    Examples
    --------
    Add examples here
    
    """
    
    inches = cm / 2.54    
    
    return inches

Note that a docstring is not required for the function to work in Python. However, good documentation will save you time in the future when you need to use this code again, and it also helps others understand how they can use your function.

> You can learn more about docstrings in the [PEP 257 guidelines](PEP 257 guidelines) focused on docstrings. 

### Testing
Before applying the function directly to our data, we could try this on a couple of reference values to make sure the function is working properly.

In [17]:
print('Zero is zero:', centimeters_to_inches(0))

Zero is zero: 0.0


In [18]:
print('2.54 cm is:', centimeters_to_inches(2.54), 'in')

2.54 cm is: 1.0 in


We saw in a previous lecture that booleans can be interpreted as `True` or `False` or as intergers, where `True == 1` and `False == 0`. So if you accidentaly passed a boolean as an input of your function, what would hapen?

In [19]:
centimeters_to_inches(True)

0.39370078740157477

we would prefer that our function didn't work and gave us a warning that the input doesn't seem right. Designing function that can catch error before they happen is called **defensive programming**

#### Assertions

The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. An assertion is simply a statement that something must be true at a certain point in a program. When Python sees one, it evaluates the assertion’s condition. If it’s true, Python does nothing, but if it’s false, Python halts the program immediately and prints the error message if one is provided. For example, we might want our function to break if the input is not a float:

In [20]:
def centimeters_to_inches(cm):
    assert (type(cm)==float), "Input should be a float"
    inches = cm / 2.54    
    return inches

In [21]:
centimeters_to_inches(10)

AssertionError: Input should be a float

For this case, our `centimeters_to_inches` function should work with floats, integers, or arrays, so let's implement that:

In [22]:
# The backslash in the assert statment below tells python 
# that we're breaking a line. This allows writing the code in a more
# readable way (instead of having a very long statement in a single line)

def centimeters_to_inches(cm):
    assert (type(cm)==float) or (type(cm)==int) or (type(cm)==np.ndarray),\
    "Input should be a float, integer, or numpy array"
    inches = cm / 2.54    
    return inches

Now, let's test

In [23]:
# Testing with float
centimeters_to_inches(10.1)

3.976377952755905

In [24]:
# Testing with integer
centimeters_to_inches(1)

0.39370078740157477

In [25]:
# Testing with array
a = np.array([10.1, 1])
centimeters_to_inches(a)

array([3.97637795, 0.39370079])

In [26]:
centimeters_to_inches(True)

AssertionError: Input should be a float, integer, or numpy array

Perfect! Now our function works for floats, integers, and arrays, but breaks if a boolean is given as input. Now that we now how our function should behave, it's time to improve the documentation:

In [27]:
def centimeters_to_inches(cm):
    """Convert from centimeters to inches.
    
    Takes a given input in centimeters and returns the 
    equivalent distance or collection of distances in inches.
    
    Parameters
    ----------
    cm : ndarray or int or float
        Input data.
    
    Returns
    ------
    inches : ndarray or scalar
        Distance or collection of distances in inches.
    """
    assert (type(cm)==float) or (type(cm)==int) or (type(cm)==np.ndarray),\
    "Input should be a float, integer, or numpy array"
    inches = cm / 2.54    
    return inches

Since this function is fairly simple, we don't need to add notes or examples. As you start writing more complex function, these extra information will be very helpul. We can now access the docstring of your function by running:

In [28]:
centimeters_to_inches?

[0;31mSignature:[0m [0mcentimeters_to_inches[0m[0;34m([0m[0mcm[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Convert from centimeters to inches.

Takes a given input in centimeters and returns the 
equivalent distance or collection of distances in inches.

Parameters
----------
cm : ndarray or int or float
    Input data.

Returns
------
inches : ndarray or scalar
    Distance or collection of distances in inches.
[0;31mFile:[0m      /var/folders/f3/f5vjyxnd4vd2qfk05qn6ss740000gq/T/ipykernel_96201/2252703902.py
[0;31mType:[0m      function

#### Types of assertions
Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80–90% are working correctly. Broadly speaking, assertions fall into three categories:

- A precondition is something that must be true at the start of a function in order for it to work correctly.

- A postcondition is something that the function guarantees is true when it finishes.

- An invariant is something that is always true at a particular point inside a piece of code.

For example, suppose we are representing rectangles using a list of four coordinates [x0, y0, x1, y1], representing the lower left and upper right corners of the rectangle. In order to do some calculations, we need to normalize the rectangle so that the lower left corner is at the origin and the longest side is 1.0 units long. This function does that, and it checks that its input is correctly formatted and that its result makes sense:

In [29]:
def normalize_rectangle(rect):

    assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
    x0, y0, x1, y1 = rect
    assert x0 < x1, 'Invalid X coordinates'
    assert y0 < y1, 'Invalid Y coordinates'

    dx = x1 - x0
    dy = y1 - y0
    if dx > dy:
        scaled = float(dx) / dy
        upper_x, upper_y = 1.0, scaled
    else:
        scaled = float(dx) / dy
        upper_x, upper_y = scaled, 1.0

    assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
    assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'

    return (0, 0, upper_x, upper_y)

The preconditions on lines 3, 5, and 6 catch invalid inputs:

In [30]:
# missing the fourth coordinate
normalize_rectangle([0.0, 1.0, 2.0])

AssertionError: Rectangles must contain 4 coordinates

In [31]:
# X axis is inverted
normalize_rectangle([4.0, 2.0, 1.0, 5.0]) 

AssertionError: Invalid X coordinates

The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations might have been incorrect. For example, if we normalize a rectangle that is taller than it is wide everything seems OK:

In [32]:
normalize_rectangle([0.0, 0.0, 1.0, 5.0])

(0, 0, 0.2, 1.0)

but if we normalize one that’s wider than it is tall, the assertion is triggered:

In [33]:
normalize_rectangle([0.0, 0.0, 5.0, 1.0])

AssertionError: Calculated upper Y coordinate invalid

Re-reading our function, we realize that line 11 should divide `dy` by `dx` rather than `dx` by `dy`. If we had left out the assertion at the end of the function, we would have created and returned something that had the right shape as a valid answer, but wasn’t. Detecting and debugging that would almost certainly have taken more time in the long run than writing the assertion.

But assertions aren’t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.

Most good programmers follow two rules when adding assertions to their code. 

1. **Fail early, fail often**. The greater the distance between when and where an error occurs and when it’s noticed, the harder the error will be to debug, so good code catches mistakes as early as possible.

2. **Turn bugs into assertions or tests**. Whenever you fix a bug, write an assertion that catches the mistake should you make it again. If you made a mistake in a piece of code, the odds are good that you have made other mistakes nearby, or will make the same mistake (or a related one) the next time you change it. Writing assertions to check that you haven’t regressed (i.e., haven’t re-introduced an old problem) can save a lot of time in the long run, and helps to warn people who are reading the code (including your future self) that this bit is tricky

### Testing

Finally, as functions get more and more complex and we need to implement more tests it is good practice to write another function that runs the test and says if the function has passed all tests or not. For example, in our `centimeters_to_inches` we can package some post conditions in a seperate test function:

In [34]:
def test_centimeters_to_inches():
    assert centimeters_to_inches(0) == 0
    assert centimeters_to_inches(2.54) == 1
    print("Passed all tests")

In [35]:
test_centimeters_to_inches()

Passed all tests
