# Introduction to Python (part 2)

## What is a library?

Research programming is all about using libraries: tools other people have written and shared with the communit and that do many cool things.
The python syntax to import someone else’s library is “import”.

In [51]:
import geopy  # A python library for investigating geographic information. https://pypi.org/project/geopy/


ModuleNotFoundError: No module named 'geopy'

Now, if you try to follow along on this example in an Jupyter notebook, you’ll probably find that you just got an error message.

## Writing functions

Defining functions which put together code to make a more complex task seem simple from the outside is the most important thing in programming. 

Imagine you want to comput the mean, given a list of numbers **AND** there's no library that could do this for you (if there's a library it's always better to rely on it, as other people would have used it in the past and caught specific bugs etc.)

✏️ **Exercises:** Given a list of numbers, for instance [1,4,5,6,7] - how would you compute the mean without a computer?

A good starting point for defining a function is to think how to convert the process of doing an operation "manually", in a series of defined steps.Let's see an example below

In [52]:
def compute_mean(numbers):
    mean = sum(numbers)/len(numbers)
    return mean

This is the basic structure of a function, you have a function name `compute_mean`, and input `numbers`, a series of operations and an output (`mean`). However this function seems also **a bit difficult to read**, so let's go through the different components together.

Sum and len are for instance built-in Python function. It means that these are always available, you don't need to import them. You can see the documentation of sum [here](https://docs.python.org/3/library/functions.html#sum)

In [64]:
numbers = [1,4,5,6,7]
numbers_sum = sum(numbers)
print (numbers_sum)

23


In [65]:
number_of_elements = len(numbers)
print (number_of_elements)

5


In [66]:
mean = numbers_sum / number_of_elements
print (mean)

4.6


✏️ **Exercises:** What is the mean of [10,124,65,86,7,98,6,54,112,13,87] ?

A function would prevent you for reusing over and over the same commands in your code, because you will define a specific operation (for instance `compute_mean`) and then you will be able to use it over and over, whenver needed. This will also prevent you for adding bugs to your code by making mistakes copy-pasting your code.

However, there are two problems with functions:
1. they need to be well documented, otherwise it will be difficult to read them (even by you in a few weeks!)
2. they can have bugs too! So you need to be careful

## Documenting functions

A Python docstring is a documentation string. When you call the built-in help() function on a Python function for instance, you see its documentation. This documentation is specified by the docstring at the beginning of the definition.

In [67]:
### this description is the way a function is documented, so others can quickly understand how to use it
help(sum)

Help on built-in function sum in module builtins:

sum(iterable, /, start=0)
    Return the sum of a 'start' value (default: 0) plus an iterable of numbers
    
    When the iterable is empty, return the start value.
    This function is intended specifically for use with numeric values and may
    reject non-numeric types.



In [75]:
def compute_mean(numbers):
    """ compute the mean, given a list of numbers """
    mean = sum(numbers)/len(numbers)
    return mean

Ok this looks better, but still the input / output and operation are hard to read. A way of documenting function is following the [Google style for docstring](https://google.github.io/styleguide/pyguide.html). Here's an example

In [76]:
def compute_mean(numbers):
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list
    """
    mean = sum(numbers)/len(numbers)
    return mean

help(compute_mean)

Help on function compute_mean in module __main__:

compute_mean(numbers)
    compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list



Additionally, you can write the types of arguments and return values

In [77]:
def compute_mean(numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list
    """
    mean = sum(numbers)/len(numbers)
    return mean

help(compute_mean)

Help on function compute_mean in module __main__:

compute_mean(numbers: list) -> float
    compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list



Finally, you can make the variable names more clear or even breaking down each step to make it more readable

In [78]:
def compute_mean(list_of_numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of numbers (either integers or floats)
    
    Returns:
        The mean of the values contained in the list
    """
    sum_list = sum(list_of_numbers)
    length_list = len(list_of_numbers)
    mean = sum_list / length_list
    return mean

In [79]:
# to use the function

list_a = [1,4,5,6,7]
list_b = [10,124,65,86,7,98,6,54,112,13,87]

mean_a = compute_mean(list_a)
mean_b = compute_mean(list_b)

print (mean_a)
print (mean_b)

4.6
60.18181818181818


## Testing Functions

Functions are really useful tools when writing code. They allow your scripts to be more concise and modular. However, functions can easily add bugs to your code. Let's see the following example

In [80]:
def compute_mean(list_of_numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list
    """
    sum_list = sum(list_of_numbers)
    length_list = len(list_of_numbers)
    mean = sum_list / length_list
    return length_list

mean_a = compute_mean(list_a)
print (mean_a)

5


In this example the function is well documented and as expected returns a number, however by mistake we are returning the length of the list instead of the computed mean. Bugs like this one are very easy to make and hard to spot when you have a fairly complex pipeline.

To make sure the functions work correctly often people test them quickly after having implemented or spot errors by looking at the final output of the code. However both these approaches add additional issues.

So, what are best practices in testing your code?

A good starting point is to define some specific cases where you test your function and you know what output it should give, for instance:


In [81]:
# In Python, the assert statement is used to continue the execute if the given condition evaluates to True. 
# If the assert condition evaluates to False, then it raises the AssertionError exception with the specified error message.

assert compute_mean([1,4,5,6,7]) == 4.6,'The mean is not correct'

AssertionError: The mean is not correct

Everytime you write a function, define a series of assert statements that should produce a specfic outcome.

Other typical things that are important to test is that the input you are expecting is correct. For instance in our function we are expecting a list of numbers (either integers or floats)

What happens if the list contains a string? or the input list is empty? or the input is not a list?
Let's see!

In [82]:
compute_mean([1,4,"0.555",6,7])

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [83]:
compute_mean([])

ZeroDivisionError: division by zero

In [84]:
compute_mean(5)

TypeError: 'int' object is not iterable

As you can see the code crashes for different reasons. Instead of this, we should assert that the input is what we are expecting or returning a message

In [None]:
def compute_mean(list_of_numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of numbers (either integers or floats)
    
    Returns:
        The mean of the values contained in the list
    """

    assert type(list_of_numbers) is list, 'The input is not a list'
    assert len(list_of_numbers) >0, 'The input list is empty'

    sum_list = sum(list_of_numbers)
    length_list = len(list_of_numbers)
    mean = sum_list / length_list
    return mean

✏️ **Exercises:** Add an assert to test that all elements in the list are integers or float 