# Functions and Function Calls

In the previous notebooks, we have already used a Python concept called **functions**.

Python provides many 'in-built' functions (free stuff to play with) which as usually indication by the **round parenthesis** at the end of a statement.

For example:

In [2]:
type("42")

str

In Python lingo, the statements is referred to as "calling a function". Actually many things are happening in this very short code example.
- We call a function with the **name** `type`.
- We pass a value or variable as an **argument** to this function (a string "42")
- The functions **returns** the type of the argument (`str`).

Functions usually perform an action on a given input and return the result of this transformation (not always, but let's not always complicate matters, please).

A common example are math functions.

In [7]:
import math
math.log10(100) # the logarithm of 10

2.0

In [8]:
math.pow(10,2) # 10 to the power of 2

100.0

Python also provides many built-in functions for strings. 

**Exercise**: Can you describe the `len` function?

In [10]:
len("pneumonoultramicroscopicsilicovolcanoconiosis")

-1.958144629610086


We will discuss string functions (often referred to as string methods in one the next notebooks).

An import feature of function is called "composition", i.e. one expression can appears as part of another. A function can appear an argument to another function.

**Exercise**: can you tell me happens here?

In [12]:
word = "pneumonoultramicroscopicsilicovolcanoconiosis"
math.log10(len(word))

1.6532125137753437

## Writing your own functions

These in-built functions are part of Pythons basic toolkit. We will discuss these tools in more detail later.

These existing functions will bring you far, but in many occassions you will have to forge your own toolkit by writing functions yourself. This is a critical building block programming and will also figure prominently when you design your research process.

Why write you own functions? 

Imagine you want to write a programme that checks if your text is longer than 5 characters.

In [14]:
word = 'Supercalifragilisticexpialidocious'
if len(word) > 5:
    print('Yes')
else:
    print('No')

Yes


Imagine you want to repeat this operation, for two words. A simple solution would be to copy the code.

In [16]:
word_1 = 'Supercalifragilisticexpialidocious'
if len(word_1) > 5:
    print(True)
else:
    print(False)
    
word_2 = 'Yo!'
if len(word_2) > 5:
    print(True)
else:
    print(False)

True
False


But what if have thousands words, and we would like to do similar checks elsewhere in our notebook. A better solution is to package this piece of code as a function.

In the code cell below we define a new function, with the name `check_length`. It takes an argument, and checks if this arguments consists of more than five characters. If so, the function returns `True`, otherwise `False`.

In [17]:
def check_length(word):
    if len(word) > 5:
        return True
    else:
        return False

Now we can repeat these checks with fewer lines of code.

In [19]:
print(check_length('Supercalifragilisticexpialidocious'))
print(check_length('Yo'))

True
False


## When to write functions?

With functions you can group a sequence of statements (and execute them with one function call).

A common example in text mining is called text preprocessing, where we transform an input text to format that makes it more amenable to computational analysis. We discuss this in more detail later, but a simple preprocessing stap could be:

- Check if text is if minimal length 5
- if so, return the text in lowercased form
- Otherwise return and empty string

In [24]:
def preprocess_text(text):
    if len(text) > 5:
        lowercased = text.lower()
        return lowercased
    else:
        return ''

In [25]:
text = 'SupercalifragilisticexpialidociouS'
preprocess_text(text)

'supercalifragilisticexpialidocious'

In [26]:
text = 'Yo!'
preprocess_text(text)

''


When you write code, you will notice that certain sequences of statements form a unit or operation that you want to repeat and reuse. 

Wrapping them in a function will make you code more concise and less error-prone (again, this approach is much better than copy pasting. 


For example, by creating a function for preprocessing text we are sure that each incoming document will be transformed in exactly the same way.


## How to write functions?

Function definitions follow a general syntax:

```python
def name(parameters):
    statements
```

- The name used to call the function. You can use any name except those that belong the small category of Python keywords
- Parameters comprise the information you want to pass to a function
- Statemetns: the sequence of statements that perform an operation (on the input parameters).
- Function often and with a `return` statement

Let's try to build a function that computes the average of length of texts in corpus.

Imagine we collected the number of words in five documents
```python
doc_length = [10,44,52,16,97] 
```


✏️ **Exercise:** 

Given this list of numbers - how would you compute the mean without a computer?

## Computing the mean in Python

To compute the mean we need sum all values and divide this number by number of elements. 
We can easily performs this sequence of statements using the built-in `sum` and `len` Python functions. 

In [31]:
numbers = [1,4,5,6,7]
numbers_sum = sum(numbers)
print (numbers_sum)

23


In [32]:
number_of_elements = len(numbers)
print (number_of_elements)

5


In [33]:
mean = numbers_sum / number_of_elements
print (mean)

4.6


✏️ **Exercise:** 

In Python code you can compute the mean by combining the already existing `sum` and `len` functions.
```python
mean = sum(numbers)/len(numbers)
```

Can you write a function
- with the name "compute_length"
- it takes one argument called `list_of_numbers`
- it computes the mean ans saves the in a variable `mean_length`
- returns the results of this operation
- try it on different examples, e.g. `[1,3,4]` or `[3,0,9,12]`

A good starting point for defining a function is to think how to convert the process of doing an operation "manually", in a series of defined steps.Let's see an example below

Remember: A function prevents you from repeating (in the sense of copy-pasting) over and over the same statements in your code.

Once you have defined a specific operation (for instance `compute_mean`) you'll be able use it over and over, whenever needed. 

Composing code based on clear and meaningful units (i.e. functions) will make debugging easier as well. 

However, there are two problems with functions:
1. they need to be well documented, otherwise it will be difficult to read them (even by you in a few weeks!)
2. they can have bugs too! So you need to be careful and check if they are correct at a semantic level.

## Documenting functions

A Python docstring is a documentation string. When you call the built-in help() function on a Python function for instance, you see its documentation. This documentation is specified by the docstring at the beginning of the definition.

In [5]:
### this description is the way a function is documented, so others can quickly understand how to use it
help(sum)

Help on built-in function sum in module builtins:

sum(iterable, /, start=0)
    Return the sum of a 'start' value (default: 0) plus an iterable of numbers
    
    When the iterable is empty, return the start value.
    This function is intended specifically for use with numeric values and may
    reject non-numeric types.



In [6]:
def compute_mean(numbers):
    """ compute the mean, given a list of numbers """
    mean = sum(numbers)/len(numbers)
    return mean

Ok this looks better, but still the input / output and operation are hard to read. A way of documenting function is following the [Google style for docstring](https://google.github.io/styleguide/pyguide.html). Here's an example

In [None]:
def compute_mean(numbers):
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list
    """
    mean = sum(numbers)/len(numbers)
    return mean

help(compute_mean)

Additionally, you can write the types of arguments and return values

In [None]:
def compute_mean(numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list
    """
    mean = sum(numbers)/len(numbers)
    return mean

help(compute_mean)

Finally, you can make the variable names more clear or even breaking down each step to make it more readable

In [None]:
def compute_mean(list_of_numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of numbers (either integers or floats)
    
    Returns:
        The mean of the values contained in the list
    """
    sum_list = sum(list_of_numbers)
    length_list = len(list_of_numbers)
    mean = sum_list / length_list
    return mean

In [None]:
# to use the function

list_a = [1,4,5,6,7]
list_b = [10,124,65,86,7,98,6,54,112,13,87]

mean_a = compute_mean(list_a)
mean_b = compute_mean(list_b)

print (mean_a)
print (mean_b)

## Testing Functions

Functions are really useful tools when writing code. They allow your scripts to be more concise and modular. However, functions can easily add bugs to your code. Let's see the following example

In [None]:
def compute_mean(list_of_numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of integers
    
    Returns:
        The mean of the values contained in the list
    """
    sum_list = sum(list_of_numbers)
    length_list = len(list_of_numbers)
    mean = sum_list / length_list
    return length_list

mean_a = compute_mean(list_a)
print (mean_a)

In this example the function is well documented and as expected returns a number, however by mistake we are returning the length of the list instead of the computed mean. Bugs like this one are very easy to make and hard to spot when you have a fairly complex pipeline.

To make sure the functions work correctly often people test them quickly after having implemented or spot errors by looking at the final output of the code. However both these approaches add additional issues.

So, what are best practices in testing your code?

A good starting point is to define some specific cases where you test your function and you know what output it should give, for instance:


In [None]:
# In Python, the assert statement is used to continue the execute if the given condition evaluates to True. 
# If the assert condition evaluates to False, then it raises the AssertionError exception with the specified error message.

assert compute_mean([1,4,5,6,7]) == 4.6,'The mean is not correct'

Everytime you write a function, define a series of assert statements that should produce a specfic outcome.

Other typical things that are important to test is that the input you are expecting is correct. For instance in our function we are expecting a list of numbers (either integers or floats)

What happens if the list contains a string? or the input list is empty? or the input is not a list?
Let's see!

In [None]:
compute_mean([1,4,"0.555",6,7])

In [None]:
compute_mean([])

In [None]:
compute_mean(5)

As you can see the code crashes for different reasons. Instead of this, we should assert that the input is what we are expecting or returning a message

In [None]:
def compute_mean(list_of_numbers:list)-> float:
    """ compute the mean, given a list of numbers 
    
    Args:
        numbers: List of numbers (either integers or floats)
    
    Returns:
        The mean of the values contained in the list
    """

    assert type(list_of_numbers) is list, 'The input is not a list'
    assert len(list_of_numbers) >0, 'The input list is empty'

    sum_list = sum(list_of_numbers)
    length_list = len(list_of_numbers)
    mean = sum_list / length_list
    return mean

✏️ **Exercise:** 

Add an assert to test that all elements in the list are integers or float 

✏️ **Exercise:** 

Write a function that, given a list of names (containing duplicates) returns a list of names without duplicates. Make sure to include documentation and define `assert` to check the input and the correctness of the output

Example of input:

In [None]:
names = ["Mark","Paula","Paul","Fede","Mariona","Kaspar","Paul","Thomas","Thomas","Mark","Thomas"]