# 7PAVITPR: Introduction to Statistical Programming
# Python practical 7

_Angus Roberts<br/>
Department of Biostatistics and Health Informatics<br/>
Institute of Psychiatry, Psychology and Neuroscience<br/>
King's College London<br/>_



# Defining  functions
So far our code has not been very modular. If we wanted to reuse some piece of code, we would have to cut and paste it. It is better to break our code in to smaller pieces in order to:

- reflect the component parts of the problem
- make code available for reuse

One way to modularise and reuse code is by defining our own functions, just as in R. There are other ways, which we will mention briefly later.

Here is an example of a function definition, for the F1 score. This is the harmonic mean of precision and recall - a metric used in machine learning:

In [1]:
# Compute F1 score from precision and recall

def f1_score(precision, recall):
    """Return the F1 score, given a precision and a recall"""
    f1 = (2 * precision * recall)/(precision + recall)
    
    return f1


And here is some code that uses our new function:

In [2]:
p = [0.76, 0.87, 0.79]           # A list of precisions
r = [0.55, 0.40, 0.59]           # A list of recalls
f1 = []                          # An empty list to put the F1s in

for i, j in zip(p, r):           # Iterate over each pair of precision and recall
    f1.append(f1_score(i, j))    # Add the F1 to the end of the list of F1s
    
print(f1)                        # Take a look at the answers

[0.6381679389312978, 0.5480314960629922, 0.6755072463768117]


The function definition is made up of the following parts:

- Above the function definition is a comment which explains the function to programmers reading the code
- The function starts with the `def` keyword
- This is followed by the name that will be used to call the function, and its arguments in parentheses, then a colon
- The function statements follow on the next lines, indented
- The first line after `def ...` is an optional _document string_ or _docstring_. This is used by automated tools to compile documentation of code
- At the end of the function, the `return` keyword precedes one or more values to return to the code that called the function
- If no `return` is given, or if no values follow it, then `None` will be returned

## <font color=green>💬 Discussion point</font>

Why might you want to write a function that returns `None`? Can you think of any built-in functions that return `None`?

## <font color=green>❓ Question</font>

Write a function that
- takes a list of numbers as an argument
- returns a new list that has members which are:
  - the same as the argument list, in the same order 
  - except any member with a value of zero is replaced with the mean of the non-zero members
  
For example,

`replace_zeros([2, 4, 0, 2, 3, 0, 4])`

should return:

`[2, 4, 3, 2, 3, 3, 4]`

(because 3 is the mean of teh non-zero members, `[2, 4, 2, 3, 4]`

## <font color=green>⌨️ Your answer</font>


In [7]:
# Complete the following code, replacing the text where neccesary

def replace_zeros(values):
    '''Takes a list of numbers, and returns a copy with any zero values replaced
    by the mean of the non-zero values'''

    # first iterate over the list to find the mean of the non-zero values
    
    total = 0                 # initialise a sum and count of non-zeros
    count = 0
    
    # iterate over the list argument, summing and counting non-zero items
    for v in values:
        if v:
            total = total + v
            count = count + 1
    
    # mean of the non-zero values
    mean = total / count

    # this is the list we will return
    new_list = []
    
    # iterate over the list argument, putting its values in new_list
    # when they are non-zero, and putting the mean in new_list when they are zero
    for v in values:
        if v:
            new_list.append(v)
        else:
            new_list.append(mean)
    
    # return the new list
    return new_list

# Some code to test it out
print(replace_zeros([2, 4, 0, 2, 3, 0, 4]))
print(replace_zeros([12.5, 14.6, 0, 17.3, 14.5, 0, 0]))

[2, 4, 3.0, 2, 3, 3.0, 4]
[12.5, 14.6, 14.725000000000001, 17.3, 14.5, 14.725000000000001, 14.725000000000001]


### More complex function definitions

There are several additional features available when defining functions. We cover two in the code below. There are others, less often used. Read the code, and run it.

In [10]:
# Find the mean of values in a list.
# Optionally, replace None values with some specified value

def mean(values, include_none=True, replace_none_with=0):
    '''compute the mean of list members, handling None value as specified'''
    total = 0                     # initialise our variables
    count = 0
    
    for val in values:
        if val is not None:
            total += val            # this is shorthand for total = total + val
            count += 1
        elif include_none:          # the value is None, check whether we should deal with it
            total += replace_none_with
            count += 1
        else:                       # this "else" isn't needed, including shows we have not forgotten
            pass                    # this means do nothing
         
    return total / count


# Some test assertions. Checks that results are what they should be
# Prints True for each test that passes

print( mean([2,3,4,None]) == 2.25 )
print( mean([2,3,4,None], False) == 3.0 )
print( mean([2,3,4,None], include_none=False) == 3.0 )
print( mean(replace_none_with=2.5, values=[2,3,4,None]) == 2.875 )
            
        
    

True
True
True
True



- __Positional arguments__ We can refer to arguments by virtue of their position in the function call. This is shown in the first two tests.
- __Keyword arguments__ We can refer to arguments with an explicit keyword, of the form _keyword=value_, as shown in the 3rd and 4th tests.
- __Default arguments__ We can provide a default value for an argument, by specifying it in the function definition

Note that you can mix positional and keyword arguments, but the positional ones must come first.

The code shows some additional features, that we have not seen before:

- `+=`  operator: x+=y is shorthand for x = x + y
- `pass` is a statement that does nothing, to be used when a statement is required, but no action needed
- __Assertions__ a simple way of testing that the code works as expected. In this case, we print out True if our test passes, False otherwise



## <font color=green>❓ Question</font>

The F1 measure described above is a specialisation of the F measure, which is defined as:

`f_score = ( (1 + (beta * beta) ) * precision * recall ) / ( (beta * beta * precision) + recall )` 

where `beta` defines the relative weight given to precision and recall in the metric. `beta=1` for the F1 measure.

Write a f_score() function, with keyword arguments for precision and recall, and an argument beta that defaults to 1. Test it with different ordering and naming of parameters, with and without defaults.

## <font color=green>⌨️ Your answer</font>

In [12]:
# Write your answer below, replacing the text where neccesary

def f_score(precision, recall, beta=1):
    '''The generalised F score'''
    f = ((1+beta**2) * precision * recall ) / ((beta**2 * precision) + recall)
    
    return f

# Some test assertions - you should get four Trues

print( round( f_score(70,  80), 3 ) == 74.667 )
print( round( f_score(70,  80, 1), 3 ) == 74.667 )
print( round( f_score(70,  80, 2), 3 ) == 77.778 )
print( round( f_score(beta=2, recall=80, precision=70), 3 ) == 77.778 )

True
True
True
True
