<img src="https://github.com/Center-for-Health-Data-Science/PythonTsunami/blob/spring2022/figures/HeaDS_logo_large_withTitle.png?raw=1" width="300">



<img src="https://github.com/Center-for-Health-Data-Science/PythonTsunami/blob/spring2022/figures/tsunami_logo.PNG?raw=1" width="600">


# Generalizing Code with Functions


*prepared by [Katarina Nastou](https://www.cpr.ku.dk/staff/?pure=en/persons/672471) and [Rita Colaço](https://www.cpr.ku.dk/staff/?id=621366&vis=medarbejder) and edited by [Henrike Zschach](https://kunet.ku.dk/oevrige/telefonbog/Telefonbogsdetaljer?upname=pnv719)*

## Objectives

* Describe what a function is and how it is useful
* Understand how to handle input to functions
* Understand how to use keyword arguments when calling functions
* Explain exactly what the return keyword does and some of the side effects when using it
* Understand how scope works in a function


## What is a function?

You can think of functions like 'mini-programs' that take input do something to it. Often this will result in an output (but not always):

**Input -> [Function processes input] (-> Output)**

You have probably already used functions such as `print()` and `int()`. We call these functions _built-in_ since they already exist in python.

The function `print` for example takes an input and evaluates it, then displays the result in the console.

In [None]:
my_int = 23
print(my_int)


23


You can see that `print` evaluates because you can give it expressions as well and it will compute the result:

In [None]:
print(my_int + 5)

28


Functions can also be **wrapped** inside other functions!

Here we ask for the type of the variable `my_int` by giving it as input to the `type` function. The output will be a string that describes which variable type `my_int` is. We can then print that information by giving it as input to the `print` function.

However, we can also do this more directly and without an intermediate variable by wrapping `print` around `type`. In that case, the output of `type` will directly be given to `print` as an input!

In [None]:
type_of_var = type(my_int)
print(type_of_var)

print(type(my_int))

<class 'int'>
<class 'int'>


## User defined functions

We can define our own functions using the keyword `def`! Here we have a user defined function:

In [None]:
#define the function
def say_hi():
    print('Hi!')

User defined functions always start with the keyword `def`, followed by a pair of parentheses and a colon. The code inside a function must be indented, just like the code inside a loop. This is the way python knows which code is part of the function since we don't wrap bracets around it.

You execute the function by writing its name followed by a pair of parentheses.

In [None]:
#execute/call the function
say_hi()

Hi!


Why use functions?

There are several motivations behind using functions:

* simplification
* generalization
* repurposing

They are **references** to the block of code we assign to them:

```python
def say_hi():
  print('Hi!')
```

When we call `say_hi()` what will happen is that the code inside, `print('Hi!')`, will be executed. This is useful because we can put there many lines of code that perform a complex task and execute them repeatedly by calling the name of the function.




## Input to functions

In the example above we haven't passed any input to the function. In fact we haven't defined any input in the `def` line, so this function does not accept any input and **will always do the same**!

Try it out:

In [None]:
# will give an error:
say_hi(5)

Of course a function like that isn't terribly useful. We want to be able to pass information into the function which the function will then process for us.

You define what kinds of input the function will accept in the `def` line. You also define what the input variable will be called inside the function (here it is `my_name`).

In [None]:
def say_hi_to_me(my_name):
    print('Hi '+ my_name + '!')

In [None]:
say_hi_to_me('Henrike')
say_hi_to_me('John')

Hi Henrike!
Hi John!


You can define as many input parameters as you want:

In [None]:
def say_hi_many(my_name, times):
    for i in range(times):
        print('Hi '+ my_name + '!')

In [None]:
say_hi_many('Mary', 5)

Hi Mary!
Hi Mary!
Hi Mary!
Hi Mary!
Hi Mary!


## Parameters vs Arguments

Formally, the input you give to a function when you **execute** it is is called an __argument__. It is the actual information we want the function to work on.

```python
say_hi_to_me('John')
```

In the above example, `'John'` is the argument. Any value passed to a function is an [argument](https://docs.python.org/3/glossary.html#term-argument).




What are __parameters__?

[Parameters ](https://docs.python.org/3/glossary.html#term-parameter) are the names that we want to use inside the function. They appear in the function definition:

```python
def say_hi_to_me(my_name):
    print('Hi '+ my_name + '!')
```

So in the code example below, `my_name` is the parameter and `'John'` is the argument I pass to that parameter when I call the function:


In [None]:
def say_hi_to_me(my_name):
    print('Hi '+ my_name + '!')

In [None]:
say_hi_to_me('John')

Hi John!


You can also think of it as the argument as being the actual value and the parameter being its alias.

Some more examples:
https://docs.python.org/3/faq/programming.html#faq-argument-vs-parameter

## Exercise 1 (5 mins)

#### My first function

Write a function that takes a number as input (don't worry about checking for variable type right now!) and calculates the square root of this number. Print the result.

You can use `math.sqrt()` for the calculation. Remember to `import math`.

Test it with different inputs. What happens when you provide no input?

In [None]:
# your code here

In [None]:
# test the function

In the example of the function you just wrote, what are the parameters and what are the arguments?

## Keyword arguments

In functions with several parameters, how does the function decide with supplied argument should be assigned to which parameter?

In [None]:
def say_hi_many(my_name, times):
    for i in range(times):
        print('Hi '+ my_name + '!')

In [None]:
say_hi_many('Mary', 5)

Hi Mary!
Hi Mary!
Hi Mary!
Hi Mary!
Hi Mary!


In [None]:
#What will happen if I switch them around?
say_hi_many(5, 'Mary')



By default, arguments are positional-or-keyword. The first argument given will be assigned to the first parameter defined:

In this example:

```python
def say_hi_many(my_name, times):
    for i in range(times):
        print('Hi '+ my_name + '!')
```

If you call `say_hi_many(5, 'Mary')` the parameter `my_name` will take the value 5 and the parameter `times` will take the value 'Mary' since they are specified in that order in the function definition.

You can avoid this behavior by explicitly stating which parameter belongs to which argument with the `=` sign:


In [None]:
say_hi_many(times=5, my_name='Mary')

Hi Mary!
Hi Mary!
Hi Mary!
Hi Mary!
Hi Mary!


We call this a **keyword argument**. Passing parameters like this is always **cleaner** since others may not know in which order they were mentioned in the `def` line.

## Default arguments

When we define an input parameter in our function we cannot call the function without passing an argument to that parameter. This can be circumvented by using `=` to define a default value the parameter should take if no input was given.

Default arguments can also be called optional arguments since there is no need to specify values for them when you call a function.

In [None]:
def say_hi_to_me(my_name = 'John Doe'):
    print('Hi '+ my_name + '!')

In [None]:
# test it:

say_hi_to_me()
say_hi_to_me('Henrike')

Hi John Doe!
Hi Henrike!


This is not the same as passing a keyword argument.

* When you **define** a function and use an `=` you are setting a **default parameter**
* When you **call** a function and use an `=` you are passing a **keyword argument**

## Exercise 2 (10 mins)

Write a function that calculates and prints BMI (body mass index), which is defined as:

$BMI = \frac{m_{kg}}{{h_m}^2} $

with:

$m_{kg}$ - mass in kilogram

$h_m$ - height in meters

You can calculate powers in python by using i.e. `my_number**2` for the second power (squaring `my_number`).

Supply default values for both arguments.

## Input: A note on data types

The observant participant might have noticed that at no point did we tell python whether the input we are expecting is a string, a number, a boolean or any other defined type.

Consider:
```python
def say_hi_many(my_name, times):
    for i in range(times):
        print('Hi '+ my_name + '!')
```

`times` is clearly meant to be an integer, the number of times the function should say hi. `my_name` is clearly supposed to be a string.

Why do we not define them so?





**Because python does [duck typing](https://www.askpython.com/python/oops/duck-typing):**

"*If it walks like a duck, and it quacks like a duck, then it is probably a duck.*"

What does this mean in practise?

Duck typing gives more importance to what can be **done** to an object than what the object **is**.

Consider the following example:

In [None]:
my_name = 'Henrike'
my_list = [1, 2, 'Hi', 'Pizza', 3.41]
my_dict = {'Monday': 'work', 'Tuesday': 'work', 'Friday': 'beers!'}

print(type(my_name))
print(type(my_list))
print(type(my_dict))


<class 'str'>
<class 'list'>
<class 'dict'>


These three objects are obviously of different types and asking for their `type` confirms this. However, I can call `length` on each of them because even though they are different objects they all meaningfully have a length.

In [None]:
print(len(my_name))
print(len(my_list))
print(len(my_dict))

7
5
3


If your heart desires further insight into how objects actually work in python I really recommend this post:

https://nedbatchelder.com/text/names.html


In python we do not define the type of a variable when we declare it. Instead, if we really need to know we can conduct a type test as shown below.

See also:
https://stackoverflow.com/questions/43233535/explicitly-define-datatype-in-python-function

In [None]:
#test whether we get the data types we need:
def say_hi_many(my_name, times):
    if type(times) == int and type(my_name) == str:
        for i in range(times):
            print('Hi '+ my_name + '!')
    else:
        print('Incorrect input types!')

In [None]:
say_hi_many(4, 'a')

Incorrect input types!


## Scope

Variables created in functions are scoped inside that function! Consider our example from the start:



In [None]:
def say_hi_to_me(my_name):
    print('Hi '+ my_name + '!')

say_hi_to_me('Henrike')
print(my_name)

The variable `my_name` only exists inside the function. Or more correctly, it is only **defined** within the scope of the function. The global scope has never heard of this variable!

## The `return` statement

So far we've been using `print` to make our functions show us their results. However, if you think back to the beginning at the lecture we said that functions can produce output.

This output is the **return value**. We define return values with the `return` statement and capture them with the assignment operator `=`:

```python
def my_function(input1, input2):
    result = input1 + input2
    return result

my_output_var = my_function(input1, input2)
```

In [None]:
def add_three(my_int):
  return my_int + 3

In [None]:
my_result = add_three(5)
print(my_result)

8


The reason we want to use `return` instead of just printing the result is that we can keep working with the return value. It is captured in a variable.

In truth all functions in python have a return value, even if you do not specify `return`. In that case, they will return the `None` object.

In [None]:
def say_hi_to_me(my_name = 'John Doe'):
    print('Hi '+ my_name + '!')

my_result = say_hi_to_me('Henrike')
print(my_result)


Hi Henrike!
None


When Python encounters a `return` statement, it exits the function **immediately**, and passes the value on the right hand side to the calling context. Code that follows after the `return` statement is not executed!

In [None]:
def add_three(my_int):
  return my_int + 3
  print('Yo!')

In [None]:
my_result = add_three(5)
print(my_result)

8


The same function can incoporate several return statements:

In [None]:
from datetime import datetime

def are_you_working():
    #Thursday is day 3, Friday is day 4
    if datetime.today().weekday() == 3 or datetime.today().weekday() == 5:
        return 'No, am learning python!'
    else:
        return 'Yes, working hard.'

print(are_you_working())


Yes, working hard.


But you can only ever `return` once! In other words, you can never reach both return statements. The first one that is reached will end the function, no matter what comes after.

Then how can you return more than one result? Well, you can only encounter one return statement, but the object you return can be complex, i.e. a list, a dict, a tuple. In the examples above we're returning strings. The example below returns a list:

In [None]:
def fun_with_strings(my_string):
    my_list = [my_string.upper(), my_string.lower(), my_string[::-1]]
    return my_list

In [None]:
fun_with_strings('Hello World!')

['HELLO WORLD!', 'hello world!', '!dlroW olleH']

## Exercise 3 (5 mins)

A common return mistake is returning too early in a loop.
The following function sums all the even numbers in a list.

What is wrong here? Create some test data like `my_list = [1,2,3,4,5,6]` and test the function by calling it on the list: `sum_even_numbers(my_list)`. Do you think the result is correct?
How can you change the code to get the correct result?

In [None]:
def sum_even_numbers(list_of_numbers):
    total = 0
    for number in list_of_numbers:
        if number % 2 == 0:
            total += number
        return total

## Group Exercise (20 mins)

Now, look again at our code from yesterday for making a PCA and producing a biplot. Abstract it where it makes sense and wrap it inside two functions.

The first function should:

* take in the path to/filename of the dataset to read in, a PCA object, and perhaps other parameters you may want to tweak
* return a dataframe of at least the two first PCs and the outcome variable.

If you find it easier, you can also choose to create a PCA object inside the function and return both the PCA object and the dataframe. You will need them both to the make the biplot.

Then write a second function that:

* takes in the PCA object, the dataframe returned from the first function, and perhaps other parameters you may want to tweak
* returns the biplot

