<a target="_blank" href="https://colab.research.google.com/github/JLDC/Data-Science-Fundamentals/blob/master/notebooks/09_functions.ipynb">
    <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Open this notebook in Google Colab
</a>

___

# Introduction to Functions
___
If you went through the introductory notebook, you should already have a rought understanding of what a function is and why it can be useful. In this notebook we will take some more time to extend your knowledge of functions, introducting *lambda functions*, *argument unpacking*, and *vectorization*.

___
## Basics

Before we dive into the more complicated and useful concepts, let's review some function basics. I'll try not to bore you too much and I won't reiterate the things we did in the introductory notebook.

Because you'll mostly work with `pandas` dataframes and `numpy` arrays, it's a good idea to start learning about functions using these objects directly.

In [None]:
# Import necessary packages
import pandas as pd
import numpy as np

In [None]:
# Let's create a very simple function which adds up two elements and returns the output
def add_two_elements(a, b):
    return a + b

In [None]:
# Check that the function behaves correctly
add_two_elements(1, 2)

Notice that because string addition is defined in Python, this function also works with strings.

In [None]:
# Add two strings
add_two_elements("Hello ", "world!")

Of course, this behavior is not limited to numbers and strings, but rather it works for every object that can be added together, i.e., also the columns a `pandas` dataframe or two `numpy` matrices.

In [None]:
# Create a dataframe with columns A and B filled with random numbers
df = pd.DataFrame({"A": np.random.randn(5), "B": np.random.randn(5)})
df # View the dataframe

In [None]:
# Add a third column C, with the sum of A and B
df["C"] = add_two_elements(df["A"], df["B"]) # ⚠️ Of course, df["C"] = df["A"] + df["B"] is shorter!
df # Show the result

In [None]:
# Create two 3x3 matrices of random integers between 0 and 10
mat1 = np.random.randint(0, 10, (3, 3))
mat2 = np.random.randint(0, 10, (3, 3))

In [None]:
pd.DataFrame(mat1) # View the first matrix using pandas for nicer display

In [None]:
pd.DataFrame(mat2) # View the second matrix using pandas for nicer display

In [None]:
pd.DataFrame(add_two_elements(mat1, mat2)) # View their sum

___
### Optional arguments
A function, can also have *optional* arguments, i.e., we can choose to enter an argument or not. This is easy to create, we just set the argument to a default value when creating the function, look at the following example.

In [None]:
# Notice that we have now set the default value of b equal to 3
def add_two_elements_or_add_3(a, b=3):
    return a + b

In [None]:
# ... but it's the same result as before ?
add_two_elements_or_add_3(1, 2)

In [None]:
# ... unless we don't specify b, then it takes the default value, 3
add_two_elements_or_add_3(1)

___
### Unpacking arguments
Sometimes (often) programmers are a lazy bunch. So many programming languages have a few tricks which make it easier to write code. Think about list comprehensions from the introductory, it's not really a new or necessary functionality, we could achieve the same output using a loop, but it feels so much nicer once you get used to them. In programming, we refer to this as **syntactic sugar**, i.e., something that makes the syntax much nicer.

Unpacking arguments is a neat little bit of syntactic sugar which can be used with Python function. As the following illustrates, the idea is that we can use a list of arguments instead of specifying every single argument.

In [None]:
# Say we have two inputs and we want to pass them to a function, the normal way of doing it is
input1 = 1
input2 = 2
add_two_elements(input1, input2)

In [None]:
# If our inputs are in a list, we can also do it by accessing each element in the list
inputs = [1, 2]
add_two_elements(inputs[0], inputs[1])

In [None]:
# But using argument unpacking, we can achieve a much nicer result
inputs = [1, 2]
add_two_elements(*inputs)

Notice what happened, we simply had a list of inputs and used the `*` operator in front when passing them as the arguments to our function. It's not achieving anything we could not have achieved otherwise (except nicer code), but it can be quite useful. Here's a real-life example.

In [None]:
# Say we want to create a sequence between a lower bound and an upper bound
lower_bound = -10
upper_bound = 10
seq = np.linspace(lower_bound, upper_bound, num = 5)
seq

In [None]:
# ... or we can just keep the bounds in the same single variable
bounds = [-10, 10]
seq = np.linspace(*bounds, num=5)
seq

In [None]:
# Or perhaps you have a list of strings you want to print after each other
string_list = ["Unpacking", "arguments", "is", "not", "a", "must", "but", "it",
               "makes", "everything", "so", "much", "nicer!"]
# print(string_list)
print(*string_list)

Much better than
```python
print(string_list[0], string_list[1], string_list[2], string_list[3], ...)
```

Don't you think?

### 🙀 🤯 Scope of a variable

If you remember only one thing from this notebook, it should be **vectorization** below... but if you remember only two things, then the concept of a variable's **scope** is the second most important thing! This can be particularly confusing if you don't have any programming background, so be sure to ask if there is anything you don't understand!

First, consider the following examples:

In [None]:
# First example
x = 10
def mysum1(a, b):
    print(f"x is {x}")
    return a + b

In [None]:
mysum1(1, 2) # Run the example

In [None]:
# Second example
x = 10
def mysum2(x, y):
    print(f"x is {x}")
    return x + y

In [None]:
mysum2(1, 2) # Run the example

Wait. What happened?! Is `x=1` or `x=10`? Isn't it the same function? I'm confused! 🥴

Well, let's break it down, first, let's show the current value of `x`.

In [None]:
x # Display the value of x

So, `x` is still 10. But `mysum2` tells me it is `1`, so, is `mysum2` wrong? Well no, both functions are right. The problem here is that `x` is variable named **used in the inputs to `mysum2`**. Once a variable name is used as input to a function, any variable that had this exact same name is ignored within the function.

This makes sense, imagine if instead the function would reuse the variable defined before, suddenly `mysum2(1, 2)` would return `12`! That would be quite crazy!

Scope of variables is an important topic in all of programming and different programming languages handle it differently. Python takes a bit of a *laissez faire* approach, such that you nearly never run into an error... which is good sometimes, but can also be very dangerous when you make mistakes! Stricter languages will not let you define something like `mysum1` above, instead they will say you can't use `x` because it was never defined in `mysum1`, i.e., they would treat every single function like a little script. **This is not the case for Python, and because of this, we need to be careful!**.

So what's the main takeaway about variable scope? Well, for the time being, the takeaway is the following: **Avoid using variables that were defined outside of your function as much as possible!**

There is **always** a cleaner way of doing it, for instance, if we really needed to print some value of another variable `x` as we do in `mysum1`, we could simply write:

In [None]:
def mysum3(a, b, x):
    print(f"x is {x}")
    return a + b

In [None]:
# And now we pass the value x should have directly as input, try running
# mysum3(1, 2), you will see that it won't work
mysum3(1, 2, -3)

#### ➡️ ✏️ Task 1
Let's start by writing our first function. 

The [body mass index (BMI)](https://en.wikipedia.org/wiki/Body_mass_index) is a measure derived from the height and weight of a person. It is defined as

$$\text{BMI} = \frac{\text{weight in kg}}{(\text{height in m})^2},$$

i.e., the weight of a person in kilograms, divided by their height in meters squared. The World Health Organization defines the following categories of BMI:

|Category|BMI (kg/m²)|
|:--|:-:|
|Underweight|$<18.5$|
|Normal|$[18.5, 25)$|
|Overweight|$[25, 30)$]
|Obese|$> 30$|

You are working as a data science consultant for a company doing health analytics. They want to categorize the BMI of a person given their weight and height. Luckily, now that you have learned about functions, you know how to do just that!

Create a function called `bmi_category(height_in_cm, weight_in_kg)` which does the following:
+ Take as input the height in centimeters and the weight in kilograms.
+ Compute the BMI according to the above formula.
+ Output a string with the BMI category according to the table above.

In [None]:
# Enter your code below


___
## Lambda functions
Python has a second, more cryptic, way of writing functions: so-called *lambda functions*. A lambda function is just a way of writing functions in a single line of code instead of using multiple lines such as with `def`.

As of now, this might sound neither complicated nor useful, but it can be. It's good to know about it because I will be using it here and there, and, typically, it's very useful when trying to apply short functions to `numpy` arrays and `pandas` dataframes. Lambda functions can seem strange and unnecessarily complex at first, but in the end **they are nothing but a different way of writing functions.**

In [None]:
# Create a function that adds 10 to any number
def add_10_normal(x):
    return x + 10

In [None]:
# Test out the function
add_10_normal(50)

In [None]:
# Write the same function but as a lambda function
add_10_lambda = lambda x: x + 10
add_10_lambda(50)

Alright, so to reiterate and make an example, for a function that sums two numbers, we can either write:

```python
def myfunction(input1, input2):
    return input1 + input2
```

or

```python
myfunction = lambda input1, input2: input1 + input2
```
The first version is typically easier to understand when you are new to programming, but the second version will be very useful, in particular when coupled with vectorization as we will see later.
<br/><br/>

#### ➡️ ✏️ Task 2

Write a lambda function that does the following. As an input, it takes a (numpy) array or a column from a pandas dataframe. It then calculates the difference between the sum and the mean of the input. Test your function with a simple numpy array and a simple dataframe with a single column.


In [None]:
# Enter your code below

fancyfun = lambda x: x.sum() - x.mean()
a = np.random.randn(5)
fancyfun(a)


In [None]:

df = pd.DataFrame({"A": np.random.randn(5)})
fancyfun(df)