# Functions

Every program can be thought of as a series of transformations. We start with input data, and end up with our output. Every intermediate step is also just some transformation from an input to an output. In mathematics, we call the machinery that performs this transformation a *function*.

Consider the function below:

$$f: x, y \rightarrow x + y$$

The function $f$ takes two input arguments, $x$ and $y$ and returns a single value, the sum of $x$ and $y$. In a programming language like Python, we also have the *function* construct.

In [2]:
def add(x: int, y: int) -> int:
    """
    Function takes two integers and returns their sum,
    also an integer
    """
    result: int = x + y
    return result

add(3, 2)

5

### Anatomy of a Function

The function `add()` has not only the mathematical analogue, but also all of the syntax that is typical of functions in Python.

Our function declaration begins with...
```python
def
```

It is a keyword that tells Python that we are about to **def**ine a function. This is a sufficiently critical role that we are not permitted to use `def` for anything else.

Next we include the function name...

```python
def add
```

Just as we saw with individual values, our labels provide us with a means of abstraction. If we trust the way that `add()` is implemented, *we no longer need to concern ourselves with how it works*. It is just a black box that reliably delivers the sum of the inputs. This is one way in which programming languages help us control complexity. Now that we have the label `add` we can lift the level of reasoning about our task to the semantic level instead remaining mired in low-level implementation. Need to get the sum of two integers? You have a tool that is intuitively called `add` for this purpose.

Then comes the parameter list...

```python
def add(x: int, y: int)
```

We know that a function is a mapping of inputs to outputs, but how do we manipulate these inputs? How many are there? What properties do they have? Our parameter list holds all of this information. There are two inputs that we can manipulate in the body of the function by using their labels, $x$ and $y$. We can use them as we would integers ... because they are integers.

What are we getting out of this thing?

```python
def add(x: int, y: int) -> int:
```

We don't know yet what the output value is called inside the function, if it even has a name. That's ok because the name it takes inside the function will not be relevant outside of the function (more on that later). We do know that the output value, whatever it's called, will be an integer. This is very important information because it tells us what we can do to or with the value that comes out of the function. 

The full line, including the declaration, name, parameter list, and output type, is known as the *function signature*. In Python it is punctuated with a `:` to mark the end. In the majority of cases, the function signature is the only thing we need to know about the function. It tells you what goes in and out, how to call it, and hopefully has a label that tells you something about what it does.

Next comes the *docstring*...

```python
def add(x: int, y: int) -> int:
    """
    Function takes two integers and returns their sum,
    also an integer
    """
```

The docstring provides users of the function with more information about the function. In this case, it's a bit redundant because it just tells the user information that can already be gleaned from the function signature. That said, this is a very simple function. In more complicated cases, clarification on how the function works, pitfalls to look out for, or assumptions that were baked into the design of the function would be relevant information to include in the docstring.

Inside the body, which sits under the function signature in indented fashion, note that we assign the label `result` to the sum of `x` and `y`. This label, `result`, is *only* relevant inside of the function body. We'll address why when we discuss scoping. 

```python
def add(x: int, y: int) -> int:
    """
    Function takes two integers and returns their sum,
    also an integer
    """
    result: int = x + y
```
For now, we can simply recognize `result` as being attached to the value we wish to pass out of the function. 

Remember, the goal was not to just throw some integers into a function. We wanted to get their sum back, so we *must* explicitly return the value back to the calling scope (whereever we invoke the `add()` function). Enter another keyword: `return`.

```python
def add(x: int, y: int) -> int:
    """
    Function takes two integers and returns their sum,
    also an integer
    """
    result: int = x + y
    return result
```

Just like we saw with `def`, the role of `return` is so critical that we do not allow it to be used for anything else. The only thing that `return` does is pass values from inside the function to the outside.

### Lexical Scoping

This is a very scary sounding term, but the concept is quite intuitive. A good understanding of scoping will save you a lot of unnecessary headaches. Let's work our way up to it.

The first thing to note is that functions are *first-class objects* in Python. For our purposes, you can just think of this as meaning you can define functions pretty much anyway, including inside of other functions.

In [3]:
def sum_is_even(x: int, y: int) -> bool:
    """
    Returns True if the sum of the inputs is even,
    and False if the sum is odd.
    """
    
    def is_even(z: int) -> bool:
        return z % 2 == 0
    
    my_sum: int = add(x, y)
    my_sum_is_even: bool = is_even(my_sum)
    return my_sum_is_even

sum_is_even(3, 7)

True

We have three functions firing here:

1. `sum_is_even()` is the primary function we have just defined.
2. `add()` is a function we have already defined elsewhere, and we use it in the body of `sum_is_even()`.
3. `is_even()` is a function we defined *inside of the body* of `sum_is_even()`.

Utter madness. How are we going to sort all of this out? Lexical scoping!

Forgive me for playing a little fast and loose with the definition of scope, but I think the following provides the right kind of intuition. You can think of a scope as the environment of relevance, and relevance means you can reference a given variable. As a practical matter, scopes in Python can usually be identified via indentation. The function scope of `sum_is_even()` is everything below the function signature right up to and including the `return` statement. The function scope of `is_even()` is just one line long: its `return` statement. In other words, there are two scopes in `sum_is_even()`: the scope corresponding to the body of `sum_is_even()` and the scope of `is_even()` which it encloses.

Note that the function scope of `sum_is_even()` is *itself enclosed by another scope*! The ubiquitous global scope that contains all other scopes is typically that which is associated with the `main()` function. You can be forgiven for not realizing there was a `main()` function since we didn't define it, but it's there. Every program has some form of a `main()` function, whether you define it explicitly or not.

In a nutshell, there are three rules:

1. A label must be assigned upstream of it's first use, as in literally higher up in the document.
2. Labels assigned in an scopes that enclose the current scope are available for use.
3. Labels assigned in an enclosed scope defined inside of the current scope are not available for use.

Said differently, you can go "up" to get stuff (earlier in the program or to an enclosing scope), but you can't go "down" (later in a program or to an enclosed scope). 

Cool, cool, cool. Why do we care? For one, lexical scoping sorts out the thicket of functions defined above. Both `add()` were defined in the current scope (i.e. `main()`) so we can use them without incident. However, `is_even()` was defined in an enclosed scope (i.e. inside `sum_is_even()`, which is enclosed by `main()`), so what happens if we try use that?

In [5]:
try:
    is_even(2)
except NameError as e:
    print(e)

name 'is_even' is not defined


`is_even()` is apparently not defined despite the fact that we did define it. Even in this simple example, this seems odd. When programs get more complicated, this behavior can be downright baffling if you don't intuitively understand scoping. The key is this: from the perspective of `main()`, `is_even()` does not exist because it is *local* to `sum_is_even()`.

The corollary, of course, is that nothing defined in an enclosed scope is freely available to the enclosing scope. Not just functions, but also plain old values, and classes (which are beyond the scope of this notebook). The only exceptions are those values which are passed back out via the `return` statement.

With that, we have a more robust way to understand the function of *parameter lists* and the *return* statement. They are portals between hierarchical scopes:

+ The *parameter list* is the means by which we can formally pass variables from an enclosing scope into an scope it encloses.
  + This is not the only way, because you could just reference the variables in the enclosed scope. But, it is usefully explicit.
+ The *return* statement is the *only* means by which you can pass values from an enclosed scope back to the enclosing scope.

## Composition

Let us revisit our function `add()`.

In [6]:
add(1, 2)

3

When we called `add()`, it evaluated the expression contained within the body and return the result. So, is the function call operationally distinguishable from the returned value?

In [7]:
add(1, 2) == 3

True

It would appear that everywhere we want to use `3`, we could use `add(1, 2)`, and vice versa. They are operationally equivalent from a value perspective. So, what happens if we passed an evaluated function as an argument to another function?

In [8]:
sum_is_even(add(1, 2), add(4, 3))

True

It seems the computer did not spontaneously combust. Why did this work?

Remember that a function is just a mapping from inputs to outputs. So long as the enclosing function can map the output of the evaluated function parameter to a new output, it will do so reliably. This is called *composition*. Again, there is a mathematical analog. Suppose we have two functions:

\begin{align}
    f: x &\rightarrow x \times 2 \\
    g: x &\rightarrow x + 10
\end{align}

We could define a new function $h$ as their composition.

$$h: x \rightarrow (g \circ f)(x) \iff x \rightarrow (x \times 2) + 10$$

This idea is a very powerful one, because it reduces the space of functions we must write by allowing us to leverage abstraction. That is, if we have well designed generic functions that do one specific thing well, we can compose them together to achieve our objective without reinventing the wheel.

### Case Study: `map()`

Often we are faced with a situation in which we want to perform an operation on all of the values in a `list`. Let's say we want to run a regression and our data includes prices. We don't want to use raw prices directly, but the logarithm of prices. How do we go about converting all of our prices to their logged form? Enter `map()`.

At the end of the day, we have two things we want to do:

1. We want to contact every value in the `list`.
2. When we contact a value, we want to log transform it.

The second one is easy enough.

In [16]:
import numpy as np
from typing import List, Callable

def log_my_value(x: float) -> float:
    return np.log(x)

log_my_value(1000)

6.907755278982137

This whole "contact every value" feels a bit abstract. What do we do when we get there? The key is, we don't have to care. We can leave that part unspecified and rely on composition to get us there when the time comes. *Note that `Callable` just means function. A `Callable[[int, int], int]` takes two integer values and returns and integer*.

In [20]:
def my_map(f: Callable[[float], float], vals: List[float]) -> List[float]:
    out: List[float] = []
    for elem in vals:
        out.append(f(elem))
    return out

underlying_vals: List[float] = np.linspace(0.5, 5.5, 11)
vals_to_be_logged: List[float] = my_map(np.exp, underlying_vals)
logged_vals: List[float] = my_map(log_my_value, vals_to_be_logged)

print("Underlying Values: ", underlying_vals)
print("Exponentiated: ", vals_to_be_logged)
print("Logged: ", logged_vals)

Underlying Values:  [0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5]
Exponentiated:  [1.6487212707001282, 2.718281828459045, 4.4816890703380645, 7.38905609893065, 12.182493960703473, 20.085536923187668, 33.11545195869231, 54.598150033144236, 90.01713130052181, 148.4131591025766, 244.69193226422038]
Logged:  [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5]


Take a moment to appreciate how general `my_map`. We can use that on any list with values of any type. As long as we pass a function that can deal with one instance of the original type, `my_map` will work. That's the abstraction that composition buys us.