<img src="../img/Dolan.png" width="180px" align="right">

# **Lesson 4: Functions**
_Parameterizing code for reuse_

## **Learning Objectives**

### Theory / Be able to explain ...
- How functions encapsulate logic into reusable components
- The Python Standard Library of built-in functions
- The difference between defining a function and calling it
- Function arguments vs function parameters
- Default parameter values
- Positional vs named arguments
- Void functions, short-circuiting, and `None`
- Failing Forward

### Skills / Know how to  ...
- Define and call functions
- Import modules (with functions, data types, constants, etc.) from libraries
- Use positional and named arguments when calling a function
- Use short-circuiting to simplify functional logic
- Guard against bugs with short-circuiting 

---

## **Abstraction and the DRY Principle**
> "A designer knows that he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery

While many novice programmers assume that the best programs have the most lines of code and the most features, it's usually quite the opposite. With every line of code you are potentially introducing a bug! Keeping it simple and lean is always best. By extension, the best programmers are, as Joel Spolsky once observed, "lazy but smart." They tend to talk in simple, grammatically correct sentences, enunciating every word, perhaps even pronouncing the "t" in "often." They are the most likely to wear tee-shirts and jeans to work because they are comfortable and what you wear doesn't define who you are anyway. There is a kind of geeky elegance in that that many others, especially their bosses but sometimes their spouses, tend to miss ... until crunch time when it really matters. Then suddenly everybody is waiting for the programmers to squash every bug and nobody cares what they wear. While nobody is advocating you wear a tee-shirt to your next staff meeting, there is something to be learned from watching programmers do their best work. 

In its highest forms, programming is about creating _elegant_ code that just works. A skilled programmer has the rare ability to **abstract the essential from the concrete.** Given a block of code that does X, they will winnow it down to its bare logical essence and then parameterize it (i.e., with variables) so it can be reused over and over again, even in novel situations that never occurred to anybody before. They _are_ both lazy and smart. 

In many cases they will do this abstraction process without being asked. They call it the **DRY ("Don't Repeat Yourself") principle**. After you have done something the second or third time, it begins to become worth your time to see that it gets done right every time, with a minimum of effort. That, ultimately, is the essence of programming (pun intended). 

This lesson is about **functions** that **encapsulate** logic into reusable components. We will start with functions that come built into Python and then define some of our own.   

---
## **What's a Function?**
**A function is a named block of statements** that can be **called** as needed. If the function requires data to do its work, then we can supply input **arguments** when we call the function. Often, but not always, a function may **return** a result (output) of the computation.

We have already seen a few function calls:

In [None]:
type(32)

int

In [None]:
print("Go Stags!")

Go Stags!


In [None]:
int(42.0)

42

In each case the pattern is the same:
```python
function_name(argument)
```
The expectation is that the function call will return a value. In mathematical terms we say that a function is a mapping from a domain (set of inputs) to a range (set of outputs). Given some number of arguments, the function performs a calculation and returns the result. 

Python 3 ships with dozens of built-in functions in its **standard library**:

> ![Std Lib Functions](https://github.com/christopherhuntley/BUAN5405-lessons/raw/master/img/L4_standard_library_functions.png)

We can classify them into several categories:
- **types:** `bool()`, `int()`, `float()`, `complex()`, `str()`, `list()`, `dict()`, `tuple()`,`set()`, `frozenset()`,`type()`
- **math/logic:** `all()`,`any()`,`bin()`,`oct()`,`hex()`,`abs()`,`round()`,`max()`, `min()`, `pow()`, `sum()`
- **strings/sequences:** `ascii()`, `chr()`, `hash()`, `format()`, `len()`, `range()`, `iter()`, `filter()`, `enumerate()`, `slice()`, `sorted()`
- **text I/O:** `input()`, `print()`, `repr()`
- **files:** `open()` 
- **plumbing:** `bytes()`, `bytearray()`,`callable()`,`classmethod()`, `locals()`, `dir()`,`setattr()`, `getattr()`, `delattr()`, `compile()`, `eval()`,`exec()`, ...

In addition to these functions that come pre-loaded, there are many more things that can be **imported** from the standard library. 

In [None]:
import math
print(math.pi)

import random
print(random.random())
print(random.random())

3.141592653589793
0.9636885289926522
0.4041246611761464


`math` and `random` are **modules** to bundle together collections of functions, constants, and other reusable components that we can use in our code. Once imported, we use "dot notation" to indicate what which module contains a given function. Thus, `math.pi` is a constant (`pi`) found in in the `math` module. Similarly, `random.random()` is the function `random()` in the `random` module. We use it to generate pseudo-random numbers between 0 and 1. 

We can use the same mechanism to import and use components from third-party libraries as well.  
```python
import pandas as pd
```
You will see the code above a lot in your data science classes, basically at the top of every notebook. It imports the pandas library used for managing (sometimes impossibly huge) datasets, supplying a shorthand **alias** `pd` to save us from typing `pandas` over and over again. (We programmers really are a lazy bunch.)

---
## **Function Definitions & Calls**

Before a function can be called, it has to be defined. For that we use a `def` statement:
```python
def function_name( parameters ):
    function_body
```

where
- `function_name` follows exactly the same rules as variable names
- `parameters` **declares** (names) a list of zero or more variables (_parameters_) that can be **passed** as arguments in a function call
- `function_body` is a block of statements to be executed one after the other

Consider, for example, the following code, which includes one function definition and two function calls: 

In [None]:
def go_team(school):
    if school == "Fairfield":
        return "Go Stags!"
    else:
        return "Go home!"

print(go_team("Marist"))
print(go_team("Fairfield"))
print(go_team)

Go home!
Go Stags!
<function go_team at 0x110002320>


- The `def` statement has to precede the first function call. If we were to move the statement `print(go_team("Marist"))` to the top of the code before running the cell the first time then we would get `NameError: name 'go_team' is not defined`. 
- The `school` parameter is used inside the function body like any other variable. However, without more work on our part, `school` is a **local variable** that does not exist outside the function body. Its value is lost once the function is done. (Note: that's a feature, not a bug; forgetting things we no long need to know clears memory for remembering new things.)
- `return` statements are used to tell the function to **terminate execution** and (optionally) what to **_pass back_** to the **_caller_**. Yes, it is possible for a function to return nothing. We'll come back to that in a bit.
- Each function call is considered to be independent of the others. The function body gets a fresh instance of the `school` parameter to work with each time the function is called. 
- Finally, **a function definition has no effect until the function is called.** Without the parentheses to indicate that we are calling it, it is just a software object like anything else. It's [Schrödinger's cat](https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat), waiting for us to open the box to see if it is alive or dead. If we call a function that hasn't been defined yet then we get an error. 

### **Lexical Scope: Parameters and Arguments**
You may have noticed that we seem to be somewhat inconsistent with what we call the inputs to our functions. **Function calls supply input _arguments_** but **function definitions declare input _parameters_.** If both are inputs why have two names? That's because they are not actually the same thing at all. An argument is any Python expression, while a parameter is a local variable that is assigned the _value_ of the argument. Arguments get evaluated as part of the function call and then assigned to parameters just before executing the function body. 

So, for example, consider the following function definition and function call: 
```python
def double_value(x):
  return x*2

double_value(1+1)
```

The parameter `x` is used only inside the function body. Meanwhile, the argument `1+1` is evaluated to `2` just before setting the value of `x` inside the function. 

_**Now for a more advanced explanation ...**_ (Skim if you like and read the TL;DR at the bottom.)

This distinction between what is "inside" a function definition and what is "outside" the function definition is called **lexical scope**. Each variable exists in a **namespace** (a.k.a, "scope") within which no two variables can have the same name. However, what about code written by two different people? How can we be sure that library code written by person X many years ago does not have a variable with the same name as the one we are setting right now? We can't! So, we instead say that every variable has a scope within which it is defined. These scopes nest inside of each other, like [Matryoshka dolls](https://en.wikipedia.org/wiki/Matryoshka_doll). A module (program) can define variables and functions. Functions inside the module can define variables and _even more functions_.

The result is a hierarchy of scopes, with lower level scopes nested inside of higher level scopes. Recall dot notation, which we learned about in the section on modules? That's exactly how we refer things throughout the scope hierarchy:
- _`module`.`function`_ and _`module`.`variable`_ are how we refer to a function or a variable in an imported module.
- _`function`.`variable`_ is how we would refer to a variable that is defined inside a function in the current module (i.e., our code). 
- _`module`.`function`.`variable`_ is how we would refer to a variable within a a function defined in another module. 
- ... and many more variations on the above.  

Most of the time we can ignore the whole namespace concept but of course there are times when it matters. One such time is when calling a function, where data inside the function (parameters) are separate from data outside the function (arguments). Consider the following example:


In [None]:
def add_two_numbers(a,b):
  print("\n--- inner add_two_numbers FUNCTION scope ---")
  print("param a =",a)
  print("param b =",b)
  print("Returning", a+b)
  return a + b


a = 1
b = 2
print("--- outer MODULE Scope ---")
print("var a =",a)
print("var b =",b)
print("about to call add_two_numbers()")
print("\n--- outer MODULE Scope --\nFunction returned", add_two_numbers(b,a)) # note: b is passed before a

--- outer MODULE Scope ---
var a = 1
var b = 2
about to call add_two_numbers()

--- inner add_two_numbers FUNCTION scope ---
param a = 2
param b = 1
Returning 3

--- outer MODULE Scope --
Function returned 3


Here's a step-by step trace through in the order (and scope) in which each line of code is executed.
- In _MODULE_ scope:
  - lines 1-6: define the function `add_two_numbers()` 
  - lines 9-10: initialize `a`=1 and `b`=2
  - line 15: calls the function `add_two_numbers()`:
    - the arguments `b` and `a` are evaluated to get the values 2 and 1
    - the values 2 and 1 are passed into the function as data
- In `add_two_numbers()` _FUNCTION_ scope:
  - line 1: the values 2 and 1 are assigned to the parameters `a` and `b`, which act like variables within the `add_two_numbers()` function scope. 
  - the names `a` and `b` within the function take precendence over the names a and b outside the function. In other words, within the scope of `add_two_numbers()`, the names `a` and `b` are always the parameters, not the variables set outside the function. 
  - line 6: the function returns the sum of `a` and `b` (i.e., 3), after which the scope returns to the enclosing module 
- Back in _MODULE_ scope (upon returning from the function call in line 15):
  - the value returned by the function (3) is used by the print statement.

**TL;DR:** Function _parameters_ are what we call the data _inside the function scope_ (the middle bullets), while function _arguments_ are how we refer to the same data when _calling_ the function from the enclosing module scope (the outer bullets). 




### **Default Values and Named Arguments**
It is also possible that we might make some parameters optional by supplying **default values**:

In [None]:
# default values can be assigned in the parameter list
def go_team(school = "Fairfield"):
    if school == "Fairfield":
        return "Go Stags!"
    else:
        return "Go home!"

print(go_team("Marist"))
print(go_team("Fairfield"))
print(go_team())

Go home!
Go Stags!
Go Stags!


Here we have set "Fairfield" as the default value for `school`. If the school is not specified in the call then the function assumes you meant "Fairfield". 

So far we have only considered functions that have one parameter. When defining a function with multiple parameters, the parameters are declared in a particular order, with parameters having a default value (e.g., `school` in the above example) always listed _after_ those without them. When calling the function, the inputs that are required (because the corresponding parameters lack defaults) are said to be **positional arguments** because they have to be listed in the same order as the parameter declarations. The optional inputs (with default values) are **named arguments**. 

In [None]:
def go_team(sport, school="Fairfield", gender_modifier=""):
    if school != "Fairfield":
        return "Go home!"
    
    gender_modifier_padded = gender_modifier + " " if gender_modifier else ""

    return "Go "+ gender_modifier_padded + "Stags " + sport +"!"  

print(go_team("Tennis", school="Marist"))
print(go_team("Lacrosse", gender_modifier="Lady"))
print(go_team("Tennis","Fairfield"))
print(go_team("Basketball", gender_modifier="Lady", school = "Fairfield"))
print(go_team("Basketball"))
print(go_team())   # Error because sport is not optional

Go home!
Go Lady Stags Lacrosse!
Go Stags Tennis!
Go Lady Stags Basketball!
Go Stags Basketball!


TypeError: ignored

A few observations (which may take a few passes for you to process):

- The function (intentionally) short-circuits itself, returning "Go home!" for any `school` except "Fairfield". Everything after that is Fairfield-specific.
- Named arguments (`school` and `gender_modifier`) can appear in any order as long as they appear after the positional arguments (`sport`).
- We can also treat named arguments like positional arguments (without `name =` syntax) as long as they appear in the order the parameters were declared.
- We can use the parameters like any other variables (and even change their values).
- However, if we omit the `sport` argument then we get an error. Positional arguments are always required. 

---
## **Fruitful vs Void**
A function that returns a value is said to be **fruitful**. One that doesn't is a **void function**, which some languages call _subroutines_ or _procedures_. A void function is called not to perform a calculation but to carry out an **action** ("side effect") like printing to the screen or writing to a file. Notice that the following does not have a `return` statement. 

In [None]:
def print_go_team(sport, school="Fairfield", gender_modifier=""):
    print(go_team(sport,school,gender_modifier))

print_go_team("Tennis")
print("-----")
print(print_go_team("Tennis"))
print("-----")
print(type(print_go_team("Tennis")))

Go Stags Tennis!
-----
Go Stags Tennis!
None
-----
Go Stags Tennis!
<class 'NoneType'>


Notes:
- A void function returns (terminates) at the bottom of the function body unless it is short-circuited with a `return` (by itself, without any value) somewhere before that.
- Even a void function actually returns a value. It's just that the value is always `None` with data type `NoneType`.  

> **Heads Up: `print()` is not fruitful.** It does not return a value. Instead it displays a value on a given output device.  
> For example, consider the following code:
>
> ```python
> type(print('Hi'))  
>
> ```  
> 
>
> If we run that in a code cell then _two_ things occur:
> - the text 'Hi' is printed to the screen
> - the return type of the `print()` call is identified as `NoneType` (i.e., nothingness itself)
> 
> Since we are using Python to process data (not just `print()` it), we almost always want our functions to be fruitful. In fact, avoid using `print()` for much of anything except debugging. **Whatever you do, don't ever confuse `return` with `print()`.** They really are very different things. 

## **Failing Forward**
We'll close this lesson with a general comment about the nature of programming. Programming is one of those weird professions where it is best to learn from mistakes. **Nothing ever works at first.** Even if it seems to work, always assume that there are bugs in the code that you haven't discovered yet. 

In Lesson 1 we learned about the Edit / Run / Test cycle:
1. Edit some code. 
2. Run it.
3. Test to see that it worked. If not identify the mistake. 
4. Go back to step 1.

In the hands of a novice, the work mostly happens in step 1. However, the learning (i.e., the thing that makes you a better programmer) happens in **step 3**. You discover that "whenever we do X, this bad thing happens so let's try something else" and that's when you get creative and smart. Until then it's just pounding keys on the keyboard. You don't **know** if your code works or how it is going to fail. So, why not get through steps 1 and 2 as quickly as you can, with **the least amount of code you can possibly test** and then linger a bit in step 3 to see what happened and plan for step 1 again. You'll ultimately go through more cycles but it will eventually take less time and be more likely to succeed. 

All of this leads to a general strategy that we might call **Failing Forward**:
- Instead of trying to write your code all at once, **write code in tiny testable chunks** that will eventually build up into working code that does what you need it to do.
- Use the tests you create as the model for all the things you've learned about how things can fail. **When a new kind of failure happens, embrace it** just as much as you would a nifty new coding tool or library.
- Not all code requires an automated test suite. However, **if you bother to write unit tests, write the tests first**. Your first code after that should fail every test. Then the tests become your TODO list going forward. Keep rewriting the code and re-running the tests until the code passes every test. 
- When writing code that is hard, **start with whatever is most likely to fail**. After all, if you can't solve that then the rest of the code is irrelevant anyway. 



---
## **Before you go ... Save your notebook to be sure it is up to date.**

---
> ## Every Tee Shirt Has a Story
> ABOUT THE CAMEL CODE     
> This shirt is working code that is also art. The front of the shirt displays the commands needed to run the [Camel Code](https://www.perlmonks.org/?node_id=45213) displayed on the back. The code prints out the camel itself reduced 50% in size. I'm not a Perl aficionado but even I have to say that this is the most elegant text hack I've ever seen. So, I bought a tee shirt. You may wonder about the camel ... That's the image on O'Reilly's unofficial Perl manual. 

![L4 Tee Front](../Photos/L04_TeeFront.jpeg)
![L4 Tee Back](../Photos/L04_TeeBack.jpeg)

## Copyright &copy; 2020 Christopher Huntley. All rights reserved. 