<img src="https://github.com/christopherhuntley/BUAN5405-docs/blob/master/Slides/img/Dolan.png?raw=true" style="width:180px; float:right">

# Lesson 4: Functions
_Parameterizing code for reuse_

# Learning Objectives

## Theory / Be able to explain ...
- How functions encapsulate logic into reusable components
- The Python Standard Library of built-in functions
- The difference between defining a function and calling it
- Function arguments vs function parameters
- Default parameter values
- Positional vs named arguments
- Void functions, short-circuiting, and `None`

## Skills / Know how to  ...
- Define and call functions
- Import modules (with functions, data types, constants, etc.) from libraries
- Use positional and named arguments when calling a function
- Use short-circuiting to simplify functional logic
- Guard against bugs with short-circuiting 
- Create nameless lambda functions within calculations 

**What follows is adapted from Chapter 4 of the _Python For Everybody_ book. If you have not read it, then please do so before continuing on.**

## Just the Highlights
HIGHLIGHT VIDEO GOES HERE

## Abstraction and the DRY Principle
> "A designer knows that he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery

While many novice programmers assume that the best programs have the most lines of code and the most features, it's usually quite the opposite. With every line of code you are potentially introducing a bug! Keeping it simple and lean is always best. By extension, the best programmers are, as Joel Spolsky once observed, "lazy but smart." They tend to talk in simple, grammatically correct sentences, enunciating every word, perhaps even pronouncing the "t" in "often." They are the most likely to wear tee-shirts and jeans to work because they are comfortable and what you wear doesn't define who you are anyway. There is a kind of geeky elegance in that that many others, especially their bosses but sometimes their spouses, tend to miss ... until crunch time when it really matters. Then suddenly everybody is waiting for the programmers to squash every bug and nobody cares what they wear. While nobody is advocating you wear a tee-shirt to your next staff meeting, there is something to be learned from watching programmers do their best work. 

In its highest forms, programming is about creating _elegant_ code that just works. A skilled programmer has the rare ability to **abstract the essential from the concrete.** Given a block of code that does X, they will winnow it down to its bare logical essence and then parameterize it (i.e., with variables) so it can be reused over and over again, even in novel situations that never occured to anybody before. They _are_ both lazy and smart. 

In many cases they will do this abstraction process without being asked. They call it the **DRY ("Don't Repeat Yourself") principle**. After you have done something the second or third time, it begins to become worth your time to see that it gets done right every time, with a minimum of effort. That, ultimately, is the essence of programming (pun intended). 

This lesson is about **functions** that **encapsulate** logic into reusable components. We will start with functions that come built into Python and then define some of our own.   

## What's a Function?
**A function is a named block of statements** that can be **called** as needed. If the function requires data to do its work, then the function call can supply input **arguments**. Often, but not always, a function may **return** a result (output) of the computation.

We have already seen a few function calls:

In [1]:
type(32)

int

In [2]:
print("Go Stags!")

Go Stags!


In [3]:
int(42.0)

42

In each case the pattern is the same:
```python
function_name(argument)
```
The expectation is that the function call will return a value. In mathematical terms we say that a function is a mapping from a domain (set of inputs) to a range (set of outputs). Given one or more arguments, the function performs a calculation and returns the result. 

Python 3 ships with dozens of built-in functions in its **standard library**:

> ![](img/L4_standard_library_functions.png)

We can classify them into several categories:
- types: `bool()`, `int()`, `float()`, `complex()`, `str()`, `list()`, `dict()`, `tuple()`,`set()`, `frozenset()`,`type()`
- math/logic: `all()`,`any()`,`bin()`,`oct()`,`hex()`,`abs()`,`round()`,`max()`, `min()`, `pow()`, `sum()`
- strings/sequences: `ascii()`, `chr()`, `hash()`, `format()`, `len()`, `range()`, `iter()`, `filter()`, `enumerate()`, `slice()`, `sorted()`
- text I/O: `input()`, `print()`, `repr()`
- files: `open()` 
- plumbing: `bytes()`, `bytearray()`,`callable()`,`classmethod()`, `locals()`, `dir()`,`setattr()`, `getattr()`, `delattr()`, `compile()`, `eval()`,`exec()`, ...

In addition to these functions that come pre-loaded, there are many more things that can be **imported** from the standard library. 

In [4]:
import math
print(math.pi)

import random
print(random.random())
print(random.random())

3.141592653589793
0.013784543583697961
0.9395157561609665


`math` and `random` are **modules** to bundle together collections of functions, constants, and other reusable components that we can use in our code. Once imported, we use "dot notation" to indicate what module Python should look to find it. The reference `math.pi` is to a constant called `pi` in the `math` module. Similarly, `random.random()` is the function `random()` in the `random` module. We use it to generate pseudo-random numbers between 0 and 1. 

We can use the same mechanism to import and use components from third-party libraries as well.  
```python
import pandas as pd
```
You will see the code above a lot in your data science classes, basically at the top of every notebook, including in the final project for this course. It imports the pandas library used for managing (sometimes impossibly huge) datasets, supplying a shorthand **alias** `pd` to save us from typing `pandas` over and over again. (We programmers really are a lazy bunch.)

## Function Definitions & Calls

Before a function can be called, it has to be defined. For that we use a `def` statement:
```python
def function_name( parameters ):
    function_body
```

where
- `function_name` follows exactly the same rules as variable names
- `parameters` **declares** (names) a list of zero or more variables that can be **passed** as arguments in a function call
- `function_body` is a block of statements to be executed one after the other

Consider, for example, the following code, which includes one function definition and two function calls: 

In [5]:
def go_team(school):
    if school == "Fairfield":
        return "Go Stags!"
    else:
        return "Go home!"

print(go_team("Marist"))
print(go_team("Fairfield"))
print(go_team)

Go home!
Go Stags!
<function go_team at 0x103fa63b0>


- The `def` statement has to precede the first function call. If we were to move the statement `print(go_team("Marist"))` to the top of the code before running the cell the first time then we would get `NameError: name 'go_team' is not defined`. 
- The `school` parameter is used inside the function body like any other variable. However, without more work on our part, `school` is a **local variable** that does not exist outside the function body. It's value is lost once the function is done.
- `return` statements are used to tell the function to **terminate execution** and (optionally) what to **_pass back_** to the **_caller_**. Yes, it is possible for a function to return nothing. We'll come back to that in a bit.
- Each function call is considered to be independent of the others. The function body gets a fresh instance of the `school` parameter to work with. 
- Finally, **a function definition has no effect unless it is called.** Without the parentheses to indicate that we are calling it, it is just a software object like anything else. It's [Schrödinger's cat](https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat), waiting for us to open the box to see if it is alive or dead.

### Parameters and Arguments
You may have noticed that we seem to be somewhat inconsistent with what we call the inputs to our functions. Function calls supply input _arguments_ but function definitions declare input _parameters_. If both are inputs why have two names? That's because they are not actually the same thing at all. An argument is any Python expression, while a parameter is a local variable that is assigned the _value_ of the argument. It's the difference between `1+1` and `2`. We have to evaluate `1+1` in order to get the value `2`. 

It is also possible that we might make some parameters optional by supplying **default values**:

In [6]:
def go_team(school = "Fairfield"):
    if school == "Fairfield":
        return "Go Stags!"
    else:
        return "Go home!"

print(go_team("Marist"))
print(go_team("Fairfield"))
print(go_team())

Go home!
Go Stags!
Go Stags!


Here we have set "Fairfield" as the default value for `school`. If no school is specified in the call then the function assumes you meant "Fairfield". 

So far we have only considered functions that have one parameter. When defining a function with multiple parameters, the parameters are declared in a particuar order, with parameters having a default value (e.g., school above) always listed _after_ those without them. When calling the function the inputs that are required (because the corresponding parameters lack defaults) are **positional arguments** because they have to be listed in the same order as the parameter declarations. The optional inputs (with default values) are **named arguments**. 

In [7]:
def go_team(sport, school="Fairfield", gender_modifier=""):
    if school != "Fairfield":
        return "Go home!"
    
    gender_modifier_padded = gender_modifier + " " if gender_modifier else ""

    return "Go "+ gender_modifier_padded + "Stags " + sport +"!"  

print(go_team("Tennis", school="Marist"))
print(go_team("Lacrosse", gender_modifier="Lady"))
print(go_team("Tennis","Fairfield"))
print(go_team("Basketball", gender_modifier="Lady", school = "Fairfield"))
print(go_team("Basketball"))

Go home!
Go Lady Stags Lacrosse!
Go Stags Tennis!
Go Lady Stags Basketball!
Go Stags Basketball!


A few observations (which may take a few passes for you to process):

- The function short-circuits itself, returning "Go home!" for any `school` except "Fairfield". Everything after that is Fairfield-specific.
- Named arguments (`school` and `gender_modifier`) can appear in any order as long as they appear after the positional arguments (`sport`).
- We can also treat named arguments like positional arguments (without `name =` syntax) as long as they appear in the order the parameters were declared.
- We can use the parameters like any other variables (and even change their values).
- However, if we omit the `sport` argument then we get an error. Positional arguments are always required. 

In [8]:
print(go_team(gender_modifier="Lady"))

TypeError: go_team() missing 1 required positional argument: 'sport'

## Void Functions
A function that returns a value is said to be _fruitful_. One that doesn't is a **void function**, which some languages call _subroutines_ or _procedures_. A void function is called not to perform a calculation but to carry out an **action** ("side effect") like printing to the screen or writing to a file. 

In [9]:
def print_go_team(sport, school="Fairfield", gender_modifier=""):
    print(go_team(sport,school,gender_modifier))

print_go_team("Tennis")
print("-----")
print(print_go_team("Tennis"))
print("-----")
print(type(print_go_team("Tennis")))

Go Stags Tennis!
-----
Go Stags Tennis!
None
-----
Go Stags Tennis!
<class 'NoneType'>


Notes:
- A void function returns at the bottom of the function body unless it is short-circuited with a `return` (by itself, without any value) somewhere before that.
- Even a void function actually returns a value. It's just that the value is always `None` with data type `NoneType`.  

## Pro Tips
### Sanitizing Inputs
Sanitizing is a defensive programming technique intended to avoid system crashes and security vulnerabilities. Forgetting to **sanitize** inputs before doing a potentially dangerous calculation is a common security bug. (Actually, in a language like C without memory protection, every calculation was dangerous, so I guess that says something about how insecure paleolithic code can be.)

In practice, the easiest way to sanitize function inputs is to check for invalid input parameters at the top of the function, **before** it does anything else. Let's go back to an example from the Defensive Programming section in Lesson 3, except this time packaged as a function:

In [10]:
def warning_ratio_gt_1(x,y):
    if (y/x > 1):
        print("Warning! y/x > 1")

This function crashes when called with `x` equal to 0:

In [11]:
warning_ratio_gt_1(0,1)

ZeroDivisionError: division by zero

We can of course fix that with a boolean guard. 

In [12]:
def warning_ratio_gt_1(x,y):
    if (x !=0 and y/x > 1):
        print("Warning! y/x > 1")
warning_ratio_gt_1(0,1)

Looks good, right? Not exactly. It crashes if `x` or `y` isn't a number:

In [13]:
warning_ratio_gt_1("1",1)

TypeError: unsupported operand type(s) for /: 'int' and 'str'

The solution is to sanitize `x` and `y` with a conversion to floats. 

In [14]:
def warning_ratio_gt_1(x,y):
    denominator = float(x)
    numerator = float(y)
    if (numerator/denominator > 1):
        print("Warning! y/x > 1")
warning_ratio_gt_1("1",1)

But then how about this?

In [15]:
print(warning_ratio_gt_1("one",1))

ValueError: could not convert string to float: 'one'

Oops. Now we can't just let this one go with a guard or a quick type conversion. We might either i) **throw** a more meaningful `ValueError` or ii) print out a warning message. Either way, we'll need to catch the error before it crashes the code. For that, we'll use the `try ... except` statement described in section 3.7 of the Py4E book. (Go back and reread it to be sure about hos this works.) The following code prints out a warning message instead of throwing an error.

In [16]:
def warning_ratio_gt_1(x,y):
    try:
        denominator = float(x)
        numerator = float(y)
    except: 
        print("Warning: Both arguments must be numbers")
        return
    
    if (numerator/denominator > 1):
        print("Warning! y/x > 1")
warning_ratio_gt_1("one",1)



### Unit Testing
While testing is often seen as more engineering than data science, there are times when we will want to be 100% sure our code is free of any bugs that might throw off our analysis. In practice the more critical and potentially complex a calculation is, the more important it is to encapsulate it into a function that we can test with lots of different input. That is the essence of test-driven development with **unit tests**. 

A unit test is a call for which we know the correct response. If the response does not match what we expect then the test fails. The set of all tests for the code being tested is called the **test suite**. Python has built-in support for unit testing code and, with a little help so does Jupyter. The code below will install and load the ipython-unittest extension (plugin) for Jupyter. Run it to see what it does. 

In [None]:
# Run this cell to install the ipython-unittest library. You can clear the cell output after running. 
!pip install ipython_unittest
!pip install jupyter_dojo
%load_ext ipython_unittest

Let's try this out on the code we just debugged, starting with the first buggy version.

In [18]:
def warning_ratio_gt_1(x,y):
    if y/x > 1:
        print("Warning! y/x > 1")

The test suite is in the cell below. Run it to see the function fail. 

In [19]:
%%unittest
"division by 0"
warning_ratio_gt_1(0,1) == True 
"not a number"
warning_ratio_gt_1("0","1") == True



Fail

EE
ERROR: test_division_by_0 (__main__.JupyterTest)
division by 0
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Cell Tests", line 2, in test_division_by_0
    if y/x > 1:
ZeroDivisionError: division by zero

ERROR: test_not_a_number (__main__.JupyterTest)
not a number
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Cell Tests", line 4, in test_not_a_number
    if y/x > 1:
TypeError: unsupported operand type(s) for /: 'str' and 'str'

----------------------------------------------------------------------
Ran 2 tests in 0.002s

FAILED (errors=2)


<unittest.runner.TextTestResult run=2 errors=2 failures=0>

The function failed as expected. **More importantly, we know how and why it failed. The tests told us!** 

Now let's add in our boolean guard `x != 0 and ...` to the conditional, add sanitizing code to the front of the body, and rerun the tests. 

In [20]:
def warning_ratio_gt_1(x,y):
    try:
        x = float(x)
        y = float(y)
    except: 
        print("Warning: Both arguments must be numbers")
        return
    
    if x!= 0 and y/x > 1:
        print("Warning! y/x > 1")

In [21]:
%%unittest
"division by 0"
warning_ratio_gt_1(0,1) == True 
"not a number"
warning_ratio_gt_1("0","1") == True



Success

..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK


<unittest.runner.TextTestResult run=2 errors=0 failures=0>

Voila. We now know that our code works for the conditions that could crash it. If we ever change the code we can rerun the tests to be sure it works. Or, if we ever get a crash we didn't expect, we can add a new test and debug the code until it passes again. It's like basic hygiene. It's always a good idea to wash your hands before dinner. It likely won't kill you if you don't, but ... you'd rather not find out the hard way, right? 

## Failing Forward
We'll close this lesson with a general comment about the nature of programming. Programming is one of those weird professions where it is best to learn from mistakes. **Nothing ever works at first.** Even if it seems to work, always assume that there are bugs in the code that you haven't discovered yet. 

In Lesson 1 we learned about the Edit/Run/Test cycle:
1. Edit some code. 
2. Run it.
3. Test to see that it worked. If not identify the mistake. 
4. Go back to step 1.

In the hands of a novice, the work mostly happens in step 1. However, the learning (i.e., the thing that makes you a better programmer) happens in **step 3**. You discover that "whenever we do X, this bad thing happens so let's try something else" and that's when you get creative and smart. Until then it's just pounding keys on the keyboard. You don't **know** if your code works or how it is going to fail. So, why not get through steps 1 and 2 as quickly as you can, with **the least amount of code you can possibly test** and then linger a bit in step 3 to see what happened and plan for step 1 again. You'll ultimately go through more cycles but it will eventually take less time and be more likely to succeed. 

All of this leads to a general strategy that we might call **Failing Forward**:
- Instead of trying to write you code all at once, **write code in tiny testable chunks** that will eventually build up into working code that does what you need it to do.
- Use the tests you create as the model for all the things you've learned about how things can fail. **When a new kind of failure happens, embrace it** just as much as you would a nifty new coding tool or library.
- Not all code requires an automated test suite. However, **if you bother to write unit tests, write the tests first**. Your first code after that should fail every test. Then the tests become your TODO list going forward. 
- When writing code that is hard, **start with whatever is most likely to fail**. After all, if you can't solve that then the rest of the code is irrelevant anyway. 



## Exercise

Repackage your `waist2hip_ratio` calculation from Lesson 3 as a function with the following requirements:
- Name the function `w2h_ratio` with three parameters: `waist_inches`, `hip_inches`, `gender`
- The function should return a string with the pattern "For a G with waist W and hip H, the w2h ratio is R, with a shape S.", where G, W, H, R, and S are placeholders for the actual values. (Be sure to use Pythonic names for any variables, however.)
- Round the ratio R to 2 decimal places. 
- The shape S is either "Apple" or "Pear".
- Do not use `input()` to get data. Let that be passed into the function via arguments. 
- `waist_inches` and `hip_inches` can be given as anything that can be converted to `floats`.
- You will need to sanitize your inputs before doing the calculations. The waist and hip measurements should be in inches. The gender needs to be either "M" or "F". 
- If `waist_inches` or `hip_inches` are invalid then return the string "w2h_ratio: Invalid measurement(s)". You may want a `try ... except` statement to prevent runtime errors. 
- If the gender is not "M" or "F" then return the string "w2h_ratio: Unknown gender"

Run (and rerun and rerun and ...) the **unit tests** below to check your work. **Do not modify the tests.** Instead, craft your code to pass them all. The `%%unittest` output will indicate which tests failed.  

In [None]:
# YOUR CODE HERE

**Unit Tests**

In [None]:
# Run this cell once to install the ipython-unittest library for this notebook session. You can clear the cell output after running. 
!pip install ipython_unittest
!pip install jupyter_dojo
%load_ext ipython_unittest

In [None]:
%%unittest
"function signature"
assert w2h_ratio(waist_inches=36,hip_inches=44,gender="M") == 'For a M with waist 36.0 and hip 44.0 the w2h ratio is 0.82 with shape Pear.'
"divide by zero"
assert w2h_ratio(36,0,"M") == "w2h_ratio: Invalid measurement(s)"
"negative waist"
assert w2h_ratio(-36,44,"M") == "w2h_ratio: Invalid measurement(s)"
"negative hip"
assert w2h_ratio(36,-44,"M") == "w2h_ratio: Invalid measurement(s)"
"non-numercial waist"
assert w2h_ratio("36 and a half",44,"M") == "w2h_ratio: Invalid measurement(s)"
"non-numercial hip"
assert w2h_ratio("36","44 and a half","M") == "w2h_ratio: Invalid measurement(s)"
"numerical strings"
assert w2h_ratio("36","44","M") == 'For a M with waist 36.0 and hip 44.0 the w2h ratio is 0.82 with shape Pear.'
"male Apple shape"
assert w2h_ratio("36","38","M") == 'For a M with waist 36.0 and hip 38.0 the w2h ratio is 0.95 with shape Apple.'
"male Pear shape"
assert w2h_ratio("36","42","M") == 'For a M with waist 36.0 and hip 42.0 the w2h ratio is 0.86 with shape Pear.'
"female Apple shape"
assert w2h_ratio("33","36","F") == 'For a F with waist 33.0 and hip 36.0 the w2h ratio is 0.92 with shape Apple.'
"female Pear shape"
assert w2h_ratio("33","44","F") == 'For a F with waist 33.0 and hip 44.0 the w2h ratio is 0.75 with shape Pear.'
"unknown gender"
assert w2h_ratio(36,44,"B") == "w2h_ratio: Unknown gender"