# Functions and Apply

Welcome to lab 11! This week, we'll learn about functions, and how we can combine them with tables using the `apply` method.

First, set up the tests and imports by running the cell below.

In [None]:
import numpy as np
from datascience import *

# These lines set up graphing capabilities.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
Table.interactive_plots()

import otter
grader = otter.Notebook()

## 1. Functions and CEO Incomes

Let's start with a real data analysis task.  We'll look at the 2015 compensation of CEOs at the 100 largest companies in California.  The data were compiled for a Los Angeles Times analysis [here](http://spreadsheets.latimes.com/california-ceo-compensation/), and ultimately came from [filings](https://www.sec.gov/answers/proxyhtf.htm) mandated by the SEC from all publicly-traded companies.  Two companies have two CEOs, so there are 102 CEOs in the dataset.

We've copied the data in raw form from the LA Times page into a file called `raw_compensation.csv`.  (The page notes that all dollar amounts are in millions of dollars.)

In [None]:
raw_compensation = Table.read_table('raw_compensation.csv')
raw_compensation

**Question 1.** We want to compute the average of the CEOs' pay. Try running the cell below.

In [None]:
np.average(raw_compensation.column("Total Pay"))

You should see an error. Let's examine why this error occured by looking at the values in the "Total Pay" column. Check out the first value in the column. What data type is it? Assign the value's datatype to total_pay_type. If you're not sure, you may want to use the `type` function.

In [None]:
# Recall that type(1) == int, type("a") == str, et cetera

total_pay_type = ...
total_pay_type

In [None]:
grader.check('q1_1')

**Question 2.** You should have found that the values in "Total Pay" column are strings (text). It doesn't make sense to take the average of text values, so we need to convert them to numbers if we want to do this. 

So, let's do this first with an example. Extract the first value in the "Total Pay" column.  It should be Mark Hurd's pay in 2015, in *millions* of dollars.  Call it `mark_hurd_pay_string`.

In [None]:
mark_hurd_pay_string = ...
mark_hurd_pay_string

In [None]:
grader.check('q1_2')

**Question 3.** Convert `mark_hurd_pay_string` to a number in *dollars*.  

The string method `strip` will be useful for removing the dollar sign; it removes a specified character from the start or end of a string.  For example, the value of `"100%".strip("%")` is the string `"100"`.  

You'll also need the function `float`, which converts a string that looks like a number to an actual number, a float. Lastly, remember that the **answer should be in dollars, not millions of dollars,** so you should do some arithmetic.

*Hint:* Remember your data types! They matter in converting values like this. What is the order of steps we should take?

In [None]:
mark_hurd_pay = ...
mark_hurd_pay

In [None]:
grader.check('q1_3')

To compute the average pay, we need to do this process of converting a string of millions of dollars into a float in dollars for every CEO.  But that looks like it would involve copying this code 102 times.

This is where functions come in.  First, we'll define a new function, giving a name to the expression that converts "total pay" strings to numeric values.  Later in this lab, we'll see the payoff: we can call that function on every pay string in the dataset at once.

**Question 4.** **Copy the expression** you used to compute `mark_hurd_pay` as the `return` expression of the function below, but replace the specific `mark_hurd_pay_string` with the generic `pay_string` name specified in the first line of the `def` statement. 

When defining a function, the argument (in this case pay_string) is a **generic variable**. When we call the function, it's like setting that variable equal to the value in the function call. 

It's similar to this process: f(x) = x^2

f(2) = f(x=2) = 2^2 

In [None]:
def convert_pay_string_to_number(pay_string):
    """Converts a pay string like '$100' (in millions of dollars) to a number (in dollars)."""
    return ...

In [None]:
grader.check('q1_4')

Running that cell doesn't convert any particular pay string. Instead, it creates a function called `convert_pay_string_to_number` that can convert any string with the right format to a number representing millions of dollars. That function, `convert_pay_string_to_number`, now has a value in your notebook, similar to any other function we've seen before like `max`, `sum`, or `min`.

We can call our function just like we call the built-in functions we've seen. It takes one argument, a string, and it returns a number.

In [None]:
convert_pay_string_to_number('$42')

In [None]:
convert_pay_string_to_number(mark_hurd_pay_string)

In [None]:
# We can also compute Safra Catz's pay in the same way, but we'll need to get her pay first. 
safra_catz_pay_string = raw_compensation.where("Name", are.containing("Safra")).column("Total Pay").item(0)
convert_pay_string_to_number(safra_catz_pay_string)

What have we gained?  Well, without the function, we'd have to copy your code that calculated Mark Hurd's pay string, but change the specific pay string to another CEO's, each time we wanted to do a conversion. Now, instead, we can just call a function whose name says exactly what it's doing.

Soon, we'll see how to apply this function to every pay string in a single expression. First, let's write some more functions.

## 2. Defining functions

Let's write a very simple function that converts a proportion to a percentage by multiplying it by 100.  For example, the value of `to_percentage(.5)` should be the number 50.  (No percent sign.)

A function definition has a few parts.

##### `def`
It always starts with `def` (short for **def**ine):

    def

##### Name
Next comes the name of the function.  Let's call our function `to_percentage`.
    
    def to_percentage

##### Signature
Next comes something called the *signature* of the function.  This tells Python how many arguments your function should have, and what names you'll use to refer to those arguments in the function's code.  `to_percentage` should take one argument, and we'll call that argument `proportion` since it should be a proportion. This argument, when used in the body of the function, is going to be called a *generic variable*, since calling the function with an argument will use the value of the argument as the value of the generic variable.

    def to_percentage(proportion)

We put a colon after the signature to tell Python it's over.

    def to_percentage(proportion):

##### Documentation
Functions can do complicated things, so you should write an explanation of what your function does.  For small functions, this is less important, but it's a good habit to learn from the start.  Conventionally, Python functions are documented by writing a triple-quoted string:

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
    
    
##### Body
Now we start writing code that runs when the function is called.  This is called the *body* of the function.  We can write anything we could write anywhere else.  First, let's give a name to the number we multiply a proportion by to get a percentage.

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
        factor = 100

##### `return`
The special instruction `return` in a function's body tells Python to make the value of the function call equal to whatever comes right after `return`.  We want the value of `to_percentage(.5)` to be the proportion .5 times the factor 100, so we write:

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
        factor = 100
        return proportion * factor

**Question 1.** Define `to_percentage` in the cell below.  Call your function to convert the proportion .2 to a percentage.  Name that percentage `twenty_percent`. Do **not** just hard code `twenty_percent` to the corresponding percentage for the proportion 0.2.

In [None]:
# define the to_percentage function below, replacing the ellipses:
def ...(...):
    """..."""
    factor = ...
    return ...

twenty_percent = ...
twenty_percent

In [None]:
grader.check('q2_1')

Like the built-in functions, you can use named values as arguments to your function.

**Question 2.** Use `to_percentage` again to convert the proportion named `a_proportion` (defined below) to a percentage called `a_percentage`.

*Note:* You don't need to define `to_percentage` again!  Just like other named things, functions stick around after you define them.

In [None]:
a_proportion = 2**(.5) / 2
a_percentage = ...
a_percentage

In [None]:
grader.check('q2_2')

Here's something important about functions: the names assigned within a function body are only accessible within the function body. Once the function has returned, those names are gone.  So even though you defined `factor = 100` inside `to_percentage` above and then called `to_percentage`, you cannot refer to `factor` anywhere except inside the body of `to_percentage`:

In [None]:
# You should see an error when you run this.  (If you don't, you might
# have defined factor somewhere above.)
factor

As we've seen with the built-in functions, functions can also take strings (or arrays, or tables) as arguments, and they can return those things, too.

**Question 3.** Define a function called `disemvowel`.  It should take a single string as its argument.  (You can call that argument whatever you want.)  It should return a copy of that string, but with all the characters that are vowels removed.  (In English, the vowels are the characters "a", "e", "i", "o", and "u".)

For example: "go bears" would become "g brs".

*Hint:* To remove all the "a"s from a string, you can use `that_string.replace("a", "")`.  And you can call `replace` multiple times by using multiple method calls, as we have done with tables in the past.

In [None]:
def disemvowel(a_string):
    ...
    ...

# An example call to your function.  (It's often helpful to run
# an example call from time to time while you're writing a function,
# to see how it currently works.)
disemvowel("Can you read this without vowels?")

In [None]:
grader.check('q2_3')

##### Calls on calls on calls
Just as you write a series of lines to build up a complex computation, it's useful to define a series of small functions that build on each other.  Since you can write any code inside a function's body, you can call other functions you've written.

If a function is a like a recipe, defining a function in terms of other functions is like having a recipe for cake telling you to follow another recipe to make the frosting, and another to make the sprinkles.  This makes the cake recipe shorter and clearer, and it avoids having a bunch of duplicated frosting recipes.  It's a foundation of productive programming.

For example, suppose you want to count the number of characters *that aren't vowels* in a piece of text.  One way to do that is this to remove all the vowels and count the size of the remaining string.

**Question 4.** Write a function called `num_non_vowels`.  It should take a string as its argument and return a number.  The number should be the number of characters in the argument string that aren't vowels.

*Hint:* The function `len` takes a string as its argument and returns the number of characters in it.

In [None]:
def num_non_vowels(a_string):
    """The number of characters in a string, minus the vowels."""
    ...

In [None]:
grader.check('q2_4')

**One note:** Functions can also encapsulate code that *does things* rather than just computing values. For example, if you call print inside a function, and then call that function, something will get printed. You don't necessarily need a return statement at the end of your function, but if you do not include one, the output is essentially gone forever.

## 3. `apply`ing functions

Defining a function is a lot like giving a name to a value with `=`.  In fact, a function is a value just like the number 1 or the text "the"!

For example, we can make a new name for the built-in function `max` if we want:

In [None]:
our_name_for_max = max
our_name_for_max(2, 6)

The old name for `max` is still around:

In [None]:
max(2, 6)

Try just writing `max` or `our_name_for_max` (or the name of any other function) in a cell, and run that cell.  Python will print out a (very brief) description of the function.

In [None]:
max

Why is this useful?  Since functions are just values, it's possible to pass them as arguments to other functions.  Here's a simple but not-so-practical example: we can make an array of functions.

In [None]:
make_array(max, np.average, are.equal_to)

Here's a simpler example that's actually useful: the table method `apply`.

`apply` calls a function many times, once on *each* element in a column of a table.  It produces an array of the results.  Here we use `apply` to convert every CEO's pay to a number, using the function you defined:

In [None]:
raw_compensation.apply(convert_pay_string_to_number, "Total Pay")

Here's an illustration of what that did:

<img src="apply.png"/>

Note that we didn't write something like `convert_pay_string_to_number()` or `convert_pay_string_to_number("Total Pay")`.  The job of `apply` is to call the function we give it, so instead of calling `convert_pay_string_to_number` ourselves, we just write its name as an argument to `apply`.

**Question 2.** Using `apply` and your function `convert_pay_string_to_number`, convert all of the pay strings in the "Total Pay" column of `raw_compensation` to an array of floats in number of dollars. Assign this value to `converted_pay_array`, and then make a table called `compensation` that's a copy of `raw_compensation` with one more column called "Total Pay (\$)".

*Note:* `tbl.with_column` will replace an existing column in the table if the new column has the same label as the existing column.

In [None]:
converted_pay_array = ...
compensation = raw_compensation.with_column(
    "Total Pay ($)",
    ...
compensation

In [None]:
grader.check('q3_2')

The reason why `apply` is so great is that the actual process of doing this involves iteration and a for loop, which is something we've seen before but unnecessary now. This gives the same output as `apply` above, but in a much more complicated format:

In [None]:
converted_pay_array_loop = make_array()
for i in raw_compensation.column("Total Pay"):
    value = convert_pay_string_to_number(i)
    converted_pay_array_loop = np.append(converted_pay_array_loop, value)
    
converted_pay_array_loop

Now that we have the pay in numbers, we can compute things about them.

**Question 3.** Compute the average total pay of the CEOs in the dataset.

In [None]:
average_total_pay = ...
average_total_pay

In [None]:
grader.check('q3_3')

## Submission

You're done with this lab!

To submit this notebook, please download your notebook as a .ipynb file and submit to Gradescope. You can do so by navigating to the toolbar at the top of this page, clicking File > Download as... > Notebook (.ipynb). Then, go to our class's Gradescope page [here](https://www.gradescope.com/courses/136698) and upload your file under "Lab 11." 

To check your work for all autograded questions, run the cell below. 

It's fine to submit multiple times, but we will only grade the final notebook you submit for each assignment. Make sure you pass all tests to receive credit.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
grader.check_all()