# Functions

In [None]:
from datascience import *
from cs104 import *
import numpy as np

%matplotlib inline

## 1. Overlaid Histograms

Here's a dataset of adults and the heights of their parents. 

In [None]:
heights_original = Table().read_table('data/galton.csv')
heights_original.show(5)

Let's focus on the female adult children first.

In [None]:
heights = heights_original.where('gender', 'female').select('father', 'mother', 'childHeight')
heights = heights.relabeled('childHeight', 'daughter')
heights

In [None]:
plot = heights.hist('daughter')
plot.set_xlabel("Daughter height (inches)")

What are some common heights we can see from the histogram above? 

A concentration around 63 inches (5'4"). 

In [None]:
plot = heights.hist('mother')
plot.set_xlabel("Mother height (inches)")

Recall, we can use *overlaid histograms* to compare the distribution of two variables that have the same units.  (*Note:* We've seen how to group by a column containing a categorical.  It's also possible to just call `hist` with multiple column names to produce an overlaid histogram.)

In [None]:
plot = heights.hist('daughter', 'mother')
plot.set_xlabel('Height (inches)')

In [None]:
plot = heights.hist()
plot.set_xlabel('Height (inches)')

We can specify bins ourselves to make them have a different width and number. 

In [None]:
our_bins = np.arange(55, 81, 1)
print("bins=", our_bins)
plot = heights.hist(bins=our_bins)
plot.set_xlabel('Height (inches)')

Let's now take a different slice of the original data

In [None]:
heights_original.show(3)

In [None]:
heights = heights_original.select('father', 'mother', 'childHeight').relabeled('childHeight', 'child')
heights

In [None]:
plot = heights.hist(bins=np.arange(55,80,2))
plot.set_xlabel('Height (inches)')

**Question:** Why is the maximum height of a bar for child smaller than that for mother or father?

**A:** We have a larger spread because we have both male and female children. 

## 2. Functions

We use functions all the time.  They do computation for us without us describing every single step.  That saves us time -- we don't have to write the code -- and let's us perform those operations without even caring how they are implemented.  Example: `max`: we have an idea of how we'd take the maximum of a list of numbers, but we can just use that function in Python without explicitely describing how it works.

Can we do the same for other computations?  Yes!  It's a core principle of programming: define functions for tasks you do often so you never have to repeat writing the code.

### Defining and calling our own functions

In [None]:
def double(x):
    """ Double x """
    return 2*x

In [None]:
double(5)

In [None]:
double(double(5))

Scoping: parameter only "visible" inside the function definition

In [None]:
x #should throw an error

In [None]:
double(5/4)

In [None]:
y = 5
double(y/4)

In [None]:
x # we still can't access the parameter

In [None]:
x = 1.5
double(x)

In [None]:
x

What happens if I double an array?

In [None]:
double(make_array(3,4,5))

What happens if I double a string?

In [None]:
double("string")

In [None]:
5*"string"

### More functions

**Think-pair-share**: 
1. What is this function below doing? 
2. How would you rewrite this function (the name of the function, the docstring, the parameters) in order to make it more clear? 

In [None]:
def f(s):
    total = sum(s)
    return np.round(s / total * 100, 2)

**A: Always** use meaningul names.

In [None]:
def percents(counts):
    """Convert the counts to percents out of the total."""
    total = sum(counts)
    return np.round(counts / total * 100, 2)

Note that we have a local variable `total` in our definition....

In [None]:
f(make_array(2, 4, 8, 6, 10))

In [None]:
percents(make_array(2, 4, 8, 6, 10))

Remember scoping here too!

In [None]:
total

### Accessing *global* variables

*Global* variables are defined outside any function -- we've been using them all along.  You can access global variables that you have defined inside your functions.  **Always** define globals before functions that use them to avoid confusion and surprising results when you rerun your whole notebook!  

In [None]:
heights.show(3)

In [None]:
def children_under_height(height):
    """Proportion of children in our data set that are no taller than the given height."""    
    return heights.where("child", are.below_or_equal_to(height)).num_rows / heights.num_rows

In [None]:
children_under_height(65)

### Functions with more than one parameter

We can add functions with more than one parameter. 

In [None]:
#original function
def percents(counts):
    """Convert the counts to percents out of the total."""
    total = sum(counts)
    return np.round(counts / total * 100, 2)

In [None]:
#function with two parameters
def percents_two_params(counts, decimals_to_round):
    """Convert the counts to percents out of the total."""
    total = sum(counts)
    return np.round(counts / total * 100, decimals_to_round)

In [None]:
counts = make_array(2, 4, 8, 6, 10)

In [None]:
percents(counts)

In [None]:
percents_two_params(counts, 2)

In [None]:
percents_two_params(counts, 1)

In [None]:
percents_two_params(counts, 0)

In [None]:
percents_two_params(counts, 3)

Let's write a function that given the unique id of an observation (a row) gives us the value of a particular column.

In [None]:
heights_id = heights.with_columns('id', np.arange(heights.num_rows))
heights_id.show(5)

In [None]:
def find_a_value(table, observation_id, column_name): 
    return table.where('id', are.equal_to(observation_id)).column(column_name).item(0)

In [None]:
find_a_value(heights_id, 2, 'mother')

In [None]:
find_a_value(heights_id, 200, 'mother')

Great! Now we can keeping using a function *we wrote* throughout this class to speed up work in the same way we're using functions built-in to Python, e.g. `max`, or the datascience package, e.g. `.take()`

## 3. Apply

There are times we want to perform mathematical operations columns of the table but can't use array broadcasting...

In [None]:
min(make_array(70, 73, 69), 72) #should be an error

In [None]:
def cut_off_at_72(x):
    """The smaller of x and 72"""
    return min(x, 72)

In [None]:
cut_off_at_72(62)

In [None]:
cut_off_at_72(72)

In [None]:
cut_off_at_72(78)

The table `apply` method applies a function to every entry in a column.

In [None]:
heights

In [None]:
heights.hist('child')

In [None]:
cut_off = heights.apply(cut_off_at_72, 'child')
height2 = heights.with_columns('child', cut_off)

In [None]:
height2.hist('child')

Like we did with variables, we can call functions and their types. In Python, `help` prints out the docstring of a function.

In [None]:
cut_off_at_72

In [None]:
type(cut_off_at_72)

In [None]:
help(cut_off_at_72)

### Apply with multiple columns

In [None]:
heights.show(6)

In [None]:
parent_max = heights.apply(max, 'mother', 'father')
parent_max.take(np.arange(0, 6))

In [None]:
def average(x, y):
    """Compute the average of two values"""
    return (x+y)/2

In [None]:
parent_avg = heights.apply(average, 'mother', 'father')
parent_avg.take(np.arange(0, 6))