# Lecture 10: Functions #

<h2>
<ul>
    <b><li> Introduce function</li></b>
    <b><li> Apply functions to tables </li></b>
    <li><b> If time, we'll revisit Groups and delve deeper into their use.</li></b>
    
</h2>

In [None]:
from datascience import *
import numpy as np
import warnings
warnings.filterwarnings("ignore")

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
plots.rcParams["patch.force_edgecolor"] = True

#The following allows porting images into a Markdown window
from IPython.display import Image

## Defining Functions ##  

### Example: ###   
### **Create a function that takes a numerical input and triples it:** $\textsf{triple}(x)=3\,x$ ###

In [None]:
def triple(x):
    return 3 * x

In [None]:
triple(3)

**We can also assign a value to a name, and call the function on the name:**

In [None]:
num = 4

In [None]:
triple(num)

In [None]:
triple(num * 5)

## The Anatomy of a Function ##  

<h3>
    
```python
def functionname(Arguments_Parameters_Expressions_or_Values):     
      return return_expression
``` 
</h3>

![title](function-anatomy.png)


## Functions are Type-Agnostic  ## 

<h3> Example:  String Input </h3>

In [None]:
triple('ha')

<h3> Example: Array Input </h3>

In [None]:
np.arange(4)

<h3> Feed the array above into our function <span style='font-family:sans-serif'> <font color='blue'> triple </font> </span> to see what is produced: </h3>

In [None]:
triple(np.arange(4))

<h2> If we're not careful what we feed into a function, there's no telling <i>what</i> it will produce, if anything.  

For example, try to run the <span style='font-family:sans-serif'> <font color='blue'> pow </font> </span> function on the string $\textsf{'ha'}$.  In particular, try $\textsf{'ha'}^3$.
    
    
</h2>

In [None]:
pow('ha',3)  #Recall that pow(a,b) is the same as a ** b

<h2> Discussion </h2>
<h3> 
<ul>
    <li><b> What does the following function do?</b></li>     
    <li><b> What type of input does it take?</b></li>      
    <li><b> What type of output does it produce? </b></li>    
    <li><b> What's a good name for the function?</b></li>
</ul>
</h3>
<h2>
    
```python
def f(s):     
      return np.round(s / sum(s) * 100, 2)
``` 
</h2>

In [None]:
def percent_of_total(s):
    return np.round(s / sum(s) * 100, 2)

In [None]:
first_four=make_array(1,2,3,4)
first_four

In [None]:
sum(first_four)

In [None]:
percent_of_total(first_four)

In [None]:
percent_of_total(make_array(1, 213, 38))

<h2> Functions Can Take Multiple Arguments </h2>

<h3> Example: Calculate the Hypotenuse Length of a Right Triangle </h3>


<h2>
Pythagoras's Theorem: If $x$ and $y$ denote the lengths of the right-angle sides, then the hypotenuse length $h$ satisfies:

$$ h^2 = x^2 + y^2 \qquad \text{which implies}\qquad \hspace{20 pt} h = \sqrt{ x^2 + y^2 } $$

</h2>

In [None]:
def hypotenuse(x,y):
    hypot_squared = (x ** 2 + y ** 2)  #Square of the hypotenuse's length
    hypot = hypot_squared ** 0.5       #Take the square root to compute the hypotenuse's length
    return hypot

In [None]:
hypotenuse(1, 2)

In [None]:
hypotenuse(3, 4)

<h2> We could've typed the body all in one line:</h2>

In [None]:
def hypotenuse(x,y):
    return (x ** 2 + y ** 2) ** 0.5   #Not as readable as the first version

In [None]:
hypotenuse(9, 12)

In [None]:
hypotenuse(3, 4)

![title](345triangle.png)

<h3>
Example: A function that takes the year of birth of a person and produces their age in years.
</h3>

In [None]:
def age(year):
    age = 2021 - year
    return age

In [None]:
age(1942)

<h3> Now add some bells and whistles:  
    
    Take person's name and year of birth (two arguments).  
    
    Produce a sentence that states how old they are. 

</h3>

In [None]:
def name_and_age(name, year):
    return name + ' is ' + str(age(year)) + ' years old.'

In [None]:
name_and_age('Joe', 1942)

<h3> Q. What happens if we don't convert <tt>age(year)</tt> to a string?</h3>

<h2> Apply </h2>

SLIDE

<h3>
    
```python
table_name.apply(function_name, 'column_label(s)')
```
</h3>
   

In [None]:
ages = Table().with_columns(
    'Person', make_array('Jim', 'Pam', 'Michael', 'Creed'),
    'Birth Year', make_array(1985, 1988, 1967, 1904)
)
ages

In [None]:
def cap_at_1980(x):
    return min(x, 1980)

In [None]:
cap_at_1980(1975)

In [None]:
cap_at_1980(1991)

In [None]:
ages.apply(cap_at_1980, 'Birth Year')

In [None]:
def name_and_age(name, year):
    age = 2021 - year
    return name + ' is ' + str(age)

In [None]:
ages.apply(name_and_age, 'Person', 'Birth Year')

Everything above this cell has been ported over from Lec 09

In [None]:
def age(year):
    age = 2020 - year
    return age

In [None]:
age(1942)

In [None]:
def name_and_age(name, year):
    return name + ' is ' + str(age(year))

In [None]:
name_and_age('Joe', 1942)

## Apply

In [None]:
staff = Table().with_columns(
    'Employee', make_array('Jim', 'Dwight', 'Michael', 'Creed'),
    'Birth Year', make_array(1985, 1988, 1967, 1904)
)
staff

In [None]:
def greeting(person):
    return 'Dunder Mifflin, this is ' + person

In [None]:
greeting('Pam')

In [None]:
greeting('Erin')

<h3> Now apply to the whole table</h3>

In [None]:
staff.apply(greeting, 'Employee')

<h3> Now use a function that takes more than one argument </h3>

In [None]:
staff.apply(name_and_age, 'Employee', 'Birth Year') #The order in which we specify the columns is important. 

SLIDE: Sir Francis Galton

## Prediction ##

In [None]:
galton = Table.read_table('galton.csv')
galton

<h3>
    <ul>
    <b><li> <tt> 'family'</tt>: Index Number of the Family (categorical)</li></b>
    <b><li> <tt> 'father'</tt>: Father's Height (inches) </li></b> 
    <b><li> <tt> 'mother'</tt>: Father's Height  (inches) </li></b> 
    <b><li> <tt> 'midparentHeight'</tt>: Weighted Average of the Parents' Heights (inches) </li></b> 
    <b><li> <tt> 'children'</tt>: Number of Children in the Family  </li></b> 
    <b><li> <tt> 'childNum'</tt>: Birth Order of the Child (e.g., 2 means second child)  </li></b> 
    <b><li> <tt> 'childHeight'</tt>: Height of the Child (inches) in Adulthood </li></b>    
    </ul> 
</h3>    

<h2> Let's explore relationship between <tt>midparentHeight</tt> and <tt>childHeight</tt>. </h2>

<h3> What's a good type of plot for this purpose? </h3>

In [None]:
galton.scatter('midparentHeight', 'childHeight')

<h3>
    Suppose we know the <tt>midparentHeight</tt> is 68 inches for a particular family.  
    
    What can we say about the child's height when they turn into an adult?
</h3>

<h3> Let's look at other families whose <tt>midparentHeight</tt> are near 68 inches. </h3>

In [None]:
galton.scatter('midparentHeight', 'childHeight')
#The following lines plot vertical reference lines
#lw denotes line width
plots.plot([67.5, 67.5], [50, 85], color='purple', lw=2)  #lw denotes line width
plots.plot([68.5, 68.5], [50, 85], color='orange', lw=2);

<h3> Now let's calculate the Average Height of the people within the region bounded by the vertical lines. </h3>

<h3> Grab only the rows where midparentHeight is in the interval   
$$67.5 \leq \texttt{midparentHeight} < 68.5.$$

</h3>

In [None]:
# Grab only the rows where midparentHeight is in the interval 67.5 <= midparentHeight < 68.5
nearby = galton.where('midparentHeight', are.between(67.5, 68.5))  

<h3> Now take average height of the children in such families </h3>

<h2> New Function Alert: <tt>mean</tt></h2>

In [None]:
#Take the average 
nearby_mean = nearby.column('childHeight').mean()
nearby_mean

<h3> Now insert this average in the table above.</h3>

In [None]:
galton.scatter('midparentHeight', 'childHeight')
plots.plot([67.5, 67.5], [50, 85], color='purple', lw=2)
plots.plot([68.5, 68.5], [50, 85], color='orange', lw=2)
#We insert a dot of size s=70, of our desired color, as a single-point superimposed on the original plot.
plots.scatter(68, nearby_mean, color='red', s=70);  # s=50 specifies the size of the dot

<h3> Now let's generalize ...</h3>

In [None]:
def predict(midparentHeight):
    nearby = galton.where('midparentHeight', are.between(midparentHeight - 1/2, midparentHeight + 1/2))
    return nearby.column('childHeight').mean()

In [None]:
predict(68)

In [None]:
predict(70)

In [None]:
predict(73)

<h3> Now <tt>apply</tt> our <tt>predict</tt> function to the entire table.</h3>

In [None]:
predicted_heights = galton.apply(predict, 'midparentHeight')
predicted_heights

<h3> Now add the predicted values as a new column to our table.</h3>

In [None]:
galton = galton.with_column('predictedHeight', predicted_heights)
galton

<h3>Q. Do you notice any patterns the <tt>predictedHeight</tt> column above?

<ul> 
    <b><li> Siblings get the same predicted value. </li></b>    
    <b><li> Our function tends to overestimate female heights and underestimate male heights. </li></b>    
</ul>

</h3>



In [None]:
galton.select(
    'midparentHeight', 'childHeight', 'predictedHeight').scatter('midparentHeight')

<h3> When we call <tt>scatter</tt> with only one argument, it plots all other columns against that column specified as an argument to <tt>scatter</tt>.
    
So, here we plot <tt>childHeight</tt> and <tt>predictedHeight</tt> (yellow) against <tt>midparentHeight</tt> (horizontal axis).
    
</h3>

<h2> Prediction Accuracy: How good are our predictions? </h2>

<h3> Write a function that computes the difference between two values.</h3>

In [None]:
def difference(x, y):
    return x - y

<h3> Now <tt>apply</tt> to the <tt>predictedHeight</tt> and <tt>childHeight</tt> columns.</h3>

In [None]:
pred_errors = galton.apply(difference, 'predictedHeight', 'childHeight')
pred_errors

<h3> Add the errors array as a new column to our table.</h3>

In [None]:
galton = galton.with_column('errors',pred_errors)
galton

<h3> Now create a histogram of the errors.</h3>

In [None]:
galton.hist('errors')

<h3> Make a histogram based on gender.</h3>

In [None]:
galton.hist('errors', group='gender')

# Discussion Question

In [None]:
def predict_smarter(h, g):
    nearby = galton.where('midparentHeight', are.between(h - 1/2, h + 1/2))
    nearby_same_gender = nearby.where('gender', g)
    return nearby_same_gender.column('childHeight').mean()

In [None]:
predict_smarter(68, 'female')

In [None]:
predict_smarter(68, 'male')

In [None]:
smarter_predicted_heights = galton.apply(predict_smarter, 'midparentHeight', 'gender')
galton = galton.with_column('smartPredictedHeight', smarter_predicted_heights)

In [None]:
smarter_pred_errs = galton.apply(difference, 'childHeight', 'smartPredictedHeight')
galton = galton.with_column('smartErrors', smarter_pred_errs)

In [None]:
galton.hist('smartErrors', group='gender')

## Grouping by One Column ##

In [None]:
cones = Table.read_table('cones.csv')

In [None]:
cones

In [None]:
cones.group('Flavor')

In [None]:
cones.drop('Color').group('Flavor', np.average)

In [None]:
cones.drop('Color').group('Flavor', min)

## Grouping By One Column: Welcome Survey ##

In [None]:
survey = Table.read_table('welcome_survey_v1.csv')

In [None]:
by_extra = survey.group('Extraversion', np.average)
by_extra

In [None]:
by_extra.select(0,1,2).plot('Extraversion') # Drop the categorical columns

In [None]:
by_extra.select(0,2).plot('Extraversion')

## Lists

In [None]:
[1, 5, 'hello', 5.0]

In [None]:
[1, 5, 'hello', 5.0, make_array(1,2,3)]

## Grouping by Two Columns ##

In [None]:
survey.group(['Handedness','Sleep position']).show()

## Pivot Tables

In [None]:
survey.pivot('Sleep position', 'Handedness')

In [None]:
survey.pivot('Sleep position', 'Handedness', values='Extraversion', collect=np.average)

In [None]:
survey.group('Handedness', np.average)