# Lecture 13 – Functions (Part I)

## Data 6, Summer 2022

In [None]:
from datascience import *

## Motivation

We've seen a few in-built Python functions so far.

In [None]:
int('-14') # Evaluates to -14

In [None]:
abs(-14) # Evaluates to 14

In [None]:
max(-14, 15) # Evaluates to 15

In [None]:
print('zoology') # Prints 'zoology', evaluates to None

We don't currently have a good way to prevent our code from getting repetitive. For example, if we want to determine whether or not different students are ready to graduate:

In [None]:
units_1 = 104
year_1 = 'sophomore'
ready_to_graduate_1 = (year_1 == 'senior') and (units_1 >= 120)
ready_to_graduate_1

In [None]:
units_2 = 121
year_2 = 'senior'
ready_to_graduate_2 = (year_2 == 'senior') and (units_2 >= 120)
ready_to_graduate_2

In [None]:
units_3 = 125
year_3 = 'junior'
ready_to_graduate_3 = (year_3 == 'senior') and (units_3 >= 120)
ready_to_graduate_3

## Functions
Here's a better solution:

In [None]:
... # Write a function called `ready_to_graduate` which returns True 
    # if a student is ready to graduate and False otherwise

In [None]:
ready_to_graduate(year_1, units_1)

In [None]:
ready_to_graduate(year_2, units_2)

In [None]:
ready_to_graduate(year_3, units_3)

By using a function, we only had to write out the logic once, and could easily call it any number of times.

Other function examples:

In [None]:
# This function has one parameter, x.
# When we call the function, the value we pass in
# as an argument will replace x in the computation.

def triple(x):
    return x*3

In [None]:
triple(15)

In [None]:
triple(-1.0)

In [None]:
# Functions can have zero parameters!
def always_true():
    return True

# The body of a function can be
# longer than one line.
def pythagorean(a, b):
    c_squared = a**2 + b**2
    return c_squared**0.5

In [None]:
always_true()

The body of a function **must** be indented. If not, Python will raise an error.

In [None]:
# Good
def square(x):
    return x**2

In [None]:
# Bad
def square(x):
return x**2

### Quick Check 1

Suppose we define the function mystery as follows:

In [None]:
def mystery(t):
    return t + '0'

What will be the values of `alpha`, `beta`, and `charlie` be after running the following code?

In [None]:
alpha = mystery('19')

In [None]:
beta = mystery(19)

In [None]:
charlie = mystery('1' + '9')

In [None]:
alpha

In [None]:
beta

In [None]:
charlie

## Parameter Scope and Return Values

Functions can use names that are defined within that function. They can also use names defined in the 'global' frame. But if you try to use a name defined in a function _outside_ of that function, Python will cause an error.

### Function Scope

In [None]:
def eat(zebra):
    return 'ate ' + zebra

In [None]:
eat('lionel')

In [None]:
zebra

Notice that we can't use the name `zerbra` outside of the `eat` function because it hasn't been defined in that scope.

However, functions _can_ use names defined outside of the function. Pay attention to what happens when the function `half` is called.

In [None]:
N = 15
def half(N):
    return N/2

In [None]:
half(0)

In [None]:
half(12)

In [None]:
half(N)

In [None]:
N = 15
def addN(x):
    return x + N

In [None]:
addN(0)

In [None]:
addN(3)

Recall the `triple` function from earlier in the notebook.

In [None]:
triple(15)

Remember that when calling a function, Python first evaluates the operands to determine the arguments. If evaluating the operands causes an error, the Python will error before the function is called.

In [None]:
triple(1/0)

And if you give a function more (or fewer) arguments than it expects, it will also cause an error.

In [None]:
triple(3, 4)

Some functions, like `print` or `min` can take an arbitrary number of arguments.

In [None]:
print('my', 'name', 'is', 300)

### Returning

Functions are _not_ required to return anything. If you don't specify a return value, the default value is `None`. `print` is an example of a function that doesn't return anything.

In [None]:
def add_and_print(a, b):
    total = a + b
    print(total)

In [None]:
total = add_and_print(3, 4)

In [None]:
total

In [None]:
print(total)

Notice that nothing after the `return` keyword is run.

In [None]:
def odd(n):
    return n % 2 == 1
    print('this will never be printed!')

In [None]:
odd(15)

In [None]:
odd(2)

### Quick Check 2

What is the value of `total` after running this code?

In [None]:
total = 3
def square_and_cube(a, b):
    return a**2 + total**b
total = square_and_cube(1, 2)

In [None]:
total

## String Methods

Methods are functions that are called on objects. The syntax for calling a method is `obj.method()`, where `obj` is the object you're calling the method on. This is called 'dot notation'.

We have already seen table methods. Strings also have their own methods, including `str.upper()`, `str.lower()`, and `str.replace()`.

In [None]:
'ian'.upper()

In [None]:
s = 'JuNiOR12'
s.upper()

In [None]:
s.lower()

In [None]:
s.replace('i', 'iii')

Another useful string method is `str.split(separator)`, which separates (or splits) parts of the string `str` into separate elements in a list based on the `separator` string. For example: 

In [None]:
my_string = 'James is awesome.'
... # Split `my_string into three different strings

In [None]:
joshs_string = 'No, Josh is awesome.'
... # Split `joshs_string` into two different strings

The 'opposite' of `str.split(separator)` is `str.join(arr_of_strings)`. This functionis used to join together the strings in the array `arr_of_strings` into a single string separated by `str`.

In [None]:
hello = 'hello'
world = 'world!'
... # Join `hello` and `world` to create a single string 'hello, world!'

In [None]:
james_strings = 'James is awesome'.split()
print(james_strings)
... # Join the elements of `james_strings` together

## Demo

Let's load in data from Wikipedia on countries around the world. Our original data is fairly messy, so we will need to clean it ourselves.

In [None]:
from datascience import *
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import numpy as np

data = Table.read_table('data/countries.csv')
data = data.take(np.arange(0, data.num_rows - 1))
data = data.relabeled('Country(or dependent territory)', 'Country') \
           .relabeled('% of world', '%') \
           .relabeled('Source(official or UN)', 'Source')
data = data.with_columns(
    'Country', data.apply(lambda s: s[:s.index('[')].lower() if '[' in s else s.lower(), 'Country'))

def first_letter(s):
    return s[0]

def last_letter(s):
    return s[-1]

In [None]:
data

Let's look at the `'Population'` column.

In [None]:
# ignore
china_pop = data.column('Population').take(0)

In [None]:
china_pop

We want these numbers to be integers, so that we can do arithmetic with them or plot them. However, right now they are not.

Let's write a function that takes in a string with that format, and returns the corresponding integer. But first, proof that the `int` function doesn't work here (it doesn't like the commas):

In [None]:
int(china_pop)

In [None]:
china_pop

In [None]:
... # Write a function `clean_population_string` to remove the commas from a numerical string and convert to an int

In [None]:
china_pop_clean = clean_population_string(china_pop)
china_pop_clean

Cool!

Using techniques we haven't yet learned, we can apply this function to every element of the `'Population'` column, so that when we visualize it, things work.

In [None]:
# Just run this cell
data = data.with_columns('Population', data.apply(clean_population_string, 'Population'))

In [None]:
data

The `'%'` column is also a little fishy.

In [None]:
china_pct = data.column('%').take(0)
china_pct

Percentages should be floats, but here they're strings.

Let's suppose we want to have the **proportion of the total global population that lives in a given country** as a column in our table. Proportions are decimals/fractions between 0 and 1. We can do this two ways:
- write a function, similar to `clean_population_string`, that correctly extracts the proportion we need
- calculate this by hand using all of the values in `'Population'`

Let's do... both!

In [None]:
... # Write a function `clean_pct_string to convert a percent string into a demical

In [None]:
clean_pct_string(china_pct)

Nice! The other way requires adding together all of the values in the `'Population'` column. We haven't covered how to do that just yet, so ignore the code for it and assume it does what it should.

In [None]:
total_population = data.column('Population').sum()
total_population

Assume this is the total population of the world. How would you calculate the proportion of people living in one country?

In [None]:
... # Create a function `compute_proportion` to compute the proportion of people living in the inputted country

In [None]:
china_pop_clean

In [None]:
compute_proportion(china_pop_clean)

Pretty close to `clean_pct_string(china_pct)`. The difference is likely due to some countries not being included in one column or the other.

Hopefully this gives you a glimpse of the power of functions!