In [14]:
from datascience import *
import matplotlib
matplotlib.use('Agg', warn=False)
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import numpy as np

### Functions and Tables ###

We are building up a useful inventory of techniques for identifying patterns and themes in a data set by using functions already available in Python. We will now explore a core features of the Python programming language: function definition.

We have used functions extensively already in this text, but never defined a function of our own. The purpose of defining a function is to give a name to a computational process that may be applied multiple times. There are many situations in computing that require repeated computation. For example, it is often the case that we want to perform the same manipulation on every value in a column of a table.

### Defining a Function ###

A function is defined in Python using a `def` statement, which is a multi-line statement that begins with a header line giving the name of the function and names for the arguments of the function. The header line ends with a colon. The rest of the def statement, called the body, must be indented below the header.

A function expresses a relationship between its inputs (called arguments) and its outputs (called return values). The number of arguments required to call a function is the number of names that appear within parentheses in the `def` statement header. The values that are returned depend on the body. 

Whenever a function is called, its body is executed. Whenever a return statement within the body is executed, the call to the function completes and the value of the return expression is returned as the value of the function call.

The definition of the `double` function below simply doubles a number.

In [15]:
# Our first function definition

def double(x):
    """ Double x """
    return 2*x

The primary difference between defining a `double` function and simply evaluating its return expression, `2*x`, is that when a function is defined, its return expression is *not* immediately evaluated. It cannot be, because the value for `x` is not yet defined. Instead, the return expression is evaluated whenever this `double` function is *called* by placing parentheses after the name `double` and placing an expression to compute its argument in parentheses.

In [16]:
double(17)

34

In [17]:
double(-0.6/4)

-0.3

The two expressions above are all *call expressions*. In the second one, the value of `-0.6/4` is computed and then passed as the argument named `x` to the `percent` function. 

When the `percent` function is called in this way, its body is executed. The body of `percent` has only a single line:

`return 2*x` 

Executing this *`return` statement* completes execution of the `double` function's body and computes the value of the call expression.

The same result is computed by passing a named value as an argument. The `double` function does not know or care how its argument is computed or stored; its only job is to execute its own body using the arguments passed to it.

In [18]:
any_name = 42
double(any_name)

84

Calling a function involves executing its body — everything that's indented after the first line. The names in parentheses in the first line refer to the argument values provided in the call. For example, in this def statement, `x` is the name for the argument that is passed to `double`, such as 3 in the call `double(3)`. 

Only statements in the body of double can refer to this `x`. The technical terminology is that `x` has local scope, which means that the name `x` only refers to the argument of `double` while the body of `double` is being executed.

Therefore the name `x` isn't recognized outside the body of the function, even though we have called `double` in the cells above.

In [1]:
x

NameError: name 'x' is not defined

**Docstrings.** A well-composed function has a name that evokes its behavior, as well as a *docstring* — a description of its behavior and expectations about its arguments. The docstring can also show example calls to the function, where the call is preceded by `>>>`.

A docstring can be any string that immediately follows the header line of a `def` statement. Docstrings are typically defined using triple quotation marks at the start and end, which allows the string to span multiple lines. The first line is conventionally a complete but short description of the function, while following lines provide further guidance to future users of the function.

Here is a definition of a `percent` function that takes two arguments. The definition includes a docstring.

In [20]:
# A function with more than one argument

def percent(x, total):
    
    """Convert x to a percentage of total, 
    by dividing x by total and then multiplying by 100; then
    round the result to two decimal places.
    
    >>> percent(4, 16)
    25.0
    >>> percent(1, 6)
    16.67
    """
    return round((x/total)*100, 2)

In [21]:
percent(33, 200)

16.5

Contrast the function `percent` defined above with the function `percents` defined below. The latter takes an array as its argument, and converts all the numbers in the array to percents out of the total of the values in the array. The percents are all rounded to two decimal places, this time replacing `round` by `np.round` because the argument is an array and not a number.

In [22]:
def percents(array_x):
    
    """ Convert the values in array_x to percents 
    out of the total of array_x"""
    
    total = array_x.sum()
    return np.round((array_x/total)*100, 2)

The function `percents` returns an array of percents that add up to 100 apart from rounding.

In [23]:
some_array = make_array(7, 10, 4)
percents(some_array)

array([ 33.33,  47.62,  19.05])

Functions are called by placing argument expressions in parentheses after the function name. Any function that is defined in isolation is called in this way. You have also seen examples of functions that are called using dot notation, such as `some_table.sort(some_label)`, but functions that you define will always be called using the function name first, passing in all of its arguments.