## Creating Functions

Functions are packaged routines that can be called over and over again. Functions also take parameters as input (e.g., a data set), and produce an output. The function myarray.mean() is taking `myarray` as input and produces the array's mean as output. It's as simple as that. 

In this lesson,
we'll learn how to write a function
so that we can repeat several operations with a single command. This will save you loads of time during your analysis. E.g., when performing the same routins (plotting, calculating stats) on several columns or data sets.

#### Objectives

*   Define a function that takes parameters.
*   Return a value from a function.
*   Test and debug a function.
*   Explain what a call stack is, and trace changes to the call stack as functions are called.
*   Set default values for function parameters.
*   Explain why we should divide programs into small, single-purpose functions.

### Defining a Function

Let's start by defining a function `fahr_to_kelvin` that converts temperatures from Fahrenheit to Kelvin. 

Functions are declated (defined) using the keyword `def`, followed by the function name (`fahr_to_Kelvin`), followed by the parameters (the values in brackets, here `temp`). This function takes as input a temperature (in degree Fahrenheit) and returns as output the equivalent in degree Kelvin. The keyword `return` indicates the value that gets returned. 

Note the intendation. Like in for-loops and if-then-else conditions, everthing *inside* the function is indented by exactly 1 tab---including the `return`, which has to be last line of every function. (there are cases where a function can have multiple `return` keywords. We will figure that out later. 


In [None]:
def fahr_to_kelvin(temp):
    return ((temp - 32) * (5/9)) + 273.15
print( 'freezing point of water:', fahr_to_kelvin(32))
print( 'boiling point of water:', fahr_to_kelvin(212))

Yay! (By the way, this simple program would have had a bug in Python 2, but we're using Python 3, so no worries. If you keep programming you'll have lots more opportunities to encounter bugs.)

### Composing Functions

Now that we know how to turn Fahrenheit into Kelvin,
it's easy to turn Kelvin into Celsius:

In [None]:
def kelvin_to_celsius(temp):
    return temp - 273.15

print( 'absolute zero in Celsius:', kelvin_to_celsius(0.0) )

What about converting Fahrenheit to Celsius?
We could write out the formula,
but we don't need to.
Instead,
we can [compose](./gloss.html#function-composition) the two functions we have already created:

In [None]:
def fahr_to_celsius(temp):
    temp_k = fahr_to_kelvin(temp)
    result = kelvin_to_celsius(temp_k)
    return result

print( 'freezing point of water in Celsius:', fahr_to_celsius(32.0))

This is our first taste of how larger programs are built:
we define basic operations,
then combine them in ever-large chunks to get the effect we want.
Real-life functions will usually be larger than the ones shown here&mdash;typically half a dozen to a few dozen lines&mdash;but
they shouldn't ever be much longer than that,
or the next person who reads it won't be able to understand what's going on.

#### Challenges

1.  "Adding" two strings produces their concatention:
    `'a' + 'b'` is `'ab'`.
    Write a function called `fence` that takes two parameters called `original` and `wrapper`
    and returns a new string that has the wrapper character at the beginning and end of the original:

    ~~~python
    print( fence('name', '*'))
    *name*
    ~~~



In [None]:

print(fence('name','*'))

2.  If the variable `s` refers to a string,
    then `s[0]` is the string's first character
    and `s[-1]` is its last.
    Write a function called `outer`
    that returns a string made up of just the first and last characters of its input:

    ~~~python
    print( outer('helium') )
    hm
    ~~~

In [None]:

print( outer('helium') )


# Additional material: Testing and Documenting

Once we start putting things in functions so that we can re-use them,
we need to start testing that those functions are working correctly.
To see how to do this,
let's write a function to center a dataset around a particular value:

In [None]:
def center(data, desired):
    return (data - data.mean()) + desired

We could test this on our actual data,
but since we don't know what the values ought to be,
it will be hard to tell if the result was correct.
Instead,
let's use NumPy to create a matrix of 0's
and then center that around 3:

In [None]:
z = numpy.zeros((2,2))
print( center(z, 3) )

That looks right,
so let's try `center` on some real data. Make sure that you have uploaded 'inflammation-01.csv' to the same place as the Notebook so that it can find it. Don't worry about `numpy` at the moment - we'll cover that later. The first line is reading in all the data from the CSV file, and putting it into an array.

In [None]:
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
print( center(data, 0) )

It's hard to tell from the default output whether the result is correct,
but there are a few simple tests that will reassure us.

(Again - don't worry about the statistics, we'll cover them later - but come back and have a look when you know what mean and standard deviation are!)

In [None]:
print( 'original min, mean, and max are:', data.min(), data.mean(), data.max())
centered = center(data, 0)
print( 'min, mean, and and max of centered data are:', centered.min(), centered.mean(), centered.max())

That seems almost right:
the original mean was about 6.1,
so the lower bound from zero is how about -6.1.
The mean of the centered data isn't quite zero&mdash;we'll explore why not in the challenges&mdash;but it's pretty close.
We can even go further and check that the standard deviation hasn't changed:

In [None]:
print( 'std dev before and after:', data.std(), centered.std() )

Those values look the same,
but we probably wouldn't notice if they were different in the sixth decimal place.
Let's do this instead:

In [None]:
print( 'difference in standard deviations before and after:', data.std() - centered.std() )

Again,
the difference is very small.
It's still possible that our function is wrong,
but it seems unlikely enough that we should probably get back to doing our analysis.
We have one more task first, though:
we should write some [documentation](./gloss.html#documentation) for our function
to remind ourselves later what it's for and how to use it.

The usual way to put documentation in software is to add [comments](./gloss.html#comment) like this:

In [None]:
# center(data, desired): return a new array containing the original data centered around the desired value.
def center(data, desired):
    return (data - data.mean()) + desired

There's a better way, though.
If the first thing in a function is a string that isn't assigned to a variable,
that string is attached to the function as its documentation:

In [None]:
def center(data, desired):
    '''Return a new array containing the original data centered around the desired value.'''
    return (data - data.mean()) + desired

This is better because we can now ask Python's built-in help system to show us the documentation for the function:

In [None]:
help(center)

A string like this is called a [docstring](./gloss.html#docstring).
We don't need to use triple quotes when we write one,
but if we do,
we can break the string across multiple lines:

In [None]:
def center(data, desired):
    '''Return a new array containing the original data centered around the desired value.
    Example: center([1, 2, 3], 0) => [-1, 0, 1]'''
    return (data - data.mean()) + desired

help(center)

#### Challenges

1.  Write a function called `analyze` that takes a filename as a parameter
    and prints out the minimum, maximum and mean values.
    `analyze('inflammation-01.csv')` should produce the graphs already shown,
    while `analyze('inflammation-02.csv')` should produce corresponding graphs for the second data set.
    Be sure to give your function a docstring.

2.  More advanced: Write a function `rescale` that takes an array as input
    and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0.
    (If $L$ and $H$ are the lowest and highest values in the original array,
    then the replacement for a value $v$ should be $(v-L) / (H-L)$.)
    Be sure to give the function a docstring. Note: this is a bit complicated 
    because we haven't learnt about Numpy yet. If you want to take a numpy array $a$ and
    subtract 5 from it, you can just do `a-5`, no need for loops...

3.  More advanced: Run the commands `help(numpy.arange)` and `help(numpy.linspace)`
    to see how to use these functions to generate regularly-spaced values,
    then use those values to test your `rescale` function.

In [None]:
def analyze(filename):
    # Your code here

analyze("inflammation-01.csv")

In [None]:
def rescale(input): 
    # Your code here

raw_data = numpy.loadtxt(fname="inflammation-01.csv", delimiter=',')
print(raw_data)
print(f'Min: {raw_data.min()}, Mean:{raw_data.mean()}, Max: {raw_data.max()}')
scaled_data = rescale(raw_data)
print(scaled_data)
print(f'Min: {scaled_data.min()}, Mean:{scaled_data.mean()}, Max: {scaled_data.max()}')


### Defining Defaults

We have passed parameters to functions in two ways:
directly, as in `span(data)`,
and by name, as in `numpy.loadtxt(fname='something.csv', delimiter=',')`.
In fact,
we can pass the filename to `loadtxt` without the `fname=`:

In [None]:
numpy.loadtxt('inflammation-01.csv', delimiter=',')

but we still need to say `delimiter=`:

In [None]:
numpy.loadtxt('inflammation-01.csv', delimiter=',')

To understand what's going on,
and make our own functions easier to use,
let's re-define our `center` function like this:

In [None]:
def center(data, desired=0.0):
    '''Return a new array containing the original data centered around the desired value (0 by default).
    Example: center([1, 2, 3], 0) => [-1, 0, 1]'''
    return (data - data.mean()) + desired

The key change is that the second parameter is now written `desired=0.0` instead of just `desired`.
If we call the function with two arguments,
it works as it did before:

In [None]:
test_data = numpy.zeros((2, 2))
print( center(test_data, 3) )

But we can also now call it with just one parameter,
in which case `desired` is automatically assigned the [default value](./gloss.html#default-parameter-value) of 0.0:

In [None]:
more_data = 5 + numpy.zeros((2, 2))
print( 'data before centering:\n', more_data )
print( 'centered data:\n', center(more_data) )

This is handy:
if we usually want a function to work one way,
but occasionally need it to do something else,
we can allow people to pass a parameter when they need to
but provide a default to make the normal case easier.
The example below shows how Python matches values to parameters:

In [None]:
def display(a=1, b=2, c=3):
    print( 'a:', a, 'b:', b, 'c:', c)

print( 'no parameters:' )
display()
print( 'one parameter:')
display(55)
print( 'two parameters:')
display(55, 66)

As this example shows,
parameters are matched up from left to right,
and any that haven't been given a value explicitly get their default value.
We can override this behavior by naming the value as we pass it in:

In [None]:
print( 'only setting the value of c' )
display(c=77)

With that in hand,
let's look at the help for `numpy.loadtxt`:

In [None]:
help(numpy.loadtxt)

There's a lot of information here,
but the most important part is the first couple of lines:

~~~python
loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None,
        unpack=False, ndmin=0)
~~~

This tells us that `loadtxt` has one parameter called `fname` that doesn't have a default value,
and eight others that do.
If we call the function like this:

~~~python
numpy.loadtxt('inflammation-01.csv', ',')
~~~

then the filename is assigned to `fname` (which is what we want),
but the delimiter string `','` is assigned to `dtype` rather than `delimiter`,
because `dtype` is the second parameter in the list.
That's why we don't have to provide `fname=` for the filename,
but *do* have to provide `delimiter=` for the second parameter.

#### Challenges

1.  Rewrite the `rescale` function so that it scales data to lie between 0.0 and 1.0 by default,
    but will allow the caller to specify lower and upper bounds if they want.
    Compare your implementation to your neighbor's:
    do the two functions always behave the same way?

#### Key Points

*   Define a function using `def name(...params...)`.
*   The body of a function must be indented.
*   Call a function using `name(...values...)`.
*   Numbers are stored as integers or floating-point numbers.
*   Integer division produces the whole part of the answer (not the fractional part).
*   Each time a function is called, a new stack frame is created on the [call stack](./gloss.html#call-stack) to hold its parameters and local variables.
*   Python looks for variables in the current stack frame before looking for them at the top level.
*   Use `help(thing)` to view help for something.
*   Put docstrings in functions to provide help for that function.
*   Specify default values for parameters when defining a function using `name=value` in the parameter list.
*   Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used).