### CS102/CS103

Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

# Lecture 7: Functions

Functions are used to wrap up well-defined pieces
of functionality into re-usable boxes.  Functions have names by which they can be called.
And functions provide natural places for the I, P and O parts
of a typical program.  We have already used functions
like `print()`, `input()` and `math.sqrt()`.  Here we will discuss
how to define your own functions, as in the examples of the
`chaos()` and `convert()` functions from previous lectures.

## `jupyter` worksheets

Before that, a few words on the Computer Labs.

1. If anything is unclear, let us (me or the demonstrators) know.

2. Be patient with the software.
https://try.jupyter.org is back online.
If you can, complete this week's assignment before Friday, 5pm.
If you can not: let me know before Friday.
And we are working on a more stable local solution.  

2. Or download your own copy of the `jupyter` software
as part of the `anaconda` distribution at
https://www.anaconda.com/download/ (available for `linux`, `macos` and
`windows`).

3. When submitting your completed notebook, selecting
`Kernel > Restart & Run All` will execute all `python` cells 
in the notebook and that should not produce any errors.

4. Finally, make sure to submit the notebook (`*.ipynb`)
version, not a `pdf` or `html` file.

##  A Function Example

Suppose that you are given a random DNA sequence (as a `python` string),
and you want to determine its AT content, that is
the proportion of letters `A` and `T` in the given sequence.

A brief **analysis** of the problem suggest that,
if you know the length $l$ of the DNA string,
the number $a$ of letters `A` in the string
and the number $t$ of letters `T` in the string,
the formula
$$c = \frac{a + t}{l}$$
yields the desired answer.

A program that
solves the problem of determining the AT content of a given
DNA sequence would have as its **specification**
a DNA sequence as its **input** and the AT content of that 
sequence as its **output**.

You might remember (or have researched this on the internet)
that the function `len()` can determine the length of a string,
and that the method `count()` can be applied to determine the
number of occurrences of a letter in a string.

This then suggests the following design of a program that
solves the problem of determining the AT content of a given
DNA sequence.

1. **Input** the DNA sequence as string, call it `dna`.

2. Use `len()` and `count()` in an **expression**
that computes the AT content, which is then **assigned**
to a variable called `at`.

3. **Output** the value of `at`

In the **minimal IPO format** from this week's practical, an **implementation** of this design looks as follows:

In [1]:
# this program determines the AT content of a DNA sequence.
dna = 'TAGTADDAAGDDAGTAGTTGADGDGTDGDADGGDGD'      # (I)nput
at = (dna.count('A') + dna.count('T')) / len(dna) # (P)rocess
at                                                # (O)utput

0.3888888888888889

Chances are that if you have done this once, there are AT contents of other
DNA sequences waiting to be determined.  To process another one,
you can go back to the cell above and replace its
`dna` string with a new one.  Here, for better readability,
we use a copy of that cell, with a different value for `dna`.
The rest remains the same.

In [2]:
# this program determines the AT content of a DNA sequence.
dna = 'TDTGGTDDTTDDATAGGDDATTGAADTDGGDGTADG'      # (I)nput
at = (dna.count('A') + dna.count('T')) / len(dna) # (P)rocess
at                                                # (O)utput

0.4444444444444444

## From IPO to Functions

Functions are a powerful tool for structuring programs. 
For starters a function provides dedicated spaces for the
I, P, O and other parts of a program.  The function version 
of the AT content calculation,  called `at_content`, say,
looks as follows.

In [3]:
def at_content(dna):
    "this function computes the AT content of a DNA sequence"
    at = (dna.count('A') + dna.count('T')) / len(dna)
    return at

at_content('TAGTADDAAGDDAGTAGTTGADGDGTDGDADGGDGD')

0.3888888888888889

This cell contains **two statements**: a **function definition**
and a **function call**.  We focus on the function definition.  Its
general form is
```
def <name>(<parameters>):
    <body>
```
and consists of a **heading**, from the keyword `def` to the colon (`:`),
and a **body**, which is a sequence of statements.

### (I)nput

Our function heading
```python
def at_content(dna):
```
assigns the **name** `at_content`
to the function to be defined.

I also declares an **input parameter** `dna`,
to be used in the program with whatever
value is given to the function call.

### (P)rocess

The Processing part of the program is literally the 
same as in the previous version:
```python
    at = (dna.count('A') + dna.count('T')) / len(dna)
```
just that,
as part of the function body, it is **indented**
by (4) spaces.

### (O)utput

The last statement in the function body
```
    return at
```
is a **return statement**.  
The value returned here will then become the value
of the **function call expression**
```
at_content('TAGTADDAAGDDAGTAGTTGADGDGTDGDADGGDGD')
```

### Documenting

The function even provides a space to 
document its purpose.  This is a co-called
**doc-string**,
and it appears as the first statement of the function body.
This documentation can be accessed with the built-in `help()` function:

In [4]:
help(at_content)

Help on function at_content in module __main__:

at_content(dna)
    this function computes the AT content of a DNA sequence



To find the AT content of another DNA string, you simply call the 
function again, with the new DNA string as **argument**.

In [5]:
at_content('TDTGGTDDTTDDATAGGDDATTGAADTDGGDGTADG')

0.4444444444444444

The function call captures the **input** as arguments inside
the parentheses, refers to the named function 
for the **processing** if the input data, and
then receives as its value the **output** of the function body.

## Another Example.

The program that computes the average of two exam scores,
in minimal IPO format:

In [6]:
# this program computes the average of two exam scores.
score1, score2 = 99, 96
average = (score1 + score2) / 2
average

97.5

This becomes the follwing **function**, illustrating how
a function can have **more than one input parameter**.

In [7]:
def average2(score1, score2):
    "This function computes the average of tow exam scores"
    average = (score1 + score2) / 2
    return average

average2(99, 96)

97.5

In [8]:
average2(12, 35)

23.5

In [9]:
help(average2)

Help on function average2 in module __main__:

average2(score1, score2)
    This function computes the average of tow exam scores



## Summary: Functions

* A `python` program can (and usually does) contain **many functions**.

* A function can be considered a **subprogram** - a small program inside a larger one.

* Once defined, a function can be **invoked** (or called) later,
possibly many times.

* Functions provide **structure** in a program, making it easier
to write, read, understand, and debug.

* Functions can make a large program smaller, by reducing
the amount of repetition.

* Functions appear in **two different roles**:
as function definition, and as a function call.

* A **function definition** is a **statement** that
names and defines the parts of a **new** function.

* A **function call** is an **expression**
that causes (the statements in the body of) a **known** function
to be executed.

* A function call takes **arguments** as input
for the named **parameters** which form part of the
function definition.

* The **function body** processes the input data
and uses the parameter names to refer to them.

* A **return statement** in the function body
outputs the **value** of the function call expression.