Modifed from _Introduction to Programming in the Biological Sciences Bootcamp_ © 2022 Justin Bois (Caltec). Download the original notebook [here](https://justinbois.github.io/bootcamp/2023/lessons/l07_intro_to_functions.html).

# Lesson 7: Introduction to functions

<hr>

### The objective of this lession is to describe functions in python:
1. Scope
2. Defining a function
3. Calling a function
4. Return values
5. Software development of writing a function (i.e. start simple)

A **function** is a key element in writing programs. You can think of a function in a computing language in much the same way you think of a mathematical function. The function takes in **arguments**, performs some operation based on the identities of the arguments, and then **returns** a result. For example, the mathematical function

\begin{align}
f(x, y) = \frac{x}{y}
\end{align}

takes arguments $x$ and $y$ and then returns the ratio between the two, $x/y$. In this lesson, we will learn how to construct functions in Python. 

First, here is some code that does calculates a ratio and prints it:

In [1]:
x = 2
y = 4
result = x / y
print(result)

0.5


## Basic function syntax

For our first example, we will translate the above function into Python. A function is **defined** using the `def` keyword. This is best seen by example.

In [2]:
def ratio(x, y):
    """The ratio of `x` to `y`."""
    return x / y

Following the `def` keyword is a **function signature** which indicates the function's name and its arguments. Just like in mathematics, the arguments are separated by commas and enclosed in parentheses. The indentation following the `def` line specifies what is part of the function. As soon as the indentation goes to the left again, aligned with `def`, the contents of the functions are complete.

Immediately following the function definition is the **doc string** (short for documentation string), a brief description of the function. The first string after the function definition is always defined as the doc string. Usually, it is in triple quotes, as doc strings often span multiple lines.

Doc strings are more than just comments for your code, the doc string is what is returned by the native python function `help()` when someone is looking to learn more about your function. For example:

In [3]:
help(ratio)

Help on function ratio in module __main__:

ratio(x, y)
    The ratio of `x` to `y`.



They are also printed out when you use the `?` in a Jupyter notebook or JupyterLab console.

In [4]:
ratio?

[0;31mSignature:[0m [0mratio[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m The ratio of `x` to `y`.
[0;31mFile:[0m      ~/school/BME_JumpStart_Drafts/Day 1/<ipython-input-2-ad32d3e3b4db>
[0;31mType:[0m      function


You are free to type whatever you like in doc strings, or even omit them, but you should always have a doc string with some information about what your function is doing. True, this example of a function is kind of silly, since it is easier to type `x / y` than `ratio(x, y)`, but it is still good form to have a doc string. This is worth saying explicitly.

<div style="color: dodgerblue; text-align: center; font-weight: bold;">

All functions should have doc strings.
    
</div>

In the next line of the function, we see a **return** keyword. Whatever is after the **return** statement is, you guessed it, returned by the function. Any code after the **return** is *not* executed because the function has already returned!

### Calling a function

Now that we have defined our function, we can **call** it. You call a function by typing the function name followed by arguments in parenthesis. Functions will always have parenthesis.  

Since we're working in a jupyter notebook, the return value is printed as output even without calling `print`:

In [5]:
# call the function and return to the ipython console
ratio(5, 4)

1.25

In [6]:
# call the function and excplicitly print the return value
print(ratio(4, 2))

2.0


In [7]:
# assign the return value to a new variable and print the return value
my_ratio = ratio(90.0, 8.4)
print(my_ratio)

10.714285714285714


### Functions need not have arguments

A function does not need arguments. As a silly example, let's consider a function that just returns 42 every time. Of course, it does not matter what its arguments are, so we can define a function without arguments.

In [8]:
def answer_to_the_ultimate_question_of_life_the_universe_and_everything():
    """Simpler program than Deep Thought's, I bet."""
    return 42

We still needed the open and closed parentheses at the end of the function name. Similarly, even though it has no arguments, we still have to call it with parentheses.

In [9]:
answer_to_the_ultimate_question_of_life_the_universe_and_everything()

42

### Functions need not return anything

Just like they do not necessarily need arguments, functions also do not need to return anything. If a function does not have a `return` statement (or it is never encountered in the execution of the function), the function runs to completion and returns `None` by default. `None` is a special Python keyword which basically means "nothing." For example, a function could simply print something to the screen.

In [10]:
def think_too_much():
    """Express Caesar's skepticism about Cassius"""
    print("""Yond Cassius has a lean and hungry look,
He thinks too much; such men are dangerous.""")

We call this function as all others, but we can show that the result it returns is `None`.

In [11]:
return_val = think_too_much()

# Print a blank line
print()

# Print the return value
print(return_val)

Yond Cassius has a lean and hungry look,
He thinks too much; such men are dangerous.

None


## Built-in functions in Python

The Python programming language has several built-in functions. We have already encountered `print()`, `id()`, `ord()`, `len()`, `range()`, `enumerate()`, `zip()`, and `reversed()`, in addition to type conversions such as `list()`.  The complete set of **built-in functions** can be found [here](https://docs.python.org/3/library/functions.html). A word of warning about these functions and naming your own.

<div style="color: dodgerblue; text-align: center; font-weight: bold;">

Never define a function or variable with the same name as a built-in function.
    
</div>

Additionally, Python has **keywords** (such as `def`, `for`, `in`, `if`, `True`, `None`, etc.), many of which we have already encountered. A complete list of them is [here](https://docs.python.org/3/reference/lexical_analysis.html#keywords). The interpreter will throw an error if you try to define a function or variable with the same name as a keyword.

Here's a fun example of using a python built-in: 

In [14]:
help(help)

Help on _Helper in module _sitebuiltins object:

class _Helper(builtins.object)
 |  Define the builtin 'help'.
 |  
 |  This is a wrapper around pydoc.help that provides a helpful message
 |  when 'help' is typed at the Python interactive prompt.
 |  
 |  Calling help() at the Python prompt starts an interactive help session.
 |  Calling help(thing) prints help for the python object 'thing'.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *args, **kwds)
 |      Call self as a function.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



## Example function: GC content

Below is an example we saw earlier on how to use a `for` loop to calculate the GC content of a sequence. 

Write a function to do the same calculation given an input sequence and returning an output GC content fraction. Call your function using the seq and verify that your results are the same

In [10]:
seq = 'TTGTTAGTGATGGCTAAATATAG'

#enter your solution here
gc_count = 0
for bp in seq:
    if bp == 'G' or bp == 'C':
        gc_count = gc_count + 1

gc_content = gc_count / len(seq)
gc_content

0.30434782608695654

In [11]:
# Your solution here
def calc_gc_content(seq):
    """ Calculate the GC content given a sequence string """
    gc_count = 0
    for bp in seq:
        if bp == 'G' or bp == 'C':
            gc_count = gc_count + 1

    gc_content = gc_count / len(seq)
    return gc_content

calc_gc_content(seq=seq)

0.30434782608695654

## Example: reverse complement

A reverse complement is what it sounds like; given a DNA sequence find the reversed and complmented sequence. For example 

`AATG`

is reversed to 

`GTAA`

and the complement is 

`CATT`

You could do these operations in either order. Knowing the reverse complement is important for a variety of reasons, for exmaple when designing primers for a PCR experiment. 

Write a function `reverse_complement(sequence)` that computes the reverse complement of a sequence. Do the following when writing this function:
1. Write another function `complement_base(base)` that returns a complement for a single base. 
2. Call `complement_base()` inside of `reverse_complement()`

In [4]:
# your solution here:
def complement_base(base):
    """Returns the Watson-Crick complement of a base."""
    if base in 'Aa':
        return 'T'
    elif base in 'Tt':
        return 'A'
    elif base in 'Gg':
        return 'C'
    else:
        return 'G'


def reverse_complement(seq):
    """Compute reverse complement of a sequence."""
    # Initialize reverse complement
    rev_seq = ''
    
    # Loop through and populate list with reverse complement
    for base in reversed(seq):
        rev_seq += complement_base(base)
        
    return rev_seq

Note that we do not have error checking here, which we should definitely do, but we'll cover that in a future lesson. For now, let's test it to see if it works.

In [5]:
reverse_complement('GCAGTTGCA')

'TGCAACTGC'

Note that `complement_base()` and `reverse_complement` are compartmentalized. If we need to find a complementary base for another function, we already have the necessary function. In programming, the idea of compartmentalizing your code so it is reusable is called _modularity_. If your code is well-desigend, you'll be able to mix and match the modulues of code you write for future use. 

A related programing practice is to write DRY code: Don't Repeat Yourself. If you're writing a block of code over and over, it's a good idea to refactor that into a function. 

## Example: reverse complement continued
`reverse_complement()` looks good, but we might want to write yet another function to display the template strand (from 5$'$ to 3$'$) above its reverse complement (from 3$'$ to 5$'$). This makes it easier to verify. 

Write a new function that utilizes `reverse_complement()` to display the original string, lines (use the pipe character) to indicate complements on a new line, and the reversed complement on a new line. For example: 

```python
seq = 'GCAGTTGCA'
display_complements(seq)
```
Will display: 
```
GCAGTTGCA
|||||||||
CGTCAACGT
```


In [15]:
# Your solution here:
def display_complements(seq):
    """Print sequence above its reverse complement."""
    # Compute the reverse complement
    rev_comp = reverse_complement(seq)
    
    # Print template
    print(seq)
    
    # Print "base pairs"
    for base in seq:
        print('|', end='')
    
    # Print final newline character after base pairs
    print()
            
    # Print reverse complement
    for base in reversed(rev_comp):
        print(base, end='')
        
    # Print final newline character
    print()

Let's call this function and display the input sequence and the reverse complement returned by the function.

In [16]:
seq = 'GCAGTTGCA'
display_complements(seq)

GCAGTTGCA
|||||||||
CGTCAACGT


Ok, now it's clear that the result looks good! This example demonstrates an important programming principle regarding functions. We used three functions to compute and display the reverse complement.

1. `complement_base()` gives the Watson-Crick complement of a given base.
2. `reverse_complement()` computes the reverse complement.
3. `display_complements()` displays the sequence and the reverse complement.

We could very well have written a single function to compute the reverse complement with the `if` statements included within the `for` loop.  Instead, we split this larger operation up into smaller functions. This is an example of **modular** programming, in which the desired functionality is split up into small, independent, interchangeable modules. This is a very, very important concept.

<div style="color: dodgerblue; text-align: center; font-weight: bold;">

Write small functions that do single, simple tasks.
    
</div>


## Pause and think about testing

Let's pause for a moment and think about what the `complement_base()` and `reverse_complement()` functions do. They do a well-defined operation on string inputs. If we're doing some bioinformatics, we might use these functions over and over again. We should therefore thoroughly **test** the functions. For example, we should test that `reverse_complement('GCAGTTGCA')` returns `'TGCAACTGC'`. For now, we will proceed without writing tests, but we will soon cover **test-driven development**, in which your functions are built around tests. For now, I will tell you this: **If your functions are not thoroughly tested, you are entering a world of pain. A world of pain.** Test your functions.

## Keyword arguments

Now let's say that instead of the reverse DNA complement, we want the reverse RNA complement. We could re-write the `complement_base()` function to do this. Better yet, let's modify it.

In [17]:
def complement_base(base, material='DNA'):
    """Returns the Watson-Crick complement of a base."""
    if base in 'Aa':
        if material == 'DNA':
            return 'T'
        elif material == 'RNA':
            return 'U'
    elif base in 'TtUu':
        return 'A'
    elif base in 'Gg':
        return 'C'
    else:
        return 'G'
    
def reverse_complement(seq, material='DNA'):
    """Compute reverse complement of a sequence."""
    # Initialize reverse complement
    rev_seq = ''
    
    # Loop through and populate list with reverse complement
    for base in reversed(seq):
        rev_seq += complement_base(base, material=material)
        
    return rev_seq

We have added a **named keyword argument**, also known as a **named kwarg**. The syntax for a named kwarg is

    kwarg_name=default_value
    
in the `def` clause of the function definition. In this case, we say that the default material is DNA, but we could call the function with another material (RNA). Conveniently, when you call the function and omit the kwargs, they take on the default value within the function. So, if we wanted to use the default material of DNA, we don't have to do anything different in the function call.

In [18]:
reverse_complement('GCAGTTGCA')

'TGCAACTGC'

But, if we want RNA, we can use the kwarg. We use the same syntax to call it that we did when defining it.

In [19]:
reverse_complement('GCAGTTGCA', material='RNA')

'UGCAACUGC'

# Extra content/ on your own?

## Calling a function with a splat

Python offers another convenient way to call functions. Say a function takes three arguments, `a`, `b`, and `c`, taken to be the sides of a triangle, and determines whether or not the triangle is a right triangle. I.e., it checks to see if $a^2 + b^2 = c^2$.

In [20]:
def is_almost_right(a, b, c):
    """
    Checks to see if a triangle with side lengths
    `a`, `b`, and `c` is right.
    """
    # Use sorted(), which gives a sorted list
    a, b, c = sorted([a, b, c])
    
    # Check to see if it is almost a right triangle
    if abs(a**2 + b**2 - c**2) < 1e-12:
        return True
    else:
        return False

Remember our warning from before: never use equality checks with `float`s. We therefore just check to see if the Pythagorean theorem *almost* holds. The function works as expected.

In [21]:
is_almost_right(13, 5, 12)

True

In [22]:
is_almost_right(1, 1, 1.4)

False

Now, let's say we had a tuple with the triangle side lengths in it.

In [22]:
side_lengths = (13, 5, 12)

We can pass these all in separately by splitting the tuple but putting a `*` in front of it. A `*` before a tuple used in this way is referred an **unpacking operator**, and is referred to by some programmers as a "splat."

In [23]:
is_almost_right(*side_lengths)

True

This can be very convenient, and we will definitely use this feature later in the bootcamp when we do some string formatting.

## Topics not in this tutorial
1. Anonymous (a.k.a. lambda) functions
2. Function 'args' vs 'kwargs'
3. Default arguments that are lists (danger!) 
4. Functions as variables

## More exercises:

1. modify `is_almost_right()` to take an 'error' argument that has a default value. 
    1. Experiment with the default value (what value is 'good enough')?
    2. Think about how you might determine what error value is good enough programatically. We'll do plotting later, so describe what you would want to plot to identify a good value. 
    3. Think about (or implmement) some tests for the function.
2. Modify the `calc_gc_content` function from above to check that the input is:
    1. A string
    2. All uppercase or lowercase (you choose which to do checks against)
    3. Contains valid characters
3. Problem 3