# Functions

Functions are a key component of programming. We have already used functions several times in this course. Some examples include `len()`, `range()`, and `max()`. In this section we will learn how to write our own functions. When programming, it often becomes important to conduct the same set of operations many times. Wrapping up these operations into a function allows us to write that code only once, then call the function each time that we want to conduct that particular set of operations. It also isolates that chunk of code into a single place, which really helps when we have to debug our code.

### Defining functions

To define a function, we must follow some standard Python syntax. Say we want to create a function that adds two integers together. First, we must give it a name, such as `add_these`. To start this function, we use `def` followed by a unique name and parentheses:

```
def add_these():
```

In this case, we wish to add to integers, so we have to feed the function some arguments. We enclose these in the parentheses. This will allow us to accept numbers and add them together. Note that whatever the user feeds to the function will be automatically assigned to the variables `a` and `b` and **in that order**. If the user feeds too many or too few arguments to the function, it will result in an error.

```
def add_these(a, b):
```

Now we need to conduct the actual operation. In Python, anything that occurs within the function needs to be indented.

```
def add_these(a, b):
    the_sum = a + b
```

Now we have the sum assigned to the variable `the_sum`. However, functions are blocks of code that are meant to stand alone. This also means that the variables that we set *within* the function are not accessible *outside* of the function. In order to end information out of the function, we must use a `return` statement. In this example, we will return `the_sum`.

Let's go ahead and add this variable to our code in this notebook:

In [2]:
def add_these(a, b):
    the_sum = a + b
    return the_sum

In order to access the sum, we must set a variable equal to the function, and feed the function some arguments (the integers that we want to add).

In [3]:
my_sum = add_these(1, 5)
print(my_sum)

6


You can imagine that a function like this could be pretty useful in a large program. Anytime that you need to add things together, you can simply invoke `add_these()`. If you discover that you've done something wrong in your function, you only have to correct a single block of code.

Let's use an example from our homework. Imagine that you write a program, where you need to estimate the GC content for many different sequences. To do this we can enclose some of the code from our homework into a function that estimates GC content. Here is an example:

In [1]:
def estimate_gc_content(sequence):
    upper_sequence = sequence.upper()
    gc = 0
    at = 0
    for nuc in sequence:
        if nuc == "C" or nuc == "G":
            gc += 1
        elif nuc == "A" or nuc == "T":
            at += 1
        elif nuc == "\n":
            pass
        else:
            print("Warning: Uh oh, it looks like you have a character that is not A, C, T, or G." +
                  " This program only estimates GC content based on those characters. We" +
                  " will move on, but check your file to make sure your results are consistent.")
    gc_cont = (gc/(at + gc))
    return gc_cont

Now that we have a nice, enclosed function to estimate GC content, our homework might look like this:

In [2]:
seq_file = open("sequences.txt")
gc_cont_list = []
for line in seq_file.readlines():
    gc_cont = estimate_gc_content(line)
    gc_cont_list.append(gc_cont)
max_gc = max(gc_cont_list)
print("Your maximum GC content is: " + str(max_gc) + "!")
seq_file.close()

Your maximum GC content is: 0.6!


Now let's see what happens if we open a file that contains invalid nucleotides.

In [3]:
seq_file = open("sequences_with_errors.txt")
gc_cont_list = []
for line in seq_file.readlines():
    gc_cont = estimate_gc_content(line)
    gc_cont_list.append(gc_cont)
max_gc = max(gc_cont_list)
print("Your maximum GC content is: " + str(max_gc) + "!")
seq_file.close()

Your maximum GC content is: 0.6!


As you can see above, this code generated a lot of warnings, and it isn't clear whether we can trust the answer. A better method might be to raise an error that stops the program. To do this, we can use the command `raise`. In this case, we will raise a `ValueError`. Here is an updated function with these changes:

In [7]:
def estimate_gc_content(sequence):
    upper_sequence = sequence.upper()
    gc = 0
    at = 0
    for nuc in sequence:
        if nuc == "C" or nuc == "G":
            gc += 1
        elif nuc == "A" or nuc == "T":
            at += 1
        elif nuc == "\n":
            pass
        else:
            raise ValueError("Sorry, it looks like you have a nucleotide that is not valid, please fix your data.")
    gc_cont = (gc/(at + gc))
    return gc_cont

Now let's try running the sequence with errors again:

In [8]:
seq_file = open("sequences_with_errors.txt")
gc_cont_list = []
for line in seq_file.readlines():
    gc_cont = estimate_gc_content(line)
    gc_cont_list.append(gc_cont)
max_gc = max(gc_cont_list)
print("Your maximum GC content is: " + str(max_gc) + "!")
seq_file.close()

ValueError: Sorry, it looks like you have a nucleotide that is not valid, please fix your data.

Now the output is quite a bit more useful and we don't receive a dubious answer.