# Functions
In this notebook, we'll encounter functions. We'll ultimately put some of our knowledge from the past few sessions together to write a program that can compute the GC content of a nucleotide sequence.

### **Outline**
* Definition of a function
* Anatomy of a function
* Standard syntax of a function
* Write a simple function
* Example use case of a function: Calculate the GC content of a nucleotide sequence
* Breakout room: Write a flexible functions


### At the end of this notebook, you'll be able to:
* [Write a simple function](#functions)
* [Use these tools to test the GC content in a DNA string](#GCcontent)

<hr>

## Definition of a function

#### A block of reusable code that only executes when it is called

<div style="background: ghostwhite; font-size: 20px; padding: 10px; border: 1px solid lightgray; margin: 10px;">
  <b>Input -> Do something (usually to input)  - > Output</b>
</div>

<a id="Functions"></a>

## Anatomy of a function

![image](https://datascienceparichay.com/wp-content/uploads/2020/08/python-function-anatomy2-1024x576.png.webp)


### Additional notes about functions
* <b>Functions should do one thing and do it well</b>
* function names use snakecase -> **my_function_name**
* Functions default to return None
* python has many built-in functions: see documentation https://docs.python.org/3/library/functions.html
* We can add docstrings to define a function, by adding a statement wrapped by `'''` after the function name. This will come up when you use `help(function)`.
* Functions can have many, many lines!
* Functions can call other functions.
* A **program** is one or more functions that work together.


Example: a simple function to square a number

```
    def square(x):        
        x_squared = x**2
        return x_squared
```

In [None]:
# Define the function here
def square(x):
    x_squared = x**2
    return x_squared

In [None]:
# Run the function here
number = square(3)
print(number)

9


### You can return the body of a function without storing a variable

In [None]:
def square(x):
    return x**2

number_two = square(4)
print(number_two)

16


<a id="GCcontent"></a>

## Writing a program to count GC content

*DNA Refresher*: Nucleic acids contain all of the information to build our cells.
- In deoxyribonucleic acid (DNA) there are four different ones: adenine (A), cytosine (C), guanine (G), and thymine (T).
- The sequence of a nucleic acid polymer is defined by the order of these bases, which we can represent with a string of A's, C's, G's, and T's.
- A bonds to T,  and C bonds to G
- [GC Content](https://en.wikipedia.org/wiki/GC-content) is a useful way to characterize DNA.

Below, there is a function to calculate the GC content of a DNA string of length 4. It may include a few elements that we haven't discussed, but can you see what it's doing?

In [None]:
# Write our function

def gc_content_4(DNA):
    """
    initialize counter,
    for every element in DNA sequence, add one to counter
    return counter/ length of DNA sequence
    """

    counter = 0
    if DNA[0] =='G' or DNA[0] == 'C':
        counter += 1
    if DNA[1] =='G' or DNA[1] == 'C':
            counter += 1
    if DNA[2] =='G' or DNA[2] == 'C':
            counter += 1
    if DNA[3] =='G' or DNA[3] == 'C':
            counter += 1

    return counter/4

In [None]:
# Call our function
gc_content_4('ATGC')

0.5

The `gc_content_4` uses **conditional statements** to test whether a given nucleotide in the sequence is equal to either a G or a C. In other words, it is doing a **value comparison**. If either of those conditions are met, it increments a **counter**.

**Question**: Why can't we write `DNA[0] == 'C' or 'G'`?

**Answer:** Python requires you to be esplict for each statement, including after or

## Breakout: Creating a flexible function
If we put in a string that's not length 4, what happens?

><b>Task:</b> Write a function (<code>gc_content_3_4</code>) that will work with strings of 3 or 4.</div>

#### With if/elif statement

In [None]:
def gc_content_3_4(DNA):
    """
    initialize counter,
    for every element in DNA sequence, add one to counter
    return counter/ length of DNA sequence
    """

    counter = 0
    if DNA[0] =='G' or DNA[0] == 'C':
        counter += 1
    if DNA[1] =='G' or DNA[1] == 'C':
            counter += 1
    if DNA[2] =='G' or DNA[2] == 'C':
            counter += 1
    if len(DNA) ==3:
        return counter/ 3


    elif DNA[3] =='G' or DNA[3] == 'C':
            counter += 1

    return counter/4

seq_3 = 'ATG'
seq_4 = 'CTGC'
gc_content_seq3 = gc_content_3_4(seq_3)
gc_content_seq4 = gc_content_3_4(seq_4)

print(gc_content_seq3, gc_content_seq4)


0.3333333333333333 0.75


#### With if/else statement

In [None]:
def gc_content_3_4_else(DNA):
    """
    initialize counter,
    for every element in DNA sequence, add one to counter
    return counter/ length of DNA sequence
    """

    counter = 0
    if DNA[0] =='G' or DNA[0] == 'C':
        counter += 1
    if DNA[1] =='G' or DNA[1] == 'C':
            counter += 1
    if DNA[2] =='G' or DNA[2] == 'C':
            counter += 1
    if len(DNA) ==3:
        return counter/ 3


    else:
        DNA[3] =='G' or DNA[3] == 'C'
        counter += 1

    return counter/4

gc_content_seq3_else = gc_content_3_4_else(seq_3)
gc_content_seq4_else = gc_content_3_4_else(seq_4)

print(gc_content_seq3_else, gc_content_seq4_else)

0.3333333333333333 0.75


#### With for loop, to process DNA sequence of any length

In [None]:
def gc_content(DNA):
    counter = 0
    for i in range(len(DNA)):
        if DNA[i] =='G' or DNA[i] =='C':
            counter += 1
    return counter/len(DNA)


seq = 'ATGCGTGCAGACT'

gc_con = gc_content(seq)
print(gc_con)


0.5384615384615384


## Additional Resources
* <a href="https://merely-useful.github.io/py/py-dev-development.html">Merely Useful Functions</a>
* <a href="https://www.python-course.eu/python3_functions.php">Python Course: Functions</a>
* <a href="https://swcarpentry.github.io/python-novice-plotting/17-conditionals/">Software Carpentries Conditionals</a>

## About this notebook
* This notebook is largely derived from UCSD COGS18 Materials, created by Tom Donoghue & Shannon Ellis, as well as exercises in [*Computing for Biologists*](https://www.cambridge.org/highereducation/books/computing-for-biologists/5B08EEEE2AE8A602113A8F225E89F5FD#overview).