# Functions

* the important principles of modularization (breaking down into discrete chunks) and repetition in programming (I tend to think of this, generally, as encapsulation but encapsulation has the connotation of OOP which we haven't covered yet).

* the inventor of the "subroutine" (function): https://en.wikipedia.org/wiki/Kathleen_Antonelli

* We've already seen some built in methods that are associated with particular data type.
    * ie. .upper(), lower(), len(), str()
    
* We've also seen built in functions that work more generally and might even work differently depending on the arguments that they are given
    *  range()
        * range(6) or range(0,6) result in: [0,1,2,3,4,5]
        * range(1,6,3) results in: [1,4] since it encompasses the range from 1-6 with steps of size 4
    * type() – returns type of data that is input as argument
    * float() – converts integer to a float
    * round()- rounds floats
    * max() – takes any number of arguments and returns the largest one. Hard to use on strings so use of floats and integers
    * min() – opposite of max()
    * abs() – absolute value but only takes one argument

Here is the link to more built in functions: 
https://docs.python.org/3/library/functions.html

## Modules: 

* Python (and most other languages) have built-in functions that are general use
* Modules and libraries (we’ll learn about them later) are a way to address discipline-centric functions
    * Sometimes, we will will need to write a particular function for our own specific research needs
    
* We can import modules (we saw this in an earlier example) that contain functions and variables or we can create our own:

<div class="alert alert-block alert-warning">
		
        
        import math   
#you can also import ALL functions from a module by using the astericks symbol, *
		
        print(math.sqrt(100))
        
 <div class="alert alert-block alert-warning">
    
    
#or, equivalently, we can bring in just one function from a module like so:
        
		from math import sqrt
		print(sqrt(100))


## Creating our OWN FUNCTIONS offers a number of advantages: 
* Allows us to re-use a block of code many times
        * Reusable piece of code; allows for repeatability
* If we need to change the code, we only have to do it to the one function (instead of each time it is called in a program)
* Encapsulation (an important programming idea) Splitting code into functions, into little chunks that we can work on independently 
        * benefits of modularization
        * Can use the block of code in other programs
### Parts of a function: 
1. Header
    * def keyword – there are nomenclature rules
    * :
    * parameters – name of argument that located between ()
          * Variable number of arguments *args 
                  * Passed into the function, 
                  * Ex: def hello_world():
2. (optional) comment
    * explanation of function following # 
3. Body
    * The procedures of the function, indented four spaces

**example: **
<div class="alert alert-block alert-warning">
    
    def function_name(argument or nothing at all):
           Blah
           Blah
           Blah
           return something


* NB: There is a BIG difference between defining a function and calling it

    * After defining a function, you must call it in order for it to be execute
    
    * **Functions return to where they are called from**
    
    * return prompts Python to exit the function and  assign any values on the return line to the variable that called the function in the first place. It is easy to forget to give a return line in which case python supplies “None” for you
    
    * To summarise: **the calling function suspends execution at the point of the call, the body of the function is executed, and control is returned to just after the point where the function was called.**
    
    * Similar **scoping rules** apply to functions as to loops. That is: variables created within the body of a function cannot be called outside of the function or you will get a NameError

In [6]:
# here you are defining the function
# This function raises a given number to a given power

#you are passing in two arguments to the function when you call the function from the main
# part of the program
def power_to(base,exponent):
    res = base**exponent
    print("%d to the power of %d is %d." % (base, exponent, res))
    print("here is the return statement to orient me")
    return res

#here you are calling the function
result_main=2
result_again=power_to(37,result_main)

# Why does this raise an exception? 
#print(res)
# This error should help you with scope. 

print(result_main)
print(result_again)

37 to the power of 2 is 1369.
here is the return statement to orient me
2
1369


### Revisiting scope 

* Scope  - is the region of the program from which a variable can be referenced. It is the hierarchical order in which namespace is searched to find the object. ** L.E.G.B ** is the acronym for how scope officially works: Local, Enclosed, Global and Built-in. Local can only be 'reached' by their local environment (Function), Enclosed is accessed from inside nested function, Global is accessed anywhere in the program and Built-in are reserved names in Python (like keywords such as 'def'). 

* The only way that a function can see a variable from another function (including the main body of the program itself) *is if the variable is passed as an argument/parameter to the function itself* 

* Local variables (within the function) cease to exist once the function is completed - the local variables are released from memory once the function is finished (so the memory is erased)... 

* Potential Confusion Warning: We saw the idea of scope when we discussed loops. Keep in mind that variables that are passed into a function don’t change their value outside of the function (ie. in the main body of the program)…usually. This is because most variables that we are interested in using are immutable such as numbers and tuples (python passes them ‘by value’). However, we were also talking about *iterating over lists* and confusingly, lists work a little bit differently than regular variables because they are made up of pointers rather than actual data/variables. Thus, they can be manipulated/changed from within a loop (called 'changed in place') and maintain that difference even once the loop is finished. Mutable data types such as lists CAN HAVE THEIR VALUE CHANGED by a function and passed back to the call where **it will remain changed.** However, this is not true of most variables since most variable are immutable. 

In [12]:
# functions usually don't change the values of passed parameters
# however, if you are using lists - which contain pointers rather 
# than actual values, functions that are called CAN change the value
# of elements of the list. Note: this example is stolen from CSC161

def change_num_by_two(num):
    #num = 2*num
    num *= 2
    print("Inside the function. The value of num is: ")
    print(num)
    return num

mynum = 7
mynum2 = change_num_by_two(mynum)

print("after being sent through the change_by_two function, the value of mynum2 set in the program is: ")
print(mynum2)
print("but the original value of mynum set in the program is still: ")
print(mynum)
print("--------------")

Inside the function. The value of num is: 
14
after being sent through the change_by_two function, the value of mynum2 set in the program is: 
14
but the original value of mynum set in the program is still: 
7
--------------


In [13]:
#compare this to passing a list

# note: you should put all of your function definitions at the top 
# of your program 

def change(my_list):
    if(len(my_list)>0):
        print("did I get here?")
        my_list[0]=my_list[0]*2
    else:
        print("there is nothing in this list")
        
mylist=[7,6,5,4,3]
empty_list=[]
print(mylist)
change(mylist)
change(empty_list)
print(mylist[0])
print(mylist)

[7, 6, 5, 4, 3]
do I get here?
there is nothing in this list
14
[14, 6, 5, 4, 3]


#### We'll learn about it in an entire lecture dedicated just to it but, for a future placemarker:  functions can also call themselves!!*

* recursive functions. How cool it that? 

## Functions can call *other* functions
We’ll see more sophisticated versions of this fact in Module 3 lectures (there is an entire lecture based on recursion)

I really want you to work through these examples and make sure you understand where the program is calling/running in the code. PythonTutor will probably help with this visualization. 

<div class="alert alert-block alert-warning">
example: 
    
    def one_good_turn(n):
       return n + 1   

    def deserves_another(n):
       return one_good_turn(n) + 2
    
    print(deserves_another(10))

In [3]:
def one_good_turn(n):
    return n + 1   

def deserves_another(n):
    #return one_good_turn(n+2)
    return one_good_turn(n) + 2

#main body of the program    
print(deserves_another(10)) #calls the function deserves_another

13


In [15]:
def ran(x):
    x=42
    return 2

def DAP(x):
    # here the second function, ran() is called from the middle of the first function
    y=ran(x)
    print(x)
    print("_"*10)
    print("y:")
    print(y)
    z=x+y
    print("z:")
    print(z)
    return z

print("Call the DAP function which, in turn, calls the ran function")
print(DAP(1))

Call the DAP function which, in turn, calls the ran function
1
__________
y:
2
z:
3
3


In [16]:
# Another example just because I enjoy them so much!
def cube(number):
    return number**3
    
def by_three(number):
    if number%3 ==0:
        return cube(number)
    else:
        return False

print(by_three(27))

19683


#Discussion thread example: 
_____________________
1. Let's assume that we have a long DNA sequence in which each symbol is random and equally likely to be an A, C, T, or G. We'd like to determine the average number of symbols until we see the first occurrence of a start codon "ATG".  We'll break this into parts: 

       a. First, write a function in pseudocode called lengthToATG() that takes no input, generates a random letter from "A", "T", "C", or "G", and then if the last three letters generated are "ATG", it returns the total number of symbols that were generated. Otherwise, it repeats the process by generating a next random letter. 
       
       
       b. compute the average time to the ATG by creating another function that takes lengthToATC() as an argument and runs it *n* number of trials to see when an ATG is created (n=10 or 100 or 1000). 

## Other Miscellaneous facts about functions
* Functions don’t necessarily have to return a value
    * If no return, they will return None
* Functions don’t need to take arguments
* Functions don’t need to take arguments in particular order if you include a key word
* Function arguments can have defaults 
* Functions can use *args, an argument passed into a function that **allows a flexible number of items to be inputted when the function is called**

## Calling a function
* What return value/type?
* What argument type/number?
* Sometimes you also need to know the order of the arguments
    1. Key word arguments:
        * Allows us to call functions with list of variables in whatever order we like
    2. Defaults:
        * Allows us to specify default values for arguments


In [4]:
# Slightly more challenging example and will look suspiciously similar to a problem set
# question. In this case, we don't care if the A and the T are neighbours, we just care
# about the total percentage of A and T in the sequence since it can be informative 
# about mutation rates: 
# https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1001115

# I fixed this by adding round() to print statement. 
def get_at_content(dna,sig_dig=2):
# notice that we are passing two arguments to this function and 
# one of them has a default value also sig_dig is a keyword
    length=len(dna)
    #print(length)
    a_count=dna.upper().count("A")
    t_count=dna.upper().count("T")
    at_content=(a_count+t_count)/length
      #built in function round takes argument and number of decimal points 
      #remember not to mix tabs and spaces, pick either tabs or 4 spaces but not both
      #or python will raise an indentation error exception
    print(round(at_content,sig_dig))

# main part of the program
my_at_content=get_at_content("ATATATATACGGGGGGGGG")
# no sig dig have been specified so it gives you the default of 2 
print(str(my_at_content))
print("_______")
print(get_at_content("aactgtagcga",5))


0.47
None
_______
0.54545
None


In [30]:
# ------------------------------------------------------------
# SAME EXAMPLE AS ABOVE BUT WITH A default for both arguments
def get_at_content(dna="ATATATATACGGGGGGGGG",sig_dig=2):
# notice that we are passing two arguments to this function and 
# one of them has a default value also sig_dig is a keyword
    length=len(dna)
    print(length)
    a_count=dna.upper().count("A")
    t_count=dna.upper().count("T")
    at_content=(a_count+t_count)/length
      #built in function round takes argument and number of decimal points 
      #remember not to mix tabs and spaces, pick either tabs or 4 spaces but not both
      #or python will raise an indentation error exception
    print(at_content)
    return round(at_content,sig_dig)

# Since both arguments have defaults, you don't have to provide an argument
my_at_content=get_at_content()
# no sig dig have been specified so it gives you the default of 2 
print(str(my_at_content))
print("_______")
# IN FACT BY USING KEY WORDS, YOU DON'T EVEN HAVE TO PROVIDE THE ARGUMENTS IN THE SAME ORDER!
print(get_at_content(sig_dig=5,dna="aactgtagcga"))

19
0.47368421052631576
0.47
_______
11
0.5454545454545454
0.54545


In [33]:
# this was an attempt to bring together a bunch of ideas that we have seen already. 
# you have also seen some of this code in a previous lecture. 
# it is a slightly more complicated program that calls a bunch of functions on a file object
# I used it with a file called data.csv which has a couple of lines of sequence data 
# in it.  

# we don't need this function for this example but maybe students could figure out how to
# appropriately call it and use it? 
#def print_all(f):
#    print(f.read())

def rewind(f):
    #What do you think the .seek method of file objects does? 
    f.seek(0)
    
#def print_a_line(line_count,f):
#    print(line_count,f.readline())

# create a file object called current_file:   
current_file=open('data.csv')

print("First let's print the whole file: \n")
print_all(current_file)
print("Now let's rewind this file like a tape. \n")

#call the rewind function and use our FILE OBJECT as the argument to the function
rewind(current_file)

print("Let's print these three lines: ")

# in the following commands, we COULD BUNDLE THEM MORE EFFICIENTLY AS A LOOP!!
current_line=1
print_a_line(current_line,current_file)

current_line=current_line+1
print_a_line(current_line,current_file)

current_line=current_line+1
print_a_line(current_line,current_file)

# ALWAYS CLOSE OUR FILE OBJECT!
current_file.close()

First let's print the whole file: 

Drosophila melanogaster,atatatatatcgcgtatatatacgactatatgcattaattatagcatatcgatatatatatcgatattatatcgcattatacgcgcgtaattatatcgcgtaattacga,kdy647,264
Drosophila melanogaster,actgtgacgtgtactgtacgactatcgatacgtagtactgatcgctactgtaatgcatccatgctgacgtatctaagt,jdg766,185
Drosophila simulans,atcgatcatgtcgatcgatgatgcatccgactatcgtcgatcgtgatcgatcgatcgatcatcgatcgatgtcgatcatgtcgatatcgt,kdy533,485
Drosophila yakuba,cgcgcgctcgcgcatacggcctaatgcgcgcgctagcgatgc,hdt739,85
Drosophila ananassae,ttacgatcgatcgatcgatcgatcgtcgatcgtcgatgctacatcgatcatcatcggattagtcacatcgatcgatcatcgactgatcgtcgatcgtagatgctgacatcgatagca,hdu045,356
Drosophila ananassae,gcatcgatcgatcgcggcgcatcgatcgcgatcatcgatcatacgcgtcatatctatacgtcactgccgcgcgtatctacgcgatgactagctagact,teg436,222
Now let's rewind this file like a tape. 

Let's print these three lines: 
1 Drosophila melanogaster,atatatatatcgcgtatatatacgactatatgcattaattatagcatatcgatatatatatcgatattatatcgcattatacgcgcgtaattatatcgcgtaattacga,kdy647,264

2 Drosoph

In [20]:
# this function should be SOME - not significantly - help for PS 
def get_at_content(dna="ATCG",sig_dig=1):
    length=len(dna)
    print("here i am in the function")
    print(length)
    a_count=dna.count("A")
    t_count=dna.count("T")
    c_count=dna.count("C")
    g_count=dna.count("G")
    at_content=(a_count+t_count)/length
    print("a and t content")
    print(at_content)
    return round(at_content,sig_dig)

# main body of program begins here
filename="dna5.txt"
#creating file object
f=open(filename)
# interact with file object using methods
contents=f.read().rstrip("\n")
print("Here we are in the main body of the program")
print(contents)
print(get_at_content(contents,4))
print("~~~~~~~~~~~~~~~~~~~~")
print("Default values: ",get_at_content())

Here we are in the main body of the program
AAAATCGCTAATATCCATGCTCACGCTATCACCTCGGTTCCGCTTTTGGCGATGTGGGTACGCTTGCCGGCGGGGCTGCC
GCAGGCTGTTGTAATACACTTACCGGCTCATTCTGGCTTTCAGCGTTTTCATGCAAAATGGCCTGTTCCGCCTGTTTGTT
TAACACCAGCAGGCTTATACGGCGGTTGATGGCGTCATCAGGACCGCGATCGCTCAGCCGCATCGTGGCGGCCATGCCAA
CCACCCGTAATACTTTTCCGTTATCCAGCCCGCCAGCGACCAGTTCGCGACGAGAGGCATTGGCGCGATCGGCGGATAAC
TCCCAGTTGCTATAGCCTTTTTCGCCGTTCGCGTAGGGAAAGTCATCGGTATGGCCGGCCAGGCTAATGCGATTAGGTAT
ACCGTTTAACACTGGCGCAATCGCACGCAGGATATCGCGCATATACGGCTCAACTTCGGCGCTGCCGGTTTTAAACATCG
GGCGGTTCTGGCTGTCGATAATCTGGATGCGCAACCCCTCCTGAACTAAATCAATTTTCAGATGCGGACGTAACGCGCGC
AGTTTGGGATCGGATTCGATCAGTTGATCCAGATCGCCGCGCAGTTTGTTTAAGCGACTCTGCTCCATCCGTTTTTTCAG
CTCGTCGATATTCGGCTGCTTTTCCACTTCACCCTGCTGTTGGGTGTAATCATCGCCGCCGCCTGGTATCGGGCTCTCGC
TATTGGCAATCCGATTCCCCCCCGTTACCGCGGTGGCCAACGGCGTACGAAAATATTCGGCAATCTGAATTAATTCTTTA
GGGCTGGAGATGGAAATCAGCCACATCACCAGAAAAAAAGCCATCATCGCCGTCATAAAATCGGCGTAGGCAATTTTCCA
GGAACCGTGCGCCCCGCCGCCGTGCGGTTTGTGCCTGCGGCGTTTTACGACGACAATGGGATGAG

# Functions can take lists as input
* Functions can take lists as arguments
    * You can pass a list to a function the same way you pass any argument
    * You will need to use one of the two formats: 
        function_name(listname) or function_name([list items])

<div class="alert alert-block alert-warning">
example: 
   
    def fizz_count(x):
    #Define a function that counts the number of times "fizz" is in the list
        count=0
        for item in x:
            if item=="fizz":
                count=count+1
        return count

    fizz_count(["fizz","cat","fizz"])

In [27]:
def fizz_count(x):
#Define a function that counts the number of times "fizz" is in the list
    count=0
    for item in x:
        if item=="fizz":
            count=count+1
    return count

# we can create a list with a name
fizzy_list=["fizz","cat","fizz"]
# pass list name into the function
print(fizz_count(fizzy_list))
# or we can directly pass a list into the function
print(fizz_count(["fizz","cat","fizz"]))

2
2


# Functions can also take multiple lists as input

<div class="alert alert-block alert-warning">
example: 
   
    m = [1, 2, 3]
    n = [4, 5, 6]

    def join_lists(x,y):
        return x+y
    
    print(join_lists(m, n))
    # You want this to print [1, 2, 3, 4, 5, 6]


In [22]:
# Functions can also take multiple lists as input 
def join_lists(x,y):
    return x+y

m = [1, 2, 3]
n = [4, 5, 6]

print(join_lists(m, n))

[1, 2, 3, 4, 5, 6]


In [32]:
# Functions can also take a FLEXIBLE number of arguments. Note that the important part of 
# *args name isn't actually arg, it is *.
# https://realpython.com/python-kwargs-and-args/
def my_sum(*args):
    result = 0
    # Iterating over the Python args tuple
    for x in args:
        result += x
    return result

print("How about with FOUR arguments provided: ")
print(my_sum(1,2,3,4))
print("What about with only two arguments provided: ")
print(my_sum(1,2))

How about with FOUR arguments provided: 
10
What about with only two arguments provided: 
3


An example illustrating a simple function wherein I sneak in some Bayes' theorem because I love Bayes' and also because it harbours a fundamentally important philophy to computing, being able to update a prior probability/expectation based on new information. 

--------------------------------------

#### BAYES IS SUPER IMPORTANT IN COMPUTING (especially with biological data): 
----------------------------------------
* Functionally: it allows you to update your probability when you encounter new information or new evidence. 

Often we see Bayes' in the context of sensitivity and specificity of tests, like the tests for a certain disease. (https://en.wikipedia.org/wiki/Sensitivity_and_specificity)

Example: 

Let's imagine that you are looking for an early indicator of a type of cancer. This cancer, call it cancer "A", is found at 1.5% of a population. Your research group has found a biomarker that is predictive of 80% of sufferers of this cancer. 

Most people (and, yes, sadly this does include physicians, who have been surveyed about this repeatedly and still mostly get it wrong) would conclude: "80%?! That's great. That means if there is a positive test for this biomarker, the individual has an 80% chance of having cancer "A"". 

However, THAT IS DEFINITELY NOT the correct interpretation. 

Is this surprisingly to you? 

In order to get to the correct interpretation, we need to know the answer to another question: 

**How many individual without the disease would get a positive result for this biomarker?**

Let's plug in some numbers to illustrate the problem: 
--------------------------------------------------------------

False positive rate = 4%
True negative rate = 96%

-------------------------

True positive rate = 80%
False negative rate = 20%

-------------------------

So, this test will only fail to identify 20 out of 100 sufferers, and it will only misdiagnose 4 out of 100 healthy individuals. Still sounds good, eh? 

Ah, but no. 

Let's use round numbers: We have a population of 10,000 individuals. We expect 1.5% of them to suffer from cancer A which means about 150 individuals have cancer A. Of this 150 sufferers, 120 of them will - accurately - get a positive test for the biomarker (based on the 80% true positive rate). 

But 9850 people in this population don't have cancer "A". And, with just a 4% false positive rate (which is quite low), we still expect there be 394 to test positive! 

If you are patient who has recieved a positive result, you will want to know "what is the chance that i actually have cancer A?". 

You have to **update** your probability. You walked into the clinic with a 1.5% chance of having cancer A, but now - with a positive test result - you have updated your probability to approximately 23%. This percentage comes from the fact that there are 514 positive individuals over all, (120 true positives and 394 false positives).  

Notably, this is not 80%, though!

(* note: these numbers are for the CA-125 biomarker used to detect ovarian cancer, with a lifetime risk for women of developing ovarian cancer is 1.34% and it has a false positive rate of 4%)

In [None]:
def biomarker(pDisease, pPosDisease, pPosNoDisease):
    pNoDisease =1.0-pDisease
    pPos=pPosDisease*pDisease+pPosNoDisease*pNoDisease
    return round((pPosDisease*pDisease)/pPos,3)

pDisease=0.015
pPosDisease=0.8
pPosNoDisease=0.04
print("Probability(disease|positive result)= ", biomarker(pDisease, pPosDisease, pPosNoDisease))

In [None]:
# A simple and generalizable Bayesian function with Python String format
def bayes(outComeA,outComeB,pB,pAGivenB,pAGivenNotB):
    pNotB=1.0-pB
    pA=pAGivenB*pB+pAGivenNotB*pNotB
    pBGivenA=(pAGivenB*pB)/pA
    return 'p (%s|%s) = %.2f' %(outComeB,outComeA,pBGivenA)

#main part of program
geneName='TP53 Tumor protein p53 [Homo sapiens]'
geneID=7157
matchProbability=98.64756341
#print("The gene to be analyzed is: %s" % geneName)
print("The gene number is: %d: %s"%(geneID,geneName))

outComeA="Positive test result"
outComeB="has disease"
pB=0.015
pAGivenB=0.8
pAGivenNotB=0.04
print(bayes(outComeA,outComeB,pB,pAGivenB,pAGivenNotB))