# Summary
1. functions
   1.  syntax - indentation, top of cell
   2.  return - if no return, it still returns None type
   3.  arguments - ordered, unordered, default values, variable number of arguments (*argv)
   5.  calling - in order to be evaluated, you must call them from the body of the program
   6.  functions can call other functions and even themselves (recursion)
  
2. Bundling functions so they can be used by multiple programs

In [None]:
# here is an example that calls a module named random. This is usually standard when you download Python.
import random

print("Lucky numbers! Three numbers will be generated")
print("If one of them is 5, you lose.")

count = 0
while count <3:
    num=random.randint(1,6)
    print(num)
    if num == 5:
        print("Sorry, you lose!")
        break # the break means the else loop will be bypassed if the if condition is met
    count+=1
else:
    print("you win!")

print("See you again.")

In [None]:
# this is a bit fussy, but instead of importing the entire random module, you can import only ONE function from it
from random import randint
#since you are importing a specific function, randint, you don't need to call it as a method of random like you did above:
#  num=random.randint(1,6). You can just call randint(lower number,upper number) directly.
# Generates a number from 1 through 10 inclusive
random_number = randint(1, 10)

guesses_left = 3
# Start your game!
while guesses_left>0:
    #cheating for the purpose of trouble shooting
    #print(random_number)
    guess=int(input("Your guess: "))
    if guess==random_number:
        print("You win!")
        #the break goes through the loop once and then breaks out of it!
        break
    guesses_left-=1
else:
    print("You lose")

## Modularization: Tools when you need them, they don’t automatically load
* Collection of specialized functions, data types
    * More efficient
    * Allows for code re-use
    * Acts as documentation for other programmers (or you) who read your program later
* Library – a module which contains groups of related functions (we will see this in R)
* In Module 5B when we are dealing with regular functions we will see two particular modules that need to be imported (re,os)

* We can import modules (we saw this in an earlier example) that contain functions and variables or we can create our own:

<div class="alert alert-block alert-warning">
        
        import math   
you can also import ALL functions from a module by using the astericks symbol, *

        print(math.sqrt(100))
        
 <div class="alert alert-block alert-warning">
    
    
or, equivalently, we can bring in just one function from a module like so:
        
		from math import sqrt
		print(sqrt(100))

# Modules

* Python (and most other languages) have built-in functions that are general use
* Modules and libraries (we’ll learn about them later) are a way to address discipline-centric functions
    * Sometimes, we will will need to write a particular function for our own specific research needs
* Come with documentation
* The documentation should give you a list of names in the namespace
* To get list of names in namespace:

>import my_module

>print(dir(my_module))

* Conventions:
* Uppercase names: constants
* _names <- are internal use only
* _ _ names <- special meaning

## Namespace
* Scope of the name of the function
* Modules have their own namespace (the names of the functions that belong to that particular module)

>from my_module import print (<--this imports the print function from my_module instead of using the global print function)

>from my_module import * (this means import all functions from my_module)

# How to create a module
* create new file (easiest if in the same place as your modules, but isn't necessary)
* replace .txt with .py
* write the functions in the module

Here is a simple example (modified from previous module example) that we are going to cut and paste into a module:

------------------------------
```
def AT_amount(dna):
    length = len(dna)
    A_count = dna.upper().count("A")
    T_count = dna.upper().count("T")
    AT_count=(A_count + T_count)/length
    return AT_count
```

In [None]:
# We are going to create a get_at_content function by treating it as a module
# We call the function by calling the file name without the “.py “ extension
# Module4B.py

import Module4B
# built in function dir() should give the functions that are in Module4B.py but
# only one is available to be called the others are built in constructors - we'll
# worry about some of that when we create Objects of our own (in
# object oriented programming section)
print(dir(Module4B))

x=Module4B.AT_amount("ATGATTA")
# we could even use the built in round function that we learned about previously.
print("*******")
print(round(x,3))

['AT_amount', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
*******
0.857


In [None]:
# https://learnpythonthehardway.org/book/ex25.html
# We are creating a module with the code in this cell which we will then import into our program below.
def break_words(stuff):
#This function will break up words for us.
#the split method splits on whatever is specified - if nothing is specified, just whitespace
    words=stuff.split()
    return words

def sort_words(words):
#sorts the words.
#NOTE: SORTED WILL RETURN A LIST THAT IS SORTED BUT IT WON'T MODIFY THE ORIGINAL LIST
# WHICH IS WHAT SORT() DOES!
    return sorted(words)

def print_first_words(words):
#Prints the first word after popping it off
    word=words.pop(0)
    print(word)

def print_last_words(words):
#Prints the last word after popping it off
    word=words.pop(-1)
    print(word)

def sort_sentence(sentence):
#Takes in a full sentence and returns the sorted words.
    #a function calling another function
    words=break_words(sentence)
    print(sort_words(words))

def print_first_and_last(sentence):
#Prints the first and the last words of a sentence
#a function calling another function
    words=break_words(sentence)
    print_first_word(words)
    print_last_word(words)

def print_first_and_last_sorted(sentence):
#sorts the words and then prints the first and last.
#a function calling another function
    words=sort_sentence(sentence)
    print_first_words(words)
    print_last_words(words)


In [None]:
# We are going to call the program above as a module
# in order to do that, we will call it with the name only (no .py ending!)

import Module4Be_from_LPTHW

sentence = "There is grandeur in this view of life"
words = (Module4Be_from_LPTHW.break_words(sentence))
print(words)
sorted_words = Module4Be_from_LPTHW.sort_words(words)
print(sorted_words)
Module4Be_from_LPTHW.print_first_words(words)
Module4Be_from_LPTHW.print_last_words(words)
print(sorted_words)


['There', 'is', 'grandeur', 'in', 'this', 'view', 'of', 'life']
['There', 'grandeur', 'in', 'is', 'life', 'of', 'this', 'view']
There
life
['There', 'grandeur', 'in', 'is', 'life', 'of', 'this', 'view']


In [None]:
# we can also import ONLY specific functions from a module
# note the slightly different format
from Module4Be_from_LPTHW import break_words

sentence = "There is grandeur in this view of life"
# note you don't need to call the function as Module4Be_from_LPTHW.break_words(), you can just called it as break_words()
words = break_words(sentence)
print(words)

['There', 'is', 'grandeur', 'in', 'this', 'view', 'of', 'life']


This is a **simple** simulator of mutation which will clarify the type of large state space that MCMC addresses efficiently. There is a bit more to MCMC than is given in this program but it is a start... We will start with a simple four nucleotide sequence "CTAG" (note that you can, of course, start with any four letter nucleotide) and mutate it a specified number of times. We will use an extremely simplified mutation matrix (picture illustrating the model is given below) with a probability model defined as: the transition -0.1, tranversion - each transversion is 0.05- and no mutation -0.8- probabilities are the same for all nucleotides among other simplifying assumptions. To refresh your molecular evolution knowledge: Transversion is when a purine mutates to a pyramidine or vice versa, i.e. A--> C, A-->T or C-->G, transition is when a purine mutates to a different purine (A--> G or G-->A) or a pyramadine mutates to a different pyramidine (T-->C or C---> T).

There is a question on one of your problem sets where I ask you to re-write this mess of a simulator in a much more elegant/efficient manner.

Here is the breakdown of strategy:
1. Start sequence is specified. In this case, we are running the string "CTAG" but I should be able to substitute any sequence of *any length*.
2. Function that takes the starting sequence and number of times to repeat as an argument and using conditions, mutates the start string n times appropriately. Note that in this model, a mutation event can be a transition, transversion or no mutation at all. In fact, out of 1000000 simulations, you can expect mutations will occur about 200000 times (20%).
3. We will want to sample the mutated sequence at different points in our simulator. This will partially be to trouble shoot and de-bug but also because we often want to do this in MCMC to sort of test whether the equilibrium state has been reached (that doesn't really work under this particular model since the nucleotides are considered completely independent but most models are more complex so it is a good habit to incorporate into this kind of code.)

Here is the function that lays out one round of where the mutations happen. We can call the function with a specified *nsteps*.

In [None]:
# Check out how you can determine how fast code it with a magic command:
%time
# Note: that depending on your operating system you may need to use the time module: from timeit import *.
# you will see at the bottom of this cell, where I call the function, I have surrounded the call
# by a start variable and an end variable. This should produce the same kind of behaviour in both windows and OS.
# ---------------------------------------
# This is a terrible and inefficient program and I want YOU to do BETTER!
# we will import all the functions in the random module and we won't have to use the syntax

import timeit
# random.randint() if we use this format; we can just use randint() instead.
from random import *

# This is a poorly written script and it doesn't do what I want it to do. Improve on it!
def mutation(current_seq,nsteps):
    while(nsteps > 1):
        if(nsteps==1 or nsteps==10 or nsteps==1000 or nsteps==10000):
            print(current_seq)
            print(nsteps)
        # one of the loci between site 0 and 3
        #print("************")
        current_loci = randint(0,3)
        #print(current_loci)
        # pick a number between 1 and 20 to determine if that site mutates or not
        current_mut_event = randint(1,20)
        #print("~~~~~~~~")
        #print(current_mut_event)
    # No MUTATION
        if(current_mut_event<=16):
            pass
    #MUTATION
        else:
            if(current_mut_event==17 or current_mut_event==18):  #TRANSITION MUTATIONS
      #replace the nt with a transition mutation
                if(current_seq[current_loci] == "A"):
                    current_seq[current_loci]= "G"

                if(current_seq[current_loci] == "G"):
                    current_seq[current_loci] = "A"

                if(current_seq[current_loci] == "T"):
                    current_seq[current_loci] = "C"

                if(current_seq[current_loci]== "C"):
                    current_seq[current_loci] = "T"

            else: #transversions are here
                if(current_mut_event==19): #TRANSVERSION 1
                    #print("What is the current_seq and current_loci? transversion 1")
                    #print(current_seq)
                    #print(current_loci)
                    if(current_seq[current_loci]=="A" or current_seq[current_loci]=="G"):
                        current_seq[current_loci]= "T"
                    else:
                        current_seq[current_loci] ="A"
                   # print("Am I where I am supposed to be in transversion 1?")
                    #print(current_seq[current_loci])

            #TRANSVERSION 2
                else:
                    #print("What is the current_seq and current_loci? transversion 2")
                    #print(current_seq)
                    #print(current_loci)
                    if(current_seq[current_loci]=="A" or current_seq[current_loci]=="G"):
                        current_seq[current_loci] = "C"
                    else:
                        current_seq[current_loci] = "G"
        #print("there can be only one")
        #print(current_seq)
        nsteps = nsteps-1
        #print(nsteps)
    return current_seq

# Now we need to call this inefficient excuse of a function and assign the results to a variable:
start=timeit.timeit()
test = mutation(["C","T","A","G"],10000)
end=timeit.timeit()
print(test)
print(start - end)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.96 µs
['C', 'T', 'A', 'G']
10000
['A', 'A', 'G', 'A']
1000
['C', 'T', 'C', 'A']
10
['T', 'G', 'A', 'A']
0.0005913340137340128


## Assertions - debugging
* So far, in order to debug, we have relied on two general techniques:
     1. print statements
     2. Error statements (along with liberal use of google)
* There are two other, more 'official' ways:
        1. try/except
        2. Assertions
        
* Assertions are a way of testing a function to ensure that it results in the expected value (are functions working as intended?)

    * Assertions let us test functions… *especially ones that we have modified* (to ensure that we haven’t introduced any new errors)
    
* If assertion turns out to be false, the program stops and throw AssertionError

* Useful also as documentation – by including a collection of assertion tests alongside a function, we can show exactly what output is expected

* With the example that we have been using so far, we might run the following short code where we know what the function should give us:

> **assert get_at_content(“ATGC”)==0.5**

* If the assertion turns out to be false, the program will stop and it will through an AssertionError exception

Here is Rear Admiral Dr. Grace Murray Hopper:

<div class="alert alert-block alert-info">
<img src="http://image.slidesharecdn.com/FromAbacustoiPhonetoCrestaTVshare-090220174115-phpapp02/95/from-abacus-to-i-phone-to-cresta-tv-share-40-728.jpg?cb=1235151982" alt="Alt text that describes the graphic" title="Title text" />


In [None]:
# Here is an example of a function that can take a variable number of arguments by using *args
def biggest_number(*args):
    print("~~~~~~~~")
    #print(max(args))
    return max(args)

# we pass in 5 items
big = biggest_number(-10,-5,5,10,100)
# we pass in 3 items
big2 = biggest_number(1,4,10000)
# these should be true
assert big ==100
assert big2 == 10000
# this is not true
assert biggest_number(-10,-5,5,10,100)==100
# this is true but we won't know that until we fix the assert statement above since the program stops when it encounters the error
assert biggest_number(-20,-5,5,10)==10

~~~~~~~~
~~~~~~~~
~~~~~~~~
~~~~~~~~


Class Questions:
----------------
1. Here is a 20 amino acid long protein sequence:
MSRSLLLRFLLFLLLLPPLP

a.	Write a function that takes two arguments, a protein sequence and an amino acid residue code (one letter that represents the amino acid, if you have no idea what I am talking about than go here: https://en.wikipedia.org/wiki/Amino_acid), and returns the percentage of the protein that the amino acid makes up. The following assertions – which should all be true --  need to be included in your code to test your function:

	assert my_function(“MSRSLLLRFLLFLLLLPPLP”,”M”)==5
	assert my_function(“MSRSLLLRFLLFLLLLPPLP”,”r”)==10
	assert my_function(“msrslllrfllfllllpplp”,”L”)==50
	assert my_function(“MSRSLLLRFLLFLLLLPPLP”,”Y”)==0

b.	Modify the function in part a so that it accepts a list of amino acid residues rather than a single one. Your function should pass the following assertions:
  
	assert my_function(“MSRSLLLRFLLFLLLLPPLP” [“M”])==5
	assert my_function(“MSRSLLLRFLLFLLLLPPLP”,[“M”,”L”])==55
	assert my_function(“MSRSLLLRFLLFLLLLPPLP”,[“F’,”S”,”L”])==70
	assert my_function(“MSRSLLLRFLLFLLLLPPLP”)==65

     
Note that you will need to ensure that  there is an argument in order to pass the last assertion (and that value should be ["F","M","L"]).

2. Write a function that bundles your board game board (we created a board game board previously, but now we want to bundle it into a function to be called).

