# Python: Functions and Modules

Materials by: John Blischak, Anthony Scopatz, and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy and many more)

Computers are very useful for doing the same operation over and over. When you know you will be performing the same operation many times, it is best to encapsulate this similar code into a function or a method.  Programming functions are related to mathematical functions, e.g. $f(x)$, and it is helpful to think of them as abstract operators that produce output given some input. Let's look at some examples to solidify this concept.

# Built-in string methods
The base distribution comes with many useful functions. When a function works on a specific type of data (lists, strings, dictionaries, etc.), it is called a method.  I will cover some of the basic string methods since they are very useful for reading data into Python.

In [1]:
# Find the start codon of a gene
dna = 'CTGTTGACATGCATTCACGCTACGCTAGCT'
dna.find('ATG')

8

In [3]:
str.split?

In [4]:
# parsing a line from a comma-delimited file
lotto_numbers = '4,8,15,16,23,42\n'
lotto_numbers = lotto_numbers.strip()
print lotto_numbers.split(',')

['4', '8', '15', '16', '23', '42']


In [5]:
str.replace?

In [6]:
question = '%H%ow%z%d%@d%z%th%ez$%@p%ste%rzb%ur%nz%$%@szt%on%gue%?%'
question = question.replace('%', '')
question = question.replace('@', 'i')
question = question.replace('$', 'h')
question = question.replace('z', ' ')
print question

How did the hipster burn his tongue?


In [7]:
answer = '=H=&!dr=a=nk!c=~ff&&!be=f~r&=!i=t!w=as!c=~~l.='
print answer.replace('=', '').replace('&', 'e').replace('~', 'o').replace('!', ' ')

He drank coffee before it was cool.


## Short Exercise: Calculate GC content of DNA

Because the binding strength of guanine (G) to cytosine (C) is different from the binding strength of adenine (A) to thymine (T) (and many other differences), it is often useful to know the fraction of a DNA sequence that is G's or C's. Go to the [string method section](http://docs.python.org/2/library/string.html) of the Python documentation and find the string method that will allow you to calculate this fraction.

In [17]:
# Calculate the fraction of G's and C's in this DNA sequence
seq1 = 'ACGTACGTAGCTAGTAGCTACGTAGCTACGTA'
num_G = seq1.count('G')
num_C = seq1.count('C')
total = float(len(seq1))
gc = (num_G + num_C) / total

print num_G, num_C, total, gc


gc = (seq1.count('G') + seq1.count('C'))/float(len(seq1))

print gc

8 7 32.0 0.46875
0.46875


In [10]:
str.count?

Check your work:

In [15]:
round(gc, ndigits = 2) == .47

True

# Creating your own functions!
When there is not an available function to perform a task, you can write your own functions.  The simplest functions have the  following format in Python:

    def <function name>():
        <function body>

In [18]:
def do_nothing():
    s = "I don't do much"

However, this often isn't very useful since we haven't returned any values from this function.  Note: that if you don't return anything from a function in Python, you implicitly have returned the special `None` singleton.  To return values that you computed locally in the function body, use the **return** keyword.

    def <function name>():
        <function body>
        return <local variable 1>    

In [19]:
def square(x):
    sqr = x * x
    return sqr

Using a function is done by placing parentheses `()` after the function name after you have defined it.  This is known as **calling** the function.  If the function requires arguments, the values for these arguments are inside of the parentheses.

In [22]:
result = square(4)
print result


print square(2)

16
4


Like mathematical functions, you can compose a function with other functions or with itself!

In [23]:
print square(square(2))

16


Functions may be defined such that they have multiple arguments or multiple return values:

    def <function name>(<arg1>, <arg2>, ...):
        <function body>
        return <var1> , <var2>, ...

In [24]:
def hello(time, name):
    """Print a nice message. Time and name should both be strings.
    Example: hello('morning', 'Software Carpentrby')
    """
    print 'Good ' + time + ', ' + name + '!'

In [25]:
hello?

In [28]:
hello('Billy', 'morning')

Good Billy, morning!


The description right below the function name is called a docstring. For best practices on composing docstrings, read [PEP 257 -- Docstring Conventions](http://www.python.org/dev/peps/pep-0257/).

## Short exercise: Write a function to calculate GC content of DNA
Make a function that calculates the GC content of a given DNA sequence. For the more advanced participants, make your function able to handle sequences of mixed case (see the third test case).

In [31]:
def calculate_gc(x):
    """Calculates the GC content of DNA sequence x.
    x: a string composed only of A's, T's, G's, and C's."""
    x = x.upper()
    num_G = x.count('G')
    num_C = x.count('C')
    total = float(len(x))
    gc = (num_G + num_C) / total
    return gc

Check your work:

In [32]:
print round(calculate_gc('ATGC'), ndigits = 2) == 0.50
print round(calculate_gc('AGCGTCGTCAGTCGT'), ndigits = 2) == 0.60
print round(calculate_gc('ATaGtTCaAGcTCgATtGaATaGgTAaCt'), ndigits = 2) == 0.34

True
True
True


In [33]:
str.upper?

# Modules
Python has a lot of useful data type and functions built into the language, some of which you have already seen. For a full list, you can type `dir(__builtins__)`. However, there are even more functions stored in modules. An example is the sine function, which is stored in the math module. In order to access mathematical functions, like sin, we need to `import` the math module. Lets take a look at a simple example:

In [34]:
print sin(3) # Error! Python doesn't know what sin is...yet

NameError: name 'sin' is not defined

In [35]:
import math # Import the math module
math.sin(3)

0.1411200080598672

In [36]:
dir(math) # See a list of everything in the math module

['__doc__',
 '__file__',
 '__name__',
 '__package__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'hypot',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'modf',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'trunc']

In [37]:
help(math) # Get help information for the math module

Help on module math:

NAME
    math

FILE
    /Users/rowellw/anaconda/lib/python2.7/lib-dynload/math.so

MODULE DOCS
    http://docs.python.org/library/math

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)
        
        Return the arc cosine (measured in radians) of x.
    
    acosh(...)
        acosh(x)
        
        Return the hyperbolic arc cosine (measured in radians) of x.
    
    asin(...)
        asin(x)
        
        Return the arc sine (measured in radians) of x.
    
    asinh(...)
        asinh(x)
        
        Return the hyperbolic arc sine (measured in radians) of x.
    
    atan(...)
        atan(x)
        
        Return the arc tangent (measured in radians) of x.
    
    atan2(...)
        atan2(y, x)
        
        Return the arc tangent (measured in radians) of y/x.
        Unlike atan(y/x), the signs of both x and y are conside

It is not very difficult to use modules - you just have to know the module name and import it. There are a few variations on the import statement that can be used to make your life easier. Lets take a look at an example:

In [38]:
from math import *  # import everything from math into the global namespace (A BAD IDEA IN GENERAL)
print sin(3)        # notice that we don't need to type math.sin anymore
print tan(3)        # the tangent function was also in math, so we can use that too

0.14112000806
-0.142546543074


In [39]:
reset # Clear everything from IPython

Once deleted, variables cannot be recovered. Proceed (y/[n])? y
Don't know how to reset  #, please run `%reset?` for details
Don't know how to reset  clear, please run `%reset?` for details
Don't know how to reset  everything, please run `%reset?` for details
Don't know how to reset  from, please run `%reset?` for details
Don't know how to reset  ipython, please run `%reset?` for details


In [1]:
from math import sin  # Import just sin from the math module. This is a good idea.
print sin(3)          # We can use sin because we just imported it
print tan(3)          # Error: We only imported sin - not tan

0.14112000806


NameError: name 'tan' is not defined

In [2]:
reset                 # Clear everything

Once deleted, variables cannot be recovered. Proceed (y/[n])? y
Don't know how to reset  #, please run `%reset?` for details
Don't know how to reset  clear, please run `%reset?` for details
Don't know how to reset  everything, please run `%reset?` for details


In [1]:
import math as m      # Same as import math, except we are renaming the module m
print m.sin(3)        # This is really handy if you have module names that are long

0.14112000806


## The General Problem
![xkcd](http://imgs.xkcd.com/comics/the_general_problem.png "I find that when someone's taking time to do something right in the present, they're a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they're a master artisan of great foresight.")
From [xkcd](http://www.xkcd.com)

Now that you can write your own functions, you too will experience the dilemma of deciding whether to spend the extra time to make your code more general, and therefore more easily reused in the future.

# Longer exercise: Reading Cochlear implant into Python
For this exercise we will return to the cochlear implant data first introduced in the section on the shell. In order to analyze the data, we need to import the data into Python. Furthermore, since this is something that would have to be done many times, we will write a function to do this. As before, beginners should aim to complete Part 1 and more advanced participants should try to complete Part 2 and Part 3 as well.

## Part 1: View the contents of the file from within Python
Write a function `view_cochlear` that will open the file and print out each line. The only input to the function should be the name of the file as a string. 

In [None]:
def view_cochlear(filename):
    """Write your docstring here.
    """
       

Test it out:

In [None]:
view_cochlear('/home/swc/boot-camps/shell/data/alexander/data_216.DATA')

In [None]:
view_cochlear('/home/swc/boot-camps/shell/data/Lawrence/Data0525')

## Part 2: 
Adapt your function above to exclude the first line using the flow control techniques we learned in the last lesson. The first line is just `#` (but don't forget to remove the `'\n'`).

In [None]:
def view_cochlear(filename):
    """Write your docstring here.
    """
    

Test it out:

In [None]:
view_cochlear('/home/swc/boot-camps/shell/data/alexander/data_216.DATA')

In [None]:
view_cochlear('/home/swc/boot-camps/shell/data/Lawrence/Data0525')

## Part 3:
Adapt your function above to return a dictionary containing the contents of the file. Split each line of the file by a colon followed by a space (': '). The first half of the string should be the key of the dictionary, and the second half should be the value of the dictionary.

In [None]:
def save_cochlear(filename):
    """Write your docstring here.
    """

Check your work:

In [None]:
data_216 = save_cochlear("/home/swc/boot-camps/shell/data/alexander/data_216.DATA")
print data_216["Subject"]

In [None]:
Data0525 = save_cochlear("/home/swc/boot-camps/shell/data/alexander/data_216.DATA")
print Data0525["CI type"]

# Bonus Exercise: Transcribe DNA to RNA
### Motivation:
During transcription, an enzyme called RNA Polymerase reads the DNA sequence and creates a complementary RNA sequence. Furthermore, RNA has the nucleotide uracil (U) instead of thymine (T). 
### Task:
Write a function that mimics transcription. The input argument is a string that contains the letters A, T, G, and C. Create a new string following these rules: 

* Convert A to U

* Convert T to A

* Convert G to C

* Convert C to G

Hint: You can iterate through a string using a for loop similarly to how you loop through a list.

In [None]:
def transcribe(seq):
    """Write your docstring here.
    """

Check your work:

In [None]:
transcribe('ATGC') == 'UACG'

In [None]:
transcribe('ATGCAGTCAGTGCAGTCAGT') == 'UACGUCAGUCACGUCAGUCA'