# BMI565: Bioinformatics Programming & Scripting

#### (C) 2015 Michael Mooney (mooneymi@ohsu.edu)

## Week 2: Functions, Generators, Object Oriented Programming, and Modules

1. Code Organization / Modularity
2. Functions
    - Function Parameters
    - Variable Scope
3. Generators
4. Object-oriented Programming (OOP)
    - Class Definitions
    - Inheritance, Polymorphism, Encapsulation
    - The `Exception` Class
5. Python Modules
    - Installing Modules
    - Creating Your Own Modules

#### Requirements

- Python 2.7
- Data Files
    - `annot_test.txt`
- Miscellaneous Files
    - `variable_scope.jpg`
    - `cell.jpg`

## Code Organization / Modularity

Modularity is a key concept in a top-down problem solving strategy. In programming, modularity refers to breaking code into small pieces that can (essentially) function independently. The advantages of modular code are:

1. Makes code easier to read/interpret
2. Makes code easier to maintain/debug
3. Makes code easier to reuse
4. Can sometimes improve performance

In Python (and many other languages), creating functions and classes (OOP) are methods for modularizing code.

<b>Functional Programming</b>: Create data structures and perform operations on the data using functions. Functions can manipulate input and return that changed input as output, but they do not maintain an internal state. 

<b>Object-oriented Programming</b>: Create data structures (objects) that have associated functions (methods). The objects maintain an internal state, which is manipulated by methods.

## Functions

Functions are executable blocks of code with specific inputs and (optionally) outputs. A function's inputs are specified by parameters in the function definition. Ouputs are specified by the return value in the function definition. An example function definition is given below:

    def my_function(parameter1, parameter2):
        """
        Docstring explaining function
        """
        
        ...function code...
        
        return output


A function is called as follows:

    my_function(x, y)

\*Note: The number of arguments (x and y above) and their order are important.

In [1]:
def count_A(seq):
    """
    This function counts the number of As in a DNA sequence.
    """
    count = 0
    for base in seq:
        if base == "A":
            count = count + 1
    
    return count

In [2]:
seq = "ATAATAAGATGCGCGCGCGCGCTTATGCGCGCGCA"
count_A(seq)

8

### Function Parameters

When calling a function you can specify the parameters by name. In this case, the order is not important.

In [3]:
def foo(x, y, message):
    print message
    return x**y

foo(message="3 raised to the 4th power is:", x=3, y=4)

3 raised to the 4th power is:


81

You can also specify <b>default values</b> for parameters in the function definition. Parameters given default values must come after other parameters.

In [4]:
import random
def rolldice(num_rolls=1, sides=6):
    rolls = [random.randint(1,sides) for n in range(num_rolls)]
    return rolls
rolldice()

[2]

In [5]:
rolldice(3)

[2, 5, 6]

In [6]:
rolldice(3,20)

[20, 2, 17]

#### Flexible Function Parameters

There are two ways to specify additional (not explicitly defined) parameters in a function definition. Using the `*parameters` notation will load any extra arguments provided in the function call as a tuple of arguments. The `**parameters` notation will load additional arguments as a dictionary. In the latter case, arguments must be specified as `key=value` pairs.

In [7]:
## Example of a function allowing additional parameters
def arg_tuple(message, *parameters):
    print "Function Message: %s" % message
    for p in parameters:
        print p

arg_tuple("Hello, world!", 1, 2, 3)

Function Message: Hello, world!
1
2
3


In [8]:
arg_tuple("Hello, world!")

Function Message: Hello, world!


In [9]:
## Another example of a function allowing additional parameters
def arg_dict(seq, **parameters):
    print seq
    for k, v in parameters.items():
        print str(k)+": "+str(v)

arg_dict(seq, type="DNA", location="chr1:123-4956")

ATAATAAGATGCGCGCGCGCGCTTATGCGCGCGCA
type: DNA
location: chr1:123-4956


In [10]:
param_dict = {'type':'DNA', 'loc':"chr1:123-456"}
arg_dict(seq, **param_dict)

ATAATAAGATGCGCGCGCGCGCTTATGCGCGCGCA
loc: chr1:123-456
type: DNA


### Variable Scope

Variable scope defines where in your code a variable can be accessed. 

- Local Scope
    - Variables defined within a function definition are only accessible within that function
- Global Scope
    - Variables defined outside a function are accessible both outside and within a function
    
\**Don't change global variables within a function unless you use the `global` statement (use sparingly)

<img src="variable_scope.jpg" align="left" width="200" />

#### An Example:

In [11]:
## Define a global variable 'a'
a = 42

## Define a functin with a local variable 'a' 
def foo2():
    #global a 
    a = 13
    print a

## Will the global variable 'a' be changed?
print a
foo2()
print a

42
13
42


## Generators

Generators are a special type of function that returns an iterator. Values are yielded one at a time when the generator is called with the `.next()` method. Generators contain `yield` statements rather than `return` statements. 

Generators can be useful for: 
- Memory-intensive applications
- Parallelizing code
- Artificial intelligence applications where we want to dynamically search a parameter space

#### A Simple Example:

In [12]:
def abc():
    yield "a"
    yield "b"
    yield "c"

gen = abc()
gen.next()

'a'

In [13]:
gen.next()

'b'

In [14]:
gen.next()

'c'

In [15]:
gen.next()

StopIteration: 

In [16]:
letters = [l for l in abc()]
letters

['a', 'b', 'c']

#### Prime Number Example:

In [17]:
## Define a function that determines if a number is prime
def isprime(n):
    for i in range(2,n):
        if n % i == 0:
            return False
    return True

## Define a generator that yields prime numbers <= n
## By using a generator the entire list of numbers is not stored in memory
def getprime(n):
    for i in range(2,n+1):
        if isprime(i):
            yield i

prime_gen = getprime(10)
prime_gen.next()

2

In [18]:
prime_gen.next()

3

In [19]:
[n for n in getprime(11)]

[2, 3, 5, 7, 11]

## Object Oriented Programming (OOP)

<b>Class</b>: A data structure consisting of data fields and methods.

<b>Object</b>: An instantiated version of a class (an instance).

<img src="cell.jpg" align="center" width="400" />

Let's say we want to write a program to simulate a cell:

A <b>functional programmer</b> would:
1. Create data structures to represent the cell and its macromolecules
    - Data: DNA, mRNA, protein
2. Create functions to perform basic cellular tasks
    - Functions: DNA replication, transcription, translation, etc.

    dna = "ATGCGCTAAATTCA"
    mrna = transcribe(dna)
    protein = translate(mrna)

An <b>object-oriented programmer</b> would:
1. Create a class to represent the cell's macromolecules and methods associated with each
    - Data: DNA, mRNA, protein
    - Methods: DNA replication, transcription, translation, etc.

    myCell = Cell()
    myCell.dna = "ATGCGCTAAATTCA"
    myCell.transcribe()
    myCell.translate()

### Class Definitions

    class Book:
        title = "NA"
        author = "NA"
        
        def __init__(self, t=title, a=author):
            if t is not None:
                self.title = title
            if a is not None:
                self.author = author
        
        def set_pages(self, pages):
            self.num_pages = pages

<b>Definitions</b>:
- `title` and `author` are class attributes (variables associated with the class)
- `self` refers to the object itself, and should be a parameter in all class methods
- `self.title`, `self.author`, and `self.num_pages` are object attributes (variables associated with an object of type Book)
- `__init__` is a special method that is executed when an object is created
- `set_pages()` is a method

In [20]:
## Define the Book class
class Book:
    title = "NA"
    author = "NA"

    def __init__(self, t=title, a=author):
            self.title = t
            self.author = a

    def set_pages(self, pages):
        self.num_pages = pages

In [21]:
## Create a Book object
scifi_book = Book()
print scifi_book.title

NA


In [22]:
scifi_book = Book("Ender's Game", "Orson Scott Card")
print scifi_book.title
print scifi_book.author

Ender's Game
Orson Scott Card


In [23]:
scifi_book.set_pages(324)
print scifi_book.num_pages

324


#### Variable Scope

Rules for variable scope apply in class definitions the same as for function definitions. However, `self` is used to indicate object attributes, which will be preserved outside of methods. Any variables not declared using `self` are treated as local variable and will not be accessible outside a class method.

    class Book:
        title = "NA"
        author = "NA"
        
        def __init__(self, t=title, a=author):
            language = "English"
            self.title = t
            self.author = a
        
        def set_pages(self, pages):
            self.num_pages = pages

The `language` variable has local scope within the `__init__` method and will be lost after the object is created.

In [24]:
## An example of a class for DNA sequences
class Sequence:
    TranscriptionTable = {"A":"U","T":"A","C":"G","G":"C"}
    def __init__(self, seqstring):
        assert len(set(seqstring.upper()) - set(['A', 'T', 'C', 'G'])) == 0, "Not a DNA sequence!"
        self.seqstring = seqstring.upper()
    def transcribe(self):
        tt = ""
        for x in self.seqstring:
            tt += self.TranscriptionTable[x]
        return tt

seqObj = Sequence(seq)
seqObj.transcribe()

'UAUUAUUCUACGCGCGCGCGCGAAUACGCGCGCGU'

In [25]:
seqObj.TranscriptionTable

{'A': 'U', 'C': 'G', 'G': 'C', 'T': 'A'}

In [26]:
Sequence.TranscriptionTable

{'A': 'U', 'C': 'G', 'G': 'C', 'T': 'A'}

In [27]:
seq

'ATAATAAGATGCGCGCGCGCGCTTATGCGCGCGCA'

### Inheritance, Polymorphism, Encapsulation

Inheritance, polymorphism (a.k.a. overloading), and encapsulation are three important feature of the object-oriented programming paradigm.

1. Inheritance:
    - The ability to create a new class that automatically includes (inherits) the features of another class
    - The new class can be modified to represent a more specific phenomenon
    - Example: a mammal class can be modified to create a dog class
    - Advantage: ability to reuse and tailor classes
2. Polymorphism (a.k.a. Overloading):
    - The ability of a method to behave differently depending on the object type
    - Example: the `+` operator behaves differently for strings and numeric data types
    - Advantage: ability to create a common interface
3. Encapsulation:
    - The ability to make variables and methods private, so they are not easily modified
    - There is limited support for this in Python (mangling)
    - Advantage: ability to simplify code and prevent misuse

#### Inheritance

Inheritance allows us to create a new class, a <b>derived class</b>, that retains all methods and attributes from a <b>base class</b>. We can specify new class variables and methods that overwrite those in the base class.

In [28]:
## Define a new class that inherits from the Sequence class
class Plasmid(Sequence):
    abResDict = {"Tet":"CTAGCAT", "Amp":"CACTACTG"}

pbr322 = Plasmid("ttctcatgtttgacagctta")
pbr322.seqstring

'TTCTCATGTTTGACAGCTTA'

In [29]:
pbr322.transcribe()

'AAGAGUACAAACUGUCGAAU'

In [30]:
## Add attributes and a method to the Plasmid class
class Plasmid(Sequence):
    abResDict = {"Tet":"CTAGCAT", "Amp":"CACTACTG"}
    def __init__(self, seqstring, iname):
        self.insert_name = iname
        Sequence.__init__(self, seqstring)
    def abRes(self, ab):
        if self.abResDict[ab] in self.seqstring:
            return True
        else:
            return False

pbr322 = Plasmid("ttctcatgtttgacagctta", "Akt1")
pbr322.seqstring

'TTCTCATGTTTGACAGCTTA'

In [31]:
pbr322.abRes('Tet')

False

#### Polymorphism

Polymorphism allows us to modify common operators and methods to behave in a specific way depending on object type. This means that we can create customized classes that still behave as one would expect a Python object to behave. We can do this using special class methods. 

####Special Methods
<table align="left">
<tr><td style="text-align:center"><b>Method</b></td><td><b>Description</b></td></tr>
<tr><td style="text-align:center">`__init__()`</td><td>Executed every time an object is created</td></tr>
<tr><td style="text-align:center">`__repr__()`</td><td>Returns a string that is a unique representation of the object; result of `repr(object)`</td></tr>
<tr><td style="text-align:center">`__str__()`</td><td>Returns a string to be displayed when the object is printed; result of `str(object)`</td></tr>
<tr><td style="text-align:center">`__len__()`</td><td>Returns a length value for the object; result of `len(object)`</td></tr>
<tr><td style="text-align:center">`__add__()`</td><td>Specifies the behavior of the `+` operator</td></tr>
</table>

#### Encapsulation

Encapsulation allows us to keep portions of code (variables, methods) private, meaning they are not viewable or usable. This allows us to control how other use our code. In Python, "mangling" allows us to make variables and methods <i>less</i> visible. To create a mangled variable or method, place two underscores before the variable or method name.

    __MangledVar = 13
    
    def __MangledMethod(self, msg):
        print "Secret Message: " + str(msg)


Mangled variables or methods can still be accessed, but the class name must be included.

    object._ClassName__MangledVar

In [32]:
## Define a class with a mangled attribute
class MangledMsg:
    __version = 1.0
    def __init__(self, msg):
        self.msg = msg

mang_obj = MangledMsg("Hello, world!")
print mang_obj._MangledMsg__version

1.0


In [33]:
print mang_obj.__version

AttributeError: MangledMsg instance has no attribute '__version'

### The `Exception` Class

An exception is a signal that an error has occured. We'll talk about exceptions more when we dicuss error checking and code testing.

#### Creating Custom Exceptions

We can create a custom exception class by inheriting from the base `Exception` class.

In [34]:
## A custom exception to be raised when we encounter
## an odd number
class NotEvenNumberError(Exception):
    def __init__(self, num):
        self.num = num
    def __str__(self):
        return str(self.num) + " is not an even number!"

In [35]:
## Use the raise statement to activate your exception
num = 3
if num % 2 != 0:
    raise NotEvenNumberError(num)

NotEvenNumberError: 3 is not an even number!

## Python Modules

Modules are files containing code that can be easily imported and re-used. There are a number of built-in modules that are included with the standard Python installation. We've seen some of these already, such as `csv`, `re`, `random`, etc.

    import os
    os.listdir(".")  ## lists files in the current directory
    os.getcwd()  ## prints the current directory
    
    import random as ran ## use 'as' to rename a module
    ran.randint(0,10)
    
We can also import specific functions or variables that can be called without the module name.

    from random import randint
    randint(1,6)

### Installing Modules

Installing modules using the `pip` package manager:
    
    pip install <package-name>

To install a package from source, first download and decompress the source package. Then change to the source directory and execute the following command:
    
    python setup.py install
    
A list of available packages: [http://pypi.python.org/pypi](http://pypi.python.org/pypi)


### Creating Your Own Modules

A module can be any function(s) we define and save as a `.py` file. We would import our module as follows:

    import my_module  ## imports my_module.py

Make sure that your module (`my_module.py`) is located in the same directory as your program or in the `Python Path`.

    import sys
    print sys.path
    sys.path.append("/home/mooneymi/mymodules")
    
** Caution: Don't name your module the same as common Python modules (e.g. sys, csv, re, random). This can lead to faulty code.

Note that importing a module with the `import` statment will work only once per session in the Python interpreter. Use the `reload()` function if you have made changes to the module and want to re-import the module.

    reload(my_module)

#### Why create modules?

- Portability of tools
- Code organization
- Specify default values/parameters
- Sharing code with others (dissemination, publication)
    - Packages: "collections" of modules (allows modules to be grouped under a common name)
    - [http://pypi.python.org/pypi](http://pypi.python.org/pypi)

#### Execution as a Main Program

Although modules are designed to contain code extensions (i.e., functions that can be called by another program), in certain circumstances we might want to execute the module itself as a stand-alone program:

    python my_module.py

At the end of the `my_module.py` module we would place the following code:

    if __name__ == "__main__":
        ## code to run when module is executed from command-line, but not when it is imported
    else:
        ## do nothing when imported


Variables with double underscores ( `__name__` ) are special built-in Python variables. The `__name__` variable contains the module name, and programs can use it to determine from where they have been executed.

## In-Class Exercises

In [None]:
## Exercise 1.
## Write a function that parses the microarray annotation file (as you did last week).
## The function should take one argument (the path to the file) and should return a list 
## of probe IDs and a list of gene symbols.
##

## Exercise 2. 
## Create a module that contains the above function.
## Import the module and use the function to assign the two lists to variables.
##


## References

- <u>Python for Bioinformatics</u>, Sebastian Bassi, CRC Press (2010)
- <u>Python Essential Reference</u>, David Beazley, 4th Edition, Addison‐Wesley (2008)
- [http://docs.python.org/](http://docs.python.org/)
- [https://docs.python.org/2/howto/functional.html](https://docs.python.org/2/howto/functional.html)
- [https://docs.python.org/2/tutorial/errors.html#user-defined-exceptions](https://docs.python.org/2/tutorial/errors.html#user-defined-exceptions)

#### Last Updated: 22-Sep-2015