[![Py4Life](https://raw.githubusercontent.com/Py4Life/TAU2015/gh-pages/img/Py4Life-logo-small.png)](http://py4life.github.io/TAU2015/)
## Lecture 5 - 15.4.2015
### Last update: 23.3.2015
### Tel-Aviv University / 0411-3122 / Spring 2015

# Previously

- 
- 
- 

# Today

- 
- 
- 

# Testing & Debugging

>  Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. _ — B. W. Kernighan and P. J. Plauger, [The Elements of Programming Style](http://www.amazon.com/gp/product/0070342075?ie=UTF8&tag=catv-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=0070342075).

> Code that cannot be tested is flawed.

> Why do we never have time to do it right, but always have time to do it over?

> Fast, good, cheap: pick any two. - _[Project management triangle](http://en.wikipedia.org/wiki/Project_management_triangle)_

![bugs](http://assets.nydailynews.com/polopoly_fs/1.1064084!/img/httpImage/image.jpg_gen/derivatives/landscape_635/bugs01-web.jpg)

# Bug categories

## Errors

`SyntaxError`: Illegal Python code. This error will appear when the program is preparing to run.

In [3]:
x = . 5

SyntaxError: invalid syntax (<ipython-input-3-572ce7711194>, line 1)

Often the error is precisely indicated, as above, but sometimes you have to search for the error on the previous line.

`IndentationError`: a line in the code has bad indentation

In [15]:
a = 7
 b = 5

IndentationError: unexpected indent (<ipython-input-15-a9531be39a36>, line 2)

This can be tricky at times, because sometimes the indentation seems OK but Python still complains -- this is usually because the indentation is in spaces when it needs to be in tabs, or vice versa.

The next sample of errors are _runtime_ errors - they only appear during when the program runs. Therefore, they can be elusive (don't always appear) because they depend on variable values and program flow.

`NameError`: A name (variable, function, module) is not defined.

In [5]:
b = a + 2

NameError: name 'a' is not defined

Look at the last of the lines starting with `File` to see where in the program the error occurs. The most common reasons for a `NameError` are

- a misspelled name,
- a variable that is not initialized,
- a function that you have forgotten to define,
- a module that is not imported.

`TypeError`: An object of wrong type is used in an operation.

In [6]:
n = 1
x = '2'
product = (1.0/(n+1))*(x/(1.0+x))**(n+1)

TypeError: unsupported operand type(s) for +: 'float' and 'str'

Print out objects and their types (here: `print(x, type(x), n, type(n))`), and you will most likely get a surprise. The reason for a `TypeError` is often far away from the line where the `TypeError` occurs.

`ValueError`: An object has an illegal value.

In [12]:
import math
z = -1
math.sqrt(z)

ValueError: math domain error

Print out the value of objects that can be involved in the error (here: `print(z)`).

`IndexError`: An index in a list, tuple, string, or array is too large.

In [13]:
values = [1,27,33,46,52]
n = 0
for i in range(len(values)):
    n += values[i+1]

IndexError: list index out of range

Print out the length of the list, and the index if it involves a variable (here: `print(len(values), i)`).

## Exercise 1 - Errors

Let's solve the following bugs. Each notebook cell has a single program with at least one bug that may either cause an error or cause the program to give the wrong answer.

Make the code work.

[This presentation](http://hplgit.github.io/teamods/debugging/._debug003.html) has a listing of common Python errors and some hints on how to debug them.

In [1]:
x = '7'
y = 8
z = x + y
print(z)

TypeError: Can't convert 'int' object to str implicitly

In [2]:
x = 1
y = 0
while x < 4:
    y += x
print(y)

KeyboardInterrupt: 

In [3]:
switch = 'on'
if switch = 'off':
    print('go home')

SyntaxError: invalid syntax (<ipython-input-3-7d0bba41f18f>, line 2)

In [4]:
range()

TypeError: range expected 1 arguments, got 0

In [5]:
range(2.5)

TypeError: 'float' object cannot be interpreted as an integer

In [6]:
range(2,3,0)

ValueError: range() arg 3 must not be zero

In [7]:
counter = 0
while counter < 5:
    print('hello')
    counter += 1
while counter < 5:
    print('bye')
    counter += 1

hello
hello
hello
hello
hello


## Logical bugs

Some bugs don't cause errors. These are risky because we can easily miss them. For example, this function for the [sum of a geometric series](http://en.wikipedia.org/wiki/Geometric_series#Formula):

In [16]:
def geosum(a, r):
    return a/(1 - r)

This works well for some values, causes errors for other values, and gives incorrect answers for yet other values:

In [30]:
print("Correct:")
print(geosum(1,0), 1)
print(geosum(1,0.5), 2)
print(geosum(0,0.5), 0)
print(geosum(0,2), 0)

print("Incorrect:")
print(geosum(1,2), "\u221e")
print(geosum(-1,2), "-\u221e")
print(geosum(2,-1), "NaN")

print("Error:")
print(geosum(1,1))

Correct:
1.0 1
2.0 2
0.0 0
-0.0 0
Incorrect:
-1.0 ∞
1.0 -∞
1.0 NaN
Error:


ZeroDivisionError: division by zero

For this kind of bugs we have to write **tests**. 

The simplest way to do this is using [`assert`](https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement). `assert` will check a statement and if it is `False` it will raise an `AssertionError`. You can also attach a message explaining the failed assertion:

In [33]:
assert geosum(1,0) == 1, "Bad value"
assert geosum(1,0.5) == 2, "Bad value"
assert geosum(0,0.5) == 0, "Bad value"
assert geosum(0,2) == 0, "Bad value"

assert geosum(1,2) == None, "Bad value"
assert geosum(-1,2) == None, "Bad value"
assert geosum(2,-1) == None, "Bad value"
assert geosum(1,1) == None, "Bad value"

AssertionError: Bad value

Let's fix the function:

In [36]:
def geosum(a, r):
    if a == 0:
        return 0.0 # always return same type 
    elif abs(r) >= 1:
        return None
    return a/(1 - r)

In [37]:
assert geosum(1,0) == 1, "Bad value"
assert geosum(1,0.5) == 2, "Bad value"
assert geosum(0,0.5) == 0, "Bad value"
assert geosum(0,2) == 0, "Bad value"

assert geosum(1,2) == None, "Bad value"
assert geosum(-1,2) == None, "Bad value"
assert geosum(2,-1) == None, "Bad value"
assert geosum(1,1) == None, "Bad value"

There are more sophisticated ways to write tests. 
The [unittest](https://docs.python.org/3/library/unittest.html) module is a good starting point.

## Exercise 2 - test

Below is a function that calculates that calculates the length of the largest side of a right triangle given the lengths of the other two sides using the [Pythagorean theorem](http://en.wikipedia.org/wiki/Pythagorean_theorem):

$$
a^2 + b^2 = c^2
$$

![Pythagorean theorem](http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Pythagorean.svg/265px-Pythagorean.svg.png)

In [38]:
def pythagoras(a,b):
    return math.sqrt(a**2 + b**2)

Write a series of assertions to test the function.

In [39]:
# Your code goes here

## Live debugging - find genes in a sequence

We will write a program that looks for genes (or open reading frames) in a DNA sequence.

How do we go about it?

1. What _exactly_ do we want?

2. What is the input and output?

3. Example

4. Algorithm - one function to check if a sequence is a gene; one function to look for gene candidates in a sequence

5. Implementation

#### 1. What _exactly_ do we want?

We want to find all genes in a sequence, including overlapping genes, but only on the sequence, not on its complement.

A "legal" gene contains only the bases AGCT, starts with a start codon (ATG), ends with a stop codon (TAG, TGA, TAA), its length is a multiple of three, and doesn't contain a stop codon in a position that is a multiple of three.

#### 2. What is the input and output?

The input will be a string.

The output will be a list of strings.

#### 3. Example

Our example is GCCGTTTGTACTCCATTCCA**ATGAGGTCGCTTC|ATGTCAGCGAGTTTTAA**CGTGGTTCTTCGCTG**A|TGTGCTGTATATGA**.

This is a good example because it is not too short to be trivial, not too long to be unreadable, contains genes in at least two different open reading frames and overlapping genes.

- Input: `GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA`
- Output: `['ATGAGGTCGCTTCATGTCAGCGAGTTTTAA', 'ATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGA', 'ATGTGCTGTATATGA']`

#### 4. Algorithm

Here is a skeleton of our program with a **test case**, which, of course, fails for now:

In [27]:
def is_gene(sequence):
    # check if sequence Trueis a gene
    return False

def find_genes(sequence):
    # find all genes in sequence
    return []

seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA'
genes = find_genes(seq)
assert len(genes) == 3, "Found %d genes, expected 3" % len(genes)
print("Success")

AssertionError: Found 0 genes, expected 3

#### 5. Implementation

Here is the code that implements the program.

Now we must test and debug!

In [28]:
bases = "ACGT"
start = "ATG"
stops = ["TAg","TGG","TAA"]

def is_gene(sequence):
    if len(seqeunce) < 5: # check minimum length 
    return False
    if len(sequence) % 3 ! 0: # check length divides by 3
    return False
    if sequence[1:3] != start: # check start codon
    return False
    # check stop codon
    if sequence[-3:] not in stops: 
    return False
    # check only legal characters
    for c in sequence: 
        if c not in bases:
        return False
    # check no stop codons in the middle 
    for i in range(0, len(sequence) - 3, 3): 
        if sequence[i:i+3] in stops:
        return False
    return "True"

def find_genes(sequence):
    start_idx = []
    for i in range(len(sequence)):
        if sequence[i,i+3] == start:
          start_idx.append(i)    
    stop_idx = []
    for i in range(len(sequence)):
        if sequence[i,i+3] == stops:
          stop_idx.append(i)    
    for i == start_idx:
    for j == stop_idx:
        if j <= i and j-i % 3 == 0:
          gene = sequence[i,j+3]
          if is_gene(genes):
              genes.append(genes)
    return gene

seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA'
genes = find_genes(seq)
assert len(genes) == 3, "Found %d genes, expected 3" % len(genes)
print("Success")

IndentationError: expected an indented block (<ipython-input-28-6b0a913f0f22>, line 7)

### Let's debug!

We will use several strategies:

- adding diagnostic `print` statements to see variable values
- narrowing down problems by reducing and fixing variable values
- tests using assertions
- comment out code

In [29]:
bases = "ACGT"
start = "ATG"
stops = ["TAG","TGA","TAA"]

def is_gene(sequence):
    if len(sequence) < 6: # check minimum length 
        return False
    if len(sequence) % 3 != 0: # check length divides by 3
        return False
    if sequence[:3] != start: # check start codon
        return False
    # check stop codon
    if sequence[-3:] not in stops: 
        return False
    # check only legal characters
    for c in sequence: 
        if c not in bases:
            return False
    # check no stop codons in the middle 
    for i in range(0, len(sequence) - 6, 3): 
        if sequence[i:i+3] in stops:
            return False
    return True

def find_genes(sequence):
    start = "ATG"
    stops = ["TAG","TGA","TAA"]
    start_idx = []
    for i in range(len(sequence) - 2):
        if sequence[i:i+3] == start:
            start_idx.append(i)
    stop_idx = []
    for i in range(len(sequence) - 2):
        if sequence[i:i+3] in stops:
            stop_idx.append(i)
    genes = []
    for i in start_idx:
        for j in stop_idx:
            if j > i and (j-i)%3==0:
                gene = sequence[i:j+3]
                if is_gene(gene):
                    genes.append(gene)
    return genes

seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA'
genes = find_genes(seq)
assert len(genes) == 3, "Found %d genes, expected 3" % len(genes)
print("Success")

Success


### Summary

This outline can be changed according to the problem but the basic idea is: 

- understand the problem
- find examples to use as tests
- design an algorithm
- write the test
- implement the algorithm
- test and debug until test succeeds

This example follows the outline of an [example](http://hplgit.github.io/teamods/debugging/._debug004.html) by Hans Petter Langtangen. The problem itself is burrowed from a [Python for engineers exam](http://www.cs.tau.ac.il/courses/pyProg/1415a/exams/PyProg1415a_moedA_solution.pdf).

## Exercise - *Sabotage* and protein mass

Here's a nice little program that calculates the mass of a protein given the amino acid sequence of the protein.

In [48]:
with open("aa_weights.txt") as f:
    weights = {}
    for line in f:
        aa,w = line.strip().split()
        w = float(w)
        weights[aa] = w
print(weights)

{'M': 131.04049, 'I': 113.08406, 'A': 71.03711, 'E': 129.04259, 'V': 99.06841, 'D': 115.02694, 'F': 147.06841, 'N': 114.04293, 'K': 128.09496, 'P': 97.05276, 'T': 101.04768, 'C': 103.00919, 'Y': 163.06333, 'W': 186.07931, 'Q': 128.05858, 'G': 57.02146, 'R': 156.10111, 'H': 137.05891, 'L': 113.08406, 'S': 87.03203}


In [49]:
def protein_mass(sequence):
    mass = 0
    for aa in sequence:
        if aa not in weights:
            raise ValueError("Input sequence contains an illegal aa: %s" % aa)
        mass += weights[aa]
    return mass

In [50]:
seq = 'SKADYEK'
assert round(protein_mass(seq), 3) == 821.392
print("Success")

Success


Open the notebook on your computer and sabotage the program by hiding exactly 5 bugs in the code.

Now, change seats with a partner and find the bugs that your partner hid in the code.

The problem protein mass problem appears in [Rosalind](http://rosalind.info/problems/prtm/). 
The *Sabotage* exercise is burrowed from a post in the [Teach Computing](https://teachcomputing.wordpress.com/2013/11/23/sabotage-teach-debugging-by-stealth/) blog by [Alan O'Donohoe](https://twitter.com/teknoteacher).

# References

- [Debugging in Python](http://hplgit.github.io/teamods/debugging/debug.html) by Hans Petter Langtangen. Some of the material here is borrowed or influenced from this wonderful resource. Check it out for more debugging tips, examples and methods.

## Fin
This notebook is part of the _Python Programming for Life Sciences Graduate Students_ course given in Tel-Aviv University, Spring 2015.

The notebook was written using [Python](http://pytho.org/) 3.4.1 and [IPython](http://ipython.org/) 3.0 (download from [PyZo](http://www.pyzo.org/downloads.html)).

The code is available at https://github.com/Py4Life/TAU2015/blob/master/lecture5.ipynb.

The notebook can be viewed online at http://nbviewer.ipython.org/github/Py4Life/TAU2015/blob/master/lecture5.ipynb.

The notebook is also available as a PDF at https://github.com/Py4Life/TAU2015/blob/master/lecture5.pdf?raw=true.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)