## Backup your exercises
rename or copy manually your changed exercise files to another directory

## Update your Course Materials!
<br><br><br><br><br><br><br><br><br>

# Week 06 - Generators and Error Handling

<div class="topics">
    <div style="padding-left: 15px;">
        This lecture will cover:
        
        <ul>
            <li>Projects</li>
            <li>Generators - A special kind of Function</li>
            <li>Debugging in Python</li>
            <li>Error Handling</li>
            <ul>
                <li>*A program rarely works as expected.*</li>
                <li>Working with *Exceptions*</li>
            </ul>
        </ul>
    </div>
</div>

# Projects
    12. Nov - Project work
    19. Nov - Project work
    26. Nov - Project work
     3. Dec - Project work
    18. Dec - Handin project
    
Every project consists of the following parts.

1. The **program code** itself. The code should be well **commented** so it is possible to follow your thinking. The **major data** structures should also be explained (structure and purpose). The code must given in as a **plain text file**.
2. A document that **describes the algorithm** that you implemented, with **strengths and weaknesses** if any, and the expected **input data format**. The document/report should be in preferably PDF or Word document. Some projects are naturally heavy in theory, others have a more practical approach, and the report is expected to reflect that to some extent.
3. Any **data files** if relevant.
<br><br><br><br><br><br><br><br><br>

Choose either a project from the list below or come with your own idea (e.g. from PhD or Master project)

<b><big>The projects</big></b> (from http://www.cbs.dtu.dk/courses/27619/projects.php)
<ol>
<li><a href="http://www.cbs.dtu.dk/courses/27619/project01.php">Random sequence generator</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project02.php">Text mining MEDLINE abstracts</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project03.php">Shortest path in graph</a>

</li>


<li><a href="http://www.cbs.dtu.dk/courses/27619/project05.php">K-means clustering</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project06.php">Analysis of sorting</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project07.php">Data mining in NCBI databases</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project09.php">Score sequence data with a PSSM</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project08.php">Searching for signals/motifs in sequences</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project10.php">Data analysis</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project11.php">Sudoku</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project16.php">k-nearest neighbor (k-NN) continuous variable estimation</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project17.php">Read trimmer for Next-Generation-Sequencing data</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project12.php">QT clustering</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project13.php">Artificial Neural Network</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project14.php">Smith-Waterman alignment</a>

</li>

<li><a href="http://www.cbs.dtu.dk/courses/27619/project15.php">Gibbs sampling</a>

</li>

</ol>

## Generators - A Special Kind of Function

Normal functions perform some action and the result is returned to the point where it was called. If the result of this function is really huge (a big list or dictionary), this could case problems. 

Take for example a function that reads a file and returns all the records of this file in a dictionary. If the file is too big (like several times the available memory), the resulting data structure will not fit in memory. 


This is a common problem in the field of bioinformatics. If the idea is to process each record, we would need a function that returns a record at a time. **A function can't do that because it doesn't keep a state**, so each times it is executed, it has to process all the data again. 

To do this, Python offer us **generators**

>**Generators** are a speciel function that can keep their internal state.

In these functions we can use 

    yield

instead of ```return```. The nice thing about ```yield``` is that it can be used inside a loop. The next time the generator is called, it resumes operation from where it was.


As an example let's look at **```range()```** and **```xrange()```**. By the look of things, it seems they are doing the same thing:

In [None]:
for i in range(5):
    print i, 

In [None]:
for i in xrange(5):
    print i,

However, ```range()``` is a function and ```xrange()``` is a **generator**. 

```range(5)``` produces a list ```[0,1,2,3,4]```:

In [None]:
range(5)

but ```xrange(5)``` does seemingly nothing:

In [None]:
xrange(5)

Generators are equally fast:

In [None]:
timeit (' for x in range(10000000): pass')

In [None]:
timeit (' for x in xrange(10000000): pass')

However, the memory comsumption is far smaller for generators:

<br><br><br><img src="../pix/play2.jpg">
### Creating a generator to square x

Make a function that generates all subsequently squared numbers of x

Here are the instructions:

<ol>
<li>Use an infinite ```while``` loop</li>
<li>Update ```x``` to be ```x**2``` in each iteration</li>
<li>Yield ```x```</li>
</ol>


In [None]:
def squared(x):
    """ Yields the series of squared numbers, starting with x^2,
    square(2), would yield
        4
        16
        256
        ...
    squared(10), would yield
        100
        10000
        100000000
        ...
    """
# finish the fucntion
 


Test the function by running the code blocks below:

In [None]:
square = squared(2)
print square.next()


In [None]:
square = squared(10)
print next ( square )
print next ( square )
print next ( square )
print next ( square )

# Debugging in Python

Tips for Debugging (from learnpythonthehardway.org)

1. Do not use a "debugger." A debugger is like doing a full-body scan on a sick person. You do not get any specific useful information, and you find a whole lot of information that doesn't help and is just confusing.
1. The best way to debug a program is to use **`print`** to print out the values of variables at points in the program to see where they go wrong.
1. Make sure parts of your programs work as you work on them. Do not write massive files of code before you try to run them. Code a little, run a little, fix a little.



## Error Handling

>"A program rarely works as expected"
-- Some wise bloke

So far we have made lots of programs without error checking. 

#### A program with no error checking
This program reads a file (myfile.csv) separated by tabs and looks for a number that is found in the first column of the first line. This value is multiplied by 0.2 and the result is written to another file (otherfile.csv).

*Strategy:* 

In [None]:
f = open("myfile.csv")
line = f.readline()
f.close()
value = line.split("\t")[0]
f = open("otherfile.csv", "w")
f.write( str( int(value) * 0.2 ) )
f.close()

#### Possible errors:

**File doesn't exist**

**Don't have permission to create the file called "otherfile.csv"**

**There is no tabs in the file**

All of the errors above causes the program to stop. 

>It is not professional to show the end user a system error like these.

#### A program with Error Handling 

*Strategy:* **L**ook **B**efore **Y**ou **L**eap (LBYL)


In [None]:
import os
iname = raw_input("Enter input filename: ")
oname = raw_input("Enter output filename: ")
cwd = os.getcwd()
if os.path.exists(iname):
    fh = open(iname)
    line = fh.readline()
    fh.close()
    if "\t" in line:
        value = line.split('\t')[0]
        if os.access(cwd+oname,os.W_OK)==0:
            fw = open(cwd+oname,"w")
            if value.isdigit():
                fw.write(str(int(value)*.2))
                fw.close()
            else:
                print("It can't be converted to int")
        else:
            print("Output file is not writable")
    else:
        print("There is no TAB. Check the input file")
else:
    print("The file '" + iname + "' doesn't exist")

This way of error handling is not very nice! The code is difficult to read and maintain because error checking is mixed together with the instructions. 

For this reason new programming languages have included a specific system for the control of *execptional conditions*. 

## Exception Handling: ```try``` and ```except```

In python we can use the 

    try
    
to wrap around the code we want to execute and then use

    except
    
to wrap around the code that will be executed if there is an error in the code under the ```try``` block. 

#### Let's try it

In [None]:
print 6/0

In [None]:
try:
    # CODE THAT MAY PRODUCE AN ERROR
    print 6/0
except ZeroDivisionError:
    # CODE THAT SHOULD BE EXECUTED INSTEAD WHEN THE ERROR OCCURS
    print "Houston, we have a problem"

#### A Program with *Exception Handling*

*Strategy:* It's **E**asier to **A**sk **F**orgiveness than **P**ermission (EAFP)

In [None]:
import os

### CODE THAT WILL ALWAYS WORK
iname = raw_input("Enter input filename: ")
oname = raw_input("Enter output filename: ")
cwd = os.getcwd()

### CODE THAT MAY PRODUCE AN ERROR ###
try:
    fh = open(iname)
    line = fh.readline()
    fh.close()
    value = line.split('\t')[0]
    fw = open(cwd+oname,"w")
    fw.write(str(int(value)*.2))
    fw.close()
    
### ERROR HANDLING ###
except IOError, (errno,errtext):
    if errno==13:
        print("Can't write to outfile.")
    elif errno==2:
        print("File not exist")
except ValueError, strerror:        
    if "substring not found" in strerror.message:
        print("There is no tab")
    elif "invalid literal for int" in strerror.message:
        print("The value can't be converted to int")
except:
    print "Some unknown error occured!?"
else:
    print("Thank you!. Everything went OK.")

> "You can make it foolproof, but you can't make it damnfoolproof" -- Naeser's law

#### Provoking Exceptions

Say we have a function that calculates the average of a sequence of numbers:

In [None]:
def avg(numbers):
    return sum(numbers) / len(numbers)

A function of this type with have problems with an empty list

In [None]:
avg( [] )

The error doesn't tell us that it was caused by an empty list. It would be more interesting with if this error could point this out for us. For this we can **raise** an exception

In [None]:
def avg(numbers):
    if not numbers:
        raise ValueError( "The list provided is empty!" )
    return sum(numbers) / len(numbers)

In [None]:
avg( [] )

Now instead of a ZeroDivisionError we get a ValueError that points the user to the cause of the problem. 

<br><br><br><img src="../pix/book.gif" width=50px> Required reading for next week: 
* Python for Bioinformatics by S. Bassi - Chapter 8, Object Orienting Programming


<br><br><br>
<img src="../pix/exercise.png">
<br><br><br>



In [None]:
from IPython.core.display import HTML


def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()