## *for* loops

We have already seen one loop, once - "While". That has a test for truth, and a code block. For as long as the test returns __True__, the code block will repeat.

The **for** loop allows you run the same block of code on every member of a set of objects. Let's start with strings and charhacters (this works with lists, too, and - soon - lines of text in a file) If you apply a **for** loop over a string, it will perform the action you choose on each character in that string:

In [None]:
sequence = 'TGTGAATGTA'
 
#iterate over string
for base in sequence:
    print base

Note that you are creating a new variable within the for loop (in this case *base*). The code within the loop is run multiple times, with the value of *base* being successively set to each character within the *sequence*. Thus, the following code is equivalent:

In [None]:
base = sequence[0]
print base

base = sequence[1]
print base

base = sequence[2]
print base
# etc

Just as we can index a string and loop over its charhacters, we can index a list and loop over its elements.

In [None]:
string_of_lazy = "The quick brown fox jumped over the lazy graduate student"
list_of_lazy = string_of_lazy.split()
for lazy_word in list_of_lazy:
    print (lazy_word)

Ok, cute, but useless. Let's do something a little more interesting. How many times does the letter "e" show up?

In [None]:
string_of_lazy = "The quick brown fox jumped over the lazy graduate student"
#string_of_lazy = "The quick brown fox jumped over the lazy dog."
e_count = 0
for lazy_char in string_of_lazy:
    if lazy_char == "e":
        e_count += 1 # Woah
print "I found {} es!".format(e_count)

For those who are too lazy to type

In [None]:
a = 0
a = a + 1

There are a host of operators you'll love. 

In [3]:
a = 0
print a
a += 1
print a
a -= 2
print a
a /= 3.0
print a
a += 9
print a

0
1
-1
-0.333333333333
8.66666666667


---
### Iterable Objects 

One difference between Python and our hypothetical undergrad is that our undergrad knows that a genome contains genes, and that when we say "For each gene. . ." they should ignore small non-coding RNAs or other non-genic features, while Python doesn't know what is in the object you're looping over. We could name this variable whatever we wanted, the name is purely descriptive. Under the hood *sequence* is a string, and a string is a sequence of characters. In Python, any object that contains (or can produce) other objects is called **iterable**, and these are the only types of objects that can be used in a **for** loop. Objects like integers only contain one value, so are not **iterable**.

---


Just like an **if** statement, if you want to do more than one thing inside the loop, you can start a new block of indented lines after the colon, and then when you're done with the code you want to run every time, go back to the original indentation:

In [None]:
sequence = 'TGTGAATGTA'

AT = 0. # Review: why am I using '0.' instead of '0'?
GC = 0.

for base in sequence:
    if base == 'G' or base == 'C':
        GC += 1
    else:
        AT += 1
        
print AT
print GC

GC_content = (GC / (AT + GC)) * 100.

print "This sequence is {}% GC".format(GC_content)

## *while* loops

Similar to a **for** loop, a **while** loop allows you to run the same block of code repeatedly. While a **for** loop runs once for every object within an **iterable**, though, a **while** loop runs *until some condition is met*. Returning to our example:

    "While you're here, for each protein in the genome, if the protein is located in the membrane, tell me that protein's expression level."

Here, the condition is "you're here", which is a boolean statement in that it is either true or false. Again, Python's syntax is almost the same:

```python
while youre_here:
    work()
    report_results()
    
    if time == 17:
        youre_here = False # The undergrad goes home at 5pm (17h)
```

Again, the **while** statement ends with a colon, and the instructions within the loop are indented. This undergrad will continue working and reporting results as long as they are in lab (*youre_here* equals **True**). Notice the **if** test at the end of the loop that sends the student home at 5pm. Unless you want your undergrad to work forever, you need to make sure that the conditional statement is eventually not **True**.

**Very Important:** Make sure the conditional statement between while and the colon *_will_* be **False** at some point! If you get caught in an infinite loop, press CTRL+C to interrupt your program.

Lets look at some simple examples using strings, since they're the only **iterable** we know how to work with at this point.

In [None]:
def prefixes(word):
    # Print all the prefixes of a word
    while word:
        print word
        word = word[:-1]
    
prefixes('banana')

Remember that empty strings are **False**, and non-empty strings are **True**. So here, our conditional is the string *word*, and every time through the loop we remove the last character from the string until the string is empty.

We can add an **if** statement to turn this function into a search function to tell us if our word starts with a given prefix.

In [None]:
def starts_with(word, prefix):
    # Return True if word starts with prefix
    while word:
        if word == prefix:
            return True
        else:
            word = word[:-1]
    
    return False # The prefix was never found

print starts_with('banana', 'ban')
print starts_with('apple', 'ora')

Now that we have a function that can tell us if a string starts with a given prefix, we can use this function to find ORFs (start with "ATG") in a DNA sequence. We do this with the same strategy, but in reverse, by checking if all suffixes of the sequence start with the prefix "ATG".

In [None]:
def find_orf_start(seq):
    while seq:
        if starts_with(seq, 'ATG'):
            return seq
        else:
            seq = seq[1:]

sequence = ('TGAATCATCCCCTTAAGAGAAGACCCGAAG' +
            'TTATTATAGGGAAGGGCAGAAATGACCACC' +
            'CTCTCATCTCGCTAGTCCACTTGACACCTC' +
            'TTAGTTCATGACGACGTGAGTCGTTCCTAA')
            
print find_orf_start(sequence)

And just for a satisfying sense of completion, lets find the end of the ORF by searching the ORF for stop codons. This time we will use an index *i*, since we want to search from the start of the string and we don't want to chew up the string as we search.

In [None]:
def ends_with(word, suffix):
    # Return True if word ends with suffix
    while word:
        if word == suffix:
            return True
        else:
            word = word[1:]
    
    return False

def find_orf_end(seq):
    i = 3
    while i < len(seq):
        prefix = seq[:i]
        if (ends_with(prefix, 'TAG') or
            ends_with(prefix, 'TAA') or
            ends_with(prefix, 'TGA')):
            return prefix
        else:
            i += 3 # need to search each codon at a time, not each base

sequence = ('TGAATCATCCCCTTAAGAGAAGACCCGAAG' +
            'TTATTATAGGGAAGGGCAGAAATGACCACC' +
            'CTCTCATCTCGCTAGTCCACTTGACACCTC' +
            'TTAGTTCATGACGACGTGAGTCGTTCCTAA')
            
sequence = find_orf_start(sequence)
sequence = find_orf_end(sequence)
print sequence