## Common Loop Tricks

### Looping a given number of times with *range()*

Loop 5 times using **while**

In [1]:
i=0
while i < 5:
    print i
    i += 1

0
1
2
3
4


Loop 5 times using **for** and **range()**

In [2]:
# This is way better than the while method, above.

for x in range(5):
    print x

0
1
2
3
4


What is **range** doing? The **range** function returns another type of **iteratable**. As you can see, *range(x)* returns an iterable that contains all the integers from 0 to *x* in order. You don't even have to use these integers, which is handy when you just want to run some code *x* times.

In [3]:
for x in range(5):
    print 'hello'

hello
hello
hello
hello
hello


This is a good moment to practice reading the Python documentation. Go [here](https://docs.python.org/2/library/functions.html#range) and read the entry on the **range** function. You may not be able to fully understand it yet, but hopefully you can see that **range** has a couple optional arguments that let you count from numbers other than 0, and by numbers other than 1.

In [4]:
for x in range(5, 12, 2):
    print x

5
7
9
11


You can even count backwards by making *step* negative.

In [5]:
for x in range(12, 5, -2):
    print x

12
10
8
6


### Looping with *enumerate()*

Lets say we have a sequence with some Ns in it, and we want to find the positions of all the Ns within the sequence. We can use a **for** loop similar to how we did above to find the GC content of a sequence, but this time we will need to remember the indexes of the bases as we loop through the sequence.

Instead of looping through all the bases themselves, we actually want to loop through the indexes of the bases (since we can use these indexes to access the bases themselves). We can generate all the indexes by combining the **range** and **len** functions

In [6]:
sequence = "ATCGATNGCTANCGTAGCNT"

print range(len(sequence))

for i in range(len(sequence)):
    if sequence[i] == "N":
        print i

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
6
11
18


This kind of problem is so common, in fact, that Python has a built-in alternative to make this prettier and faster, **enumerate()**.

**enumerate()** takes an **iterable** as an argument and returns another **iterable** with both the index of the item as well as the item itself. Python is actually able to assign values to more than one variable at once (thanks to something called "tuples" that we'll learn about later) so you can capture both outputs of **enumerate** in two variables. To see how this works, try:

In [7]:
x, y, z = 10, 15, -3

print x
print y
print z

10
15
-3


In [8]:
# *prettier* method using enumerate()
seq = "ATCGATNGCTANCGTAGCNT"

for i, base in enumerate(seq):
    if base == "N":
        print i

6
11
18


## Escaping loops

Occasionally, you might want to get out of a loop before the truth statement is met (with a **while** loop) or you've gone through every element (with a **for** loop). You can modify the default flow of the loop using **break** and **continue**. The keyword **break** ends the loop right where you are, while the keyword **continue** goes back to the top of the loop (bringing in the next object from the **iterable** if it's a **for** loop).

In [9]:
while True:
    break
    print 'this will not print'
    
for x in range(2):
    print 'this will print twice'
    continue
    print 'this will not print'

this will print twice
this will print twice


Let's look at another approach to finding and printing an ORF. We will also use our newly learned **enumerate** function to loop through the indexes of the sequence, and **range** to count by 3s.

In [10]:
def find_orf(seq):
    # First, find the position of the start codon
    for start, base in enumerate(seq):
        codon = seq[start:start+3]
        if codon == 'ATG':
            break

    # At this point, start is equal to the index of the first ATG.
    # Or, if ATG was not found, start is equal to the last index
    # and we are going to return '', since slicing values outside
    # the length of a string is always ''.
            
    # Then, find the position of the first stop codon after the start
    for stop in range(start, len(seq), 3):
        codon = seq[stop:stop+3]
        if (codon == 'TAG' or
            codon == 'TAA' or
            codon == 'TGA'):
            stop += 3 # include the stop codon
            break
    
    return seq[start:stop]

sequence = ('TGAATCATCCCCTTAAGAGAAGACCCGAAG' +
            'TTATTATAGGGAAGGGCAGAAATGACCACC' +
            'CTCTCATCTCGCTAGTCCACTTGACACCTC' +
            'TTAGTTCATGACGACGTGAGTCGTTCCTAA')

print find_orf(sequence)
print find_orf('GGGGGGGGGGGGGGGG')

ATGACCACCCTCTCATCTCGCTAG



The utility of **continue** can be harder to see, and it is mostly a stylistic choice to use **continue**. Suppose we want to calculate the GC content of the non-repetitive regions of a sequence. Repeats in sequences are commonly annotated by lower case letters, so we want to ignore any lower case bases.

In [11]:
def highQ_GC(seq):
    GC = 0.
    AT = 0.
    
    for base in seq:
        if base.islower():
            continue
        
        if (base == 'G' or base == 'C'):
            GC += 1
        else:
            AT += 1
            
    return (GC / (GC + AT)) * 100

sequence = ('TGAATCATCCCCTTAAGAGAAGACCCGAAG' +
            'TTATTAtagggaagggcagaaatGACCACC' +
            'CTCTCATCTCGCTAGTCCACTTGACACCTC' +
            'TTAGTTCATGAcgacgtgagTCGTTCCTAA')

print highQ_GC(sequence)

44.6808510638


This could also be written with an **else** clause, like:

```python
if base.islower():
    pass
else:
    if (base == 'G' or base == 'C'):
...
```

or just by inverting the condition, like:

```python
if base.isupper():
    if (base == 'G' or base == 'C'):
...
```

Why would we ever choose to use **continue**, then? Stylistically, **continue** can make your code easier to read because it makes your code *flatter*. With **if**-**else** statements all the code to be conditionally run must be indented one more level, while with **if ...: continue** we can keep our current level of indentation. This can make a big difference in readability when you have many nested statements and a series of complicated conditions to meet. We can see this in the next example.

This next example also shows us breaking the "Very Important" rule about **while** loops we just taught you. Using **break**, some while loops are designed such that the control condition at the top of the loop is never **False**!

In [12]:
# A calculator program to tell you if numbers are prime.

while True:
    number = raw_input("Number to test: ")
    
    # Quit if nothing is entered.
    if number == '':
        break
    
    number = int(number)
    
    # Prime numbers are >1 by definition.
    # If a number <= 1 is entered, stop and start over.
    if number <= 1:
        print 'Please enter a number greater than 1'
        continue
    
    prime = True
    for x in range(2, number):
        # Use modulo to test if x is a divisor of number
        # if so, the number is not prime, stop the search
        if number % x == 0:
            print 'Not prime,', x, 'is a factor'
            prime = False
            break
    
    if prime:
        print number, 'is prime!'

Number to test: -7
Please enter a number greater than 1
Number to test: 4
Not prime, 2 is a factor
Number to test: 5
5 is prime!
Number to test: 


In this second example there are two loops. The top **while** loop will run until the user enters a blank input, but will otherwise constantly ask the user for numbers to test. If the number entered is <=1, we don't even bother checking for divisors and the loop goes back to the **while** logical expression.

In the second loop, if the user enters a number >1, we assume the number is prime, then check every integer between 2 and *number* to see if it's a divisor. If we find a divisor, we know that *number* is not prime, so set *prime* to **False**, then use **break** to stop checking the rest of the integers. Lastly, if *prime* is still set to **True**, we report that the number is prime.