# Loops

In this notebook, we'll learn about a very useful programming skill: writing a loop. We'll use these loops to do several computations and changes with our DNA string.

## At the end of this notebook, you'll be able to:
* [Write a `for` loop to iterate over elements in an object](#for)
* Use a `counter` in a loop
* [Use a `for` loop to count GC content and CCAAT boxes in a DNA string](#GCcontent)
* [Write `while` loops and implement `continue` and `break`](#while)

<hr>

### Reminder: Membership Operators
Before, we saw that we can use `in` and `not in` to compare membership. These operators return booleans. Membership operators are used to check whether a value or variable is found in a sequence. We can use this to check for membership in strings, as well as in lists.

Today, we'll use this special keyword `in` as an operator in our loops as well.

In [1]:
my_dna = 'ATGCAA'
print('C' in my_dna)

True


While this is a useful way to check for one nucleotide, it doesn't help us *count* how many there are. We can also use many different if/elif/else statements, but this can easily get cumbersome with long strings of DNA. 

To count GC content (or do many other repetitive tasks), we'll turn to **loops**.

<a id="For"></a>

## For loops

Loops can be written in multiple ways. First, we'll tackle **for loops**.

The `for` loop iterates over the elements of the supplied object (a list, range, or a string, e.g.), and executes the containing block once for each element. It has the basic structure below:

```
for element in object:
    (do something)
 ```

We can use `for` loops for various types of objects. Let's look at each of these in turn.

### Looping through a string

In [1]:
# Looping through a string
counter = 0

for character in 'hello':
    print(character)
    counter = counter+1

h
e
l
l
o


### Looping through a list
Looping through a list works similarly. 

In [6]:
# Looping through a list
shopping_list = ['Eggs','Oranges','Avocados','Dragon Fruit','Lychee','Loquats']
for itm in shopping_list:
    print(str(shopping_list.index(itm)+1)+'.',itm)

1. Eggs
2. Oranges
3. Avocados
4. Dragon Fruit
5. Lychee
6. Loquats


### Looping over Dictionaries
We can also use `for` loops to iterate over key-value pairs of a dictionary. What do you think the cell below will output?

In [7]:
params = {"parameter1" : 1.0,
          "parameter2" : 2.0,
          "parameter3" : 3.0,}

for key, value in params.items():
    print(key + " = " + str(value))

parameter1 = 1.0
parameter2 = 2.0
parameter3 = 3.0


### Looping with Range
Sometimes it is useful to have access to the indices of the values when iterating over a **range** of numbers. We can use the `enumerate` function for this:

In [8]:
for idx, x in enumerate(range(-3,3)):
    print(idx, x)

0 -3
1 -2
2 -1
3 0
4 1
5 2


### List Comprehensions
Finally, list comprehensions are a very compact way to create lists using a `for` loop. **It is not neccessary to use these**, and they're not very intuitive to look at, but they do help condense lines of code.

In [12]:
# Create a list of values where x is squared for 0, 1, 2, 3, and 4.

list_1 = [x**2 for x in range(0,5)]

print(list_1)

[0, 1, 4, 9, 16]


> **Test your knowledge!** How many values will be printed from this for loop before it *first* prints “The tea is too hot!”?

In [13]:
temperatures = [114, 115, 116, 117, 118]

for temp in temperatures: 
    print(temp)
    
    if temp > 115:
        print('The tea is too hot!')

114
115
116
The tea is too hot!
117
The tea is too hot!
118
The tea is too hot!


<div class="alert alert-success"><b>Loop Task #1:</b> Write the Python code for <a href="https://i.redd.it/q9qc6bjzqr741.png">the image below</a>.</div>

![](https://i.redd.it/q9qc6bjzqr741.png)

In [14]:
# Write your code here
txt1 = 'You SOB I\'m in'
txt2 = 'You SOB I\'m out'

for i in range(4):
    if i<3:
        print('i =',i,txt1)
    else:
        print('i =',i,txt2)
        

i = 0 You SOB I'm in
i = 1 You SOB I'm in
i = 2 You SOB I'm in
i = 3 You SOB I'm out


<div class="alert alert-success"><b>Loop Task #2</b>: 
    
1. Rewrite the temperature code above using `range` instead of a list.
    
2. Package this code up into a function (`check_temp`) that takes `temp_range` as an argument.
    </div>

In [16]:
# Write your code here
for temp in range(114,119):
    print(temp)
    if temp>=116:
        print('The tea is too hot!')


114
115
116
The tea is too hot!
117
The tea is too hot!
118
The tea is too hot!


<a id="GCcontent"></a>

## Computing GC content with a loop

Let's get back to our question of computing GC content. We can write a simple function to loop over a string, as follows:

In [18]:
# Define function
def length(my_string):                     # Define the function 
    '''Computes length of input string'''  # Include a doc string
    counter = 0                            # Initialize the counter
    for character in my_string:            # First, assign the item in my_string to "character"
        print(character)                   # Then, print character
        counter = counter+1                # Then, increment the counter and go back to next item in my_string.
    return counter                         # Return the final counter, once there are no more items in my_string

In [19]:
# Run function
length('GGCAT')

G
G
C
A
T


5

This function counts the length of the string, but doesn't count the GC content.

<div class="alert alert-success"><b>Task</b>: Let's put it all together! Write a function (<code>computeGCcontent</code>) that can compute the GC content of a DNA string, regardless of the length.</div>

In [24]:
# Write your function here
def computeGCcontent(dna):
    ''' Compute GC content of a DNA string of any length '''
    if len(dna)>0:
        counter = 0
        for c in dna:
            if c=='C' or c=='G':
                counter += 1

        counter/=len(dna)
        return counter
    else:
        return 'Error: empty string'
        

computeGCcontent('')

'Error: empty string'

## Converting single items in a string
We can also use `for` loops to convert items in the string. You may recall that RNA has the same nucleotides as DNA except it uses uracil (U) instead of thymine (T). So, the DNA string 'TTACG' would be 'UUACG' in RNA. 

><b>Task</b>: Write a function called <code>DNAtoRNA</code> that takes in a string <code>DNA</code> and it converts it to RNA. A skeleton of this function is already below.</div>

In [34]:
def DNAtoRNA(DNA):
    RNA = '' # Create an empty RNA string
    
    for nuc in DNA:
        if nuc == 'T':
            RNA = RNA + 'U'
        else:
            RNA += nuc

    return RNA

RNA = DNAtoRNA('ACTTTGCGCGCACACACATTT')
print(RNA)


ACUUUGCGCGCACACACAUUU
ACUUUGCGCGCACACACAUUU
ACGCGCGCACACACA


As it turns out, there's a useful **method** of strings that does this exact same thing -- in one line of code! 
> **Task**: Do a Google search to figure out how to replace specific items in a string, and try it below. Hint: search "replace items in string Python".

In [35]:
# Test the string method here
RNA = 'ACTTTGCGCGCACACACATTT'.replace('T','U')
print(RNA)
RNA = 'ACTTTGCGCGCACACACATTT'.replace('T','')
print(RNA)

ACUUUGCGCGCACACACAUUU
ACGCGCGCACACACA


## Finding CCAAT boxes

For an extension of our GC content task, here we'll count the # of “CAT” boxes (CCAAT) in a string of DNA. The “CAT” box generally appears near the spot where transcription begins, so it's a useful thing to identify.

Above, we wrote a function that will loop over a string, but if we need to look at multiple letters at a time, what we *really* need is one that will loop over the indices in that string.

In [53]:
# First draft of code
def countCCAAT(DNA):
    DNA = DNA.casefold().swapcase()
    counter = 0
    target  = 'CCAAT'
    for index in range(len(DNA)-len(target)):
        if DNA[index:(index+len(target))] == target:
            counter+=1
    return counter

count_result = countCCAAT('CCAATACCAaTTTTTGCGCGCACACACATTT')
print(count_result)

CCAATACCAATTTTTGCGCGCACACACATTT
2


> **Task**: Modify the skeleton above by using the index to slice our DNA. For example, the first slice of 5 nucleotides would be `DNA[0:5]`, the second would be `DNA[1:6]`. We can do this programatically!

### Functions can call other functions!

What if instead of one DNA string, we'd like to count the CCAAT boxes for multiple DNA strings? We'd essentially like to run our `countCCAAT` function multiple times. You can see how we would implement this below. In this case, it becomes a **helper** function. When `countCCAAT` is called, Python passes it the current DNA string and waits for it to return something.

In [60]:
def multicountCCAAT(DNA_list):
    '''Prints the # of occurrences of CCAAT in each string in the given DNA list'''
    
    for DNA in DNA_list:
        print( DNA+ countCCAAT(DNA))

In [59]:
# Test our function
DNA_list = ['CCAATACCAaTTTTTGCGCCACATTT','CCAATACACACATTT','CCAATACGCACACACATTT']
multicountCCAAT(DNA_list)

CCAATACCAATTTTTGCGCCACATTT
2
CCAATACACACATTT
1
CCAATACGCACACACATTT
1


Another way to implement this idea would be with **nested loops**.

In [61]:
def multicountCCAAT_nested(DNA_list):
    '''Prints the # of occurrences of CCAAT in each string in the given DNA list'''
    
    for DNA in DNA_list:                        # The outer loop works through each DNA string
        counter = 0                             # For each string, reset the counter
        for index in range(len(DNA)):           # For each item in the string
            if DNA[index:index+5] == 'CCAAT':   # Check whether there is a CCAAT box
                counter = counter +1            # If so, implement the counter
        print(DNA,counter)                          # When you're done with that DNA string, print the counter

In [62]:
# Test our nested function
DNA_list = ['CCAATACCAaTTTTTGCGCCACATTT','CCAATACACACATTT','CCAATACGCACACACATTT']
multicountCCAAT_nested(DNA_list)

CCAATACCAaTTTTTGCGCCACATTT 1
CCAATACACACATTT 1
CCAATACGCACACACATTT 1


Although the nested function works too, it is not as **modular** as the first one. **Modular** design is where once function calls another function for help, and is generally a good practice when building more and more complex programs. 

<a id="while"></a>

## Other kinds of loops & related operators

### While loops
The operator `while` lets you continue to run a loop as long as something is true. 

While loops always have the structure

```
while condition:
    # Loop contents
```

In [63]:
message = 'still working'
counter = 0

while counter < 5:
    print(message)
    
    counter = counter + 1
    
print("done")

still working
still working
still working
still working
still working
done


In [64]:
# Another example

number = -5

while number < 0:
    print(number)
    number = number + 1  # must have code to make condition evaluate as False at some point

-5
-4
-3
-2
-1


> **Test your knowledge!** How many temperature values will be output from this `while` loop before “The tea is cool enough.” is printed?

In [65]:
temperature = 115
 
while temperature > 112: 
    print(temperature)
    temperature = temperature - 1
    
print('The tea is cool enough.')

115
114
113
The tea is cool enough.


### Continue operator
<code>continue</code> is a special operator to jump ahead to the next iteration of a loop.

In [None]:
lst = [0, 1, 2, 3]

for item in lst:
    
    if item == 2:
        continue
    
    print(item)

### Break operator
`break` is a special operator to exit a loop. It's useful when you'd like to stop running a loop once you've hit a certain threshold or value.

In [None]:
lst = [0, 1, 2, 3]

for item in lst:
        
    if item == 2:    
        break
    
    print(item)

<img src="https://media2.giphy.com/media/kKLoC2AdoTftSgzR0O/source.gif" alt="russiandollgif" width="200" height="200">

## Interrupting a kernel
Writing loops is sometimes tricky, and one error in your code might leave you trapped in an infinite loop.


In [None]:
a = 1

while a == 1:
    print('AAAHHHH')

_What happens if you get stuck in a forever loop and can't escape?_

If there is a star next to the `In[ ]*:` at left, that means your cell is running. Similarly, if the circle next to Python3 is filled, the Kernel is busy.

Use <b>Kernel > Interrupt</b> to stop the notebook. You can also use <b>Cell > Current Outputs > Clear</b> to clear your slate. 

It can be useful to clear and re-launch the kernel. You can do this from the 'kernel' drop down menu, at the top, optionally also clearing all ouputs. Note that this will erase any variables that are stored in memory. 

<hr>

## Additional loops practice

1. Write a function `count_odd()` containing a loop that will **add** all the odd numbers for the input range together.

2. Write a function `count_vowels()` containing a loop that will loop through all the letters in `my_name` (the input parameter) and count all the vowels in your name.

3. Write a function `create_dictionary` that takes two input lists `lst_1` and `lst_2`. Inside the function, join the two lists to form a dictionary `joined_dictionary` where the first element in `lst_1` is the first key in `joined_dictionary` and the the first element in `lst_2` is the first value in joined_dictionary and so on and so forth. Then return `joined_dictionary` as the output.

<hr>

## About this notebook
This notebook is largely inspired by [*Computing for Biologists*](https://www.cambridge.org/highereducation/books/computing-for-biologists/5B08EEEE2AE8A602113A8F225E89F5FD#overview) by Libeskind-Hadas & Bush as well as [UCSD COGS18 Materials](https://cogs18.github.io/materials/09-Loops.html), created by Tom Donoghue & Shannon Ellis.

Want to run this notebook as a slideshow? If you have Python (or Anaconda) follow <a href="http://www.blog.pythonlibrary.org/2018/09/25/creating-presentations-with-jupyter-notebook/">these instructions</a> to setup your computer with the RISE plugin.