# Iterator and loop in Python

In this section, we'll cover:

1. How to write a [`for` loop](#for-loop) in Python?
2. Use [`range()` function](#range-operator) to generate a sequence of numbers
   - [Problem - count down](#problem---count-down)
3. Use `len()` to [compute the length of an object](#use-len-to-compute-the-length)
4. Use [`enumerate()` function](#enumerate-the-iterable)
5. Interfere the loop with [`continue` and `break`](#continue-and-break)
   - [Problem - Oh no! I don't like that](#problem---oh-no-i-dont-like-that)
6. The different between `for` loop and [`while` loop](#while-loop)
   - [Problem - trap of the while loop](#problem---trap-of-the-while-loop)


## For loop

Here is a typical `for` loop in bash:

In [1]:
%%bash

for number in 1 2 3 4 5; do
    echo $number;
done

1
2
3
4
5


Again, the `for` loop in Python is simplified a bit.
- We don't need `do`, `done` and dollar sign (`$`) for variable calling. 
- Need to add colon(`:`) after your statements.
- Indentation sensitive.

The above bash `for` loop be re-written in Python as below:

In [2]:
for number in [1, 2, 3, 4, 5]:
    print(number)

1
2
3
4
5


As demonstrated in the example above, the `for` loop iterates through an *iterable*, returning an item (assign to variable `number`) for each iteration.

The below shows more valid *iterables* in Python:

List of strings is an iterable. It just like a numerical list, but returns a string (a fruit in this case) instead of a number for each round.

In [3]:
fruit_list = ["apple", "banana", "coconut"]

for fruit in fruit_list:
    print(fruit)

apple
banana
coconut


Tuple (very similar to list, but it is immutable) is an iterable. It also returns an item for each round.

In [4]:
numerical_tuple = (1, 2, 3, 4, 5)

for number in numerical_tuple:
    print(number)

1
2
3
4
5


String can also be an iterable. It returns a charactor for each round.

In [5]:
dna = "ATTGGC"

for nt in dna:
    print(nt)

A
T
T
G
G
C


### Range operator
The function [`range()`](https://docs.python.org/3/library/functions.html#func-range) create a sequence of numbers. (Very similar to `seq` in UNIX)

In [6]:
for number in range(5):
    print(number)

0
1
2
3
4


In fact, the `range()` function can takes more than one arguments:

- if only one argument is given:
    - range(stop)
- if more than two arguments are given:
    - range(start, stop, step=1)

In [7]:
# This is an equivelant as the example above
# 0      1      2      3      4      5
# print  print  print  print  print  stop
for number in range(0, 5):
    print(number)

0
1
2
3
4


You can count from 1 instead:

In [8]:
# 1      2      3      4      5
# print  print  print  print  stop
for number in range(1, 5):
    print(number)

1
2
3
4


You can counts by twos:

In [9]:
# 0      1     2      3     4
# print  skip  print  skip  print
for number in range(0, 5, 2):
    print(number)

0
2
4


You can do backward counting

In [10]:
for number in range(10, 0, -1):
    print(number)

10
9
8
7
6
5
4
3
2
1


#### Problem - count down

How to count down by threes from 180 to 150(include)?

In [11]:
# Please write and test your codes in this cell
for i in range(180, 149, -1):
    print(i)

180
179
178
177
176
175
174
173
172
171
170
169
168
167
166
165
164
163
162
161
160
159
158
157
156
155
154
153
152
151
150


### Use `len` to compute the length

You can use `len` to compute the length of an object.

In [12]:
dna = "ATTGGC"
dna

'ATTGGC'

In [13]:
# There are 6 characters in this string.
len(dna)

6

In [14]:
taxonomic_ranks = ["domain", "kingdom", "phylum", "class", "order", "family", "genus", "species"]
taxonomic_ranks

['domain', 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']

In [15]:
# There are 8 items in the list.
len(taxonomic_ranks)

8

## Enumerate the iterable

Combine `len` and `range()` we can create an iterator that generate an index for each round.

(Index can be used to get a particular value from a list object. **It's not started from 1 but from 0.** We'll introduce more usages of it in data structure lecture)

In [16]:
taxonomic_ranks[0]

'domain'

In [17]:
for idx in range(len(taxonomic_ranks)):
    print(idx, taxonomic_ranks[idx])

0 domain
1 kingdom
2 phylum
3 class
4 order
5 family
6 genus
7 species


There is another alternative to do this - use `enumerate()`.

The function `enumerate()` will returns 2 objects - index and value. It is convenient and make the codes more readable.

In [18]:
for idx, rank in enumerate(taxonomic_ranks):
    print(idx, rank)

0 domain
1 kingdom
2 phylum
3 class
4 order
5 family
6 genus
7 species


## Continue and break

Typically all the codes in a `for` loop will be executed.
If you want to skip some of the codes, you can do this by `continue`.

In the example below, we combine the `if` statement to skip the round if the `number` == 7.
Note that the codes *before* `continue` will be execute; but the codes *after* `continue` will be skipped for that particular iteration.

In [19]:
for number in range(10):
    print("The number is ...")  # This will be run
    if number == 7:
        continue
    print(number)               # This will be skipped when number == 7

The number is ...
0
The number is ...
1
The number is ...
2
The number is ...
3
The number is ...
4
The number is ...
5
The number is ...
6
The number is ...
The number is ...
8
The number is ...
9


#### Problem - Oh no! I don't like that

We got a list of fruits! Can you iterate through the list and print("I like", fruit) for each round?

However, I don't really like **dragon fruit** and **pomegranate**, please use `continue` to skip them.

In [20]:
fruit_list = ["apple", "banana", "coconut", "dragon fruit", "grape", "watermelon", "dragon fruit", "pomegranate"]
fruit_list

['apple',
 'banana',
 'coconut',
 'dragon fruit',
 'grape',
 'watermelon',
 'dragon fruit',
 'pomegranate']

In [21]:
# Please write and test your codes in this cell
for fruit in fruit_list:
    if fruit == "dargon fruit" or fruit == "pomegranate":
        continue
    print("I like", fruit)

I like apple
I like banana
I like coconut
I like dragon fruit
I like grape
I like watermelon
I like dragon fruit


Continue the example in the Problem 2. Say if I really hate dragon fruit and once I saw it, I stop eating.

In this case, you can use `break` to jump out the `for` loop. Once meeting a `break`, the program will exit from the `for` loop immediately.

In [22]:
fruit_list

['apple',
 'banana',
 'coconut',
 'dragon fruit',
 'grape',
 'watermelon',
 'dragon fruit',
 'pomegranate']

In [23]:
for fruit in fruit_list:
    if fruit == "dragon fruit":
        break
    print("I love " + fruit)

print("Aghhhhh!!!")

I love apple
I love banana
I love coconut
Aghhhhh!!!


## `While` loop

As shown previously, `for` loop iterate over an iterable which the length is *already known* beforehand.
So once the iteration reach the end of the iterable, it will exit from the loop.

In [24]:
taxonomic_ranks = ["domain", "kingdom", "phylum", "class", "order", "family", "genus", "species"]
taxonomic_ranks

['domain', 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']

In [25]:
for rank in taxonomic_ranks:
    print(rank)
print("The iteration is stop")

domain
kingdom
phylum
class
order
family
genus
species
The iteration is stop


The `while` loop, on the other hand, will keep iterating while the condition is True.

In [26]:
x = 0
while x <= 10:
#     condition
    print(x)
    x = x + 1

0
1
2
3
4
5
6
7
8
9
10


So beware when using a `while` loop. If the condition is not set up properly, it is very easy being trapped in an infinite loop.

In [None]:
x = 0
while x <= 10:
    print(x)
    x + 1       # I forgot to overwrite the x

Just like `for` loop, you can use `continue` to skip the rest of an iteration and use `break` to exit from loop.

In [28]:
x = 0
while x <= 10:
    print(x)
    if x == 7:
        break
    x = x + 1

0
1
2
3
4
5
6
7


#### Problem - trap of the while loop

Can you modify the previous `while` loop and use `continue` to skip the `print` function when x == 7?

(The output should be 0 1 2 3 4 5 6 8 9 10)

In [29]:
# Not using continue but use pass instead
x = 0
while x <= 10:
    if x == 7:
        pass
    else:
        print(x)
    x = x + 1

0
1
2
3
4
5
6
8
9
10


In [35]:
# If you insist to use continue to do that. You still have to place
# the adder (x = x + 1) before the continue, otherwise it will fall
# into infinite loop. Since the adder is also before the `print` function,
# a better way to do that is to print x - 1. You also need to adjust the
# if statement to stop when x is 8 (where 8 - 1 = 7 would be printed
# without `continue`)

x = 0
while x <= 10:
    x = x + 1
    if x == 8:
        continue
    print(x - 1)
    

0
1
2
3
4
5
6
8
9
10


#### Challenge - Find the ORFs

Below is a DNA sequence without intron and there are 3 open reading frames in it. (As you can see, the start codon of these 3 ORFs are labeled in bold text)

---
dna = CC**ATG**CGGTG**ATG**CCACTAGGCG**ATG**TATAACTGATTAAAA

---

Can you use the skills we learned so far to write a codes that print out all the ORFs?

You might also need to detect the stop codon TAA, TAG and TGA. Here I'll provide a hint for that since I didn't really covered this part in the class.

You can use `in` to check whether a value is contained in an iterable. So one way to detect the stop codon is:
```
if codon in ["TTA", "TAG", "TGA"]:
    do something
```
where codon is a 3-character string.

The expected output is:<br>
ATGCGG<br>
ATGCCACTAGGCGATGTA<br>
ATGTATTAC<br>

In [None]:
dna = "CCATGCGGTGATGCCACTAGGCGATGTATTACTGATTAAAA"