# Week 2 Tutorial - Control Flow 

This week, we will cover the following contents:

1. Loops and Statements 
    * If statement
    * For loop
    * While loop
2. Loop Control Statements
    * Continue statement 
    * Break statement 
    * Pass statement 

## Loops and Statements

### If Statement

In Python, the if statement is a conditional statement that allows you to execute a block of code only if a certain condition is True. 

It follows the general syntax:

In [3]:
if condition:
    # code block to be executed if condition is True

SyntaxError: incomplete input (3124047070.py, line 2)

The condition in the if statement is an expression that evaluates to a boolean value (True or False), it means you can't if a number or a string, it doesn't make sense. For example, ```if 5:``` or ```if "apple and banana":```.

* If the condition is True, the code block will be __executed__.
* If the condition is False, the code block will be __skipped__.

__A simple example of If Statement:__

In [8]:
x = 5

if x > 0:  
    print("x is positive.")

x is positive.


Above, when our x is greater than 0 the code will print "x is positive", and when our x is not greater than 0 the code will do nothing because it will skip the indented line.

__if...elif...else for multiple conditions:__

The general syntax:

In [1]:
if condition:
    # code block to be executed if condition is True
elif condition:
    # code block to be executed if condition is True
else:
    # code block to be executed when other conditions have not been met 

IndentationError: expected an indented block after 'if' statement on line 1 (2645580786.py, line 3)

You can use as many elif as you want, for example:

In [2]:
# Example of if-elif-else statement to determine grade
score = 78

if score >= 85:
    grade = 'HD'  # High Distinction
elif score >= 75:
    grade = 'D'  # Distinction
elif score >= 65:
    grade = 'C'  # Credit
elif score >= 50:
    grade = 'P'  # Pass
else:
    grade = 'F'  # Fail

print("Score:", score)
print("Grade:", grade)

Score: 78
Grade: D


### For Loop

Computing is mostly about doing the same thing again and again in an automated fashion. An example task that we might want to repeat is printing each character in a DNA sequence on a line of its own. 

In [8]:
DNAseq = 'atgtataacattggccataccccgtatacccatgcgaaccatattggccattaa'

One way to do this would be using a series of print statements: 

In [2]:
print(DNAseq[0])
print(DNAseq[1])
print(DNAseq[2])
print(DNAseq[3])
print(DNAseq[4])

a
t
g
t
a


**Python Indexing:**

Above, we used square brackets to access the values at different positions in the string, where ```DNAseq[0]``` means the first value in the string and ```DNAseq[1]``` means the second value in the string, and so on. As we may notice, we used 0 to represent the first value rather than 1, it is because Python Indexing starts from 0, and same as many other programming languages. 

Python Indexing starts from 0 because it is a convention that has been adopted by many programming languages, including C, C++, java, and many others. The idea behind starting from 0 is to make it easier to calculate memory addresses, which is important when dealing with low-level programming tasks. 

Okay, let's get back to print our bases.

In [9]:
print(DNAseq[5])
print(DNAseq[6])
print(DNAseq[7])
print(DNAseq[8])
print(DNAseq[9])

t
a
a
c
a


I'm getting a bit tired, let me check how many bases left for me to print.

In [10]:
len(DNAseq)

54

The len() function can check the length of a string. The sequence is 54 bases long and to print all of them I have to to write the print() funtion 54 times. 

Obviously this is not a good way to solve the problem because:
* What if I need to print a sequence with 20K bases? 
* What if I have multiple sequences I need to print?

A better approach is to use **For Loop** in Python.

__What is a for loop? And what does it do?__

A for loop is a control flow statement in Python that is used to iterate over a sequence of values or a collection, and execute a block of code once for each element in the sequence. It allows you to repeatedly perform a set of tasks or operations on each element in the sequence without having to write repetitive code. 

The basic idea behind a for loop is to iterate over each element in an iterable, such as a list, tuple, string, or dictionary, and perform some action for each element. The loop automatically handles the iteration process, moving from one element to the next until all elements have been processed.

__The syntax of a for loop is:__

In [7]:
for variable in iterable:
    # Code block to be executed

SyntaxError: incomplete input (2662539125.py, line 2)

__A simple example of for loop:__

In this example, we will create a list called "numbers" and it contains integers from 1 to 5, and use for loop to print all of the integers. 

In [9]:
numbers = [1, 2, 3, 4, 5] # create a list 

# use for loop to print every element in the list 
for num in numbers:
    print(num)

1
2
3
4
5


__Use for loop to print all the bases in the object "DNAseq":__

We have defined the variable "DNAseq" before in the same Jupyter Notebook so we don't need to define it again, it is saved in the environment. But if you close the notebook it will delete all the variables you have created in this session. You will have to run those codes again to create the variables. 

In [22]:
for base in DNAseq:
    print(base)

a
t
g
t
a
t
a
a
c
a
t
t
g
g
c
c
a
t
a
c
c
c
c
g
t
a
t
a
c
c
c
a
t
g
c
g
a
a
c
c
a
t
a
t
t
g
g
c
c
a
t
t
a
a


In [12]:
# create two more variables for us to practice 
DNAseq2 = 'ccgtatacccatgcgaacatggcgaaagaaagctttgcgagcacctaa'
DNAseq3 = 'ccgt'

__Use for loop to print all the bases in the object "DNAseq2":__

In [21]:
for base in DNAseq2:
    print(base)

c
c
g
t
a
t
a
c
c
c
a
t
g
c
g
a
a
c
a
t
g
g
c
g
a
a
a
g
a
a
a
g
c
t
t
t
g
c
g
a
g
c
a
c
c
t
a
a


__Use for loop to print all the bases in the object "DNAseq3":__

In [14]:
for base in DNAseq3:
    print(base)

c
c
g
t


We can see that no matter how long our sequence is, we can always use two lines of code to perform the same task. 

Unlike many other languages, there is no command to start/end a loop (e.g. do/done in **bash**). What is indented after the for statement belongs to the loop.

__Exercise 01: Write a loop that counts the number of bases in your DNA sequence (variable DNAseq)__

Here is a reference example:

In [15]:
# how many fruits are in my basket?

basket = ["apple", "banana", "peach", "nectarine", "apple_2", "pineapple"]
number_of_fruits = 0

for fruit in basket:
    number_of_fruits = number_of_fruits + 1

print("There are", number_of_fruits, "fruits in my basket.")

There are 6 fruits in my basket.


In [16]:
# write your solution here:



__Exercise 02: Now make it print the DNA sequence too. Like "There are 3 bases in the sequence atg."__

In [18]:
# write your solution here:



Note that there is a space between the sequence and the dot in the above output. That's because for the print() function, when you use comma to concatenate data it will put a space in between by default. 

To change this default setting, we can use the "sep" argument to specify which delimiter we want to use. Below, we specify sep="" which means we don't use any delimiter so the data will concatenate right next to each other. In this way, you need to make sure to include the space character in your strings. Otherwise, you won't get any space between your data. 

In [16]:
print('There are ', number_of_bases, ' letters in the sequence ', DNAseq, '.', sep="")

There are 54 letters in the sequence atgtataacattggccataccccgtatacccatgcgaaccatattggccattaa.


But finding the length of a string is such a common operation that Python has a built-in function to do it called __len()__:

In [21]:
len(DNAseq)

54

len() is much faster than any function we could write ourselves, and much easier to read than a two-line loop; it will also give us the length of many other things that we haven’t met yet, so we should always use it when we can.

### Iterate Over Numbers in For Loop 

All practises we did before we looped everthing from the beginning to the end, but not all of the time we want to execute the commands on all of our data.

__What if I want to run the commands on half of the list? What if I want to run the commands on every other element?__

Python has a built-in function called __range()__ that creates a list of numbers, we can loop through these numbers and use it as indexes to get the elements we want. The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number.

Syntax: range(start, stop, increment)

In [1]:
for i in range(3): # starting from 0 and increments by 1 until 3 (exclusive)
    print(i)

0
1
2


In [27]:
for i in range(2,5): # starting from 2 and increments by 1 until 5 (exclusive)
    print(i)

2
3
4


In [28]:
for i in range(10,30,5): # starting from 10 and increments by 5 until 30 (exclusive)
    print(i)

10
15
20
25


__Exercise 03: Write a loop that prints all the even numbers in the range between 1 and 10 (inclusive)__

In [None]:
# write your code here:



We can make it more generic and use what we have learned before:

In [6]:
start, end = 1, 10

for i in range(start, end + 1):
    if i % 2 == 0:
        print(i)

2
4
6
8
10


Here, we used __if statement__ and __modulo operation__ to check if the number is even and only print out those that are. 

__Iterate over indexes:__

Iterate over numbers in for loop is very useful when you combine it with object indexes. You can execute a block of command to items that are part of a object rather than loop everthing from the beginning to the end. 

Before that we need to learn the concept of subscripting.

__Subscripting in Python:__

In Python, subscripting refers to accessing individual elements or slices of a sequence or collection, such as strings, lists, tuples, or arrays, using square brackets [] and an index or slice notation. It allows you to extract or modify specific elements of a sequence. For example:

__HERE TO CONTINUR !!!__

### While Loop

A while loop allows you to repeatedly execute a block of code as long as a certain condition is True. 

The general syntax of a while loop in Python is as follows:

In [2]:
while condition:
    # Code to be executed

SyntaxError: incomplete input (2941978553.py, line 2)

__A simple example of While Loop:__

In [3]:
count = 1

while count <= 5:
    print(count)
    count += 1

1
2
3
4
5


__A biological example of While Loop:__

We can use while loop to find the start codon in a DNA sequence.

In [6]:
# Define the DNA sequence
dna_sequence = "TTAGCTATGACATGTAGGCTAGCTAG"

# Initialize the index variable
index = 0

# Use a while loop to search for the start codon
while index < len(dna_sequence):
    codon = dna_sequence[index:index+3] 
    if codon == "ATG": 
        print("Start codon found at index:", index)
        break  
    index += 1 
else: 
    print("Start codon not found in the given DNA sequence.")

Start codon found at index: 6


In [19]:
DNAseq

'atgtataacattggccataccccgtatacccatgcgaaccatattggccattaa'

In [25]:
# access the element at index 0
DNAseq[0] 

'a'

In [26]:
# access the element at index 5
DNAseq[5] 

't'

In [27]:
# access the elements from index 0 to 3 (exclusive)
DNAseq[0:3] 

'atg'

__Note:__ when using square brackets to slice a sequence from a collection, the beginning index is always inclusive and the ending index is exclusive. 

So for `DNAseq[0:3]`, it includes index 0, 1, 2 but not 3.

In [28]:
# access the elements from the beginning to index 6 (exclusive)
DNAseq[:6]

'atgtat'

In [29]:
# access the elements from index 2 (inclusive) to the end
DNAseq[2:]

'gtataacattggccataccccgtatacccatgcgaaccatattggccattaa'

You can use subsetting to print out all the first bases of all codons.

In [30]:
for i in range(0, len(DNAseq), 3):
    print(DNAseq[i])

a
t
a
a
g
c
a
c
t
a
c
g
a
c
a
g
c
t


For the above loop, consider the question:
* What indexes are been used in the loop? Obviously it's not all the indexes in variable DNAseq. 

### Exercise

Write a loop that prints all the codons (non-overlapping 3-mers) of our variable DNAseq.

In [None]:
# write your code here:




In [31]:
# solution
for i in range(0, len(DNAseq), 3):
    print(DNAseq[i:i+3])

atg
tat
aac
att
ggc
cat
acc
ccg
tat
acc
cat
gcg
aac
cat
att
ggc
cat
taa


In [32]:
# this one also prints the start index of the codon
for i in range(0, len(DNAseq), 3):
    print(i,DNAseq[i:i+3])

0 atg
3 tat
6 aac
9 att
12 ggc
15 cat
18 acc
21 ccg
24 tat
27 acc
30 cat
33 gcg
36 aac
39 cat
42 att
45 ggc
48 cat
51 taa


## Control flow - more ways to affect the order in which statements run

So far we've seen "for" loop in a few of our Python examples.

There are a few other "control flow" statements that affect the order in which statements run.

We'll start first with the "if statement" ... and its variants.

We can also use loops to count how often the base 'a' occurs in our variable DNAseq.

In [4]:
a_count = 0

for base in DNAseq:
    if base == 'a':
        a_count += 1
        
print('We have the following base counts:')
print('a:', a_count)    

We have the following base counts:
a: 17


__Exercise: What are the counts of each regular base [a, t, c, g] in DNAseq?__

In [None]:
# write your code here: 



In [5]:
a_count = 0
t_count = 0
c_count = 0
g_count = 0

for base in DNAseq:
    if base == 'a':
        a_count += 1
    elif base == 't':
        t_count += 1
    elif base == 'c':
        c_count += 1
    elif base == "g":
        g_count += 1
    else:
        print(base, 'is not a regular base [a, t, c, g]')
        
print('We have the following base counts:')
print('a:', a_count)
print('t:', t_count)
print('c:', c_count)
print('g:', g_count)   

We have the following base counts:
a: 17
t: 14
c: 15
g: 8


## Data structures

### Lists, tuples, and sets

Python has a set of standard 'containers' where you can store information in. Each data structure has its purpose and advantages.


### Lists

Lists are ordered sequences of elements. Each element or value that is inside of a list is called an item. Just as strings are defined as characters between quotes, lists are defined by having values between square brackets []. 

For example, codons within DNA sequence could be stored in a list of 3-mers.

In [None]:
DNAseq

In [None]:
### initially you can copy and paste the codons over and make a list
DNAseq_codons = ['atg', 'tat', 'aac' ]

We can also make a list more easily by looping over the string.

In [None]:
DNAseq_codons = []

for i in range(0, len(DNAseq), 3):
    DNAseq_codons.append(DNAseq[i:i+3])
    print(DNAseq[i:i+3])
print(DNAseq_codons)

### Lists are
* subscriptable
* mutable

In [None]:
DNAseq_codons[4]

In [None]:
DNAseq_codons[4] = 'aat'

In [None]:
DNAseq_codons

### Sets

Sets are unordered collection of items where each item occurs only once. It has no duplicated elements.

In [None]:
print(DNAseq_codons)
type(DNAseq_codons)

In [None]:
set(DNAseq_codons)

In [None]:
set[0]

### Sets are
* not subscriptable
* not mutable

Because sets cannot have multiple occurrences of the same element, it makes sets highly useful to efficiently remove duplicate values from a list and to perform common math operations like unions and intersections.

### Dictionaries

Dictionaries are (unordered) containers of key:value pairs where each key can only occur once.

They are one of most useful Python data structures. Looking up elements in a dictionary is really fast and there are lots of built-in functions to use and manipulate dictionaries.

We can use a dictionary to reverse complement a DNA sequence.

In [None]:
base_pair_dict = {'a' : 't', 
                  't' : 'a', 
                  'g' : 'c', 
                  'c' : 'g'}

In [None]:
base_pair_dict

In [None]:
base_pair_dict['g']

In [None]:
###so now reverse complement our DNA sequence

reverse_comp_DNAseq = ''

for base in DNAseq[::-1]:
    paired_base = base_pair_dict[base]
    reverse_comp_DNAseq += paired_base
    
print(reverse_comp_DNAseq, 'is the reverse complement of', DNAseq)

### Exercise: decode the hidden message in the DNA sequences

Use the coding table dictonary to decode the hidden message!!!!  
Hint you can convert a string to upper case with the function: x.upper(), where x is your string variable.


In [None]:
'agt'.upper()

In [None]:
coding_table_dict = { 
        'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T', 
        'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',                  
        'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R', 
        'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 
        'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 
        'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L', 
        'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 
        'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W', 
    } 

### Solution

In [None]:
hidden_message = ''

for i in range(0, len(DNAseq), 3):
    codon = DNAseq[i:i+3]
    amino_acid = coding_table_dict[codon.upper()]
    hidden_message = hidden_message + amino_acid

print("This is the hidden message in", DNAseq, ':\n', hidden_message)

In [None]:
hidden_message = ''

for i in range(0, len(DNAseq2), 3):
    codon = DNAseq2[i:i+3]
    amino_acid = coding_table_dict[codon.upper()]
    hidden_message = hidden_message + amino_acid

print("This is the hidden message in", DNAseq2, ':\n',hidden_message)