# workbook C: Conditional tests and loops

Before starting this exercise you should work through

* First part of *Chapter 3 Logic, Control Flow and Filtering*
  stopping before the video "Filtering Pandas DataFrame",
* First part of  *Chapter 4 Loops*
  stopping before the video "Looping Data Structures, part 1",

from the
[DataCamp online course: Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science),

We will then use what you have learnt there to explore how biological
data can be handled in Python.

> ### Reminder: saving your work
>
> As you work through the work book it is important to regularly save your work. Notice that as you have made  changes the Jupyter window top line will warn you there are `(unsaved changes)` in small text. To save your work in this notebook by either select menu item `File` `Save` or by hit the save button:
> 
> <img src="./images/save_button.png"/>
>
> 
> ### Reminder: getting help 
> Please see the page:
> [Help with programming](https://canvas.anglia.ac.uk/courses/1490/pages/help-with-programming)
> on ARU Canvas.

## Comparison operators

Revise the video *Comparison Operators* from *Chapter 3 Logic, Control Flow and Filtering*, [DataCamp online course: Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science).


**Comparators**

| Comparator  | meaning               | 
| :--------:  | :----------------:    |
|      <      | strictly less than    |
|      <=     | less than or equal    |
|      >      | strictly greater than |
|      >=     | greater than or equal |
|     ==      | equal |
|     !=      | not equal |

We can use these comparators for tests.

For instance to test whether num_rabbits is less than 1000:

In [None]:
# Instruction: run this cell to see an example test num_rabbits is less than 1000:
num_rabbits = 1024
print(num_rabbits < 1000)

**Your turn** Now practice using these comparators.

In [None]:
numb_chromosomes = 34
base_1 = 'G'
base_2 = 'A'
base_3 = 'T'
mammalian = True
# Instruction: write Python expression, wrapped in a print function to check whether:

# * numb_chromosomes is less than 50
### your line here! 

# * numb_chromosomes is greater or equal to 34
### your line here! 

# * base_1 is equal to base_2
### your line here! 

# * base_2 is not equal to base_3
### your line here! 

# * base_2 is greater than base_3
### your line here! 

# * mammalian is True
### your line here! 

Note that there are some subtleties involved in these comparisons. 
Strings are greater or larger than one another based on the 
[ASCII](https://en.wikipedia.org/wiki/ASCII) 
values of their characters:

In [None]:
# Instruction: run this cell to see how string comparison works in Python
print("'a' < 'b' =", 'a' < 'b')
print("'cat' < 'car' =", 'cat' < 'car')
print("'car' < 'carrot' =", 'car' < 'carrot')
print("capital letters occur before lower case letters so 'A' < 'a' =", 'A' < 'a')
print("digits occur before upper case letters so '1' < 'A' =", '1' < 'A')
print("comparing numbers as strings can lead to unexpected results! '10' < '7' =", '10' < '7')

A second subtlety is that it is poor practice to use the equality operator on booleans - they are either True or False to start with:

In [None]:
# Instruction: run this cell to see how booleans should be used in python
animal = 'rabbit'
mammalian = True

if mammalian == True:  # not good practice
    print(animal + ' is a mammal')
    
if mammalian:  # better
    print(animal + ' is a mammal! (better)')

## Making choices with  `if, elif, else `  

The real use of comparator operators is to make choices in programs. For instance one
might want to write script to check whether gene expression data is different for mammals 
compared to fish or big animals compared to small.

As you have seen in the  video *if, elif, else* from *Chapter 3 Logic, Control Flow and Filtering*, [DataCamp online course: Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science) the `if condition:` can be used to apply a test.

The if statement is used:
```
if condition:
   expression1
   expression2
expression3
```
Notice that
* there is a colon `:` at the end of the if line - this is essential.
* that expression1 are expression2 indented by 4 spaces compared to the line above. (It is most normal to used 4 spaces for the indentation but 2 or 3 spaces are sometimes used. It is possible to use the tab character but this is really bad practice and should be avoided!).
* this means that if the `condition1` is True the script will go on and run `expression1`, followed by `expression2` before going on to `expression3`.
* But if `condition1` is False the script will ignore `expression1` and `expression2` going straight to `expression3`.
* there must be at least one indented expression following an if condition. Here we have choosen to have two but you can have as many as you want.

Lets show you an example:

In [None]:
# Instruction: run this cell to see if statment in Python
HEAVY = 100.
TALL = 1.8
weight = 105.
if weight > HEAVY: # notice the colon :
    print('is heavy', end='')  # notice indentation by 4 spaces
    print(' (as weight', weight, 'kg is greater than', HEAVY, 'kg)')
height = 1.05 # this statement happens regardless of the condition
if height > TALL:
    print('is tall', end='')
    print(' (as height', weight, 'm is greater than', TALL,' m)')
print(height*weight)

A natural extension of the `if` statement is `else:`. This is used:
```
if condition:
   expression1
   expression2
else:
   expression3
   expression4
expression5
```
Notice that 
* `else` is followed by a colon
* if the `condition1` is True the script will go on and run `expression1`, followed by `expression2` before going on to `expression5`.
* but if  `condition1` is True the script will go on and run `expression3`, followed by `expression4` before going on to `expression5`.
* Once again there must be at least one indented expression following an `else:`


In [None]:
# Instruction: run this cell to see if .. else statment in Python
HEAVY = 100.
weight = 78.
if weight >= HEAVY:
    is_heavy = True
else:
    is_heavy = False
print('is_heavy: ' + str(is_heavy))

In [None]:
# Instruction: write Python expressions for the tests below

numb_chromosomes = 34
base_a = 'A'
base_b = 'A'
base_c = 'U' 
animal_a = 'spider'
mammalian_a = False
animal_b = 'kangaroo'
mammalian_b = True

# if numb_chromosomes is less or equal to 50 print out
# 'low chromosome number' otherwise print 'high chromosome number'
### your lines here! 

# if base_a is equal to base_b print out
# 'base a and b same' otherwise print 'base a and b different'
### your lines here! 

# if base_a is equal to base_c print out
# 'base a and c same' otherwise print 'base a and c different'
### your lines here! 

# if mammalian_a is True  print out#'***** is a mammal' 
# otherwise print out '*** is NOT a mammal' where ***** is the
# animal name
### your lines here! 

The `elif` construction takes this further This is used:
```
if condition1:
   expression1
elif condition2:
   expression2
else:
   expression3
expression4
```
* Once again notice that the colon : is required at the end of the `elif`
* if `condition1` is True then `expression1` is evaluated.
* but if `condition1` is False then `condition2` is evaluated and if this is True then `expression2` is invoked.
* if none of the conditions is True the `else:` takes effect and `expression3` occurs.
* you can have as many elif tests as you wish.

if, elif, else constructions can be useful in categorizing data.

As an example suppose we have a dataset with a float `e_level` representing expression level. We want to divide up the into three sets:

| `e_level`           | `e_class`           |
| :-----------------: | :-----------------: |
| `e_level` ≤ 25      | `'low expression'`  |
| 25 < `e_level` ≤ 75 | `'mid expression'`  |
| `e_level` > 75      | `'high expression'` |


In [None]:
# Instruction: write Python expressions to classify expression level 
# according to table above
LOW_EXPRESS = 25.
HIGH_EXPRESS = 75.

# change next line to test your code
e_level = 55.

# Instruction modify next line to classify data
e_class = 'needs to be defined'

print('expression level {}: classification "{}"'.format(e_level, e_class))

**Note**: instead of hard coding particular values in the `if elif else` block we have set constants `LOW_EXPRESS` and `HIGH_EXPRESS`. Constants are Python variables that are in upper case to give an indication they should not be changed.  

## Combining booleans with the operators `and`, `or` & `not`

Revise the video *Boolean Operators* (you can skip the from the Numpy section) from Chapter 3 Logic, Control Flow and Filtering, DataCamp online course: Intermediate Python for Data Science.

The `and` and `or` operators are useful for combining comparison operators.

For instance:

In [None]:
# Instruction: run this cell to see how and and or operators can be used in Python
HEAVY = 90.
TALL = 1.8
for _ in range(5):  # prompt 5 times
    weight = float(input('Enter your weight (in kg): '))
    height = float(input('Enter your height (in m): '))
    if weight > HEAVY and height > TALL: 
        print('both heavy and tall')
    if weight > HEAVY or height > TALL:
        print('either heavy or tall.')
    else:
        print('not heavy or tall')

Q: Can you change the previous code so as to not print "either heavy or tall" for people who are both heavy and tall?

A particular use of `and` and `or` operators is to test whether variables are in a numeric range.

For instance to test whether the n_chromosomes is in the range 10 to 40 (inclusive):

In [None]:
# Instruction: run this cell to see how to test for variable in numeric range
for _ in range(5):  # prompt 5 times
    n_chromosomes = float(input('Enter number of chromosomes: '))
    if n_chromosomes >= 10 and n_chromosomes <= 40:
        print('n_chromosomes is in the range 10 to 40 (inclusive)')
    else:
        print('n_chromosomes NOT in the range 10 to 40 (inclusive)')

The `not` operator is useful for inverting a logical value for instance:

In [None]:
is_tall = False
if not is_tall:
    print('short')

In [None]:
age = 21.25
fish = False
mammalian = True
# Instruction: write Python expression, wrapped in a print function to check whether:

# * age is greater than 18.0 and less than 65.
### your line here! 

# * age is less than 18.0 or greater than 65.
### your line here! 

# either fish or mammalian are True
### your line here! 

# both fish and mammalian are False
### your line here! 

# age is greater than 21. and mammalian is True
### your line here! 

## `for` loops


You should have already worked through 
* First part of  *Chapter 4 Loops*
  stopping before the video "Looping Data Structures, part 1",

from the
[DataCamp online course: Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science). Make sure you look at the *for loop* video

We will then use what you have learnt there to explore how biological

The for loop can be used to go through lists:

In [None]:
# Instruction: run this cell to see for working through lists
n_snips = [7, 8, 23, 21]
for n_snip in n_snips:
    print(n_snip)
for species in ['human', 'mouse', 'whale', 'cabbage']: # note hard coded lists
    print(species)

It can also be used to go through the characters in a string:

In [None]:
# Instruction: run this cell to see for working a sequence
peptide = 'CNCALRCDEC'
n_c = 0
for aa in peptide:
    print(aa)

As you have seen in the video if you want to have access to the index while going through a list you can use the enumerate function:

In [None]:
# Instruction: run this cell to see enumerate working
peptide = 'CNCALRCDEC'
for index, aa in enumerate(peptide):
    print(index, aa)

If you want to do an operation a given number of time use the range function:

In [None]:
# Instruction: run this cell to see range cell for counting
for i in range(5):
    print(i)

This leads to a poor way of working through a list or string, that you might sometimes see:

In [None]:
# Instruction: poor way to work through a sequence
sequence = 'GCGCGGATGCGAG'
for i in range(len(sequence)):
    base = sequence[i]
    print(i, base)

The enumerate function is clearer than this.

The `for` can be combined with conditionals to count this, for instance:

In [None]:
# Instruction: run this cell to see for working a sequence
peptide = 'CNCALRCDEC'
n_c = 0
n_total = 0
for aa in peptide:
    n_total = n_total + 1
    if aa == 'C':
        n_c += 1
print('sequence length ' + str(n_total) + 
      ' and contains ' + str(n_c) + ' Cs')

Now it is your turn. 

In [None]:
# Instruction: use a for loop to work out percentage GC in: 
sequence = 'GCGCGGATGCGAGTTTT'
### your code here

You already know another way to find the % GC content for the sequence using the `.count()` method and `len()` function. Does it produce the same answer?

In [None]:
# Instruction: use .count()/len*() to work out percentage GC in: 
sequence = 'GCGCGGATGCGAGTTTT'
### your code here

Programmers spend more of their time reading existing code rather than writing new. Which version of the % GC procedure would be easier to understand?

## Homework C: Conditional tests and loops
Now go on and use conditional tests and loops explored here to complete the [homework C](./ex_C_homework.ipynb)