    
<h1><center>Conditional Tests</center></h1>


**Conditionals** are used when you require a program to look at some data and make a decision to either execute or skip a sequence of statements.

<br>

```ruby
if expression:
    statements```
    

<br>

_**Example 1.**_
 
```ruby
if humans_can_be_inspiring: 
    give humans a chance```  
 
Above is **True**. So  `give humans a chance` will execute/run    
<br>
<br>


_**Example 2.**_
```ruby
 
if all_politians_can_be_trusted: 
    give them your money```     
<br> Above is **False**. So `give them your money` will **NOT** execute
<br> 

               


If the <span style="color:blue">expression</span>  is true, the statements are executed; otherwise, they are skipped.


Code that instructs the computer to scan data and to decide whether it satisfies your requirements/conditions, is called a ***condition***.

The code then produces a ***True*** or ***False*** answer 

For example, all the lines of code <u>**BELOW**</u>, use ***operators*** (e.g `==`, `!=`, `in`, `not in`) that will either answer **True** or **False** when the code is executed


<div class="alert alert-block alert-success">
    
> Some of the most commonly used ***Comparison operators*** are:

1. Greater than and less than:  `>` and `<`
<br>

2. "More than **or** equal to" and "Less than **or** equal to":  `>=` and `<=`
<br>


3. Equals/equavalent to:  `==` _(This will PASS for two clones which are genetically the same)_
<br>

4. Not equal to: `!=`


5. Are two objects **the same**: `is` _(This will FAIL for two clones - even though they are genetically the same. Because they are still not the same sample/individual)_
<br>

6. Is a value **in** a some data type, such as a string or list: `in`
<br>


> Will the statements below return a **True** or a **False**?

In [None]:
print(3 == 5) # Read as: "3 equal to/equavalent to 5. True or False?"
print(3 > 5)
print(3 <=5)
print(len("ATGC") > 5) 
print("GAATTC".count("T") > 1)

print("V" in ["V", "W", "L"])
print("S" not in ["V", "W", "L"])

# Some of the methods you worked with before, actually also worked as `conditions`
print("ATGCTT".startswith("ATG"))
print("ATGCTT".endswith("TTT"))
print("ATGCTT".isupper())
print("ATGCTT".islower())
print(not "ATGCTT".islower())

<div class="alert alert-block alert-danger">

>Note that ***True*** and ***False*** are built-in words in Python and not strings. They can therefore:
1. be printed without using quotation marks
2. ***cannot*** be chosen as variable names

In [None]:
print(True)
print(False)

> as opposed to below, which will produce an error:


In [None]:
print(dna_name) # an unassigned, non built-in variable name

In [None]:
True = "string"

In [None]:
true = "string" #Python is case-sensitive

print(true)

<h2><center>if Statements </center></h2>


How to use **if Statements**

> We write the word **if**, followed by a **condition**, and end the first line with a **colon**.


In [None]:
#Note that because the line containing the if statement contains 
# a colon, the body of the if statement is indented

number = 125 #assigned the value 125 to the variable `number`

if number < 100:
    print("gene is highly expressed") # note that this is tab-indented
    

print("the statement was false")

    
> But this is rarely how we would use a conditional. Usually we would have data too large to look through manually. So look at the example below

> We can write a conditional that loops through a list, but only find accession numbers that starts with **"a"**

In [None]:
accessions = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']

for accession in accessions:
    if accession.startswith('a'):
        print(accession)


In [None]:
accessions = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']

for position, accession in enumerate(accessions): # you can use `enumerate` to indicate the position/index of the item
    if accession.startswith('a'):
        print(accession, position + 1) # we add +1 since indexing starts at 0



> Remember from Loops that the variable name "accession" is user-determined. But you can use any other variable, such as "acc" or "I_dont_care", but we usually choose a variable name that is meaningful

> The variable you use takes on the value of each item in the list, sequentially as it moves through the loop. 

> So at the very start of the for loop, when we execute the code, accession == 'ab56'. 

> The second time around, accession == 'bh84'....and at the end of the loop, it takes on the last value in the list, 'ar64'.

> Also, note that here there are two levels of indentation. One for the ***for loop*** and one for the ***if statement***. 

    
> You can use many different conditions and Python can handle any number of indendation. However, the general rule of thumb is ***"If you find yourself writing a piece of code that requires more than three levels of indentation, it's generally an indication that that piece of code should be turned into a function."*** This is usually to protect the programmer from confusion and errors

<h2><center>Nested Loops</center></h2>

> A **nested loop** is a loop within a loop.

> You must be careful to understand how a nested loop is executed by your script

In [None]:
names = ["Homer", "Bart", "Bob"]
activities = ["drinks coffee", "uses his imagination", "fools himself"]

for name in names:
    for activity in activities:
        print (name, activity) #Homer has to finish all the activies, before Bart or Bob gets a go
        
    


<h2><center>Homer drinks coffee</center></h1>
<img src="https://media.giphy.com/media/l2Je9NYUAiqQEBOog/giphy.gif" width="300" height="300" />
<br>

<h2><center>Homer uses his imagination</center></h1>
<img src="https://media.giphy.com/media/DK3nPt4gDanRK/giphy.gif" width="300" height="300" />
<br>

<h2><center>Homer fools himself</center></h1>
<img src="https://media.giphy.com/media/4KkSbPnZ5Skec/giphy.gif" width="300" height="300" />

<br>
 <h2><center>Bart does NOT get coffee until Homer completed his activities</center></h1>
<img src="https://media.giphy.com/media/xT5LMPeYybFm1ryZ6o/giphy.gif" width="300" height="300" />


> So the bottom code reads:
> 1. Execute the **first `for loop`** for the list called **`names`**
> 2. Let the variable called **`name`** take on the first item in the list **`names`** - which is **Homer**
> 3. Move to the **second `for loop`** for the list called **`activities`**
> 4. Let the variable name **`activitity`** take on the first item
> in the list called **`activities`**,  which is **`drinks coffee`**
> 5. <span style="color:red"> **DO NOT GO BACK TO THE FIRST FOR LOOP**</span> and change what is stored in the variable **`name`** from Homer to Bart
> until Homer has executed everything in the second **`for loop`**


In [None]:
letters = ["a", "b", "e", "o"]
words = ["bounce", "run", "flight", "car", "roll", "role"]

for letter in letters:
    for word in words:
        if letter in word:
            print (letter, word)

In [None]:
snps = ['rs0016','rs222','rs333','rs1589','rs47847','rs747474']
entries = ['ID;SNP;position;gene', '1;rs111;2015;tp53',' 2;rs222;3069;tp53',' 3;rs33349;85476;tp53',' 4;rs444;102365;tp53']
        
for snp in snps:
    for entry in entries:
        if entry.startswith("ID"):
            continue # skip the entry
        if snp in entry:
            print (snp, entry)
            
            

<div class="alert alert-block alert-danger">

> As previously stated, this is not the best way to do it

> its better to split the individual items in the entries on the semi-colon and use the equivalent to check. 

> You may risk having an entry like **rs33349** passing the test, because **rs333** is in that snp

In [52]:
snps = ['rs0016','rs222','rs333','rs1589','rs47847','rs747474']
entries = ['ID;SNP;position;gene', '1;rs111;2015;tp53',' 2;rs222;3069;tp53',' 3;rs33349;85476;tp53',' 4;rs444;102365;tp53']
#the snp rs333 was changed to rs33349 and should now not pass

for snp in snps:
    for entry in entries:
        
        if entry.startswith("ID"):
            continue # skip the entry
        column = entry.split(";")
        if snp == column[1]:
            print (snp, entry)
        

rs222  2;rs222;3069;tp53


<h2><center>else Statement</center></h2>

> Sometimes we also want access to the data that does not satisfy our conditions. For example, print all values that are unique to one file, but print the rest to another file, because you might want to use the other data for another study

> For this we use the ***else clause*** after the end of the body of an "if statement"

```ruby
if expression:
    statements1
else:
    statements2```

> The else statement ***doesn't have any condition*** of its own – rather, the else statement body is ***executed when*** the ***if statement*** to which it's attached ***is not executed***



In [None]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a'):
        print(accession)
    else:
        print ("I do not start with 'a': ", accession)

> But usually we would separate what we need into separate files

In [None]:
# Name and open the two files I require
accessions_of_interest = open("accessions_of_interest.txt", "w")
leftover_accessions = open("leftover_accessions.txt", "w")

accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a'):
        accessions_of_interest.write(accession + "\n") # I need to include the new lines myself
    else:
        leftover_accessions.write(accession + "\n")


<div class="alert alert-block alert-danger">

> Notice how there are multiple indentation levels as before, but that the ***if*** and ***else statements*** are at the ***same level***.

<h2><center>elif Statement</center></h2>


> But sometimes you need to test for multiple conditions

> The ***elif*** statement, allows us to accomplish just that

```ruby
if expression1:
    statements1
elif expression2:
    statements2
elif expression3:
    statements3
    ( . . . any number of additional elif clauses)
else: (optional)
    statements```
    
   

The expressions in the `if` clause and **each** of the `elif` clauses are evaluated **in order**, until one is true. <br>The statement of the first expression that is true, is executed and the rest of the conditional is skipped. 
<br>If none of the expressions are true and there is an `else` clause, the statements of the `else` clause get executed.
<br> You may have multiple `elif` statements and no `else` statments. 
<br> <span style="color:red"> That is, an `else` statement is **not** required.
    </span>
   

In [None]:
# Name and open the two files I require
accessions_start_a = open("accessions_start_a.txt", "w")
acession_start_b = open("accessions_start_b.txt", "w")
leftover_accessions = open("leftover_accessions.txt", "w")

accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a'):
        accessions_start_a.write(accession + "\n") # I need to include the new lines myself
    elif accession.startswith('b'):
        acession_start_b.write(accession + "\n") 
    else:
        leftover_accessions.write(accession + "\n")

<div class="alert alert-block alert-success">
    
> This is a reminder that the `if`, `elif` and `else` statements are all **directly below** each other

> ***elif*** can be used multiple times and your code will still have just two levels of indendation (one for the "for loop" and one for "if, elif and else")

> The code is also still very readable

<h2><center>while Loops</center></h2>


> A `while loop` repeats an instruction as long as a particular condition is true.

> As soon as the condition is no longer true, it exits the while loop (stops)

> E.g. throw me a piece of candy until there is no more candy in the packet.

> You can the condition set by the `while loop` them to determine when to exit a loop

> A  while statement provides ***flexibility*** that you don’t get with a "for statement". 

> But a `for loop` ends automatically when it has tested to condition for all the items in the loop

> A `while loop` may go on forever and freeze your computer, if you don't tell it when to stop

> when working with the while statement, you must perform ***three tasks***:
1. Create the environment/starting point for the condition (such as setting Sum to 0).
2. State the condition that must be satisfied within the while statement (such as Sum < 5).
3. Update the condition as needed to ensure that the loop eventually ends (such as adding Sum+=1 to the while code block).

In [None]:
Sum = 2


while Sum < 5:
    #if....... 
    print(Sum)
    Sum = Sum + 1


> why is this not printing 5?

because when the Sum variable = 5, the condition will read:

**while 5 < 5:**

**False**, because 5 is not < 5, 

So I need to exit this loop, and not execute anything that follows in the block.

The print statement is in the block which is not going to be printed


<h2><center>Complex conditions</center></h2> 


> You may also need to filter based on several conditions.

> Imagine we want to go through our list of accessions and print out only the ones that start with "a" and end with "3".

<div class="alert alert-block alert-success">

> **Boolean operators:** ***and***, ***or*** and ***not***


> Complex conditions using ***and***

> We can also use **and** to join up two conditions, to produce a complex condition that will be true if ***BOTH*** of the two conditions are ***true***

In [None]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a') and accession.endswith('3'):
        print(accession)

> Complex conditions using ***or***

> We can also use **or** to join up two conditions, to produce a complex condition that will be true if ***either*** of the two conditions are ***true***:

In [None]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a') or accession.startswith('b'):
        print(accession)


> Complex conditions using ***and*** & ***or***

In [None]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if (accession.startswith('a') or accession.startswith('b')) and accession.endswith('4'):
        print(accession)

<div class="alert alert-block alert-danger">

> Note the ***brackets*** around the first part. This makes sure the part after the ***and*** is a condition that must be met for both conditions within the brackets after the "if statement"



> Negating conditions with ***not*** 

In [None]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a') and not accession.endswith('6'):
        print(accession)


> Using ***continue*** and ***break***

In [None]:
for x in range(11):
    if x == 2:
        continue # skip 2 and go immediately to the next iteration
    if x == 7:
        break # quit the loop entirely at 7 & don't execute what follows
    print (x)

#This should start at 0, skip 2 and break at 7. 

In [None]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72', 'ck46', 'df01','ap93', 'tc77', 'op30', 'yl01', 'aa42', 'fj11', 'px55', 'am23', 'su87', 'ar64']
for accession in accs:
    if accession.startswith('a'):
        continue # skip
    if accession.startswith("p"):
        break # stop
        
    print(accession)

<div class="alert alert-block alert-info">
    
Writing a ***function*** that uses ***conditional tests***


> Let's write a function that returns **True** if the dna content is more than 0.65 and **False** if it is not

>  Recall that we start a function by first writing the ***logic*** and ***testing*** it

In [None]:
dna = "ATTATCTACTA"

length_dna = len(dna)
a_count = dna.upper().count('A')
t_count = dna.upper().count('T')
at_content = (a_count + t_count) / length_dna

if at_content > 0.65:
    print(True)
else:
    print(False)


> Now turn this into a function by ***naming*** it, adding the proper ***indendation*** and the ***return*** statement

In [None]:
def is_at_rich(dna):
    length = len(dna)
    a_count = dna.upper().count('A')
    t_count = dna.upper().count('T')
    at_content = (a_count + t_count) / length
    if at_content > 0.65:
        return True
    else:
        return False



> Now call the function which with **is_at_rich(**dna**)**

In [None]:
print(is_at_rich("ATTATCTACTA"))
print(is_at_rich("CGGCAGCGCT"))


> Or we can just make it more concise 


In [None]:
def is_at_rich(dna):
    length = len(dna)
    a_count = dna.upper().count('A')
    t_count = dna.upper().count('T')
    at_content = (a_count + t_count) / length
    return at_content < 0.65 # return True or False depending on whether the at_content > 0.65. 

print(is_at_rich("CGGCAGCGCT"))


    
<h2><center>Exercises</center></h1>



In the chapter_6 folder in the exercises download, you'll find a text file called data.csv, containing some made-up data for a number of genes. 
<br>Each line contains the following fields for a single gene in this order: species name, sequence, gene
name, expression level. 
<br>The fields are separated by commas (hence the name of the file – csv stands for Comma Separated Values). 
<br>Think of it as a representation of a table in a spreadsheet – each line is a row, and each field in a line is a column. 
<br>All the exercises for this chapter use the data read from this file.

<br>

**Exercise 1:** 
<br>_Print out the gene names for all genes belonging to **Drosophila melanogaster** or **Drosophila simulans**._

In [None]:
#gene name is in coloum 3

#open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")

open_file open("data.csv")

for line in open_file:
    line = line.strip()
    column = line.split(",")
    species = column[0]
    gene_name = column[2]
    if species == "Drosophila melanogaster" or species == "Drosophila simulans":
        print (species,gene_name)      

<br>

**Exercise 2:**
<br>**_Length range_**
<br>Print out the gene names for all genes between 90 and 110 bases long.


In [None]:
# sequences/bases are in coloum 2
#open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")
open_file open("data.csv")

for line in open_file:
    column = line.split(",")
    sequence = column[1]
    gene_name = column[2]
    sequence_length = len(sequence)
    if 90  < sequence_length < 110: 
#     if sequence_length  > 90  and sequence_length < 110: 
        print (gene_name, sequence_length)

    

<br>

**Exercise 3**
<br> **_AT content_**
<br>Print out the **gene names** for all genes with **AT content less than 0.5** and **expression levels greater than 200**.

<br>Example of file
<br>###########################################################
<br>Drosophila melanogaster, atatatcgcgtaattacga,	kdy647,	264
<br>###########################################################

In [None]:
open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")

def AT_content(dna):
    dna = dna.lower()
    sequence_length = len(dna)
    a_content = dna.count("a")
    t_content = dna.count("t")
    at_content = round(((a_content + t_content) / sequence_length), 2)
    return at_content

for line in open_file:
    line = line.strip()
    column = line.split(",")
    gene_names = column [2]
    sequence = column [1]
    expression = int(column[-1])
    if expression > 200 and AT_content(sequence) < 0.5:
        print (gene_names, expression, AT_content(sequence))

# my_dna_content = AT_content("aaattt")

# print(my_dna_content)


<br>

**Exercise 4** (alternative answer with **NO Function**)

<br>

In [1]:
#open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")


open_file open("data.csv")
for line in open_file:
    line = line.strip ()
    column = line.split(",")
    gene_names = column [2]
    sequence = column [1].lower()
    expression = int(column [-1])
    sequence_length = len(sequence)
    a_content = sequence.count("a")
    t_content = sequence.count("t")
    at_content = round( ((a_content + t_content) / sequence_length),2)
#     print(at_content, gene_names, expression)
    if at_content < 0.5 and expression > 200: 
        print (gene_names, expression, at_content)
 


# my_dna_content = AT_content("aaattt")

# print(my_dna_content)


SyntaxError: invalid syntax (<ipython-input-1-8875aea0864c>, line 4)

<br>

**Exercise 5**
<br> **_Complex condition_**
<br> Print out the gene names for all genes whose name begins with **"k"** or **"h" except**
<br> those belonging to **Drosophila melanogaster**.

In [None]:
#open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")

open_file open("data.csv")

for line in open_file:
    line = line.strip ()
    species = column[0]
    column = line.split(",")
    gene_names = column [2]
    sequence = column [1]
    expression = int(column [-1])
    sequence_length = len(sequence)
    if (gene_names.startswith("k") or gene_names.startswith("h")) and not species == "Drosophila melanogaster":
        print(species, gene_names)

<br>

**Exercise 6**
<br>**_High low medium_**
<br>For each gene, print out a message giving the gene name and say whether its **AT content** is **high** (greater than 0.65), **low** (less than 0.45) or **medium** (between 0.45 and 0.65).

In [None]:
#open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")

open_file open("data.csv")

def AT_content(dna):
    dna = dna.lower()
    sequence_length = len(dna)
    a_content = dna.count("a")
    t_content = dna.count("t")
    at_content = (a_content + t_content) / sequence_length
    return at_content

for line in open_file:
    line = line.strip()
    species = column[0]
    column = line.split(",")
    gene_names = column [2]
    sequence = column[1]
    expression = int(column [-1])
    if AT_content(sequence)> 0.65:
        print(gene_names, "AT_content is high")
    elif 0.45 < AT_content(sequence) < 0.65 :
        print(gene_names, "AT_content is medium")
    elif AT_content(sequence)< 0.65:
        print(gene_names, "AT_content is low")
        
        
# saying whether its AT content is high (greater than 0.65), 
#low (less than 0.45) or 
# medium (between 0.45 and 0.65).

<br>

**Exercise 6** (alternative answer)
<br>

In [None]:
# open_file = open("/home/tracey/Desktop/Studygroup/Python/Python_for_Biologists_exercises/data.csv")
open_file open("data.csv")

def AT_content(dna):
    dna = dna.lower()
    sequence_length = len(dna)
    a_content = dna.count("a")
    t_content = dna.count("t")
    at_content = (a_content + t_content) / sequence_length
    return at_content

for line in open_file:
    line = line.strip()
    species = column[0]
    column = line.split(",")
    gene_names = column [2]
    sequence = column[1]
    expression = int(column [-1])
    if AT_content(sequence)> 0.65:
        print(gene_names, "AT_content is high")
    elif AT_content(sequence) < 0.45:
        print(gene_names, "AT_content is low")
    else:
        print(gene_names, "AT_content is medium")
        
        
# saying whether its AT content is high (greater than 0.65), 
#low (less than 0.45) or 
# medium (between 0.45 and 0.65).