# Making decisions (if, else and elif)

One essential part of any programming language "condtional statements", more or less the ability to choose when and how to exectute a block of code. In Python, the `if` statement is responsible fo this:

In [1]:
pval = 0.02
if pval < 0.05:
    print("It's signficant")

It's signficant


You can also use `else` inside a block to do something if the conditoin is not true

In [2]:
pval = 0.07
if pval < 0.05:
    print("It's signficant")
else:
    print("It's not significant")

It's not significant


As before, software carpentry has a [nice lesson on conditionals](https://swcarpentry.github.io/python-novice-gapminder/17-conditionals/) so we'll focus on biological examples here. 

Conditional statements are often used inside a for-loop to handle treat each element differently. Let's use the sequencing data as an example. The ids of these sequences are random letters, so let's imagine we were interested in only those reads with an ID starting with lower cases letters. We can use the `islower()` method to test that the first letter of the ID lower case. then print only those IDs

In [3]:
from Bio import SeqIO
reads = SeqIO.parse("../first_task/reads.fq", "fastq")
for seq in reads:
    first_letter = seq.id[0]
    if first_letter.islower():
        print(seq.id)

agoPnkEt
vMOiXKcw
vIBqXDxb
igcJNbKn
hbVLHucP


Using `else` we could print all the IDs, noting whether they start with lower case

In [4]:
from Bio import SeqIO
reads = SeqIO.parse("../first_task/reads.fq", "fastq")
for seq in reads:
    first_letter = seq.id[0]
    if first_letter.islower():
        print(seq.id, "starts with lowercase")
    else:
        print(seq.id, "starts with uppercase")


agoPnkEt starts with lowercase
vMOiXKcw starts with lowercase
LISzqTNF starts with uppercase
vIBqXDxb starts with lowercase
QevuyjfB starts with uppercase
WdkVXRjQ starts with uppercase
igcJNbKn starts with lowercase
ZFbfxWsl starts with uppercase
hbVLHucP starts with lowercase
JpRwMsVW starts with uppercase


## Counting a condition

One very useful thing way to use conditionals is to count the number of times something occurs. In Epichloe, we are interested in the GC-content of sequences, so how might we find the GC-content of the reference sequence in this repo? A SequenceRecord object is iterable (i.e. we cn use a for loop to traverse it) so we can loop over the sequence and add to a counter every time we see a "C" or "G" base.

In [5]:
ref = SeqIO.read("../first_task/reference.fna", "fasta")
# initialize the counter before the loop so there is someting to refer to later
counter = 0

for base in ref:
    if base == "C":
        counter = counter + 1
    elif base == "G": #elif equals else: if, see the SW lesson
        counter = counter + 1

print(counter, "total G/C bases(", counter/len(ref) * 100, "% )")            
    

72 total G/C bases( 72.0 % )


### extra for experts

The code above works, but there are a few tricks we can use to make it a bit nicer. For one, there is a special operator `+=` for incrementing (that is adding to) to a number. There is also a nicer way to format strings that the print statements at the end. If you areally want to go deep, can you re-implement this with collections.Counter?

In [6]:
counter = 0

for base in ref:
    if base == "C":
        counter  +=1 
    elif base == "G": 
        counter  += 1

print(f'{counter} total G/C bases ({counter/len(ref) * 100}%)')


72 total G/C bases (72.0%)


## Finding the index of a condition

Sometimes you want to know _which_ elements in a list or sequence meet some criterion, so you can match them up to indices in another list or object. Imagine you had two lists, one with lables for four samples and another with the sizes of those samples _in the same order_. 

In [7]:
samples = ["A", "B", "C", "D"]
sizes =   [100, 125, 300,  40]

How might you select those samples with size > 100? One calssic way is to intialize a counter that you increment through every step of the for loop:

In [8]:
i = 0
for sample_size in sizes:
    if sample_size > 100:
        print(samples[i], ":", sample_size)
    i += 1

B : 125
C : 300


Theat works, but there are a few more "pythonic" ways to get the same result. `enumerate` gives you the counter for free. 

In [9]:
for pair in enumerate(sizes):
    print(pair)

(0, 100)
(1, 125)
(2, 300)
(3, 40)


Because iterating over the generator produces by `enumerate`  give sus tuples of length two, we can assign each element a name in the for loop using the comma to seperate them

In [10]:
for index, sample_size in enumerate(sizes):
    print(index, ":", sample_size)

0 : 100
1 : 125
2 : 300
3 : 40


Finally, we can use the index to look up the sample name 

In [11]:
for index,sample_size in enumerate(sizes):
    if sample_size > 100:
        print(samples[index], ":", sample_size)


B : 125
C : 300


### Skipping the index 

The even more pythonic thing to do is never keep track of the index in the first place. The function `zip` takes take iterables and "zips" them up 

In [12]:
for pair in zip(samples, sizes):
    print(pair)

('A', 100)
('B', 125)
('C', 300)
('D', 40)


meaning you can print only the large samples easily

In [13]:
for sample, sample_size in zip(samples, sizes):
    if sample_size > 100:
        print(sample, ":", sample_size)

B : 125
C : 300
