# Lecture Review 2-23-16

## Python Syntax

Unlike C, Perl, or Java, Python doesn't use curly brackets to signify code blocks. Python uses white space. It doesn't matter what white space delimiter you use, as long as you keep it consistent. You are not going to have fun if you mix tabs and spaces.

* Whitespace is used in `if` statments, `for`, and `while` loops, function definitions and other places.

In [1]:
if 10%2==0:
    print "Even"
else:
    print "Odd"

Even


* If you have nested code blocks, each block needs to be indented to the appropriate level

* This won't work

In [7]:
nums = [1,2,3,4]
for nu in nums:
if nu%2==0:
    print "Even"

IndentationError: expected an indented block (<ipython-input-7-f574e51fd60c>, line 3)

* This won't work either

In [8]:
nums = [1,2,3,4]
for nu in nums:
    if nu%2==0:
    print "Even"

IndentationError: expected an indented block (<ipython-input-8-5897152d9f6b>, line 4)

* When things are properly indented Python is happy

In [6]:
nums = [1,2,3,4]
for nu in nums:
    if nu%2==0:
        print "Even"

Even


## `if` Statements

* `if` statements are used to test whether an expression is true or not, and then act on it.

In [11]:
age=25
if age >= 18:
    print "You can vote"

You can vote


* `else` statements are used to in combination with `if` statements. They only execute if the `if` statement they are paired with is false.

In [12]:
age=17
if age >= 18:
    print "You can vote"
else:
    print "No voting for you"

No voting for you


* `elif` statements are are used in combination with `if` statements. They act as a second `if` statement that only gets executed if it is true, and the previous `if` and `elif` statements are false. `elif` stands for else-if.

In [14]:
age=10
if age >= 18 and age < 21:
    print "You can vote but not drink"
elif age >= 18 and age >= 21:
    print "You can vote and drink"
else:
    print "No voting or drinking"

No voting or drinking


In [15]:
age=18
if age >= 18 and age < 21:
    print "You can vote but not drink"
elif age >= 18 and age >= 21:
    print "You can vote and drink"
else:
    print "No voting or drinking"

You can vote but not drink


In [16]:
age=32
if age >= 18 and age < 21:
    print "You can vote but not drink"
elif age >= 18 and age >= 21:
    print "You can vote and drink"
else:
    print "No voting or drinking"

You can vote and drink


## `while` Loops

`while` loops are used to execute code until a condition is met. They aren't used as much in Python as their more popular cousins the `for` loop.

In [19]:
tasks = ['edit paper','do homework','sleep']
while len(tasks) > 0:
    task_to_do = tasks.pop(0)
    print task_to_do

edit paper
do homework
sleep


* `while` loops also have an `else` statement, which gets executed if the loop exits normally

In [20]:
tasks = ['edit paper','do homework','sleep']
while len(tasks) > 0:
    task_to_do = tasks.pop(0)
    print task_to_do
else:
    print "All done"

edit paper
do homework
sleep
All done


* So if the else statement is executed when a loop exits normally, how does a loop exit abnormally?

## Break and Continue

`break` and `continue` act on loops. `break` dumps you out of the loop entirely, while `continue` skips over the current iteration.

In [1]:
tasks = ['edit paper','do homework','sleep']
while len(tasks) > 0:
    task_to_do = tasks.pop(0)
    if task_to_do == 'do homework':
        print "Too tired"
        continue
    print task_to_do
else:
    print "All done"

edit paper
Too tired
sleep
All done


* While `continue` can be used to skip a single iteration, `break` can be used to completely jump out of the loop

In [2]:
tasks = ['edit paper','do homework','sleep']
while len(tasks) > 0:
    task_to_do = tasks.pop(0)
    if task_to_do == 'edit paper':
        print "Need to write paper first"
        break
    print task_to_do
else:
    print "All done"

Need to write paper first


## `for` loops

`for` loops are used to execute code a set number of times. The `for`...`in`... syntax is a common usage of `for` loops in Python

In [1]:
species = ['Loxodonta africana','Callithrix jacchus','Rattus norvegicus']
for animal in species:
    print animal

Loxodonta africana
Callithrix jacchus
Rattus norvegicus


* Just like `while` loops `for` loops have an optional `else` statement

In [4]:
species = ['Loxodonta africana','Callithrix jacchus','Rattus norvegicus']
for animal in species:
    print animal
else:
    print "No break statement"

Loxodonta africana
Callithrix jacchus
Rattus norvegicus
No break statement


* And just like `while` loops `for` loops can have `break` and `continue` statements. They act exactly the same way as they did in `while` loops

* The `for`...`in`... syntax isn't just limited to lists. You can also use it on dictionaries

In [6]:
species = {'Loxodonta africana':12000,'Callithrix jacchus':0.5}
for animal in species:
    print animal + '\t' + str(species[animal])

Loxodonta africana	12000
Callithrix jacchus	0.5


* Side note: You need the `str()` around the `species[animal]` because it is a numberic type, and Python won't concatenate a number with a string.

In [7]:
species = {'Loxodonta africana':12000,'Callithrix jacchus':0.5}
for animal in species:
    print animal + '\t' + species[animal]

TypeError: cannot concatenate 'str' and 'int' objects

## Nested loops

* You can put loops inside loops. If you want to do something to every line in a file, and want to do something to every element of every line, you could use a nested loop structure

In [2]:
numbers = [1,2,3]
for number in numbers:
    for number2 in numbers:
        print number*number2

1
2
3
2
4
6
3
6
9


## `range` function

Python also has a `range` function that you can use in `for` loops. If the results look odd, remember Python starts counting from 0.

In [11]:
for i in range(5):
    print i*2

0
2
4
6
8


* You can also give `range` an optional starting point

In [12]:
for i in range (2,5):
    print i*2

4
6
8


* `range` also takes an optional step argument. By default this is 1

In [13]:
for i in range(0,10,2):
    print i*2

0
4
8
12
16


* You *could* use `range` to iterate through a list while keeping count

In [16]:
species = ['Loxodonta africana','Callithrix jacchus','Rattus norvegicus']
for i in range(len(species)):
    print str(i) + '\t' + species[i]

0	Loxodonta africana
1	Callithrix jacchus
2	Rattus norvegicus


* This isn't very *Pythonic*. there is a better way to do it using `enumerate`. In Python `enumerate` goes over a list, and gives you the index *and* the element

In [17]:
species = ['Loxodonta africana','Callithrix jacchus','Rattus norvegicus']
for i, animal in enumerate(species):
    print str(i) + '\t' + animal

0	Loxodonta africana
1	Callithrix jacchus
2	Rattus norvegicus


## Functions

Functions are chunks of reusable code. If you ever find yourself copying and pasting code multiple times, ask yourself "Could I turn this into a function?"

The syntax for a function is pretty simple
```python
def function_name(arguments):
    code_to_execute
    return result
```

If we wanted to create a function that does exponentiation, we could write something like this:

In [18]:
def expon(base,exponent):
    return base**exponent

We now can use our `expon` function in our code

In [19]:
expon(2,2)

4

In [21]:
expon(4,3)

64

We could create a function that says hello

In [24]:
def greet(name):
    greeting = "Hello " + name
    return greeting

In [25]:
greet("Andrew")

'Hello Andrew'

If we are bored with just "Hello", we could refactor our function to take any greeting as an argument

In [28]:
def greet(greeting, name):
    greeting = greeting + " " + name
    return greeting

In [29]:
greet("Bonjour","Francois")

'Bonjour Francois'

## Polymorphism

Every value in Python has a type, and this type determines what actions can be performed on the value. 
This also means that the same opperation can have different results on different types. 
For example `+` on number types is addition. `+` on strings is concatenation. This is polymorphism.

This is also why `1+"frog"` doesn't work

In [31]:
1+"frog"

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Python checks the types and tries to determine what opperation to perform. Since we have both a string and an int, Python doesn't know and throws an exception.

When we get to Object Oriented Programming, and you start creating your own classes, we will talk about polymorphism some more, and you'll see how it works under the hood.

## "Real World" Examples 

### Covariate generator

```python
from __future__ import print_function
import sys

filename = sys.argv[1]

id_output = "ID"
gender_output = "gender"

with open(filename, 'r') as f:
    for line in f:
        id, gender = line.strip().split('\t')
        id = id.split('-')[-1]
        if gender == "MALE":
            gender = 0
        elif gender == "FEMALE":
            gender = 1
        id_output += "\t" + str(id)
        gender_output += "\t" + str(gender)

print(id_output)
print(gender_output)
```

### Gene Expression Matrix

```python
from __future__ import print_function
import sys
import glob

def create_header(files):
    header = 'ID\t'

    if len(files) == 0:
        print('No files found in the glob')
        sys.exit()

    for filename in files:
        sample_id = filename.split('-')[2]
        header = header + sample_id + '\t'
    header = header.rstrip()
    return header

def parse_genes(files):
    gene_data = {}
    gene_ids = []
    for counter, data in enumerate(files):
        genes = open(data, 'r').read()
        genes = genes.split('\n')
        for gene in genes:
            if not (gene.startswith('gene') or gene.startswith('?')) and gene != '':
                gene_id, value = gene.split('\t')
                orig_id = gene_id
                gene_id = gene_id.split('|')[0]
                if orig_id in gene_data:
                    gene_data[orig_id] = gene_data[orig_id] + '\t' + value
                elif gene_id in gene_data and counter!=0:
                    gene_data[gene_id] = gene_data[gene_id] + '\t' + value
                elif gene_id in gene_data and counter==0:
                    gene_data[orig_id] = value
                    gene_ids.append(orig_id)
                else:
                    gene_data[gene_id] = value
                    gene_ids.append(gene_id)

    return(gene_data, gene_ids)

#search argument needs to be quoted so it isn't expanded in the command line
def main():
    search = sys.argv[1]
    files = glob.glob(search)

    output = create_header(files)

    gene_data, gene_id = parse_genes(files)

    print(output)
    for gene in gene_id:
        print(gene + '\t' + gene_data[gene])

if __name__ == '__main__':
    main()
```