[![Py4Life](https://raw.githubusercontent.com/Py4Life/TAU2015/gh-pages/img/Py4Life-logo-small.png)](http://py4life.github.io/TAU2015/)
## Lecture 3 - 16.3.2016     -     Dictionary & Functions
### Last update: 18.2.2016
### Tel-Aviv University / 0411-3122 / Spring 2016

## Previously on Py4Life

- Python
- Variables
- Operators
- strings
- lists
- Flow control: conditions (if,elif,else) and loops (for,while)

## In today's episode
- Dictionaries
- Functions
- Arguments
- Scopes

## Dictionaries
### Reminder - Lists
List is a data structure used to store collections of elements (int, float, str etc.) in an <u>__ordered__</u> way.

In [1]:
organisms = ['Pan troglodytes', 'Gallus gallus', 'Xenopus laevis', 'Vipera palaestinae']

We access elements of lists by using their _index_:

In [2]:
print(organisms[0])
print(organisms[2])

Pan troglodytes
Xenopus laevis


We can iterate over the elements of a list using _for loop_:

In [4]:
for elem in organisms:
    print(elem)

Pan troglodytes
Gallus gallus
Xenopus laevis
Vipera palaestinae


__Dictionaries__ are another data structure used to store collections of elements, only this time they can be accessed through a _key_. Keys can be any immutable object - a string, an integer, float and so on. Each key is connected to a _value_.

### Format:
- We define a dictionary using `{ }` brackets.
- Each element in the dictionary is a couple of `key : value`
- We seperate between the elements using `,`

### Defining dictionaries:

In [5]:
organisms_classes = {'Pan troglodytes': 'Mammalia', 'Gallus gallus': 'Aves', 'Xenopus laevis': 'Amphibia', 'Vipera palaestinae': 'Reptilia'}

In this dictionary, the _keys_ are the organisms and the _values_ are the class of each organism. Both are of type `str`.

Another example would be a dictionary representing the number of observations of various species:

In [6]:
observations = {'Equus zebra': 143,
                'Hippopotamus amphibius': 27,
                'Giraffa camelopardalis': 71,
                'Panthera leo': 112}

Here, the keys are of type `str` and the values are of type `int`. Any other combination could be used.
![safari](https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTL-7qevLJk3T6miuV7zzlxGqH0WFyEv8ejFqw7444QMEElRvkblg)

### Accessing dictionary records
Accessing a dictionary record is similar to what we did with lists, only this time we'll call a <u>_key_</u> instead of an _index_:

In [8]:
print(organisms_classes['Pan troglodytes'])
print(organisms_classes['Gallus gallus'])

Mammalia
Aves


We can get a list of the doctopnary's keys or values:

In [50]:
print(organisms_classes.keys())
print(organisms_classes.values())

dict_keys(['Xenopus laevis', 'Vipera palaestinae', 'Pan troglodytes', 'Gallus gallus', 'Danio rerio'])
dict_values(['Amphibia', 'Reptilia', 'Mammals', 'Aves', 'Actinopterygii'])


### Changing and adding records
We can change the dictionary by simply assigning a new value to a key.

In [9]:
organisms_classes['Pan troglodytes'] = 'Mammals'
print(organisms_classes['Pan troglodytes'])

Mammals


Similarly, we can use this syntax to add new records: 

In [10]:
organisms_classes['Danio rerio'] = 'Actinopterygii'
print(organisms_classes['Danio rerio'])

Actinopterygii


__Note__: A dictionary may not contain multiple records with the same _key_, but it may contain many keys with the same _value_.

### Looping throgh dictionaries
Remember the __for__ loop and how we used it to loop on lists?

In [11]:
for organism in organisms:
    print(organism)

Pan troglodytes
Gallus gallus
Xenopus laevis
Vipera palaestinae


Well, it also works on dictionaries! The for loop simply itterates over the _keys_ of the dictionary.

In [13]:
organisms_classes = {'Pan troglodytes': 'Mammals', 'Gallus gallus': 'Aves', 'Xenopus laevis': 'Amphibia', 'Vipera palaestinae': 'Reptilia', 'Danio rerio' : 'Actinopterygii'}
for organism in organisms_classes:
    print(organism, 'belongs to the', organisms_classes[organism], 'class.')

Xenopus laevis belongs to the Amphibia class.
Vipera palaestinae belongs to the Reptilia class.
Pan troglodytes belongs to the Mammals class.
Gallus gallus belongs to the Aves class.
Danio rerio belongs to the Actinopterygii class.


Notice that dictionary items don't keep their original order.

We can even change values while looping:

In [14]:
for animal in observations:
    if observations[animal] > 50:
        observations[animal] = True
    else:
        observations[animal] = False
print(observations)

{'Panthera leo': True, 'Giraffa camelopardalis': True, 'Equus zebra': True, 'Hippopotamus amphibius': False}


### Is it in the dictionary?
We can check if a __key__ is in the dictionary using an _if_ statement:

In [15]:
'Vipera palaestinae' in organisms_classes

True

In [16]:
'Bos taurus' in organisms_classes

False

In [17]:
new_organism = ['Vipera palaestinae', 'Bos taurus']
for organism in new_organism:
    if organism in organisms_classes:
        print(organism, 'belongs to the', organisms_classes[organism], 'class.')
    else:
        print(organism, 'not found in dictionary.')

Vipera palaestinae belongs to the Reptilia class.
Bos taurus not found in dictionary.


## <span style="color:blue">Class exercise 3A</span>

1) Create a dictionary with the keys 'Name','Address' and 'Phone', insert your details as values, and use the dictionary to print a sentence such as "My name is James Watson, I live in Cambridge and my phone number is 12345678"

In [19]:
# Create dictionary
details_dict = {'Name': 'James Watson', 'Address': 'Cambridge', 'Phone': '12345678'}

# print sentence
print("My name is",details_dict['Name'],", I live in",details_dict['Address'],", my phone number is",details_dict['Phone'])

My name is James Watson , I live in Cambridge , my phone number is 12345678


2) Given that the code below is a dictionary (named codon_table) where the keys represent codons and the values are the corresponding amino acids. Use the dictionary to translate the codons in the list (named seq_list) and print out the resulting sequence of amino acids.
* Hint: to print without creating a newline, use `print("your print", end='')`

In [20]:
# Create codons dictionary
bases = ['t', 'c', 'a', 'g']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))

# Sequence list
seq_list = ["atg","caa","ggc","ata","tca","tgg","cga","agg","cct","taa"]

# iterate on list and translate
for codon in seq_list:
    print(codon_table[codon], end='')

MQGISWRRP*

## What's a function?
### In mathematics:
A function is like a _machine_, that (usually) takes a number, performs some mathematical process and returns another number.  
For example, the function $f(x) = 2x + 6$  
When the function takes 3 (that is, x = 3), it returns 2*3 + 6 = 12  
And in 'pythonic':

In [21]:
x = 3
y = 2*x + 6
print(y)

12


### In computer science
A function is a piece of code that performs some process. Like the mathematical concept, a function receives _inputs_ and returns _outputs_.  
We _define_ functions with the __def__ command.  
The general syntax is:  

In [22]:
def function_name(input1, input2, input3,...):
    # some processes
    .
    .
    .
    return output

SyntaxError: invalid syntax (<ipython-input-22-e73f96cd0bf6>, line 1)

In [23]:
def linear1(x):
    y = 2*x + 6
    return y

Once a function is defined, we can call it whenever we need it (i.e. multiple times), with different inputs.

In [24]:
result1 = linear1(3)
print(result1)

12


In [25]:
result2 = linear1(7)
print(result2)

20


In [27]:
for i in range(8):
    print(linear1(i))

6
8
10
12
14
16
18
20


A function may have more than one input, and they can also be other types of variables.  
For example, the following function receives a __list__ of sequences and concatenates a given sequence __string__ to each sequence in the list. It then returns the new list.

In [28]:
def concat_to_sequences(sequence_list, sequence_to_concat):
    new_list = []
    for seq in sequence_list:
        new_list.append(seq + sequence_to_concat)
    return new_list

In [29]:
my_sequences = ['AGTTAGAGTTA', 'TTACCAGTG', 'GGCAACTTTAGG']
new_sequences = concat_to_sequences(my_sequences, 'GGG')
print(my_sequences)
print(new_sequences)

['AGTTAGAGTTA', 'TTACCAGTG', 'GGCAACTTTAGG']
['AGTTAGAGTTAGGG', 'TTACCAGTGGGG', 'GGCAACTTTAGGGGG']


The inputs of a function are also called __Arguments__ or formal variables.

### Why do we need functions?
So why bother? Can't we just write code as we did so far and avoid all that functions mess?  
Functions are good for (at least) three reasons:
* <u>Prevent code duplication</u> - if we perform the same process multiple times, we don't have to write it again every time. We just call the function, thereby making the code shorter and more readable and avoid errors.
* <u>Modularity</u> - Your code can easily be separated to small components, which can be reused and recombined.
* <u>Abstraction</u> - separating a complex task into smaller and more simple tasks.

### A biological example

Now, let's use some of the stuff we've learned to write a function that finds the reverse complement of a given sequence. Let's start by finding the complement.

In [30]:
def complement(sequence):
    transcript_dict = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    complement = ''
    for base in sequence:
        complement += transcript_dict[base]
    return complement

In [31]:
my_dna = 'ACGCTATTAGAGGGCGAGAAGCTAGAGGA'
my_complement = complement(my_dna)
print(my_complement)

TGCGATAATCTCCCGCTCTTCGATCTCCT


Now, let's write another function, that reverses a given sequences.

In [32]:
def reverse_sequence(sequence):
    reversed_seq = ''
    seq_as_list = list(sequence)
    for base in reversed(seq_as_list):
        reversed_seq += base
    return reversed_seq

In [33]:
my_reverse_complement = reverse_sequence(my_complement)
print(my_reverse_complement)

TCCTCTAGCTTCTCGCCCTCTAATAGCGT


We can call functions _from within_ a function, thereby wrapping the two functions we have in a third function.

In [34]:
def reverse_complement(sequence):
    complement_seq = complement(sequence)
    reverse_complement = reverse_sequence(complement_seq)
    return reverse_complement

In [35]:
print(reverse_complement(my_dna))

TCCTCTAGCTTCTCGCCCTCTAATAGCGT


Fuctions don't __have__ to return anything. Sometimes they just print stuff to the screen or to a file (next lesson). For example, we can take the function we created above and simply replace 'return' with 'print':

In [36]:
def print_reverse_complement(sequence):
    complement_seq = complement(sequence)
    reverse_complement = reverse_sequence(complement_seq)
    print(reverse_complement)

In [37]:
a = print_reverse_complement(my_dna)
print(a)

TCCTCTAGCTTCTCGCCCTCTAATAGCGT
None


So, what's the difference between __return__ and __print__???  
As the names suggest, while __print__ just prints the output of the function, __return__ returns a value that can be stored within a variable. The difference is especially noticable when the output is not a string (e.g. list, dictionary etc). Even if the output is a string, __retun__ let's you further manipulate the output, while __print__ does not. 

In [38]:
my_reverse_complement = reverse_complement(my_dna)
final_sequence = "ATG" + my_reverse_complement + "TAA"
print(final_sequence)

ATGTCCTCTAGCTTCTCGCCCTCTAATAGCGTTAA


### Documenting your functions
It is considered good practice to add documentation to functions you write - what do they do, what's their input and output etc. It becomes very useful once you have lots of code that you want to reuse. If you document your functions, you won't have to read the whole code when you need them again.  
Documenting functions is done by adding a '_docstring_' right under the definition line. It is enclosed by """. For example:

In [39]:
def reverse_complement(sequence):
    """
    Receives a string of DNA sequence and returns a string of it's reverse complement
    """
    complement_seq = complement(sequence)
    reverse_complement = reverse_sequence(complement_seq)
    return reverse_complement

You can easily access the documentation of a function using the `help()` command.

In [40]:
help(reverse_complement)

Help on function reverse_complement in module __main__:

reverse_complement(sequence)
    Receives a string of DNA sequence and returns a string of it's reverse complement



In [41]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



## <span style="color:blue">Class exercise 3B</span>

1) Define a function that receives two sequences strings and __returns__ (not prints) the first five bases of the longer sequence. Test your function on the given sequences. Document your function.
* Hint: recall the _len()_ function and string slicing.

In [42]:
# define function
def first_5_longer_sequence(seq1,seq2):
    if len(seq1) > len(seq2):
        return seq1[:5]
    else:
        return seq2[:5]
    
# Test function
sequence1 = "aggtctcggatataggcgcgatattta"
sequence2 = "ttaagccacgcttcggatta"
first_5 = first_5_longer_sequence(sequence1, sequence2)
print(first_5)

aggtc


2) Define a function that receives a sequence string and __returns__ a __list__ of the 1st, 3rd, 5th, 7th bases etc. Test your function on the given sequence.

In [43]:
# define function
def odd_bases(seq):
    odd_bases_list = []
    for i in range(len(seq)):
        if i % 2 == 0:
            odd_bases_list.append(seq[i])    
    return odd_bases_list

# or another option
def odd_bases(seq):
    odd_bases_list = list(seq[::2])   
    return odd_bases_list

# Test function
odd_bases_list = odd_bases("aggtctcggatataggcgcgatattta")
print(odd_bases_list)

['a', 'g', 'c', 'c', 'g', 't', 't', 'g', 'c', 'c', 'a', 'a', 't', 'a']


### Built-in functions

In fact, we've used functions before, without defining them first. For example: print(), type(), int(), len() etc. These functions are provided by the courtesy of Python developers. It is strongly adviced not to overwrite built-in functions with your own functions. That is, don't do:

In [None]:
def len(lst):
    .
    .
    .

just use another name...  
We can acquire more functions written by others by __importing__ them into our code. We'll do that on the next lesson.

## Scopes

Assume we have the following function, that calculates the hypotenuse (יתר) given two sides of a right triangle. (Remember Pythagoras' theorem?)

In [44]:
def pythagoras(a,b):
    hypo_square = a**2 + b**2
    hypo = hypo_square**0.5

And now we want to run our function on the sides _a_ = 3 and _b_ = 5. So we do:

In [45]:
pythagoras(3,5)
print(hypo)

NameError: name 'hypo' is not defined

__What happened to our result???__  
The answer is _Scope_!  
The variable _hypo_ 'lives' only as long as the function is running. In other words, it exists only withing the _scope_ of the function, and so do _a, b_ and _hypo_square_!  
If we try to print hypo from _within_ the function:

In [46]:
def pythagoras(a,b):
    hypo_square = a**2 + b**2
    hypo = hypo_square**0.5
    print(hypo)
pythagoras(3,5)

5.830951894845301


Or even better, we can use the __return__ statement to get the result. Like this:

In [47]:
def pythagoras(a,b):
    hypo_square = a**2 + b**2
    hypo = hypo_square**0.5
    return(hypo)

result = pythagoras(3,5)
print(result)

5.830951894845301


For visualizing, you can try:  
http://pythontutor.com/visualize.html#mode=edit  
make sure you choos: `render all objects on the heap` in the middle selection window

## Fin
This notebook is part of the _Python Programming for Life Sciences Graduate Students_ course given in Tel-Aviv University, Spring 2016.

The notebook was written using [Python](http://pytho.org/) 3.5.1 and [IPython](http://ipython.org/) 2.1.0 (download from [ANACONDA](https://www.continuum.io/downloads)).

The code is available at https://github.com//Py4Life/TAU2016/blob/master/lecture3.ipynb.

The notebook can be viewed online at http://nbviewer.ipython.org//Py4Life/TAU2016/Py4Life/blob/master/lecture3.ipynb.

The notebook is also available as a PDF at https://github.com//Py4Life/TAU2016/blob/master/lecture3.pdf?raw=true.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)