# Tutorial 3: Modules, files and dictionaries

## Returning values from functions

As well as taking input arguments, functions can also **return** information to the part of the program from which they were called.

Test the `add` function below and try out some modified versions:

In [1]:
def add(num1, num2): 
    num3 = num1 + num2
    return num3 

num = add(3, 4)
print(num)

7


## Modules

Modules, or libraries, are common to most programming languages, including Perl, C++ and Java.

Modules provide a set of code to provide particular functions that can be included in your own code. They are essentially programs with functions that can be called from your own program.

Python provides a large library of modules, including Biopython and PyCogent, which are imported with the `import` command.

The most commonly used is probably the `sys` module which contains system-specific functionality. This is required for functions that are often included by default in other languages.


## Command line arguments

Often when writing a Python script you want to include the option to read values in to the script (command line arguments). In many languages this is a built in function but with Python you need to import the sys module.

	import sys

Arguments can then be included when you run the script. For example, if the script is called test.py:

	python test.py arg1 arg2 arg3

To read them in to the script the sys.argv function is used:

	import sys

	a1 = sys.argv[1]
	a2 = sys.argv[2]
	a3 = sys.argv[3]

Now the variable a1 has the value of arg1 etc.
It is possible to avoid having to write sys each time the function is called by explicitly importing that function.

	from sys import argv

Or to import all functions:

	from sys import *

These options work for all modules and using either of the above the rest of the code is now simply:

	a1 = argv[1]
	a2 = argv[2]
	a3 = argv[3]

## Exercise 1

Write a script that takes 2 numbers from the command line, multiplies them together and prints the result. 

The script should pass the numbers to a function `multiply` and print out the returned result.

## File handling

Basic file handling in Python is very straightforward.

To write to a file:

	out_file = open("test.txt","w") # "w" for "write"
	out_file.write("This Text is going to out file\nSome more text\n") 
	out_file.close() 

To read a file: 

	in_file = open("test.txt","r")   # “r” for "read"
	text = in_file.read() 
	in_file.close() 
	print(text)
    
To file can be read one line at a time using a while loop:

	in_file = open(filename) # "r" is optional as it is the default behaviour 
	in_line = in_file.readline() 
	while in_line != "":
		in_line = in_file.readline() 
		print(in_line)
	in_file.close()

The `while` loop will continue until the end of the file is reached and the `in_line` string is empty. A blank line will not be regarded as an empty string as it will contain the newline character so the break will only occur at the end of the file.
The `readline()` command reads the file one line at a time but an alternative is to read the entire file contents with `read`:

	in_file = open(filename) 
	contents = in_file.read() 
	print(contents)

	in_file.close() 

The file contents can also be read into a list and iterated over in a for loop:

	in_file = open(filename) 
	for line in in_file .readlines(): 
		print(line) 
	in_file.close() 

It is also possible to treat the file as a list so you don't need to use readlines() inside a for loop, you just iterate over the file: 

	in_file = open(filename) 
	for line in in_file: 
		print(line) 
	in_file.close()

Python provides a string method called `strip()` which will remove whitespace, including newlines, from both ends of a string. It also has variants which can strip one end only called `rstrip` and `lstrip` too. The `rstrip` method can be used to remove newlines:

	in_file = open(filename) 
	for line in in_file: 
		strip_line = line.rstrip()
		print(strip_line) 
	in_file.close()


Or more simply:

	in_file = open(filename) 
	for line in in_file: 
		print(line.rstrip())
	in_file.close()


NOTE: An alternative to `rstrip` is `line = line[:-1]`


## Exercise 2

Write a script that opens a text file and prints the contents to screen.

Try using the various options to read the contents of the file.

Modify the script to write the contents of a file to a second file. Do this twice, the first time as an exact copy and the second removing all of the newlines.

You can use the http://teaching.bc.ic.ac.uk/msc/ipython-files/exercisefiles/entamoeba.txt file for this exercise.


## Dictionaries

Dictionaries are like lists but instead of having numbers as their index they can have anything as an index (called a **key**) which is then associated with the array element (called the **value**).

Dictionaries associate a key with a value, for example, associating codons with their amino acids:

	'ttt' => 'F'
	'tta' => 'L'
	etc 

Dictionaries enable this type of data to be stored and handled.

A dictionary is **created** in a similar manner to a list, but with key/value pairs and using curly braces:

	 codons = {'ttt':'F', 'tta':'L', 'gga':'G'}    # Note curly braces

Or items can be **added** directly:

	 codons['aac'] = 'N'    # Note square braces

A key can be **searched** for:

	if codons.has_key('aac'):

All keys or values can be **retrieved** and converted to a list:

	keys = list(codons.keys())
	keys.sort() 
	for x in keys: 
		print(x)
		print(codons[x])  # Prints the value for this key

Or:

    values = list(codons.values())
	for x in values: 
		print(x)


You can use `del` to **remove** a key/value pair from a dictionary. 

	del codons['aac']


In [2]:
codons = {'ttt':'F', 'tta':'L', 'gga':'G'}

keys = list(codons.keys())
keys.sort() 
for x in keys: 
    print(x)
    print(codons[x])

gga
G
tta
L
ttt
F


## Exercise 3

Write a program that creates a dictionary to expand a three-letter abbreviation for the months of the year. The key for each entry should be the abbreviation and the value the full name of the month. For example:

	Jan -> January
	Feb -> February
	etc …

Test your dictionary with a for loop that iterates over the keys and prints out the key and value.

Modify your program to convert a string of the form "25th Jan 2016" to the form "25th January 2016".


## Try/Except

Python provides a method to trap unexpected behaviours to prevent the script crashing. For example:

	print("Type -1 to exit")
	number = 1 
	while number != -1: 
		number = int(raw_input("Enter a number: ")) 
		print("You entered: ", number)

The program will prompt for a number and  print it out. It will continue prompting until the exit condition is reached.

It takes a string as input and converts it to an integer.

There is a problem with the previous code in that if text other than a number was entered, for example “2d”, the script would crash. 

This can be prevented using try/except:

    print("Type -1 to exit") 
	number = 1 
	while number != -1: 
		try: 
			number = int(raw_input("Enter a number: ")) 
			print("You entered: ", number) 
		except ValueError: 
			print("That was not a number.")

Now when something like “2d” is entered it prints “That was not a number” and continues without exiting.

Multiple errors can be caught using more than one except if necessary.

Test the try/catch code below:

In [None]:
print("Type -1 to exit") 
number = 1 
while number != -1: 
    try: 
        number = int(input("Enter a number: ")) 
        print("You entered: ",number) 
    except ValueError: 
        print("That was not a number.")

## Exercise 4

The objective of this exercise is to write your own sequence translation script.
There are two files required to complete this exercise:

1) A sample nucleotide sequence in FASTA format:

http://teaching.bc.ic.ac.uk/msc/ipython-files/exercisefiles/sequence.txt

2) A file of codon translations:

http://teaching.bc.ic.ac.uk/msc/ipython-files/exercisefiles/codons.txt

The format of the codons file is the codon on one line, the amino acid on the next:

    ttt
    F
    ttc
    F
    tta
    L
    
etc
   
Complete the script below to translate the sequence in the fasta file **in the first forward reading frame only**. We will use the BioPython function to load the FASTA file, but you should write your own code to translate the DNA sequence.


In [None]:
from Bio import SeqIO

# Load the DNA seqence from "sequence.fasta" using BioPython
fname = "sequence.txt"
dna = str(SeqIO.read(fname, "fasta").seq)
print(dna)

# Initialise a dictionary to store codons
codons = {}

# Read in the codons and amino acids from "codons.txt" and store in 
# the dictionary.

# Initialise your translation
trans = ""

# Work through the sequence with step size of 3, taking 3 nucleotides 
# each time
for count in range(0, len(dna), 3):
    # Take the next codon and append its translation to trans

# Print out your translation


## Further reading

From Think Python 2nd edition by Allen B. Downey

[Chapter 11](http://www.greenteapress.com/thinkpython2/html/thinkpython2012.html): all about dictionaries.

[Chapter 14](http://www.greenteapress.com/thinkpython2/html/thinkpython2015.html): all about files.




## Further exercise (optional)

The solution to exercise 4 only translates the DNA sequence in 1 frame whereas ideally it should translate in all 6 frames.

*Modify your script* to translate the sequence in all 6 reading frames. You may find it beneficial to use a function in this version.

When you reverse the sequence, you need to complement it and one way to achieve this is by using an intermediate:

Convert 'a' to a temp letter first, 'x', then convert 't' to 'a'. 
Now can convert 'x' to 't'. 

Without the intermediate 'x', any 'a' converted to 't' would just be converted back again.


In [None]:
seq = 'attcg'
print("original    :", seq)
seq = seq.replace('a', 'x') 
seq = seq.replace('t', 'a') 
seq = seq.replace('x', 't')
print("after a<->t :", seq)

Now do the same for 'c' and 'g'