# ReCAP Python 

In this notebook you'll find a recap of the most important Python programming essentials. 

## Variable types, and assigning values to variables

Assign a value to a variable, like this:

In [None]:
answer = 42
type(answer)

Note that 'type()' is a built-in fuction. Python has a relatively small number of built-in functions, which inlude 'print', 'open()', etc. 

<a href=https://en.wikipedia.org/wiki/42_(number)#The_Hitchhiker.27s_Guide_to_the_Galaxy>42</a> , really....? 🙄

To verify that the value of variable '`answer`' is, indeed, 42:

In [None]:
answer

In [None]:
GC_perc = 36.5
type(GC_perc)

In [None]:
DNA = 'ATGCAG'
type(DNA)

In [None]:
DNA = u'AGGCAG'
type(DNA)

Ok, this last one was just for good measure. Note, in Python3 you don't need to explicitly convert text strings to unicode.

## Python operations
Python has a complete set of mathematical and logical operators, such as `+, -, *, **, /, //` and many more. Although these operations seem completely straightforward, they do have some context-dependent behaviour.

In [None]:
# let's create a few variables
my_int1 = 4
my_int2 = 5
my_float = 4.0
my_string = 'abc'


In [None]:
# addition
my_int1 + my_int2

In [None]:
# addition; What is different from the previous addition?
my_int2 + my_float

In [None]:
# multiplication
my_int1 * my_int2

In [None]:
# exponentiation
my_int1 ** my_int2

In [None]:
# division; note the unexpected Python2 specific behaviour!
my_int2 / my_int1

In [None]:
# division; 
my_int2 / my_float

In [None]:
# floor division
my_float1 = 4.0
my_int2 // my_float

In [None]:
# multiplication; string context
my_string * my_int1


## 'objects', such as variables, have built-in methods

The reason the number of built-in fuctions in Python is limited, is because the preferred way of doing things with object is to use object-specific methods. This has a few great advantages. Among the most important ones: if you are working with some kind of object, which can be a simple one, such as a variable, or something more complex, just ask the thing: "What can I do with you?"

In [None]:
DNA = 'ATGCAG'
dir(DNA)

In [None]:
DNA = 'ATGCAG'
DNA.lower()

In [None]:
DNA

In [None]:
DNA.count('A')

## Printing strings, variables


In [None]:
DNA = 'ATGCAGA'
GC_perc = 42.9
print DNA
print GC_perc

In [None]:
print 'The DNA sequence %s has gc content %f' % (DNA,GC_perc)

In [None]:
print 'The DNA sequence %s has gc content %.1f' % (DNA,GC_perc)

## Indexes and slicing of strings

In [None]:
DNA = 'ATGCAG'
DNA[0]

In [None]:
DNA[0:3]

In [None]:
DNA[2:]

In [None]:
DNA[::-1]

In [None]:
DNA[1] = 'U' # strings are 'immutable'!

## Lists

In [None]:
DNA = 'ATGCAC'
list(DNA)

In [None]:
DNA_list = list(DNA)
type(DNA_list)

In [None]:
DNA_list[1] = 'U' # This works! lists are 'mutable', i.e. can be changed in-place
DNA = ''.join(DNA_list) # join on empty string. '.join' is a string method
type(DNA)

In [None]:
DNA

In [None]:
codons = ['ATG','CAC']
type(codons)

In [None]:
codons[1]

In [None]:
codons.insert(1,'GGG') # again, lists are 'mutable'
codons

In [None]:
del(codons[1])
# This would also work:
# codons = codons[:1]+codons[2:]
# or
# codons.pop(1)
codons

In [None]:
codons[0] = 'AUG'
codons

A few words on tuples; tuples look like lists, behave sort of like lists, but are not lists because they are not mutable (i.e., can't be changed in-place)

In [None]:
codons = ('ATG','CAC')
type(codons)

In [None]:
codons[0] = 'AUG'

## Dictionaries
Of all the built-in variable types that Python provides (well, of the ones that you are likely to use on a regular basis), dictionaries are probably the most challenging to learn.
Important concept is the 'dictionary-like' lookup nature of the dictionary, that uses unique keywords, to retrieve values. 

In [None]:
codon_table = {'ATG':'M','CAC':'H'}
type(codon_table)

In [None]:
len(codon_table)

In [None]:
codon_table['ATG']

In [None]:
codon_table['TAG'] = '*'
codon_table

In [None]:
len(codon_table)

In [None]:
codon_table.keys()

In [None]:
codon_table.values()

In [None]:
codon_table['UUU'] = 'F'
len(codon_table)

## Logic and tests
True or False?

In [None]:
DNA = 'AAGT'
len(DNA)

In [None]:
len(DNA) < 4

In [None]:
if len(DNA) < 4:
    print 'DNA strand %s has fewer than 4 bases' % (DNA)

In [None]:
if len(DNA) < 4:
    print 'DNA strand %s has fewer than 4 bases' % (DNA)
else: 
    print 'DNA strand %s has 4 or more bases' % (DNA)

## Loops

In [None]:
DNA = 'ATGCAC'
for nuc in DNA:
    print nuc

In [None]:
for i in range(len(DNA)):
    print DNA[i]

In [None]:
for i in range(len(DNA)):
    print i,DNA[i]

In [None]:
i = 0
while i < len(DNA):
    print DNA[i]
    i += 1

In [None]:
for c in codon_table.keys():
    print "%s: %s" % (c, codon_table[c])

In [None]:
for c in sorted(codon_table.keys()):
    print "%s: %s" % (c, codon_table[c])

## Reading files

In [None]:
%%bash
ls -l

In [None]:
codon_table = dict()
len(codon_table)

In [None]:
codon_file = open("codon_table.txt", 'r') # creates a filehandle object

In [None]:
for line in codon_file:
    line = line.strip('\n')
    (codon,aa) = line.split('\t')
    codon_table[codon] = aa
    
codon_file.close()
codon_table

In [None]:
codon_file = open("codon_table.txt", 'r')
cf_RNA = open("RNA_codons.txt",'w')
for line in codon_file:
    line = line.strip('\n')
    (codon,aa) = line.split('\t')
    c = codon.replace('T','U') # note, codon is string: immutable, no in-place replacement!
    cf_RNA.write("%s\t%s\n" % (c, aa)) # write to file
    print "%s\t%s" % (c, aa) # print to scren
    
codon_file.close()
cf_RNA.close()

## Functions

In [None]:
def calc_GC_content(DNA):
    GC = DNA.count("G") + DNA.count("C")
    GC_content = 1.0 * GC/len(DNA) # most incomprehensible, crappy part of P2!
    return GC_content

In [None]:
def transcribe(DNA):
    return DNA.replace('T','U')

In [None]:
# main
DNA = "ATGCACTAG"
GC_content = calc_GC_content(DNA)
RNA = transcribe(DNA)

print '%.2f' % GC_content
print RNA

## Importing and using modules

In [None]:
import re
dir(re)

In [None]:
re.sub('(\wll)','all', 'hello, hello!', 1)

In [None]:
import random
random.random()

In [None]:
somelist = ['a','b','c','d','e']
random.choice(somelist)

Creating a random DNA string:

In [None]:
bases = ['A','C','G','T']
dna = ''
for i in range(40):
    dna = dna + random.choice(bases)
print dna

Using the `sys` module to capture additional parameters into a python script. Note the similarity to shell scripts, that use the `$1`, `$2` way of capturing parameters in a similar way.

In [None]:
import re
from sys import argv

pattern = argv[1]
filename = argv[2]

file = open(filename)
for line in file:
    if  re.search(pattern,line):
        print line,

# python grep.py ATG codon_table.txt
# ATG M