## Save your Exercise notebook from last week
use the rename command in the File menu


## Update your Course Materials!

#### For Windows: 
Open the Git Shell icon (<b>not the blue one</b>).
<img src="files/images/gitshell.png"/>
Type:
<code>
cd cbs-python
git checkout -f master
git pull origin master
</code>
#### For MAC and Linux:
Open a terminal. Navigate to the course directory (Whereever you placed it):
<br><code>cd ~/Documents/Courses/cbs-python</code>
<br>Now update the folder using<br>
<code>git checkout -f master
git pull origin master</code>
<br><br><br><br><br><br>

# Data types in python

<div class="topics" style="padding-left: 30px;padding-top: 60px;">
        This lecture will cover:
             <ul>
                <li>Numbers</li>
                <li>Booleans</li>
                <li>Sequences</li>
                <li>Dictionaries</li>
            </ul>
</div>

## Numbers
Like most other computer languages Python has several types of numbers. Number data types store numeric values. They are immutable data types, which means that changing the value of a number data type results in a newly created object.


In [None]:
A = 10
B = 2

<ul style="width: 600px;">
    <li><b>integers:</b> often called just ints, are positive or negative whole numbers with no decimal point. </li>
    <li><b>long:</b> are integers of unlimited size, written like integers and followed by an uppercase or lowercase L</li>
    <li><b>floats:</b> represent real numbers and are written with a decimal point dividing the integer and fractional parts. Floats may also be in scientific notation, with E or e indicating the power of 10 (2.5E2 = 2.5 x 10<sup>2</sup> = 250)</li>
    <li><b>complex:</b> are of the form <cb>a + bJ</cb>, where a and b are floats and J is an imaginary number. Complex numbers are not used much in Python programming.</li>
</ul>


In [None]:

A = 10      # int
B = 20131L  # long
C = 3.14    # float
D = 1 + 3.14j   # complex


## Arithmetics

Python supports all the standard arithmetical operations on numerical types, and mostly uses a similar syntax to several other computer languages:


In [None]:
x = 3.14159
y = 2.71828

x + y # addition
x - y # subtraction
x * y # multiplication
x / y # division
x // y # floored division
x % y # modulus - remainder of x/y
x ** y # exponentiation
pow(x, y) # another way to do exponentiation

print ((x + 1) - y) * 4

You can mix (some) types in arithmetic expressions and python will apply rules as to the type of the result

In [None]:
13 + 5.0

You can force python to use a particular type by casting an expression explicitly, using helpfully named functions: float, int, str etc.

In [None]:
float(3) + float(13)

In [None]:
int(3.14159) + 1

Division in python sometimes trips up new (and experienced!) programmers. If you divide 2 integers you will only get an integer result. If you want a floating point results you should explicitly cast at least one of the arguments to a float.

In [None]:
print "3/4:", 3/4

In [None]:
print "3.0/4:", float(3)/4

There are a few shortcut assignment statements to make modifying variables directly faster to type

In [None]:
x = 3
x += 1 # equivalent to x = x + 1
print x

In [None]:
y = 10
y *= x # equivalent to y = y * x
print y

<br><br><br><img src="../pix/play2.jpg">

Calculate the Matthews correlation coefficient (MCC) for following kinase predictions:<br>
True positives (TP) = 184 <br>
True negatives (TN) = 161 <br>
False positives (FP) = 5 <br>
False negatives (FN) = 13 <br>
<img src="../pix/mcc2.png" width=250px>


## Booleans

Boolean values represent truth or falsehood, as used in logical operations, for example. Not surprisingly, there are only two values, and in Python they are called True and False.

In [None]:
a = True
b = False
print b
c =""
if c: print "yes"
    
False, "", 0

<br><br><br><img src="../pix/play2.jpg">
Try to understand the following examples of Boolean expressions:

In [None]:
a = 5
b = 6
a == 10

In [None]:
print "Test 1:", a == 5
print "Test 2:", a == 7
print "Test 3:", a == 5 and b == 6
print "Test 4:", a == 5 and b == 5
print "Test 5:", a == 6 or b == 6
print "Test 6:", not (a == 6 and b == 6)
print "Test 7:", not a == 6 and b == 6
print "Test 8:", a == 5 and b > a
print "Test 9:", a == 5 and (not ("testing" == "testing" or "Python" == "Fun"))

<br><br><br>

## Sequences

In Python the word *sequence* refer to an ordered collection of items. We will take a look at them all:
<ul>
    <li>Strings</li>
    <li>Lists</li>
    <li>Tuples</li>
</ul>

### Strings
Strings are amongst the most popular types in Python. We can create them simply by enclosing characters in quotes. Python treats single quotes the same as double quotes.

In [None]:
print "This is a string"
print 'This is a string'
print '''This is a string'''
print """This is a string"""

In [None]:
print "A single quote (') inside double quotes"
print 'Here we have "double quotes" inside single quotes'

print """Strings are amongst the most popular types in Python. We can create them simply by enclosing characters in quotes. 
trings are amongst the most popular types in Python."""

<br><br><img src="../pix/play2.jpg"><i>Put the following protein annotation result in a text string:</i>

Phospholipase A2 "basic" or Orotidine-5'-phosphate decarboxylase

In [None]:
annotation = ""

<br><br><br>

#### Modifying strings

You can apply some of the same "math" operations on strings:

In [None]:
A = "ACGTGA"
B = "TATAA"

In [None]:
A + B

In [None]:
B * 5

You can access a single character from a string using brackets containing the *index* of the string character:

In [None]:
A[2]

In [None]:
C = "Python"
C[-1]

<center>
<img src="http://www.python-course.eu/images/string_indices.gif" />
</center>

In [None]:
C[6] # Will give an index error

You can check how long a string is using ```len()```

In [None]:
len(C)

Strings are immutable -- once we have created it, we cannot change it.

In [None]:
A = "acgctAGACGT"
print A.upper() # Returns the uppercase string
print A

In [None]:
print A
A[2] = 'W"

Our string A is unchanged...

To actually change the string we have to make a copy of it. 

In [None]:
B = A.upper()
print "A is", A
print "B is", B

Or by reassign to the name A to change the string.

In [None]:
A = A.upper()
print A

Some other string manipulations:

In [None]:
B = "a,bunch,of,words"
print B.split(",bu")

In [None]:
B = ["a", "bunch", "of", "words", "in", "a", "list", "object"]
print " ".join(B)
print "_".join(B)
print ",".join(B)



<br><br><img src="../pix/play2.jpg">
Convert the following annotation into a hyphen (-) separated text. (compare split() and split(' '))

In [None]:
annotation = """The status, quality, and expansion of the NIH full-length cDNA
RT   project: the Mammalian Gene Collection (MGC)."""
annotation.split?

<br><br><br>
#### Slicing


The "slice" syntax is a great way to refer to substrings of strings. Slicing consists of two indexes, separated by a colon ":"

<code>
 -6  -5  -4  -3  -2  -1
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
  0   1   2   3   4   5  
</code>


In [None]:
seq = "Python"
print seq[2:4]
print seq[2:-2]


<br><br><img src="../pix/play2.jpg">

From a Illumina sequencing run, the quality scores indicate that the first 7 bases and the last 15 bases of all sequences are rather bad and need to be trimmed off.
With slicing remove the bad nucleotides from the following sequence.

In [None]:
seq = "TTCAATATTAAGCTTGGCATTTAAAGTCTTTAGGATTGACTGAAACTGTTCAAAAAAGGATAAAAGCTTAAAATCATATTTATCGGAGCCAATTTCTTTA"


<br><br><br>
#### Python string formatting with the % operator

<br>
<code>Your name is John !</code><br><br>
Do you remember last weeks irritation about the "space" between name and "!"?

In [None]:
name = "John"
print "Your name is", name, "!"

In [None]:
name = "Thomas"
print "Your name is %s!" % name

In [None]:
age = 26
print "Your name is %s, and you are %d years old!" % (name, age)

In [None]:
from math import pi

print "pi = %.10f" % pi

<br><br><br>
### Lists

#### Making lists

Lists are like arrays in other programming languages. It is essentially a stack of data items. Unlike strings (stack of characters), lists are collections that you can actually change.

Lets say you want to make a list of your movies:

<ul>
    <li>The Meaning of Life</li>
    <li>The Life of Brian</li>
    <li>The Holy Grail</li>
</ul>

In python, this is done using brackets **[ ]**

In [None]:
movies = ["The Holy Grail", "The Life of Brian", "The Meaning of Life"]

#### Reading from lists

In [None]:
print movies

We can also use this brackets afterwards to access just a single item in the list:

In [None]:
print movies[0]
print movies[1]
print movies[2]

You can check how many items are in the movie collection using

In [None]:
len(movies)

#### Adding more data to the list

If you want to **add** just another item to the list (appending) you can do this by

In [None]:
movies.append("Die Hard")
print movies

To **insert** an item into a specific position of the list you can use

In [None]:
movies.insert(3, "Twilight")
print movies

Appending and inserting only works with a single item. You can also **extend** the list with another list using

In [None]:
newmovies = ["The Dark Knight Rises", "Django Unchained", "The Avengers", 300]
mynewlist = movies.extend?

In [None]:
mynewlist = movies.extend

In [None]:
print mynewlist

#### Deleting items from list

Of course you can **remove** an element again from the collection of favorite movies by either <cb>movies.pop(3)</cb> or <cb>movies.remove("Twilight")</cb>

In [None]:
movies.remove("Twilight")
print movies

Using <cb>movies.pop()</cb> without any paramter just removes the last element. Additionally the function <cb>.pop()</cb> also returns this element where it was called:

In [None]:
movies.pop()

In [None]:
print movies

In [None]:
del movies[2] # also works
print movies

<img src="../pix/play2.jpg">Make a list that includes four common organisms, such as 'Giant Squid' and 'Killer Rabbit'.<br>
Use the <i>list.index()</i> function to find the index of one organism in your list.<br>
Use the <i>in</i> function to show that this organism is in your list.<br>
Use the <i>append()</i> function to add a new cute animal to your list.<br>
Use the <i>insert()</i> function to add a new predator at the beginning of the list.<br>
Use a loop to show all the organims in your list.<br>
(<font size="tiny"><i>adapted from http://introtopython.org/</i></font>)

<br><br><br>

### Tuples

Tuples are like lists - but once you have created it, you can't change it. 

In [None]:
mytuble = (1,2,3,4)

Since they have a **fixed size** you generally use them as structures for some data, like coordinates (x,y). 

**Example**: Say you go for a walk at DTU campus and note your GPS coordinates at any instant in a tuple (x,y) called <cb>current_coordinate</cb>. While walking you record your journey in a list called <cb>journey</cb>, so you can trace your route at a later point.

In [None]:
journey = [] # empty list
while walking:
    current_coordinate = (x,y)
    journey.append(coordinate)
# This code won't run, because we don't know the coordinates "x" and "y" and when to stop "walking".
# But it illustrates the idea of tuples used as coordinates and lists used as records.

After a short walk your <cb>journey</cb> would look something like this:

In [None]:
[
   (55.787159, 12.518814),
   (55.787033, 12.519661),
   (55.786761, 12.521485),
   (55.787582, 12.522022),
   (55.787889, 12.520208),
   (55.787159, 12.518814)
];

This cannot be done in the opposite way! 

Tuples are in general more memory efficient than lists but accessing data items from list and tuples are equally fast.

## Dictionaries

Dictionaries is a special type of data structure in python. It holds a Key and a Value. They Value can be addressed not by a number, like in lists, but by a Key. (In other languages this data structure is also called a hash-table). You make dictionaries using the <b>{ }</b> brackets:

In [None]:
IUPAC = {'A': 'Ala', 'C': 'Cys', 'E': 'Glu'}
IUPAC['E']

And access each Value using, eg. <cb>IUPAC['E']</cb>:

In [None]:
print "C stands for the amino acid", IUPAC['C']
aa = 'A'
print aa, "stands for the amino acid", IUPAC[aa]

Another way of creating it is:

In [None]:
IUPAC = dict(A='Ala', C='Cys', E='Glu')
print IUPAC

The dict() function can also be used to convert sequences to dictionaries:

In [None]:
rgb = [('red','ff0000'), ('green','00ff00'), ('blue','0000ff')]
colors = dict(rgb)
colors['green']
#print colors

keys = colors.keys()
keys.sort()
print keys

colors.items()

## Running trough items in a collection

Python has a very non-typical way of running through items in a collection (items in a list, characters in a string or keys in a dictionary).

It goes like this:

In [None]:
for color in colors:
    print color, colors[color]

The recipe is basically a **for** line ending with a colon. Everything that follows with an **indentation** will be evaluated for each item in the collection.

In [None]:
for color in colors:
    print "The color <%s> is %s" % (colors[color], color)

print "Last color was", color

In [None]:
for key, value in colors.items():
    print key, value

<br><br><br>

## Membership testing

There are two kind of membership tests in python

    in

and 

    not in

They are used to test if an item is a member of a collection. Like

In [None]:
2 in [1,2,3,4]

In [None]:
2 not in [1,3,5,7]

In [None]:
print "green" in colors
print "darkgreen" in colors

In [None]:
"T" in "UAGCCGACGUGA"

## List Comprehension

Instead of defining a list by *enumeration*:

In [None]:
A = [0,1,2,3,4]
[3*x for x in A]

you can assign it by *comprehension*, eg. 
<center>
<img src="files/images/eq_1.gif" />
</center>

which gives
<center>
<img src="files/images/eq_2.gif" />
</center>


In Python this is written as:

    B = [expression for item in collection]

In [None]:
B = [3 for x in A]
print B

In [None]:
[IUPAC[x] for x in ["C", "A", "A", "E"]]

<img src="../pix/play2.jpg">Use the following IUPAC dictionary together with a list comprehension (or a for loop) to translate the ATP synthetase subunit 8 from *Architeuthis dux* into the 3-letter amino acid code sequence separated by '-'.


In [None]:
IUPAC = {'*': 'Ter', 'A': 'Ala', 'C': 'Cys', 'E': 'Glu', 'D': 'Asp',
         'G': 'Gly', 'F': 'Phe', 'I': 'Ile', 'H': 'His', 'K': 'Lys',
         'M': 'Met', 'L': 'Leu', 'N': 'Asn', 'Q': 'Gln', 'P': 'Pro',
         'S': 'Ser', 'R': 'Arg', 'T': 'Thr', 'W': 'Trp', 'V': 'Val',
         'Y': 'Tyr', 'X': 'Xaa'}
ATP8 = "MPQLSPINWLFLFIMFWSIMILNTSIMWWNTNNLYMINKTPKTSSNISYKW*"


<br><br><br>

## Conversion

Conversion between data types in Python is called *type casting*. This is done using functions like 

    int()
    float()
    str()
    list()
    dict()

A common conversion is from string to number:

In [None]:
x = "12.4"
y = "-2"
print x + y

In [None]:
print float(x) + int(y)

Or the other way around

In [None]:
x = 12.4
y = -2

In [None]:
print str(x) + str(y)

Example of using list()

In [None]:
myStr = "ABCDEFG"
print list(myStr)

<br><br><br>

<img src="../pix/book.gif" width=50px> Required reading for next week: 
* Python for Bioinformatics by S. Bassi - Chapter 4

<br><br><br>
<img src="../pix/exercise.png">
<br><br><br>


In [1]:
from IPython.core.display import HTML


def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()
