# Bioinformatics Introduction to Python 3  
Adopted from CSE 30 (Fall '19) taught by Luca de Alfaro for a bioinformatics audience. <br>
Copyright Luca de Alfaro, 2018-19.  CC-BY-NC License.

The following is only a brief overview of Python basics. Learning a programming language is like learning a language; there is a specific set of vocabulary used and expect to improve only through practice and repetition. I encourage all to explore online resources as well as the following pages for learning Python for bioinformatics.

Additional learning resources:
* [Rosalind](http://rosalind.info/problems/list-view/) - Practice bioinformatics problems
* [Python Documentation](https://docs.python.org/3.8/tutorial/index.html) - Python 3.8 Official Documentation

## Print, and file input/output

In [1]:
# The print() function takes any number of arguments,
# and prints them with intervening spaces:
print("I have", 3, 'chicken')

I have 3 chicken


In [2]:
# We can also use {} and .format() to specify a string with {} holes, 
# which are filled by the arguments of .format(): 
print("I have {} chicken".format(3))
print("{} is divisible by {}".format(10, 2))

I have 3 chicken
10 is divisible by 2


In [3]:
# You can also use formatting options to specify the number of 
# digits to print for floating point numbers, etc. 
print("A gazillion is {:.2f}% less than a bazillion".format(15.67980))

A gazillion is 15.68% less than a bazillion


## Strings

In [4]:
# Strings can be built using either ' or " as delimiters.
# If you use ', the string can contain " inside, and vice versa.
s = 'A string'
t = "It's nice to be able to choose the delimiters"

In [5]:
# You can use + to concatenate strings:
print(s + " " + t)

A string It's nice to be able to choose the delimiters


In [6]:
# You can split a string according to spaces:
l = t.split()
print(l)

["It's", 'nice', 'to', 'be', 'able', 'to', 'choose', 'the', 'delimiters']


In [7]:
# Or you can split it according to any character:
t.split('a')

["It's nice to be ", 'ble to choose the delimiters']

In [8]:
# You can also put back a string you have split, using .join()
# Yes, it's weird; had I invented the .join operation, I would 
# have defined it as an operation of lists (rather than strings),
# so that one would write l.join(' ') rather than ' '.join(l). 
# But once you learn it, you get used to it. 
' '.join(l)

"It's nice to be able to choose the delimiters"

In [9]:
# A string can also be addressed as if it were a list of its characters,
# using indexing and slicing:
t[10:]

'to be able to choose the delimiters'

## Conditionals

In [10]:
# You can build boolean expressions with the usual relational operators
# <, <=, >, >=, == (double equal sign tests for equality), 
# and != ("!" represents not)
3 < 4

True

## Integers, floats, strings, booleans, and ... None!

In [11]:
# In python there are numbers, which can be integer or float. 
x = 1 # int
y = 1. # float

In [12]:
# You can sum, multiply, subtract numbers, and the result is integer iff
# both operands are integers. 
print(x + y)
print(x + 1)

2.0
2


In [13]:
# In Python 3, division between integers generates a float.
# This remedies a long-standing "trap" in Python 2, where 1 / 2 = 0, because 
# division between integers returned an integer: the quotient. 
x/2

0.5

In [14]:
# In Python 3, integer division (quotient) is written //, 
# and remainder is written %
print (7 // 3) 
print (7 % 3)

2
1


In [15]:
# There are also strings in Python.  They can be delimited with either " or '.
s = 'A string'
t = "It's nice to be able to choose the delimiters"
print(t)

It's nice to be able to choose the delimiters


In [16]:
# The other basic data type is booleans.  They are 'True' and 'False'.
b = True
print(not b)

False


In [17]:
# Relational operators, obviously enough, have boolean result:
print(4 < 8)
print(4 == 8)

True
False


In [18]:
# There's a special value in Python that means, no value.  It's called None. 
c = None
print(c)

None


In [19]:
# The operators +, -, *, /, can also be used with the following shorthand:
x = 2
x = x + 1
x += 1 # Same as above
print(x)
x *= 3 
print(x)

4
12


# Data Structures
Lists, tuples, sets, and dictionaries are provided data structures within Python. <br>
Official Documentation: https://docs.python.org/3/tutorial/datastructures.html#



## Lists and tuples

### Lists

In [20]:
# Lists are one of the basic data types in Python.
l = ['taa', 'tag', 'tga']
print(l)

# index elements with arrays starting at 0
print(l[0])
print(l[1])

['taa', 'tag', 'tga']
taa
tag


### List slicing

In [21]:
# You can 'slice' (yeah, that's a technical term) the beginning and end of a list:
l = ['taa', 'tag', 'tga', 'uag', 'uaa', 'uga']
l[:3] # Till element 3, excluded

['taa', 'tag', 'tga']

In [22]:
l[3:] # From element 3 onwards

['uag', 'uaa', 'uga']

In [23]:
print(l[1:3]) # From element 1 included, to element 3 excluded

['tag', 'tga']


In [24]:
# If you use negative numbers, they count backwards from the end
# of the list.  It's weird, but very useful.
l[-1] # This is the last element

'uga'

### List operations

In [25]:
# You append an element to a list like so:
l.append('atg')
l

['taa', 'tag', 'tga', 'uag', 'uaa', 'uga', 'atg']

In [26]:
# You can sum two lists.  
l + ['aug']

['taa', 'tag', 'tga', 'uag', 'uaa', 'uga', 'atg', 'aug']

In [27]:
# There are many more list operations.  Among them:
# You can 'pop' (retrieve, and remove) an element in any position:
x = l.pop(3)
print(x)
print(l)

uag
['taa', 'tag', 'tga', 'uaa', 'uga', 'atg']


In [28]:
# You can obtain the reverse of a list:
l.reverse()
l

['atg', 'uga', 'uaa', 'tga', 'tag', 'taa']

In [29]:
# And you can sort the list (the sort command has options; see Python docs).
l = l + ['aug']
l.sort()
l

['atg', 'aug', 'taa', 'tag', 'tga', 'uaa', 'uga']

In [30]:
# You can apply one operation to all elements of a list like this. 
print("atg".upper())
print("TAG".lower())

ATG
tag


In [31]:
# You can get the length of a string, or a list, with the len() operator.
print(len(l))

7


See the [official documentation](https://docs.python.org/3.7/tutorial/datastructures.html) for more list functions.

### Tuples

In [32]:
# Tuples are kind of like lists, except they are immutable.
# Here are two points in 2-D.
p1 = (1., 2.)
p2 = (3.1, 3.2)
# The useful thing with tuples is that they are easy to take apart.
# Whereas a beginner would write
x = p1[0]
y = p1[1]
print (x, y)


1.0 2.0


In [33]:
# anyone with a bit of Python experience would instead write: 
x, y = p1
print(x, y)

1.0 2.0


In [34]:
# Of course, the above works only if the tuple of variables on the left hand side
# is the same length as the tuple on the right hand side!
import traceback
try:
    x, y, z = p2 # p2 has only two points
except:
    print(traceback.format_exc())

Traceback (most recent call last):
  File "<ipython-input-34-68d824c166f0>", line 5, in <module>
    x, y, z = p2 # p2 has only two points
ValueError: not enough values to unpack (expected 3, got 2)



## Sets

In [35]:
# Sets are data structures that represent... sets.  Sets are like lists, 
# except that they cannot have repeated elements.
s = set() # {} would be a dictionary
print(s)

set()


In [36]:
set1 = {'cat', 'dog'}
set2 = {'bird', 'mouse', 'cat'}
set3 = {'dog', 'cat'}

In [37]:
# We can take union, intersection, and difference of sets:
print(set1 | set2) # union
print(set1 & set2) # intersection
print(set1 - set2) # difference

{'cat', 'bird', 'dog', 'mouse'}
{'cat'}
{'dog'}


In [38]:
# Set equality is defined as element-wise equality
# (order does not matter)
set1 == set3

True

In [39]:
# We can add elements to a set... 
set1.add('duck')
print(set1)
set1.add('dog')
print(set1)
# ... and as you can see, sets really have no repeated elements,
# so if you add a dog to a set containing already a dog, 
# nothing changes.

{'cat', 'dog', 'duck'}
{'cat', 'dog', 'duck'}


In [40]:
# And we can test membership using "in", just like for lists.
print('cat' in set1)
print('opossum' in set1)

True
False


In [41]:
# A quick way to remove duplicates from a list is to turn it 
# into a set, then back into the list.  This loses the ordering though,
# as sets do not preserve the order of the elements of the lists
# from which they were created:
l = ['a', 'b', 'c', 'g', 'c', 'd', 'f', 'g']
l_uniq = list(set(l))
l_uniq


['a', 'c', 'b', 'g', 'd', 'f']

## Dictionaries

[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) are a powerful data structure provided in Python that are quite useful in bioinformatics. The classic example is a dictionary translating DNA codons to amino acid.

In [42]:
# Dictionaries in Python are essentially maps between sets, or, one-to-many functions. 
# Or if you are in CS, they are like hash tables.  In fact, turns out they are hash tables.
# Except you don't need to worry about their implementation. 
# Enough said, let's define one.
codon_count = {'atg': 4, 'taa': 0, 'tag': 2, 'tga': 0}

In [43]:
# You can also build a dictionary like this:
d = dict(atg=4, taa=4, tag=2, tga=0)
# Of course, the keys need to be variable names...
d

{'atg': 4, 'taa': 4, 'tag': 2, 'tga': 0}

In [44]:
# Dictionaries can be indexed with [] notation like list indexing, 
# except they are indexed by their "keys", not by integers.
codon_count['atg']

4

In [45]:
# If you are not sure whether a key is in the dictionary, you can use .get() 
# rather than []:
print(codon_count.get('atg'))
print(codon_count.get('aaa'))

4
None


In [46]:
# You can check whehter something is in a dictionary with the 'in' operator:
print("atg" in codon_count)
print("aaa" in codon_count)

True
False


In [47]:
# Let's define a dictionary mapping codons to amino acid,
rnaCodonTable = {
        # RNA codon table
        # U
        'UUU': 'F', 'UCU': 'S', 'UAU': 'Y', 'UGU': 'C',  # UxU
        'UUC': 'F', 'UCC': 'S', 'UAC': 'Y', 'UGC': 'C',  # UxC
        'UUA': 'L', 'UCA': 'S', 'UAA': '-', 'UGA': '-',  # UxA
        'UUG': 'L', 'UCG': 'S', 'UAG': '-', 'UGG': 'W',  # UxG
        # C
        'CUU': 'L', 'CCU': 'P', 'CAU': 'H', 'CGU': 'R',  # CxU
        'CUC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R',  # CxC
        'CUA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R',  # CxA
        'CUG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R',  # CxG
        # A
        'AUU': 'I', 'ACU': 'T', 'AAU': 'N', 'AGU': 'S',  # AxU
        'AUC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S',  # AxC
        'AUA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R',  # AxA
        'AUG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R',  # AxG
        # G
        'GUU': 'V', 'GCU': 'A', 'GAU': 'D', 'GGU': 'G',  # GxU
        'GUC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G',  # GxC
        'GUA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G',  # GxA
        'GUG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G'   # GxG
    }

print('RNA codon GCU translates to amino acid: ',rnaCodonTable['GCU'])

RNA codon GCU translates to amino acid:  A


#### Dictionary keys, values, and key-value pairs

In [48]:
# You can ask for the list of keys of a dictionary:
codon_count.keys()

dict_keys(['atg', 'taa', 'tag', 'tga'])

In [49]:
# What's that dict_keys(...) thing?  It turns out that in Python 3,
# keys() returns a _view_ over the dictionary keys.  The view is dynamically
# updated to reflect changes in the underlying dictionary:
codons = codon_count.keys()
print(codons)
codon_count['gac'] = 6 # with apologies to entimologists...
print(codon_count)

dict_keys(['atg', 'taa', 'tag', 'tga'])
{'atg': 4, 'taa': 0, 'tag': 2, 'tga': 0, 'gac': 6}


## Iteration
https://docs.python.org/3.8/reference/compound_stmts.html#

For and while loops

In [50]:
# In old poor languages like Fortran and C, when you iterate, you have to 
# have a counter, increment it, and all that stuff.  Yeech. 
# Not so in Python.  You iterate over something that is iterable, that is, 
# that has (or can produce) a sequence of elements.  Like... a list! 
my_words = "I like to eat pizza with anchovies, I actually do!".split()

for w in my_words:
    print("My word is:", w)

My word is: I
My word is: like
My word is: to
My word is: eat
My word is: pizza
My word is: with
My word is: anchovies,
My word is: I
My word is: actually
My word is: do!


In [51]:
# You can also iterate over pairs, consisting of the element index and the list element:
for i, w in enumerate(my_words):
    print("The word number", i, "is:", w)

The word number 0 is: I
The word number 1 is: like
The word number 2 is: to
The word number 3 is: eat
The word number 4 is: pizza
The word number 5 is: with
The word number 6 is: anchovies,
The word number 7 is: I
The word number 8 is: actually
The word number 9 is: do!


In [52]:
# If you get tired of iteration, you can break out of it: 
for w in my_words:
    print(w)
    if w.startswith('anchovies'):
        print("   Indeed, they are delicious, no need to say more!")
        break

I
like
to
eat
pizza
with
anchovies,
   Indeed, they are delicious, no need to say more!


In [53]:
# And if you need to iterate over indices, like you used to do in C? 
# Well, you just create... a list of indices! 
for i in range(10):
    print("My integer is:", i)

My integer is: 0
My integer is: 1
My integer is: 2
My integer is: 3
My integer is: 4
My integer is: 5
My integer is: 6
My integer is: 7
My integer is: 8
My integer is: 9


In [54]:
# Note that you can also iterate on list slices:
for w in my_words[:5]:
    print(w)

I
like
to
eat
pizza


Oh btw, did you notice that we are using indentation rather than those 
pesky { } ?  Some people think it's silly, a throwback to Fortran and 
punched cards.  I think it's brilliant.  See, in C or Java you have 
two things: the real structure of the code (indicated by braces) and the illustrated structure (indicated by indentation).  The problem with this is that sometimes indentation and braces they differ, and when they do, the visual indication is fallacious.  In Python, the visual indication is also the structural one, and is always truthful. 

I am sure you prefer this to a language where there is only structure and no visuals! 

In [55]:
# If you have a dictionary, you can iterate on it like this. 
# On keys only (because .keys() returns a view over the keys):
for codon in codon_count.keys():
    print ("Codon:", codon)


Codon: atg
Codon: taa
Codon: tag
Codon: tga
Codon: gac


In [56]:
# There is also a while statement, which will execute as long as
# a condition is True.
#
# https://docs.python.org/3.8/reference/compound_stmts.html#while

x = 3.
while x > 1.1:
    print(x)
    x = x / 1.6
print("The final result is:", x)

3.0
1.875
1.171875
The final result is: 0.732421875


In [57]:
x = 0
while x <= 10:
    if x % 2 == 0:
        print(x)
    x += 1

0
2
4
6
8
10


## Functions
https://docs.python.org/3.8/tutorial/controlflow.html#defining-functions

In [58]:
def addone(x):
    return x + 1

addone(4)

5

In [59]:
# Ok, one more argument!  Let's test our CS skill! 
def add_one_to_prod(x, y):
    """This function adds one to the product of x and y,
    and this is how you are supposed to document what a 
    function does."""
    p = x * y
    return p + 1

In [60]:
add_one_to_prod(2, 3)

7

In [61]:
# One of the very nice things about Python is that functions can have 
# optional arguments, which have a default value.
def incadd(x, d=1):
    return x + d

print(incadd(3, d=4))
print(incadd(3))

7
4


In [62]:
# Often, the optional argument has default value None. 
# Functions, btw, can be passed around just as regular values.  
# Let's try this.  Let us define a function g that squares a number.
def g(x):
    return x * x

def f(x, h=None):
    """Adds 1 to x, then applies modifier function h if any,
    and returns the result."""
    y = x + 1
    return y if h is None else h(y)

print(f(2))
print(f(2, h=g))

3
9


## Importing modules

In [63]:
# Python libraries are organized in modules. 
# You need to import them before using them. 
import math
math.sqrt(3.)

1.7320508075688772

In [64]:
# If you like, you can also import individual functions from libraries.
from math import sqrt as square_root
square_root(2.)

1.4142135623730951

One of the things that makes Python great is the huge set of modules that are available for it.  You can look at https://docs.python.org/3/library/ for information about the Python standard library, but there is a very large number of modules besides the standard library.  The general rule is, before you try to implement something, look at whether there is a module available that does (part of) what you want to do.

## Classes
Classes are an integral part of object oriented programming and BME 160. Please read the official documentation and any additional online resources to understand them:

https://docs.python.org/3.8/tutorial/classes.html <br>
https://www.w3schools.com/python/python_classes.asp <br>
https://www.geeksforgeeks.org/python-classes-and-objects/ <br>

In [65]:
# Here is a simple standard class.
class Product(object):
    
    def __init__(self, name, price=0., quantity=0):
        """In the initializer, you should define the values that each object
        has.  Here, 'self' means, the object."""
        self.name = name 
        self.price = price
        self.quantity = quantity
        
    def __repr__(self):
        """Represents a class element in a reasonable way.
        Note the format statement below to help produce a string."""
        return "Hello, I am a {} and cost ${}; you have {} of me".format(
            self.name, self.price, self.quantity
        )
            
    
    def inflation(self, x):
        """Increases the price by a factor x.
        Note how self is always the first argument of methods; otherwise,
        you would not know to which object to apply the operations."""
        self.price *= x
        
    def value(self):
        """Total value of products of this type."""
        return self.price * self.quantity
        

In [66]:
# Let's make a list of products.
cart = [
    Product('Pear', price=1.99, quantity=10),
    Product('Apple', price=0.99, quantity=15),
    Product('Onion', price=1.49, quantity=57)
]

In [67]:
# We can print it; the representation is given by __repr__. 
for p in cart:
    print(p)

Hello, I am a Pear and cost $1.99; you have 10 of me
Hello, I am a Apple and cost $0.99; you have 15 of me
Hello, I am a Onion and cost $1.49; you have 57 of me


In [68]:
# What if you buy more apples?  
# The proper way would be to define a buy method, and write 
# something like p.buy(10) to buy 10 more.  But in Python, there is 
# nothing to prevent you from accessing object variables directly.

def double_the_cart(c):
    for p in c:
        p.quantity *= 2
        
double_the_cart(cart)

def print_cart(c):
    for p in c:
        print(p)
        
print_cart(cart)

Hello, I am a Pear and cost $1.99; you have 20 of me
Hello, I am a Apple and cost $0.99; you have 30 of me
Hello, I am a Onion and cost $1.49; you have 114 of me


# Bioinformatics Example
### Central Dogma
Transribe a DNA sequence into a protein

In [69]:
rnaCodonTable = {
        # RNA codon table
        # U
        'UUU': 'F', 'UCU': 'S', 'UAU': 'Y', 'UGU': 'C',  # UxU
        'UUC': 'F', 'UCC': 'S', 'UAC': 'Y', 'UGC': 'C',  # UxC
        'UUA': 'L', 'UCA': 'S', 'UAA': '-', 'UGA': '-',  # UxA
        'UUG': 'L', 'UCG': 'S', 'UAG': '-', 'UGG': 'W',  # UxG
        # C
        'CUU': 'L', 'CCU': 'P', 'CAU': 'H', 'CGU': 'R',  # CxU
        'CUC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R',  # CxC
        'CUA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R',  # CxA
        'CUG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R',  # CxG
        # A
        'AUU': 'I', 'ACU': 'T', 'AAU': 'N', 'AGU': 'S',  # AxU
        'AUC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S',  # AxC
        'AUA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R',  # AxA
        'AUG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R',  # AxG
        # G
        'GUU': 'V', 'GCU': 'A', 'GAU': 'D', 'GGU': 'G',  # GxU
        'GUC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G',  # GxC
        'GUA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G',  # GxA
        'GUG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G'   # GxG
    }

# ATGGAACTTGACTACGTAAATTAGT
dnaSeq = input('DNA: sequence: ') # prompt the user to input a DNA sequence
rnaSeq = dnaSeq.upper().replace('T','U') # uppercase the sequence and convert to RNA

for i in range(0, len(rnaSeq), 3):               # split the rna sequence to get
    codon = rnaSeq[i:i+3]                        # non-overlapping codons
    print(rnaCodonTable.get(codon, ''), end='')  # print the protein using the rnaCodonTable 
                                                 #    (not valid codons do not print)

DNA: sequence: ATGGAACTTGACTACGTAAATTAGT
MELDYVN-

# Bioinformatics Practice Questions

The following are several bioinformatics oriented practice problems. I higly recommend [BME 160](https://courses.soe.ucsc.edu/courses/bme160) for those interested in a bioinformatics programming class and [Rosalind](http://rosalind.info/problems/locations/) as a great resource to get started with more practice problems.

### Counting DNA Nucleiotides
Count the number of A, T, C, G nucleodes in a string input. This is the first problem in [Rosalind](http://rosalind.info/problems/dna/)


In [None]:
pass

In [70]:
# One method to do the above question. There are many different ways to get the same result.

seq = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'
print(seq.count("A"), seq.count("G"), seq.count("C"), seq.count("T"))

20 17 12 21


### DNA to RNA
Convert a DNA input to RNA

Example:
ATCGCG -> ATCGCG

In [None]:
pass

### Reverse Complement of DNA
Return the reverse complement of a DNA input

In [None]:
pass

### Unique Codons
Identify the count of unique codons in a DNA input

In [None]:
pass