## Basic Bits Python 

#### Who is this for?
* Python beginners

## Contents:
* Comments and variables
* Loops
* Functions
* Dictionaries
* Make a DNA translator

### What is this??

This a **Jupyter Notebook**, which is an interactive way of coding in python. You might be looking at this via **Google Colab** which hosts jupyter notebooks remotely (and gives access to more computing power than a laptop 👍)

Notebooks particularly well suited to data analysis because it can support text **like this**, graphs, and and the code to make your analysis replicable.

This tutorial is a quick look at some basic elements of python ✅ but is  not a comprehensive guide ❌ The best guide is the [**Python Data Science Handbook - Jake Vanderplas**](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) 🌟🌟🌟🌟🌟 (free ebook)

# Comments

In [1]:
# This is a python cell
# All the code run here (ctrl+enter/shift+enter/the 'play' button)
# will be run and the output presented below
# These lines behind the '#' are comments and will be ignored by the interpreter

'''
This is another way of making comments
over
several lines
'''


# Have a play with this 'for loop'

for i in [1,2,3]: # for each item in the set [1,2,3]
    # do something
    print(i*2)

2
4
6


# Variables


Just like in maths, variables in python are handles for data

Here's a few examples of the fundamental data types stored in variables

In [2]:
a = 10   # this is an integer (int)
b = 9.02984  # this is a floating point number (float) - a number with a decimal point
# floats and integers are stored differently on the computer
c = 'this is a string'
d = True # this is a bool (boolean), which can be either True or False (always with a capital first letter)

list_of_variables = [a,b,c,d] # this is a list

# lists are 'iterable', which means that you can loop through them like this:

for i in list_of_variables: 
        
    print(i,'    ',type(i))
    
    #'i' is just a variable that changes each time,
    # you can use any letter or word though
    # you can print multiple variables in a print statement
    # the 'type()' function returns what type of thing the variable is

10      <class 'int'>
9.02984      <class 'float'>
this is a string      <class 'str'>
True      <class 'bool'>


These are simple variables on which you can build fairly complex objects, like neural networks, for example

# Loops
 ```for loops``` are useful for loads of things, like doing the same function over and over. They iterate through iterable objects, like lists for example. 
 
 If I don't have a list to iterate through
 then I can use the ```range()``` function, which generates  integers between a minimum and a 
 maximum value. If no minimum value is given then it's assumed that the minimum is 0.

Loops are a mainstay of automating repetitive tasks, for example: processing every file in a folder.

In [3]:
for i in range(1,10):
    print(i)

1
2
3
4
5
6
7
8
9


## Functions

 Functions are very useful. They can be written to do complicated operations
 but for now I'll do one based on dilution calculations that we'd use in the lab.
 
 I want a function that calculates what volume of liquid A to add to liquid B
 to get a final concentration of C.
I've rearranged the equation :

\begin{equation*}
C1 \times V1=C2 \times V2
 \end{equation*}
 into:
 
 \begin{equation*}
 V1=\frac{C2 \times V2}{C1}
 \end{equation*}
 
 where $V1$ is the volume of solution A to add to $V2$ of solution B. 
 $C1$ is the concentrtation of A and $C2$ is the concentration of B.
 

 Functions are defined like this:

```
def f():
        ...
        # do something
``` 

  where ```f``` is the name of the function, and ```()``` optionally contains placeholders for variables. Even if your function does not take in variables, you'll still need to include the parenthesis. 

  You'll also need to remember the colon following the parenthesis and to indent the body of the function. Most text editors will indent automatically for you.

In [4]:
def WhatVolumeDoIAdd(c1,c2,v2): # this no-space + capitalizaion naming convention is called 'Camel case'
    # The machine will recognise this as a function because it's defined after 'def'.
    # Also important are the brackets which (optionally) contains variables that are
    # input into the function. Don't forget the colon after the variables and to 
    # indent all the code within the function
    
    #here's how I'm calculating v1:
    v1=(c2+v2)/c1
    
    # and I'm having the function return v1
    # Functions don't have to return variables, but lots do
    return v1

#### Time to test the function!
Let's say I need to make the concentration of ð-Amino-Levulnic acid (ð-ALA) in my cell culture broth 100µM. 
* $C2$ = 100uM (target conc of ð-ALA)
* $C1$ = 0.45 M (my stock solution of ð-ALA)
* $V2$ = 500 ml (volume of cell culture)

In [5]:
# First I have to normalize all the units. I'll but everything into uM and ul
c1 = 0.45 * 10e6 # M to uM. 10e6 is python's scientific notation for 1*10^6
c2 = 100 # uM
v2 = 500 * 10e3 # ml to ul

v1 = WhatVolumeDoIAdd(c1 ,c2, v2)

print(v1, 'µl') # for the 'µ' symbol, use AltGr+m

1.1111333333333333 µl


In [6]:
# I can round the answer like ths:

print(round(v1, 2), 'µl') # round() takes the number to be rounded, then the number of decimals. I did 2.

1.11 µl


# Dictionaries

Dictionaries are recognisable for their ```{}``` 'curly' braces. They function as a lookup system, where you can store and access lots of variables. We'll put them into action here with a DNA translator.

In [7]:
# Here's the open reading frame for P450 BM3 in a pET15b plasmid:
# the '\' is an 'escape character' which lets me put in a line break mid-string 
# without upsetting the interpreter
seq = 'atgacaattaaagaaatgcctcagccaaaaacgtttggagagcttaaaaatttaccgttattaaacacagataaaccggttcaagctttg\
atgaaaattgcggatgaattaggagaaatctttaaattcgaggcgcctggccgtgtaacgcgctacttatcaagtcagcgtctaattaaagaagcatg\
cgatgaatcacgctttgataaaaacttaagtcaagcgcttaaatttgtacgtgattttGCAggagacgggttaTTTacaagctggacgcatgaaaaaa\
attggaaaaaagcgcataatatcttacttccaagcttcagtcagcaggcaatgaaaggctatcatgcgatgatggtcgatatcgccgtgcagcttgtt\
caaaagtgggagcgtctaaatgcagatgagcatattgaggtaccggaagacatgacacgtttaacgcttgatacaattggtctttgcggctttaacta\
tcgctttaacagcttttaccgagatcagcctcatccatttattacaagtatggtccgtgcactggatgaagcaatgaacaagctgcagcgagcaaatc\
cagacgacccagcttatgatgaaaacaagcgccagtttcaagaagatatcaaggtgatgaacgacctagtagataaaattattgcagatcgcaaagca\
agcggtgaacaaagcgatgatttattaacgcacatgctaaacggaaaagatccagaaacgggtgagccgcttgatgacgagaacattcgctatcaaatt\
attacattcttaattgcgggacacgaaacaactagtggtcttttatcatttgcgctgtatttcttagtgaaaaatccacatgtattacaaaaagcagca\
gaagaagcagcacgagttctagtagatcctgttccaagctacaaacaagtcaaacagcttaaatatgtcggcatggtcttaaacgaagcgctgcgctta\
tggccaactgctcctgcgttttccctatatgcaaaagaagatacggtgcttggaggagaatatcctttagaaaaaggcgacgaactaatggttctgatt\
cctcagcttcaccgtgataaaacaatttggggagacgatgtggaagagttccgtccagagcgttttgaaaatccaagtgcgattccgcagcatgcgttt\
aaaccgtttggaaacggtcagcgtgcgtgtatcggtcagcagttcgctcttcatgaagcaacgctggtcctaggtatgatgctaaaacactttgacttt\
gaagatcatacaaactacgagctggatattaaagaaactttaacgttaaaacctgaaggctttgtggtaaaagcaaaatcgaaaaaaattccgcttggc\
ggtattccttcacctagcactgaacagtctgctaaaaaagtacgcaaaaagggctgctaaca'

In [8]:
# Here's a dictionary from the wild. Each entry is seperated by a comma and contains two values
# like this a:b. Access the contents like this:
# contents = DictionaryName[key] 
# where 'key' can be anything. Usually numbers or strings.

CodonTable = {'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
        'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
        'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
        'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
        'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
        'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
        'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
        'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

# Access some amino acids
CodonTable['GGG'] # return (and print) the amino acid corresponding to 'GGG'

'G'

In [9]:
# The keys in the dictionary are upper case, so I'll need to change case of either them
# or the sequence

seq  = seq.upper() # seq is now updated to be uppercase
seq

'ATGACAATTAAAGAAATGCCTCAGCCAAAAACGTTTGGAGAGCTTAAAAATTTACCGTTATTAAACACAGATAAACCGGTTCAAGCTTTGATGAAAATTGCGGATGAATTAGGAGAAATCTTTAAATTCGAGGCGCCTGGCCGTGTAACGCGCTACTTATCAAGTCAGCGTCTAATTAAAGAAGCATGCGATGAATCACGCTTTGATAAAAACTTAAGTCAAGCGCTTAAATTTGTACGTGATTTTGCAGGAGACGGGTTATTTACAAGCTGGACGCATGAAAAAAATTGGAAAAAAGCGCATAATATCTTACTTCCAAGCTTCAGTCAGCAGGCAATGAAAGGCTATCATGCGATGATGGTCGATATCGCCGTGCAGCTTGTTCAAAAGTGGGAGCGTCTAAATGCAGATGAGCATATTGAGGTACCGGAAGACATGACACGTTTAACGCTTGATACAATTGGTCTTTGCGGCTTTAACTATCGCTTTAACAGCTTTTACCGAGATCAGCCTCATCCATTTATTACAAGTATGGTCCGTGCACTGGATGAAGCAATGAACAAGCTGCAGCGAGCAAATCCAGACGACCCAGCTTATGATGAAAACAAGCGCCAGTTTCAAGAAGATATCAAGGTGATGAACGACCTAGTAGATAAAATTATTGCAGATCGCAAAGCAAGCGGTGAACAAAGCGATGATTTATTAACGCACATGCTAAACGGAAAAGATCCAGAAACGGGTGAGCCGCTTGATGACGAGAACATTCGCTATCAAATTATTACATTCTTAATTGCGGGACACGAAACAACTAGTGGTCTTTTATCATTTGCGCTGTATTTCTTAGTGAAAAATCCACATGTATTACAAAAAGCAGCAGAAGAAGCAGCACGAGTTCTAGTAGATCCTGTTCCAAGCTACAAACAAGTCAAACAGCTTAAATATGTCGGCATGGTCTTAAACGAAGCGCTGCGCTTATGGCCAACTGCTCCTGCGTTTTCC

# Exercise!

The aim of the exercise is to translate the DNA sequence in ```seq``` to amino acids. Luckily ```seq``` is in frame, so there's no need to shift around the reading frame. There are multiple ways to approach this. My instinct would be to:
*  Divide ```seq``` into its codon triplets
* Look up every codon in the ```CodonTable``` for the corresponding amino acid with a ```for``` loop (and store the translation somewhere)
* Clean up the the output so that we finish with a single string

Google is allowed! Part of programming is knowing what to google ;)


Extra points: 
* Make these steps into a reusable function
* Re-use the function on your ORF of choice



In [10]:
# to get you started, here's a snippet for splitting seq into triplets:

triplets = [seq[i:i+3] for i in range(0,len(seq),3)]
# This is a for loop stuffed inside a list, which comes in useful very often. 
# it returns the triplets of seq in the form of a list 

# The syntax for these types of loops (caled 'list comprehension') are:
# [dosomething(i) for i in [iterable]] 
# i can be any  variable name, the operation can be just about anything 
# and the iterable can be any iterable, including but not limited to: lists, range() objects and dictionaries

# seq[i:i+3] is selecting values i to i+3 of the list. Square brackets are used to retreive items
# from lists. the colon lets us select everything between the two values
# try running these to get the hang of addressing strings:
# seq[0] should be the first letter of the string. Python starts counting from zero because it wastes fewer bits
#  seq[-1] selects the last letter of the string
# seq[10:15] selects everything including and between the 10th and 15th character

# lists are accessed in the same way as string items

# range() returns an object that's ready to generate all the numbers specified, and will generate
# when we iterate through it. 
# range(0,len(seq),3) will generate every third number from 0 to the end of the sequence.
# range() takes in the variables/'arguments': start (default = 0), stop, step (default = 1)

triplets[0:10] # Select the first 10 triplets to display

['ATG', 'ACA', 'ATT', 'AAA', 'GAA', 'ATG', 'CCT', 'CAG', 'CCA', 'AAA']

In [11]:
translation = []

for i in triplets:
    if len(i) == 3:
        aa = CodonTable[i]
        translation.append(aa)
print(translation[0:10])

['M', 'T', 'I', 'K', 'E', 'M', 'P', 'Q', 'P', 'K']


In [12]:
translation_string = ''.join(translation)

print(translation_string)

MTIKEMPQPKTFGELKNLPLLNTDKPVQALMKIADELGEIFKFEAPGRVTRYLSSQRLIKEACDESRFDKNLSQALKFVRDFAGDGLFTSWTHEKNWKKAHNILLPSFSQQAMKGYHAMMVDIAVQLVQKWERLNADEHIEVPEDMTRLTLDTIGLCGFNYRFNSFYRDQPHPFITSMVRALDEAMNKLQRANPDDPAYDENKRQFQEDIKVMNDLVDKIIADRKASGEQSDDLLTHMLNGKDPETGEPLDDENIRYQIITFLIAGHETTSGLLSFALYFLVKNPHVLQKAAEEAARVLVDPVPSYKQVKQLKYVGMVLNEALRLWPTAPAFSLYAKEDTVLGGEYPLEKGDELMVLIPQLHRDKTIWGDDVEEFRPERFENPSAIPQHAFKPFGNGQRACIGQQFALHEATLVLGMMLKHFDFEDHTNYELDIKETLTLKPEGFVVKAKSKKIPLGGIPSPSTEQSAKKVRKKGC_
