## Basic Bits Python 

#### Who is this for?
* Python beginners

## Contents:
* Comments and variables
* Loops
* Functions
* Dictionaries
* Make a DNA translator

### What is this??

This a **Jupyter Notebook**, which is an interactive way of coding in python. You might be looking at this via [**Google Colab**](https://drive.google.com/open?id=1bYnD5GUBWw8L84dS5k8hMHGde4iIghBg) which hosts jupyter notebooks remotely (and gives access to more computing power than a laptop 👍)

Notebooks particularly well suited to data analysis because it can support text **like this**, graphs, and and the code to make your analysis replicable.

This tutorial is a quick look at some basic elements of python ✅ but is  not a comprehensive guide ❌ The best guide is the [**Python Data Science Handbook - Jake Vanderplas**](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) 🌟🌟🌟🌟🌟 (free ebook)

# Comments

In [1]:
# This is a python cell
# All the code run here (ctrl+enter/shift+enter/the 'play' button)
# will be run and the output presented below
# These lines behind the '#' are comments and will be ignored by the interpreter

'''
This is another way of making comments
over
several lines
'''

'\nThis is another way of making comments\nover\nseveral lines\n'

Jupyter notebooks automatically display anything at the bottom of the cell when it's run in the cell output 
```python
Out [1]:
```

In this case, the multi-line string has been printed. the ```\n``` character is the ascii new line character.

### Loops
Loops are insructions that are repeated. For example, you may want to apply an operation to every file in a folder, or to every number in a sequence. The format of a ```for loop``` is:
```python
for i in <insert_iterable_object_here>: # the colon is essential
    # do something
    print(i)
    custom_function(i)
```
* An iterable object is a sequence of ```things```, e.g. the list ```[1,2,3,4,5]``` (more on lists later)
* ```i``` can be repaced by any string if you want. It's value is updated to the next item in the ```iterable``` object in each iteration.
* ```custom_function``` represents a function that will be performed on ```i```. The syntax of using functions is ```<fuction_name>(variables...)```, more on that later too.

In [2]:
for i in [1,2,3]: # for each item in the set [1,2,3]
    print(i*2)

2
4
6


# Variables and ```types```


Just like in maths, variables in python are handles for data

Here's a few examples of the fundamental data types stored in variables

It's important to touch on ```data types```, which are attributes of a variable that are important for the ```interpreter``` to process the operation. Below, I've listed some of the most simple data types.

In [3]:
a = 10   # this is an integer (int)
b = 9.02984  # this is a floating point number (float) - a number with a decimal point
# floats and integers are stored differently on the computer
c = 'this is a string'
d = True # this is a bool (boolean), which can be either True or False (always with a capital first letter)

list_of_variables = [a,b,c,d] # this is a list

# lists are 'iterable', which means that you can loop through them like this:

for i in list_of_variables: 
        
    print(i,'    ',type(i))
    
    #'i' is just a variable that changes each time,
    # you can use any letter or word though
    # you can print multiple variables in a print statement
    # the 'type()' function returns what type of thing the variable is

10      <class 'int'>
9.02984      <class 'float'>
this is a string      <class 'str'>
True      <class 'bool'>


These are simple variables on which you can build arbitarily complex objects, from publication quality figures to molecular dynamics simulations to artificial neural networks.

## Functions

 Functions are useful. They can be written to perform complex operations, or simple ones. Here, we'll write a function that helps us calculate how to make a dilution.
 
 I want a function that calculates what volume of liquid A to add to liquid B
 to get a final concentration of C.
I've rearranged the equation :

\begin{equation*}
C1 \times V1=C2 \times V2
 \end{equation*}
 into:
 
 \begin{equation*}
 V1=\frac{C2 \times V2}{C1}
 \end{equation*}
 
 where $V1$ is the volume of solution A to add to $V2$ of solution B. 
 $C1$ is the concentrtation of A and $C2$ is the concentration of B.
 

 Functions are defined like this:

```python
def f(): # def #function name# (parenthesis) any parameters go in here # colon
        ... # indented code # note '...' doesn't do anything
        # do something
``` 

  where ```def``` specifies that this is a function, ```f``` is the name of the function and ```()``` optionally contains placeholders for variables. Even if your function does not take in variables, you'll still need to include the parenthesis. Code follows the  ```:``` on the next lines after an indentation, which sets it apart from the rest of the script. 
  
  Functions can optionally ```return``` an output, which can be anything from a ```float``` to a web page. A colon follows the parenthesis, and then the code that specifies our fuction. 
 
  ```python
def f(x):
    return x**2
```

## Common Mistakes:
### Missing colon
```python
def f()
    return x**2
```
    
### No indentation
```python
def f():
return x**2
```
    
 ### No Brackets
 
 ```python
def f:
    return x**2
  ```  
 ### Not enough arguments
 ```python
def f(x):
    return x**2

f()
```

```
ERROR
```

### Too many arguments
```python
def f(x):
    return x**2

f(1,3)
```

```
ERROR
```

### Invalid type
```python
def f(x):
    return x**2
f('a')
```

```
ERROR
```

In [4]:
def WhatVolumeDoIAdd(c1,c2,v2):

    # here's how I'm calculating v1:
    v1=(c2+v2)/c1
    # and returning it
    return v1

#### Time to test the function!
Let's say I need to make the concentration of ð-Amino-Levulnic acid (ð-ALA) in my cell culture broth 100µM. 
* $C2$ = 100uM (target conc of ð-ALA)
* $C1$ = 0.45 M (my stock solution of ð-ALA)
* $V2$ = 500 ml (volume of cell culture)

In [5]:
# First I have to normalize all the units. I'll but everything into uM and ul
c1 = 0.45 * 10e6 # M to uM. 10e6 is python's scientific notation for 1*10^6
c2 = 100 # uM
v2 = 500 * 10e3 # ml to ul

v1 = WhatVolumeDoIAdd(c1 ,c2, v2)

print(v1, 'µl') # for the 'µ' symbol, use AltGr+m

1.1111333333333333 µl


In [6]:
# I can round the answer like ths:

print(round(v1, 2), 'µl') # round() takes the number to be rounded, then the number of decimals. I did 2.

1.11 µl


# Dictionaries

Dictionaries are recognisable for their ```{}``` 'curly' braces. They function as a lookup system, where you can store and access lots of variables under one roof. We'll put them into action here with a DNA translator.

In [7]:
# Here's the open reading frame for P450 BM3 in a pET15b plasmid:
# the '\' is an 'escape character' which lets me put in a line break mid-string 
# without upsetting the interpreter
seq = 'atgacaattaaagaaatgcctcagccaaaaacgtttggagagcttaaaaatttaccgttattaaacacagataa\
accggttcaagctttgatgaaaattgcggatgaattaggagaaatctttaaattcgaggcgcctggccgtgtaacgcgcta\
cttatcaagtcagcgtctaattaaagaagcatgcgatgaatcacgctttgataaaaacttaagtcaagcgcttaaatttgt\
acgtgattttGCAggagacgggttaTTTacaagctggacgcatgaaaaaaattggaaaaaagcgcataatatcttacttcc\
aagcttcagtcagcaggcaatgaaaggctatcatgcgatgatggtcgatatcgccgtgcagcttgttcaaaagtgggagcg\
tctaaatgcagatgagcatattgaggtaccggaagacatgacacgtttaacgcttgatacaattggtctttgcggctttaa\
ctatcgctttaacagcttttaccgagatcagcctcatccatttattacaagtatggtccgtgcactggatgaagcaatgaa\
caagctgcagcgagcaaatccagacgacccagcttatgatgaaaacaagcgccagtttcaagaagatatcaaggtgatgaa\
cgacctagtagataaaattattgcagatcgcaaagcaagcggtgaacaaagcgatgatttattaacgcacatgctaaacgg\
aaaagatccagaaacgggtgagccgcttgatgacgagaacattcgctatcaaattattacattcttaattgcgggacacga\
aacaactagtggtcttttatcatttgcgctgtatttcttagtgaaaaatccacatgtattacaaaaagcagcagaagaagc\
agcacgagttctagtagatcctgttccaagctacaaacaagtcaaacagcttaaatatgtcggcatggtcttaaacgaagc\
gctgcgcttatggccaactgctcctgcgttttccctatatgcaaaagaagatacggtgcttggaggagaatatcctttaga\
aaaaggcgacgaactaatggttctgattcctcagcttcaccgtgataaaacaatttggggagacgatgtggaagagttccg\
tccagagcgttttgaaaatccaagtgcgattccgcagcatgcgtttaaaccgtttggaaacggtcagcgtgcgtgtatcgg\
tcagcagttcgctcttcatgaagcaacgctggtcctaggtatgatgctaaaacactttgactttgaagatcatacaaacta\
cgagctggatattaaagaaactttaacgttaaaacctgaaggctttgtggtaaaagcaaaatcgaaaaaaattccgcttgg\
cggtattccttcacctagcactgaacagtctgctaaaaaagtacgcaaaaagggctgctaaca'

In [8]:
# Here's a dictionary that functions as a codon table. 
# Each entry is seperated by a comma and contains two values
# like this a:b. Access the contents like this:
# contents = DictionaryName[key] 
# where 'key' can be anything. Usually numbers or strings.

CodonTable = {'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
        'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
        'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
        'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
        'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
        'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
        'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
        'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

# Access some amino acids
CodonTable['GGG'] # return (and print) the amino acid corresponding to 'GGG'

'G'

In [9]:
# The keys in the dictionary are upper case, so I'll need to change case of either them
# or the sequence
# I'll use the .upper() built in function that can be called on strings

seq  = seq.upper() # seq is now updated to be uppercase
seq

'ATGACAATTAAAGAAATGCCTCAGCCAAAAACGTTTGGAGAGCTTAAAAATTTACCGTTATTAAACACAGATAAACCGGTTCAAGCTTTGATGAAAATTGCGGATGAATTAGGAGAAATCTTTAAATTCGAGGCGCCTGGCCGTGTAACGCGCTACTTATCAAGTCAGCGTCTAATTAAAGAAGCATGCGATGAATCACGCTTTGATAAAAACTTAAGTCAAGCGCTTAAATTTGTACGTGATTTTGCAGGAGACGGGTTATTTACAAGCTGGACGCATGAAAAAAATTGGAAAAAAGCGCATAATATCTTACTTCCAAGCTTCAGTCAGCAGGCAATGAAAGGCTATCATGCGATGATGGTCGATATCGCCGTGCAGCTTGTTCAAAAGTGGGAGCGTCTAAATGCAGATGAGCATATTGAGGTACCGGAAGACATGACACGTTTAACGCTTGATACAATTGGTCTTTGCGGCTTTAACTATCGCTTTAACAGCTTTTACCGAGATCAGCCTCATCCATTTATTACAAGTATGGTCCGTGCACTGGATGAAGCAATGAACAAGCTGCAGCGAGCAAATCCAGACGACCCAGCTTATGATGAAAACAAGCGCCAGTTTCAAGAAGATATCAAGGTGATGAACGACCTAGTAGATAAAATTATTGCAGATCGCAAAGCAAGCGGTGAACAAAGCGATGATTTATTAACGCACATGCTAAACGGAAAAGATCCAGAAACGGGTGAGCCGCTTGATGACGAGAACATTCGCTATCAAATTATTACATTCTTAATTGCGGGACACGAAACAACTAGTGGTCTTTTATCATTTGCGCTGTATTTCTTAGTGAAAAATCCACATGTATTACAAAAAGCAGCAGAAGAAGCAGCACGAGTTCTAGTAGATCCTGTTCCAAGCTACAAACAAGTCAAACAGCTTAAATATGTCGGCATGGTCTTAAACGAAGCGCTGCGCTTATGGCCAACTGCTCCTGCGTTTTCC

### ```range(start, stop, step)```

The ```range``` function is built in to python, which means we don't need to write it or import it ourselves. It generates in ```iterable``` set of ```integers``` between ```start``` and ```stop``` (inclusive of ```start``` exlusive of ```stop```). ```start``` has a default value of ```0```, and ```step``` is ```1``` by default, so we can get away with typing:
```python
range(10)
```
for ```start = 0```, ```stop = 10``` and ```step = 1```. You'll notice that the ```range``` function doesn't return a list of numbers straight away, because it's a **"lazy function"**, which only generates the numbers when absolutely necessary, like when it's iterated over in a loop, or we call:
```python
In  [1]: list(range(10))
Out [1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```
Notice that it generates numbers up to, but not including 10. 

If we need to generate every $n^{th}$ number, then we set the ```step``` parameter as $n$. ```range``` interpretes our parameters based on their order, so we'll explicitly set the ```start``` as 0, ```stop``` at 10 and ```step``` of 2.
```python
In  [2]: list(range(0,10,2))
Out [2]: [0, 2, 4, 6, 8]
```

In [17]:
triplet_list = []
for i in range(0,len(seq),3):
    triplet = seq[i:i+3]
    triplet_list.append(triplet)

triplet_list[0:10] # Select the first 10 triplets to display

['ATG', 'ACA', 'ATT', 'AAA', 'GAA', 'ATG', 'CCT', 'CAG', 'CCA', 'AAA']

In [18]:
translation = []

for i in triplets:
    if len(i) == 3:
        aa = CodonTable[i]
        translation.append(aa)
print(translation[0:10])

['M', 'T', 'I', 'K', 'E', 'M', 'P', 'Q', 'P', 'K']


In [19]:
translation_string = ''.join(translation)

print(translation_string)

MTIKEMPQPKTFGELKNLPLLNTDKPVQALMKIADELGEIFKFEAPGRVTRYLSSQRLIKEACDESRFDKNLSQALKFVRDFAGDGLFTSWTHEKNWKKAHNILLPSFSQQAMKGYHAMMVDIAVQLVQKWERLNADEHIEVPEDMTRLTLDTIGLCGFNYRFNSFYRDQPHPFITSMVRALDEAMNKLQRANPDDPAYDENKRQFQEDIKVMNDLVDKIIADRKASGEQSDDLLTHMLNGKDPETGEPLDDENIRYQIITFLIAGHETTSGLLSFALYFLVKNPHVLQKAAEEAARVLVDPVPSYKQVKQLKYVGMVLNEALRLWPTAPAFSLYAKEDTVLGGEYPLEKGDELMVLIPQLHRDKTIWGDDVEEFRPERFENPSAIPQHAFKPFGNGQRACIGQQFALHEATLVLGMMLKHFDFEDHTNYELDIKETLTLKPEGFVVKAKSKKIPLGGIPSPSTEQSAKKVRKKGC_
