# Python: the basics

## Using Jupyter notebooks: a quick tour

***Insert -> Insert Cell Below***

Type Python code in the cell, eg:

```
print("Hello Jupyter !")
```

***Shift-Enter*** to run the contents of the cell

When the text on the left hand of the cell is: `In [*]` (with an asterisk rather than a number), the cell is still running. It's usually best to wait until one cell has finished running before running the next.

In [156]:
print("Hello Jupyter !")

Hello Jupyter !


In Jupyter, just typing the name of a variable in the cell prints its representation:

In [157]:
message = "Hello again !"
message

'Hello again !'

In [158]:
# A 'hash' symbol denotes a comment
# This is a comment. Anything after the 'hash' symbol on the line is ignored by the Python interpreter

print("No comment")  # comment

No comment


## Variables and data types
### Integers, floats, strings

In [159]:
a = 5

In [160]:
a

5

In [161]:
type(a)

int

Adding a decimal point creates a `float`

In [162]:
b = 5.0

In [163]:
b

5.0

In [164]:
type(b)

float

`int` and `float` are collectively called 'numeric' types

(There are also other numeric types like `hex` for hexidemical and `complex` for complex numbers)

## Challenge

What is the type of the variable `letters` defined below ?

`letters = "ABACBS"`

In [291]:
letters = "ABACBS"
type(letters)

str

### Strings

In [280]:
some_words = "Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇"

In [281]:
some_words

'Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇'

In [282]:
type(some_words)

str

In [283]:
more_words = 'You can use "single" quotes'
more_words

'You can use "single" quotes'

In [284]:
triple_quoted_multiline = """In the last years of the nineteenth centuary,
human affairs were being watched from the timeless worlds of space.
Nobody would have believed that we were being scrutinized as a ....

.. etc ..
"""

print(triple_quoted_multiline)

In the last years of the nineteenth centuary,
human affairs were being watched from the timeless worlds of space.
Nobody would have believed that we were being scrutinized as a ....

.. etc ..



In [298]:
# You can substitute variables into a string like this.
# The variables listed after the string replace each `{0}`, `{1}` etc, in order

formatted = "{0} and BTW, did I mention that {1}".format(more_words, some_words)
print(formatted)

# The example above is 'new-style' string formatting. 
# You may also see 'old-style' (C-style) string formatting in examples, which looks like: 

oldskool = "%s and BTW, did I mention that %s" % (more_words, some_words)

# There's lots of fancy ways to format numbers in strings (eg number of decimal places, scientific notation)
# we won't go into today. See: https://pyformat.info/

You can use "single" quotes and BTW, did I mention that Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇


## Operators

`+`  `-`  `*`  `/`  `%`  `**`  `//`  

`+=`  `*=`  `-=`  `/=`

In [170]:
# int + int = int
a = 5
a + 1

6

In [171]:
# float + int = float
b = 5.0
b + 1

6.0

In [172]:
a + b

10.0

In [173]:
some_words = "Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇"
a = 6
a + some_words

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [307]:
str(a) + " " + some_words

'16 Python3 strings are Unicode (UTF-8) ❤❤❤ 😸 蛇'

In [308]:
# Multiplication
a * 10

160

In [309]:
# Division
a / 2

8.0

In [310]:
# Power
a**2

256

In [311]:
# Modulus - divide as whole numbers and return the remainder
a % 2

0

In [312]:
# Shorthand: operators with assignment
a += 1
a

17

## Lists and sequence types

### Lists

In [313]:
numbers = [2, 4, 6, 8, 10]
numbers

[2, 4, 6, 8, 10]

In [314]:
len(numbers)

5

In [315]:
# Lists can contain multiple data types
mixed_list = ["asdf", 2, 3.142]
mixed_list

['asdf', 2, 3.142]

In [316]:
list_of_lists = [mixed_list, numbers, ['a','b''c']]
list_of_lists

[['asdf', 2, 3.142], [2, 4, 6, 8, 10], ['a', 'bc']]

In [317]:
numbers[0]

2

In [318]:
numbers[3]

8

In [319]:
numbers[3] = numbers[3] * 100
numbers

[2, 4, 6, 800, 10]

In [320]:
numbers.append(12)
numbers

[2, 4, 6, 800, 10, 12]

In [321]:
numbers.extend([14, 16, 18])
numbers

[2, 4, 6, 800, 10, 12, 14, 16, 18]

In [322]:
# The '+' operator for lists is equivalent to list.extend()
numbers + [100, 200, 300, 400]

[2, 4, 6, 800, 10, 12, 14, 16, 18, 100, 200, 300, 400]

### Tuples

In [323]:
tuples_are_immutable = ("bar", 100, 200, "foo")
tuples_are_immutable

('bar', 100, 200, 'foo')

In [324]:
tuples_are_immutable[1]

100

In [325]:
tuples_are_immutable[1] = 666

TypeError: 'tuple' object does not support item assignment

### Sets

In [326]:
unique_items = set([1, 1, 2, 2, 3, 4, 1, 2, 3, 4])
# or curly brackets
# unique_items = {1, 1, 2, 2, 3, 4, 1, 2, 3, 4}
unique_items

{1, 2, 3, 4}

### Slicing

In [327]:
numbers = [2, 4, 6, 8, 10, 12]

# list[start:end]
# start is inclusive, end isn't

numbers[0:3]

[2, 4, 6]

In [328]:
numbers[4:7]

[10, 12]

In [329]:
numbers[:3] # omitting start implies 0 (the very start)

[2, 4, 6]

In [330]:
numbers[3:] # omitting end means to the very end eg len(numbers)

[8, 10, 12]

In [331]:
numbers[-1:] # negative values reverse direction

[12]

In [332]:
numbers[:-1]

[2, 4, 6, 8, 10]

In [333]:
# you can also specify a step size
# list[start:end:step]

numbers[0:6:2]

[2, 6, 10]

In [334]:
# [:] is a shorthand for copying a list.
# Equivalent to:
# n_copy = list(numbers)

n_copy = numbers[:]
n_copy

[2, 4, 6, 8, 10, 12]

In [335]:
n_copy[3] = 8
n_copy

[2, 4, 6, 8, 10, 12]

In [336]:
numbers

[2, 4, 6, 8, 10, 12]

## Challenge

Given the list: `['banana', 'cherry', 'strawberry', 'orange']`

Return a list of just the red fruits.

In [337]:
fruits = ['banana', 'cherry', 'strawberry', 'orange']
red_ones = fruits[1:3]
red_ones

['cherry', 'strawberry']

### Dictionaries

Dictionaries store a mapping of key-value pairs. They are unordered. 

Other programming languages might call this a 'hash', 'hashtable' or 'hashmap'.

In [338]:
pairs = {'Apple': 1, 'Orange': 2, 'Pear': 4}
pairs

{'Apple': 1, 'Orange': 2, 'Pear': 4}

In [339]:
pairs['Orange']

2

In [340]:
pairs['Orange'] = 16
pairs

{'Apple': 1, 'Orange': 16, 'Pear': 4}

In [341]:
pairs.items()
# list(pairs.items())

dict_items([('Pear', 4), ('Orange', 16), ('Apple', 1)])

In [342]:
pairs.values()
# list(pairs.values())

dict_values([4, 16, 1])

In [343]:
pairs.keys()
# list(pairs.keys())

dict_keys(['Pear', 'Orange', 'Apple'])

In [344]:
len(pairs)

3

In [345]:
dict_of_dicts = {'first': {1:2, 2: 4, 4: 8, 8: 16}, 'second': {'a': 2.2, 'b': 4.4}}
dict_of_dicts

{'first': {1: 2, 2: 4, 4: 8, 8: 16}, 'second': {'a': 2.2, 'b': 4.4}}

## Functions

Functions wrap up reusable pieces of code - the *DRY* principle

Significant whitespace: the body of the function is indicated by indenting by 4 spaces

*(We also use these indented blocks for if/else, for and while statements .. later !)*

`return` statements immediately return a value (or `None` if no value is given)

Any code in the function after the `return` statement does not get executed.

In [346]:
def square(x):
    return x**2

def hyphenate(a, b):
    return a + '-' + b
    print("We will never get here")

print(square(16), hyphenate('python', 'esque'))

256 python-esque


### Indentation and whitespace

* Python uses spaces at the start of a line to indicate a 'block' of code.
* A new block of code should be indented by **four** spaces.

* For a function, all the indented code is part of the the function.
* This also applies to loops like `for` and `while` and conditionals like `if`

(Indenting/dedenting by four spaces in Python is the equivalent to opening **{** and closing **}** curly brackets in languages like Java, Javascript, C, C++, C# etc)

(You can technically use tab characters, but please don't. The official Python style guide prefers spaces https://www.python.org/dev/peps/pep-0008/).

In [347]:
# Functions can return multiple values (just return a tuple and unpack it)
def lengths(a, b, c):
    return len(a), len(b), len(c)

x, y, z = lengths("long", "longer", "LONGEREST")
print(x, y, z)

4 6 9


In [348]:
def split_at(seq, residue='K'):
    """
    Takes a protein sequence (as a string) and splits it at each K residue,
    or the residue specified in the `residue` keyword argument. Split point
    residue is discarded.
    
    Returns a list of strings.
    """
    return seq.split(residue)

split_at('MILKGROGDRINKPINEAPPLE')

['MIL', 'GROGDRIN', 'PINEAPPLE']

In [349]:
# The previous example isn't a good proteolytic digest since the 'K' is removed
# For the record ... here's a better version that is more like a real tryptic digest
import re

def digest(seq, cut_regex=r'[KR][^P]'):
    """
    Takes a protein sequence (as a string) and splits it after 
    each K or R residue, except if followed by a P.
    
    Returns a list of strings.
    """
    cut_indices = list(re.finditer(cut_regex, seq.upper()))
    peptides = []
    i = 0
    for j in cut_indices:
        peptides.append(seq[i:j.start()+1])
        i = j.start()+1
    peptides.append(seq[i:])
    return peptides

digest('MILKGROGDRINKPINEAPPLE')

['MILK', 'GR', 'OGDR', 'INKPINEAPPLE']

In [350]:
digest('MILKYGROGFPCE', cut_regex=r'[WYF][^P]')

['MILKY', 'GROGFPCE']

In [351]:
# Functions can have an indeterminate number of arguments and keyword arguments using * and **
import math

def vector_magnitude(x, y, *args, **kwargs):
    
    # print(args)    # args is a tuple
    # print(kwargs)  # kwargs is a dictionary
    
    scale = kwargs.get('scale', 1)
    
    vector = [x,y] + list(args)
    return math.sqrt(sum(v**2 for v in vector)) * scale

In [352]:
print(vector_magnitude(1, 2, 4, 8, m=2))

9.219544457292887


In [353]:
# One sublte gothca ... mutable keyword arguments might not work the way you expect
def add_nls(seq, extra_residues=['K', 'K', 'R', 'K']):
    """
    Adds a nuclear localisation signal to the C-terminal end of
    the provided sequence (string). The signal sequences can be customized 
    with the extra_residues keyword argument (a list of residues).
    
    Returns a string.
    """

    suffixed = seq + ''.join(extra_residues)
    # Here we modify the extra_residues list
    extra_residues.append('-')
    
    return suffixed

add_nls("MILKGROG")

'MILKGROGKKRK'

In [354]:
add_nls("MILKGROG")

'MILKGROGKKRK-'

In [355]:
# The list assigned to the keyword argument is only initialized when the function is first defined !!
add_nls("MILKGROG")

'MILKGROGKKRK--'

In [356]:
# The safe way to define default values when they are mutable types (lists, dicts)
def add_nls(seq, extra_residues=None):
    
    # Set the keyword arg default inside the function
    if extra_residues is None:
        extra_residues = ['K', 'K', 'R', 'K']
        
    suffixed = seq + ''.join(extra_residues)
    # Here we modify the extra_residues list
    extra_residues.append('-')
    
    return suffixed

print(add_nls("MILKGROG"), add_nls("MILKGROG"))

MILKGROGKKRK MILKGROGKKRK


In [357]:
# Names can refer to functions - functions can be passed into functions

In [358]:
# Lambdas: throw-away function one-liners
list(map(square, [1, 2, 3, 4]))

[1, 4, 9, 16]

In [359]:
# The equivalent, using a lambda
list(map(lambda x: x**2, [1,2,3,4]))

[1, 4, 9, 16]

In [360]:
# Lambdas are usually used as 'anonymous functions', but the can be named 
# just like functions defined with 'def'
sqr = lambda x: x**2
sqr(10)

100

## Conditionals

In [361]:
a = 10
b = 0
a > 1

True

In [362]:
if a > 1:
    print("a is greater than one")

a is greater than one


In [363]:
word = 'Bird'

# Note: Double equals for a conditional vs single equals for assignment !
if word == 'Bird':
    print('Bird is the word.')
    
if word != 'Girt':
    print('The word is not girt.')

Bird is the word.
The word is not girt.


In [364]:
if 'ird' in word:
    print("'ird' is in Bird.")
    
letters = ['B', 'i', 'r', 'd']
if 'i' in letters:
    print("'i' is in letters.")

'ird' is in Bird.
'i' is in letters.


*Protip*: Long lines can be split across two or more using a backslash ('\')

This can make your code more readable.

There should be nothing after the backslash, including whitespace.

Try to keep lines shorter than 78 characters for a PEP-8 style bonus.

In [365]:
if 'I' not in 'team' or \
   'I' not in 'TEAM':
    print("There is no 'I' in team (or TEAM).")

There is no 'I' in team (or TEAM).


In [366]:
# Boolean logic
# True and True => True
a > 1 and b <= 0

True

In [367]:
# True or False => True
a > 1 or b > 1

True

In [368]:
if a > 100:
    print("a is greater than one hundred")
elif a > 50:
    print("a is greater than fifty but less than one hundred")
else:
    print("a is less than fifty")
    
# For better or worse, there is no case/switch statement in Python - you just use if/elif/elif/else

a is less than fifty


In [369]:
# Truthyness
if a:
    print("A non-zero int is truthy")

if not (a - 10):
    print("The int 0 is 'falsey' ... not False => True !")

if '' or [] or () or dict():
    print("We will never see this since an empty string, list, tuple and dict are all 'falsey'")
    
if "    ":
    print("A non-empty string, even whitespace, is 'truthy")

A non-zero int is truthy
The int 0 is 'falsey' ... not False => True !
A non-empty string, even whitespace, is 'truthy


## Loops

In [370]:
def line():
    print('-'*78)

A `for` loop works on a sequence types, generators and iterators

(this includes lists, tuples, strings and dictionaries)

In [371]:
for letter in "ABCD..meh":
    print(letter)

A
B
C
D
.
.
m
e
h


In [372]:
ts = [('Z', 99), ('Y', 98), ('X', 97)]

for t in ts:
    print(t)
    
# using tuple unpacking
for m, n in ts:
    print(m, n)

('Z', 99)
('Y', 98)
('X', 97)
Z 99
Y 98
X 97


In [373]:
# for on dictionary.items()
d = {'A': 1, 'B': 2, 'C': 3}

for item in d.items():
    # print(type(item))
    print(item)

('C', 3)
('A', 1)
('B', 2)


In [374]:
for k, v in d.items():
    print(k, v)

C 3
A 1
B 2


`while` loops keep looping while their condition is true:

```
while some_condition:
    do_stuff()
```

Note: If the condition for your `while` loops never becomes `False`, the loop will run forever (in Jupyter you can do *Kernel -> Interrupt* to break out of the infinite loop).

In [375]:
a = 0
while a < 16:
    print(a, end=' ')
    a += 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

`break` immediately exits a loop

`continue` immediately starts the next iteration of the loop

Any code inside the loop after a `break` or `continue` is skipped.

In [376]:
a = 0
while True:
    a += 1
    
    if a > 16:
        break
        print('We will never see this.')
    
    if a % 2:
        continue
        print('We will also never see this.')
        
    print(a, end=' ')

2 4 6 8 10 12 14 16 

### List comprehensions

List comprehensions are a shorthand way to loop over a list, modify the items and create a new list.

In [377]:
# Instead of doing
new_list = []
for i in range(0,11):
    new_list.append(i**2)

new_list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [378]:
# Use a list comprehension instead
new_list = [i**2 for i in range(0,11)]
new_list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [379]:
# You can also `filter` values using an if statement inside the list comprehension
new_list = [i**2 for i in range(0,11) if i < 4]
new_list

[0, 1, 4, 9]

***End part 1. Stand up and strech for a moment.***

In [380]:
# No detail, except to show:
#     attributes, instance and class variables , private __underscore naming
#     methods
#     __init__ constructor (including best practise on mutable args)
# Don't cover:
#     Inheritance, super
#     Mixins / multiple inheritance

class ProteinSequence:
    """A class representing a peptide or protein sequence."""
    
    # These are 'class variables', the value is shared by every instance of the class
    pos_aa = ['R', 'K', 'H']
    neg_aa = ['D', 'E']
    
    def __init__(self, seq):
        # This defines an 'instance variable' with a value unique to the instance
        self.seq = seq
        self.__private_number = 42

    def bogus_charge(self):
        """
        Calculates the overall charge of the peptide, assuming full protonation of 
        basic residues and full deprotonation of acidic residues.
        
        Returns an integer value.
        """
        pos_charge = sum([self.seq.count(aa) for aa in self.pos_aa])
        neg_charge = sum([self.seq.count(aa) for aa in self.neg_aa])
        return pos_charge - neg_charge
    
    def get_number(self):
        return self.__private_number

s = ProteinSequence('MKNVLREDEDD')
s.bogus_charge()

-3

In [381]:
# Creating a new instance doesn't effect the original instance 's' 
# (eg, seq is an instance variable, not a class variable)
t = ProteinSequence('EEEEEEEEEEEE')
s.seq

'MKNVLREDEDD'

In [382]:
# Instance variables are public by default - you can just reassign them
s.seq = 'MKNVLRERKKR'
print(s.seq, s.bogus_charge())

MKNVLRERKKR 5


In [383]:
# Modifications to class variables are reflected for every instance (you'd rarely do this)
ProteinSequence.pos_aa = ['H']
print(t.pos_aa, s.pos_aa)

['H'] ['H']


In [384]:
# Reassigning a 'class variable' on an instance turns it into an instance variable
t.pos_aa = ['X']
print(t.pos_aa, s.pos_aa)

['X'] ['H']


In [385]:
ProteinSequence.pos_aa = ['Y']
print(t.pos_aa, s.pos_aa)

['X'] ['Y']


In [386]:
# Double underscore variables are 'private' and aren't easily accessible from outside the class
t.__private_number

AttributeError: 'ProteinSequence' object has no attribute '__private_number'

In [387]:
# But are accessible via self inside the namespace of the class
t.get_number()

42

In [388]:
# Note how I said 'aren't easily accessible' - here's how Python name-munges private variables
dir(t)

['_ProteinSequence__private_number',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'bogus_charge',
 'get_number',
 'neg_aa',
 'pos_aa',
 'seq']

In [389]:
t._ProteinSequence__private_number

42

## Variables are names references to a value
For more, see: https://nedbatchelder.com/text/names.html

In [390]:
# Variables are names referencing values, like tags that can be moved around
a = 1
b = 2
b = a
a = 3
# 'b' still references '1'
b

1

In [391]:
# Each slot in a list is a reference to a value
# Assigning a additional name to a list doesn't copy the list - so two names can point to a single 'shared' list

a = [1, 2, 3]
b = [4, 5, 6]
# The names 'b' and 'c' can be reassigned to both point to the same list as 'a'
b = a
c = a
print("a: %s\nb: %s\nc: %s" % (a, b, c))

a: [1, 2, 3]
b: [1, 2, 3]
c: [1, 2, 3]


In [392]:
# We can reassign the list that 'a' references - with won't change the list that 'b' and 'c' reference
a = [7, 8, 9]
print("a: %s\nb: %s\nc: %s" % (a, b, c))

a: [7, 8, 9]
b: [1, 2, 3]
c: [1, 2, 3]


In [393]:
# Since 'b' and 'c' point to the same list, if we modify an element via the 'c' variable name,
# this change is reflected in 'b' - after all, they reference the same list.
c[0] = 1000
print(b)

[1000, 2, 3]


In [394]:
# If we assign a list element to another list, that element (c[1]) references the list 'a' (not a copy of 'a' !)
c[1] = a
c

[1000, [7, 8, 9], 3]

In [395]:
# So changing 'a' is reflected in nested list that 'c' references
a[0] = 2000
c

[1000, [2000, 8, 9], 3]

In [396]:
# The same principle applies to dictionaries
dict_of_dicts['second'] = pairs
dict_of_dicts

{'first': {1: 2, 2: 4, 4: 8, 8: 16},
 'second': {'Apple': 1, 'Orange': 16, 'Pear': 4}}

In [397]:
# If we add a key/value pair to 'pairs', 
# we can see the change via dicts_of_dicts since the 'second' key references the same dict as 'pairs'
pairs['Plum'] = 64
dict_of_dicts

{'first': {1: 2, 2: 4, 4: 8, 8: 16},
 'second': {'Apple': 1, 'Orange': 16, 'Pear': 4, 'Plum': 64}}

### Pass by references / pass by value ... argh ....

In [398]:
def f(x):
    x = 3.142

In [399]:
# Local variables inside functions of simple numeric types are in effect a 'copy'
z = 1
f(z)
z

# z doesn't get changed from being passed into a function, only the 'copy' of the value assigned to x
# inside the function gets changed

1

In [400]:
def g(a_list):
    a_list = [6, 7, 8]
    
z = [1, 2, 3]
g(z)
z

# If a local variable inside a function is completed reassigned, the original
# data passed in ('z') is left unchanged.

[1, 2, 3]

In [401]:
def mutator(a_list):
    a_list[0] = 999
    
z = [1, 2, 3]
mutator(z)
z

# However, if we don't completely reassign `a_list` but modify it's elements, `z` is mutated since each slot
# in the list is a reference to the same underlying data.

# This can be confusing. We need the 'labels on data' diagram to make it clearer.

[999, 2, 3]

## Neat features

In [402]:
# Context managers: 'with'
# Iterators / Generators
# Decorators