# Python

Python is an modern object oriented coding language.  Its main strengths are its readability, its large array of optimised libraries and its interactivity making it an excellent place to begin learning to code.  It is fast becoming the default language for machine learning and data science and as such is the number one skill required by employers for data scientists. Here we will give a simple introduction to the language to get you started.  You should also look through a whirlwind tour of python here: https://github.com/jakevdp/WhirlwindTourOfPython


## Installation

Python should already exist on your department machine (depending on your department) and be easily accessable but possibly requiring you to load some modules.  See your local computer officer or web pages for more info.  You will also want to check the version available.  We will be working in *python 3* which has some significant changes from *python 2*

To get it on your laptop or home computer there are several options but using the `conda` package and enviroment manager is probably the easiest. Just go to https://conda.io/miniconda.html and install Miniconda.  You could instead install Anaconda from https://www.anaconda.com/download/ which includes Miniconda and a bunch of other packages for all sorts of other things (and languages) it even comes with a graphical installer and graphical package manager for wusses.  The down sides are that it's quite large (3Gb) and you can install any of Anaconda's packages with Miniconda later on if you want them.

Once installed you need to create an *enviroment*:

In [None]:
conda create --name myfirstenviroment

This will create an *enviroment* called `myfirstenviroment` (you should think of a better name).  An enviroment is just a container for all the bits for a project.  This keeps them seperate from your system so it's harder to break things.

To access this enviroment you type

In [None]:
activate myfirstenviroment (Win)
source activate myfirstenviroment (Mac, Linux)

To leave the enviroment you type:

In [None]:
deactivate (Win)
source deactivate (Mac, Linux)

Documentation about *conda*  is available on the website: https://conda.io/docs/index.html.  For now you don't need to know much more than this.

## Using python

There are a few ways to use python.  The easiest is with jupyter notebooks (like this one!).  You will have to intall the module in your enviroment with conda,

conda install jupyter

then you can launch jupyter with:

In [None]:
jupyter notebook

which brings up a window where you can create a notebook.  The terminal then becomes a window which lists all the background stuff happening in jupyter, you can stop this by running it in the background (sticking a `&` at the end).  You can then just type in code and run it in the window, a lot like in mathematica eg:

In [None]:
1+2

Jupyter is good for fiddling about, creating plots and creating code with notes attached as it can handle basic latex for formulas, eg.

$M_l(a,b,c) = \int dx j_l(a x) j_l(b x) j_l(c x)$

and it can even export the notebook as .tex, .html and .pdf for sharing or using in papers and talks.  But for larger projects you will probably want to run it using ipython which you start by typing

In [None]:
$ ipython
Python 3.7.0 (default, Jun 28 2018, 07:39:16) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 

This brings up an interactive window where you can type and code.  You will probably want to run in one window with an editor with you code files (.py) in another.  

Finally, once you have code you can just run it on the command line with:

In [None]:
python mywonderfullprogramme.py

However as we will see you can do this in ipython too so there isn't much advantage to it unless you want it to happen in the background.

In this course I would recommend you work in ipython as this is where you will most likely end up working the most as it sets you up to run your code in the batch (fancy word for background) but feel free to use Jupyter instead if you want (you can export your notebook as a .py file but this doesn't turn it into runable code unfortunatly).

With that, let's begin to code!


## Data types

The very first this we need to learn before we can code is what objects does the code support?

The simple types in python are:

In [None]:
a = 1          # Integer
b = 1.0        # Floating point number, usually refered to as a 'float'
c = 1e0        # Again a float but in enginering notation
d = 1.0 + 2.0j # Complex numbers, which follow enginering and use a 'j' (boo!)
e = True       # Boolean which just means True or False
f = 'Hello'    # Strings which just means text
g = None       # None type, a special type which just means no type or 'null'

Python changes types dynamicaly in the backgound so you can do the following without any problem

In [None]:
x = 1
x = 'abc'
x = True
x = 5e12

This also protects you from some gotchas from other explicit languages like overrunning the allowed range for say a 32bit integer (which runs from -2,147,483,647 to 2,147,483,648 so two billion plus one billion ends up negative). Python just allocates more and more memory to cope with this,

There are all the usual standard operations for use with numbers

In [None]:
x = 5
y = 2
print(x + y)  # Addition
print(x - y)  # Subtraction
print(x * y)  # Multiplication
print(x / y)  # Division
print(x // y) # Floor function
print(x % y)  # Modulus
print(x ** y) # Exponentiation

Note: exponentiation is NOT `^` which is bitwise XOR.  Bitwise operators {`&`,`|`,`^`,`<<`,`>>`,`~`} operate on integer's binary parts directly and I have never seen, nor can imagine, a use for them in data science so you can forget them.  They are mostly used for very low level data processing.

All the operations have a short version for self assignment as it's so common. So instead of `x = x + y`  we can use `x += y`.  All of the operators support this.

There are a couple of things to watch out for.  Firstly division upcasts integers to floats.  This can be overcome by forcing the cast to `int` or using floor division. 

In [None]:
print(4/2)
print(4//2)
print(int(4/2))

And, as with all computing, operations on floats are NOT exact

In [None]:
print(0.1 + 0.2 == 0.3)

So tests for floats should always use a tolerance, eg:

In [None]:
abs(0.1 + 0.2 - 0.3) < 1e-10

\+ and \* even work as you would expect with strings

In [None]:
a = 'hello '
b = 'world'
print(a+b)
print(3*a)

But not any of the other operations.  For operations on strings we instead have lots of functions.  How do we find out information on them?


## Getting help

Rather than list all possible function for every data type I will instead show you how to find the information for yourself.  Firstly, as with all coding, googling is always a good idea and the stackoverflow website is particularly heplful.  

However, ipython and jupyter both have someting called tab completion (which we already met in our section on the terminal) which is very handy here (and just for typing code in general).   Type the following and then push `tab` (cells share variables so this one knows that `a = hello` already)

In [None]:
a.

This brings up a list of all possible functions that can be applied to the string object `a`.  We should quickly pause and make note of the structure of python.  Python is an 'object oriented' code meaning that is it organised around objects rather than actions.  This presents itself mainly in that when we want to perform an action on an object `a` rather than typeing `action(a)` we type `a.action()` with the `.` binding the action to the object.  There will still be lots of times when we do end up writing `action(a)` for functions that can take multiple inputs but to find the oprtations that explicitly work on `a` will all be of the format `a.thing()`   Now bak to where we were.

Say we want to choose the first `capitilize` try pushing tab after each of the following

In [None]:
a.c
a.ca
a.cap

The last will auto complete to `a.capitalize`.  We can likely guess what it will do but python has another excellent feature, `?`, which provides documentation on what commands do.  Try

In [None]:
a.capitalize?

You can even go further and use `??` which brings up the documentation and the python code behind the command.  This is mostly useful for your functions as most inbuild python functions are not written in python so it only brings up the documentation just like `?`.

Now to use it we must type

In [None]:
a.capitalize()

These two features make python particularly easy to work with as you can easily find out all possible operations and how to use them from the command line as you code!

Finally, one function which is useful for strings is `.format` which allows you to replace parts of strings denoted by `{}` with formated input.  It has lots of options but here are some examples

In [None]:
str1 = 'Pi is: {:.2f}'.format(3.14159265)
str2 = '{}, {}, {}'.format('a', 'b', 'c')
str3 = 'The complex number {0} is formed from the real part {0.real} and the imaginary part {0.imag}.'.format(d)
str4 = 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
str5 = 'list element 0: {0[0]};  list element 3: {0[3]}'.format(list)
str6 = '{:,}'.format(1234567890)
print(str1)
print(str2)
print(str3)
print(str4)
print(str5)
print(str6)

### Exercise

1. Calculate sin(0.1) using its taylor expansion
2. Print the result as a descriptive string stating the order expanded to and value to 5 decimal places

## Data structures

There are also 4 inbuild data structures in python

In [None]:
list = [1,2,3,'cat',True,3.1415]     
tuple = (1,2,3,'cat',True,3.1415)    
dict = {'one':1,'two':2,'three':3}   
set = {1,2,3,4,5}                    
print(list)
print(tuple)
print(dict)
print(set)

Lists are just that, lists of objects of any type, including lists.  Tuples are the same but they are immutable meaning you can't change them once created.  Why would you want something immutable? They are normaly used for returning results from functions like:

x = 0.125
x.as_integer_ratio()

Lists however you can change as much as you want.  List indexing is `0` based meaning that `0` gives the first entry

In [None]:
list[0]

and `3` will give the fourth element.

In [None]:
list[3]

You can also start from the other end using negative numbers

In [None]:
print(list[-1])
print(list[-3])

You can pick out sections using `:`

In [None]:
list[2:4]

leaving out the first or last part assumes you meant to the end

In [None]:
print(list[:3])
print(list[3:])

You can even walk through with a given step size

In [None]:
print(list[::3])
print(list[1:5:2])
print(list[::-1])

You can of course both change the entries and add more

In [None]:
list[3] = 'dog'
list.append(100)
print(list)

One other useful list function is `len(list)` which gives the length of the list. Try `list.<tab>` to see what else you can do.

In [None]:
list.

Strings are just lists of letters so all the same indexing works for them too.  However strings are immutable like tuples so the following doesn't work

In [None]:
a[2] = 's'

One final thing is that while you can have lists of lists these are NOT arrays.  Even if the following looks OK

In [None]:
mat = [[1,2,3],[4,5,6],[7,8,9]]
mat[1][2]

You can see that more complicated commands go a bit wrong (`:` just means 'all')

In [None]:
print(mat[1][:])
print(mat[:][1])

Any ideas what is happening? We will see later a much better way of handling arrays when we use the package `numpy`

Dictionaries are just like lists but their contents are indexed by their key (bit before the colon) rather than position

In [None]:
dict['two']

You can add new entires like so

In [None]:
dict['four'] = 4
print(dict)

Dictionaries keep no sense of order (which makes them efficent to search)

Sets are like lists but the items are unordered and unique.  You have the usual operators you would expect 

In [None]:
set2 = {4,5,6,7,8,9}
print(set & set2) # Intersection
print(set | set2) # Union
print(set - set2) # Difference
print(set ^ set2) # Symmetric difference

Finally there is one sneaky trap that comes with data structures which is visable in the following:

In [None]:
list1 = [1,'cat',3.1415927,dict]
list2 = list1
print(list2)
list1[1] = 'BOO!'
print(list2)

This is because the data objects we define are just addresses (or pointers) which label a location in memory. The detail of what is happening for list is actually rather complicated.  In creating `list1` the computer does the following (this is actually a simplification but will do for here):

![title](Plots/ListMemory.jpg)

Then when we assign `list2 = list1` all that happens is that `list2` is assigned to `addr1` too.  Then when we change element 1 via `list1[1] = 'BOO!'` we change `addr3` to `addr6` which will store the new string `'BOO!'` (`addr3` will now be released and cleaned up by python's garbage collector).  When we print list2 it goes to `addr1` and outputs what it finds which includes the new second element.  

This approach makes lists very flexable so they can store anything at all, even functions.  The idea of just pointing list2 to list1 saves memory.  If you want a copy of list1 assigned to list2 you have to do this: 

In [None]:
list2 = list1.copy()

## Control Flow

Now we come to the two basic building blocks of all code: commands that perform a test and commands that perform a loop.

First the control structure for performing a test is `if`. The format is:

`if` 'condition' `:`

`*spaces*` 'commands'

`elif` 'condition' `:`

`*spaces*` 'commands'

`else` 'condition' `:`

`*spaces*` 'commands'

The condition is anything that evaluates to a boolian value (`True` or `False`) and all the comparison operators are:



In [None]:
a == b # Equal
a != b # Not Equal
a < b  # less than
a > b  # greater than
a <= b # less or equal
a >= b # greater or equal

You can also use numbers, where 0 is `False` and all others are `True` also string and lists where empty ones are `False` and populated ones are `True`.  `None` types are always `False`.  

Here is an example which makes it pretty clear

In [None]:
x = 20

if x==0:
    print('x is zero')
elif x>0:
    print('x is +ve')
elif x<0:
    print('x is -ve')
else:
    print("Don't Know")

Note that in the last `print` we have used double quotes `"` so we can use single quotes `'` (an apostrophe) inside without an error.

This is the first time we see the python syntax in action.  Here are the basics:

- \# denote comments. Everything on the line after them is ignored
- indentation indicates a code block and always follows a `:` on the preceding line. Think of it like 'brackets'. The amount of indentation is up to you as long as you are consistent but 4 spaces is the convention.
- Spacing inside lines is irrelevent.
- Lines are terminated by a carrage return but can be continued by adding a `\` at the end or by having parentheses, `()`
- Multiple statements can be on the same line if you use `;` between them.

Here are examples of the last two

In [None]:
x=1; y=2
z = x + \
y
print(z)
z = (x + x +
     y + y)
print(z)

The second control structure, loops, is much more varied.  You have two basic commands `for` and `while`.  `for` has the format:

`for` 'iterator object' `:`

`*spaces*` 'do a bunch of stuff'

While has the format:

`while` 'condition' `:`

`*spaces*` 'do a bunch of stuff'

Let's see some examples:

In [None]:
list1 = [1,2,3,4,5]

print('For loop')
for item in list1:
    print(item)

print('\nWhile loop')
x=1
while x<6:
    print(x)
    x+=1
    


`for` loops are safer than `while` loops as they will generally end, if we had forgot the `x+=1` in the while loop it would go on forever (`\n` is a special character meaning newline, the other main one is `\t` meaning tab). 

The 'iterator object' part is actually pretty rich with alot of possibilities. An iterator is anything which can generate a sequence or list of objects.  Its value is that the list never needs to be created so saves memory.  Two of the most common are `range` and `enumerate`.  `range` just gives you numbers in order and `enumerate` returns the item and its index from a list.

In [None]:
list1 = ['apple','pear','banana','mango']

print('enumerate')
for index, item in enumerate(list1):
    print(index,':',item)

print('\nrange1')
for n in range(5):
    print(n) 

print('\nrange2')  b
for n in range(5,10):
    print(n)  

print('\nrange3')  
for n in range(0,12,3):
    print(n) 

Loops can be fine tuned with two further commands `continue` and `break`.  `continue` sends you to the next interation, ignoring the rest of the loop.  `break` ends the loop.  Here is an example with both:

In [None]:
n=0
while True:
    n+=1
    if n>30:
        break    
    if n%3==0:
        continue
    print(n)

Sometimes you might want to iterate over multiple lists at once.  Here you can use the function `zip` to join the list together:

In [None]:
xvec = [1,2,3,5,7,8,9]
yvec = [7,4,6,7,6,4,6]

for x,y in zip(xvec,yvec):
    print(x,y)

zip is useful for joining data, but it is also its own inverse:

In [None]:
xvec = [1,2,3,5,7,8,9]
yvec = [7,4,6,7,6,4,6]

z1 = zip(xvec,yvec)
print(*z1)
z1 = zip(xvec,yvec)
z2 = zip(*z1)
print(*z2)

This contains some subtilties.  We have to use `*z1` in print as `z1` is a generator for a list rather than an actual list (to save memory) and the `*` tells it to evaluate it.  Also generator objects can only be used once (we'll see why later), so after printing `*z1` we have to define it again before creating `z2`

## Functions

Now that we have the ability to write basic code we need a way create functions for frequently used operations.   There are two types of functions explicit ones and anonomous ones.  First here is how to define functions explicitly:

In [None]:
def add2(x,y):
    return x+2*y

print(add2(1,2))
print(add2('abc','def'))
print(add2([1,2],[3,4]))

Note that in the functions you don't define types for the input so here numbers, strings and lists are all fine as inputs.

You can also define default arguments in functions like this

In [None]:
def fibonacci(N, a=0, b=1):
    list = []
    while len(list) < N:
        a,b = b,a+b
        list.append(a)
    return list

print(fibonacci(5))
print(fibonacci(5,1))
print(fibonacci(5,2,3))

Note: the part `a,b = b,a+b` means compute the pair `b,a+b` then assign them to the pair `a,b`.

Sometimes you may want a function that can handle an unknown number of arguments, like computing statistics of numbers.  For this you can use the special case `*args` which takes multiple arguments:

In [None]:
def h_mean(*args):
    x = 0e0
    for item in args:
        x+= 1e0 / item
    x = len(args)/x
    
    return x

print(h_mean(5))
print(h_mean(5,6))
print(h_mean(5,6,7,8,9))   

You can do the same with dictionarys of unknown length. These are called as keyword arguments and are implemented with `**kargs`. For example, checking a list of planets to see which closest to a particular size.

In [None]:
def planet_check(radius,**planets):
    x = 1e100
    for key, val in planets.items():
        if abs(radius-val)<x:
            x=abs(radius-val)
            best = key
    print('The planet which is closest in size is '+best)
    return None

planets = {'Mercury':2440,'Venus':6052,'Earth':6378,'Mars':3396,'Jupiter':71492,'Saturn':60268,'Uranus':25559,'Neptune':24764}

planet_check(5000,**planets)
planet_check(10000,**planets)
planet_check(20000,**planets)

Sometimes we might want to use a small function once in a throw-away manner.  For these instances there are anonomous or 'lambda' functions.  These are mostly used as input to other functions.  The format is:

`lambda` 'input stuff' `:` 'output' 

Here are some examples

In [None]:
test1 = lambda x : x**2 + x + 1
test2 = lambda x,y,z : x*y+z
test3 = lambda x: x%3==0

print(test1(5))
print(test2(1,2,3))
print(test3(7))

One good example is non-trivial sorting, say for absolute size:

In [None]:
list = [-1,-2,-3,-4,-5,2,3,4,5,6,7,8]
list.sort()
print(list)
list = sorted(list,key=lambda x: abs(x))
print(list)

### Exercises

3. Construct a function which returns a list of prime numbers less than a given integer, N
4. Construct a function which returns a list of the first N terms in the Recaman's sequence (http://mathworld.wolfram.com/RecamansSequence.html)
5. Compute a list of the numbers which appear in both lists when they are both N items long 

## Comprehensions and Generators

Python contains some neat ways for creating lists the first is a 'list comprehension'.  This is a way to create a list from a rule rather than listing the objects directly.

Its format is the following:

`[`'object to go in list' `for` 'input' `in` 'list/generator' `if` 'condition'  `]`

Here are some examples to show you what is looks like:

In [None]:
print( [x**2 for x in range(10)]                            ) # Squares
print( [x for x in range(10) if x%2==0]                     ) # Evens
print( [x if x%2 else -x for x in range(10)]                ) # Evens -ve Odds +ve
print( [x**3 for x in [1,2,4,8]]                            ) # Cubes from a list
print( [(x,y) for x in range(5) for y in range(5) if x+y<4] ) # Create pairs with sum<4
print( [x**2 for x in [x**2 for x in range(10)]]            ) # Pointlessly complicated 4th power

Generators are very similar to list comprehensions but with two key differences
- List comprehensions create the list explicitly, generators are just recipies for creating lists.  This allows you to loop over lists without the memory overhead of actually creating them
- Generators are single use, once you use them they are gone.  This does mean you can pause them halfway then restart where you left off
A generator looks just like a list comprehension but with `()` rather than `[]`.  If we want the explicit list we can call `list()`.

In [None]:
list1 = [x**2 for x in range(10)]
gen1  = (x**2 for x in range(10))
print(type(list1))
print(type(gen1))
print(list1)
print(gen1)
print(list(gen1))

Here is an example of how you can pause them

In [None]:
gen1  = (x**2 for x in range(10))

for n in gen1:
    print(n)
    if n>20:
        break

print('*have a quick lie down*')

for n in gen1:
    print(n)


But you can't use them twice!

In [None]:
gen1  = (x**2 for x in range(10))

for n in gen1:
    print(n)

print('And again?')

for n in gen1:
    print(n)

Some generators you may want to use may be to complex for a single line.  In this case we can create a generator function.  This is just a normal function but with the command `yield` in it. `yield` can be in the function multiple times and whenever we encounter it we return the value next to `yield`.  Here is an example:

In [None]:
def gen1():
    a = 'who'
    b = 'let'
    c = 'the'
    d = 'dogs'
    e = 'out,'
    for i in range(3):
        yield a
        yield b
        yield c
        yield d
        yield e
        for j in range(3):
            yield a
        yield '?\n'

G1 = gen1()
print(*G1)
    

### Examples

6. Create a list of all pairs of factors (as tuples) of 362880 using list comprehension.
7. Write a generator function for a random walk, step size 1, which is equally likely to go up or down.  End the generator when you have total displacement of 10 steps (you will need a random number generator like `random.randint(a,b)` which gives a random integer between a and b inclusive, you will need to the line `import random` in order to use it)

## Magic commands

Python also has a number of 'magic' commands which are preceded by a % or %%. These are for use in the interpreter (ipython or jupyter) to do useful things outside of coding.  There are load of them (try `%magic` which lists them all) but here are some of my favorites:

In [None]:
%whos # list of all assigned variables
%history -n 1-4 # list commands from prompts 1-4
%run filename.py # runs the python script filename.py
%timeit # times one line of code
%%timeit # times multiple lines of code
%debug # opens a debugger where an exception was raised
%rerun # rerun previously entered commands (can specify range)
%reset # delete all varible and definitions
%save # save some lines to a specified file

Play around with them, we will come back to these when we get to the section on performance.