# SWC bootcamp

### Reasons to use Python:
* **10x faster ** than MATLAB
* Easy interface with C code through Cython
* Highly object oriented
* Many packages designed for data science (numpy, scipy, matplotlib)
* Free, no license

## 1. Basic arithmatics

### Integer division

In [None]:
a = 1/2
b = 1/2.
c = float(1)/2
print a
print b
print c

### Power

In [None]:
print 2**3
print 2**3.

### the += operation

In [None]:
a = 1
print a
a += 1
print a


In [None]:
a =+ 1
print a

### Results of last execution statement

In [None]:
print _

In [None]:
1+1

In [None]:
print _

### `math` module and `dir()`

In [None]:
import math
print dir(math)

In [None]:
print math.e
print math.exp(123)
help(math.trunc)

But if you don't what to use the `math.` prefix, then you could do

In [None]:
from math import *
print e, pi

## 2. Types

### Numerical types
There are three numerical types:
   * integers
   * floating point numbers
   * complex floating point numbers
   
We can figure the type using the `type()` function, similar to `class` in MATLAB

#### Integers

In [None]:
an_integer = 123
a_float = float(an_integer)
print type(an_integer)
print type(a_float)

#### floating points

In [None]:
a_string = "123.123"
a_float = float(a_string)
print a_float

complex

In [None]:
c = 1 + 3j
c_real = c.real
c_imag = c.imag
print c
print c_real
print c_imag
print type(c)
print c.conjugate()

import cmath
print cmath.sqrt(-1)

### Sequences
Common sequenses include:
    * String
    * List
    * Tuples
    * Dictionary

#### Strings
A string is a (immutable) sequence of characters.

In [None]:
a_string = '123'
an_integer = int(a_string)
another_string = str(an_integer)
print an_integer
print another_string
a_string[0] = '2'

In [None]:
a_string = 'The Great Gatsby'
print a_string[1]
print a_string[1:2]

print a_string[:3]
print a_string[0:3]

print a_string[-6:]


string concatenation is very simple in Python

In [None]:
s1 = 'The Great'
s2 = ' Gatsby'
s3 = s1 + s2
print s3

Usually, it would be useful to include data into your string so that it is more readable

In [None]:
print 'this is a long pi ', math.pi
print 'this is a short pi %10.3f' % math.pi

The format specifier of type `%W.Df` means that a Float should be printed with a total Width of W
characters and D digits behind the Decimal point while printed digits are right-aligned

Other useful examples:

In [None]:
print " Pi = %10.2f" % math.pi
print " Pi = %10.5f" % (math.e * 10)
print " Pi = %10.3e" % (math.pi * 100.)
print " Pi = %10d" % math.pi

#### Lists

creating a list

In [None]:
string_list = ['The', 'Great', 'Gatsby']

a_range = range(10)
print a_range

a_range = range(0,10,2)
print a_range

a_list = [1]*3
print a_list

slicing

In [None]:
print string_list[0]
print string_list[0:1]

print string_list[-1]
print string_list[-1:]

Can have a nested list

In [None]:
a_nested_list = [1,[1,2,3],2,3]
print a_nested_list[1][2]

append to a list is constant time, so in Python there is no need to preallocate memory

In [None]:
list = []
for i in range(3):
    list.append(i)
print list

#### Tuples
Tuples behave very similar to lists except that they are immutable. Usually people think that tuples have *structure* while lists have *order*, but it doesn't matter so much to me...

In [None]:
a_tuple = 1,2,3
a_list  = [1,2,3]
a_list[0] = -1
print a_list
a_tuple[0]=-1

Tuples and lists does not have to contain data of the same type

In [None]:
a_tuple = [1,'2',3.0]
print a_tuple

In [None]:
another_tuple = tuple(a_list)
another_list = list(a_tuple)
print another_tuple
print another_list

Tuples can be used to make multiple assignment at the same time

In [None]:
a,b = 1,2
print a, b
a, b = b, a
print a,b

#### Useful functions for lists


In [None]:
print len(string_list)

s1 = '_'.join(string_list)
print s1

s2 = s1.split('_')
print s2

### Dictionaries

Very useful if data/function/objects has semantic meaning. Dicts are unordered, so cannot be indexed by interegers

There are many ways to build a dictionary. The `zip()` function can be very useful

In [None]:
d = {}
d['age'] = 18
d['name'] = 'Kevin'
d2 = dict(age = 18, name='Kevin')
d3 = dict([('name','Kevin'),('age',18)])
d4 = dict(zip(['name','age'],['Kevin',18] ))
print d
print d2
print d3
print d4

#### Some useful functions with dicts

In [None]:
print d.keys()
print d.values()
print d.has_key('1')

### Compreshensions

In [None]:
a = [i**3+2**2+4 for i in range(10)]
print a

In [None]:
d = dict( [(str(k),i**3+2**2+4) for k in range(10)] )
print d

In [None]:
nested = [ [1,2,3],[4,5,6] ]
single = [v for l in nested for v in l]
print single

### Loops and conditions

To iterate through a list:

    for i in list:
    for i in range(len(list)):
    for i, v in enumerate(list):
    
Imagine that we have a list of numbers that represent different measures of a classifier's performance

In [None]:
data = [0.9, 0.1, 0.34, 0.123]
for d in data:
    print d 

In [None]:
names = ['acc', 'err', 'dp', 'loss']
data = [0.9, 0.1, 0.34, 0.123]
for i in range(len(data)):
    print names[i] + ' is ' + str(data[i])

In [None]:
for i, v in enumerate(data):
    print names[i]+ ' is ' + str(v)

To iterate through a dict:

    for k in list:
    for k in list.keys():
    for v in list.values():
    for k, v in list.iteritems():
    
Imagine that we have a list of numbers that represent different measures of a classifier's performance

In [None]:
d = dict(zip(names, data))
for k in d.keys():
    print k, 'is', d[k]

In [None]:
d = dict(zip(names, data))
for k, v in d.iteritems():
    print k, 'is', v

Examples of the `while` loop

In [None]:
counter = 0
while counter < 5:
    counter += 1
    print counter


In [None]:
from random import random
finished = False
while finished is not True:
    r = random()
    if r < 0.1:
        finished = True
    print r

## Assignments, Identities and equalities
In Python, one needs to be careful about creating copies. Behaviours differ depending on their types. Generally, when assigning a numerical using `=`, the number gets copied, but when assigning a container, then something else happens... 

In [None]:
x = y = z = 1
print x,y,z

In [None]:
x += 1
print x,y,z

In [None]:
a = [1,2,3]
b = a
a[1] = 10
print b

When assigning a variable to an object, Python creates this object and a variable that points to it. In essence, x is not 1, but a reference to it. One can think of x as a box that contains the object 1.

When the object is numerical, assigning another variable to it creates a copy of the object that is contained by the new object. However, things change of a list, dict and many other non-numerical objects). Assigning another variable does not create a copy but just create another box that contains the same object.

Not sure about what this means? Let's try the `id()` function. It indicates where in the memory the object reference by the variable

In [None]:
print id(x), id(y), id(z)
print id(a), id(b)

One of the correct ways to copy a list is through sliceing with `:`

In [None]:
a = [1,2,3]
a_copy = a[:]
a_copy[1] = 10
print a, a_copy
print id(a), id(a_copy)

If two variables share the same id and this object is not numerical, then any change made through one variable will affect the results processed through another variable that points to the same object.

**One should be very careful about constructing a container of objects.** If lists are created by comprehension and the contents are non-numerical, then you are creating copies of the same thing!

In [None]:
list_container = [[]]*3
list_container[0].append(1)
print list_container

list_container_2 = []
for i in range(3):
    list_container_2.append([])
list_container_2[0].append(1)
print list_container_2

Given above, we learned that variables (name as appeared) can point to different objects (internally stored). Therefore, equality and identity between variables are different in Python. 
* Equality: references that point to the **objects that share the same value**
* Identity: references that point to the **same object in memory**/id

#### Numerical

In [None]:
x = y = z = 1
print id(x), id(y), id(z)
print x == y == z
print x is y is z

But when you give a reference back the same value, it will then be pointing to the original object.

In [None]:
x = 2
print id(x), id(y), id(z)
x = 1
print id(x), id(y), id(z)

#### Other objects

In [None]:
a = [1,2,3]
b = a
a_copy = a[:]
print a == b == a_copy
print a is b
print b is a_copy

The most common way of creating a copy is through the `copy` module, but can be slow

In [None]:
import copy
%timeit a_copy = copy.copy(a)
%timeit a_copy = a[:]

## Functions

We are going to introduce two types of functions: standard function and lambda functions (inline)

In [None]:
def times2(a):
    '''double the input
    b = times(a) is equivalent to 2.0*a
    '''
    return a*2.0
print times2(2)

In [None]:
help(times2)

Below is a simple recursive definition of a function that returns the $a$'th number in the Fibonacci sequence

In [None]:
def Fib(a):
    """compute the a'th number in the Fibonacci sequence"""
    if a == 0:
        return 0
    elif a == 1:
        return 1
    else:
        return Fib(a-1) + Fib(a-2)
print Fib(5)
print [Fib(i) for i in range(10)]

We can also define functions with more than one outputs

In [None]:
def left_right(a):
    return a-1, a+1
left, right = left_right(2)
print left, right

print left_right(2)[0]
left, _ = left_right(2)
print left

In [None]:
times3 = lambda x: x * 3.0
times3(2)

Python has some built-in functional tools that are very useful.
* map
* filter
* reduce

#### `map`
The map function `st2 = map(f, s )`applies a function f to all elements in a sequence `s`. The return
value `lst2` of `map` is a list and has the same length as `s`

In [None]:
import string
map( string.capitalize , ['banana', 'apple' , 'orange'])

In [None]:
map( lambda x : x ** 2 , range (10) )

`map` and list comprehensions seem to be very similar, but see below and many other sources that explain their differences. Some general observations
* lists and maps are usually faster than for loops
* map introduces new scope inside the `lambda` so is usually safer
* list comprehensions can be slightly faster than map

In [None]:
print map(lambda x,y: x+y, [1,2,3],[4,5,6])
print [x + y for x, y in zip([1,2,3],[4,5,6])]

In [None]:
%timeit y = [ x ** 2 for x in range (10) ]
%timeit y = map( lambda x : x ** 2 , range (10))

#### `filter`

The `filter` function `lst2 = filter( f, lst)` applies the function `f` to all elements in a sequence `s`.
    The function `f` should return `True` or `False`. The return value `lst2` is a list which will contain only those elements `si` of the sequence `s` for which `f(si)` has return True.

In [None]:
def greater_than_5 ( x ):
    if x > 5:
        return True
    else :
        return False
filter(greater_than_5 , range(11))

filter (lambda x : x > 5 , range(11))

In [None]:
known_names = [ 'kevin' , 'wittawat']
filter(lambda name: name in known_names, ['smith', 'kevin'])

## File I/O

### Check for existence

In [None]:
import os
print os.path.exists('my_first_file.txt')
print os.path.isfile('my_first_file.txt')

### Write to and read from a text file

In [None]:
f = open('my_first_file.txt', 'w')
f.write('Hello world!')
f.close()

!type my_first_file.txt

In [None]:
f = open('my_first_file.txt', 'r')
line = f.readline()
print line
f.close()

In [None]:
f = open('animals.txt', 'w')
for animal in ["Animal\tFood","Sloth\tLeaves", "Chicken\tCorn", "Ant_eater\tAnts", \
               "Penguin\tFish", "Armadillo\tIce_cream"]:
    f.write("%s\n" % animal)
f.close()
!type animals.txt

In [None]:
f = open("animals.txt", "r")
lines = f.readlines()
print lines
len(lines)
f.close()

Because the entire file is first read into memory, this can be slow or unfeasible for large files.

Processing each line in a file is such a common operation, Python provides the following simple syntax


In [None]:
f = open("animals.txt", "r")
for line in f:
    print(line.strip())
f.close()

It is very easy to forget to close a file, so use the following way to open a file.

In [None]:
with open("animals.txt", "r") as infile:
    for line in infile:
        print(line.rstrip())

In [None]:
import os
with open("animals.txt", "r") as infile:
    with open("animals.csv", "w") as outfile:
        for line in infile:
            outfile.write(",".join(line.split()))
            outfile.write(os.linesep)  # Writes \n for us!
                
# Reads the csv file with double lines. Need to remove both lines
with open("animals.csv", "r") as infile:
    for line in infile:
        print(line.rstrip())

### Multiple files, the `glob` module

When dealing with multiple files, one often puts the directories of the files in a list and iterate over them. Example of these files could be the log of many experiments that have the same file extension/structure and need to be aggregated for data processing

In [None]:
import glob
text_files = glob.glob("*.txt")
for t in text_files:
    print(t)

### Pickle

The pickle module and it's more efficient cPickle version provide two functions, dump() and load(), that allow writing and reading arbitrary Python objects (doesn't work for C objects)



In [None]:
from cPickle import dump, load
l = ["a", "list", "with", "stuff", [42, 23, 3.14], True]
with open("my_list.pkl", "w") as f:
    dump(l, f)
with open("my_list.pkl", "r") as f:
    l = load(f)
l