#Chapter 0 - Appendix

Abstract:  
+ 


##Python essentials - from the Appendix

Some basic stuff

In [3]:
import numpy as np
import pandas as pd

In [16]:
array = range(10)
less, greater = [], []
for x in array:
    if x < 5:
        less.append(x)
    else:
        greater.append(x)

less, greater

([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])

Data are not copied by referenced by different variable names.  
If modify one reference, other will change too.
    

In [18]:
a = [1,2,3]
b = a
a.append(4)
b

[1, 2, 3, 4]

Functions: pass by reference

In [20]:
def append_element(some_list, element):
    some_list.append(element)
data = [1,2,3]
append_element(data, 4)
data

[1, 2, 3, 4]

Strong type: every object has a specific tyep/class

In [22]:
a = 4.5
b = 2
print 'a is %s, b is %s' % (type(a), type(b))
a/b 

a is <type 'float'>, b is <type 'int'>


2.25

Test if an object is specific type: `isinstance(object, type/tuple of types)`

In [24]:
isinstance(a, float)

True

In [26]:
isinstance(a, (float, str, int))

True

Attribute: access using `getattr, hasattr`, and `setattr` functions.  

`iter` function: return an iterator based on the object.

In [27]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

In [34]:
isiterable([1,2,3]), isiterable('a'), isiterable(5)

(True, True, False)

A function to convert iterable object into a list:

In [38]:
if not isinstance(x, list) and isiterable(x):
    x = list(x)

Some binary operations:

In [39]:
a = [1,2,3]
b = a
c = list(a)

In [40]:
a is b

True

In [41]:
a is not c

True

`a ^ b`: a or b is true but not both; for int, bitwise EXCLUSIVE-OR

Strings and tuples are immutable.  
Sometimes need to convert types.

In [43]:
3 / 2

1

In [44]:
3 / float(2)

1.5

In [46]:
from __future__ import division
3 / 2

1.5

Some strings:

In [47]:
a = 'python'
list(a)

['p', 'y', 't', 'h', 'o', 'n']

In [48]:
a[:3]

'pyt'

Backslash \ - escape character: specify special characters  
adding `r` at the beginning means the characters should be interpreted as is:

In [53]:
s = r'this\has\no\special\characters'
s

'this\\has\\no\\special\\characters'

String template: using %

In [54]:
template = '%.2f %s are worth $%d'
# number with 2 decimal places, string, and integer
template % (4.5560, 'Argentine Pesos', 1)

Datetime:

In [59]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day, dt.minute

(29, 30)

Extract date and time from datetime object

In [62]:
dt.date(), dt.time()

(datetime.date(2011, 10, 29), datetime.time(20, 30, 21))

Using `strftime` method to format datetime into string:

In [63]:
dt.strftime('%m/%d/%Y %H:%M')

'10/29/2011 20:30'

`striptime` does the oppisite:

In [65]:
datetime.strptime('20091031', '%Y%m%d')

datetime.datetime(2009, 10, 31, 0, 0)

datetime of same format could be compared and produce `timedelta`

In [70]:
dt.replace(minute=0, second=0)
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta

datetime.timedelta(17, 7179)

In [71]:
dt + delta

datetime.datetime(2011, 11, 15, 22, 30)

Control Flow:  
+ if, elif, else
+ for, continue, break
+ pass
+ try, except, else, finally
+ range, xrange(one at a time, saves space)
+ ternary expression: 
`value = true-expr if condition else`  
`false-expr`


In [72]:
x = 5
'Non-negative' if x >= 0 else 'Negative'

'Non-negative'

##Data structure and sequences

**Tuple** - 1D, fixed length, immutable

In [75]:
tup = 4,5,6
tup

(4, 5, 6)

In [76]:
nested_tup = (4,5,6), (7,8)
nested_tup

((4, 5, 6), (7, 8))

Sequence or iterator could be converted into tuple:

In [77]:
tuple([2,0,4])

(2, 0, 4)

In [78]:
tuple('string')

('s', 't', 'r', 'i', 'n', 'g')

**Unpack tuples** - tuples will be unpacked if assigned to tuple-like expression of variables

In [81]:
tup = 4,5,(6,7)
a, b, (c,d) = tup
c

6

Could be useful for iterating over sequence of tuples/lists:  
Also useful for returning mulitple values in function

In [82]:
seq = [(1,2,3), (4,5,6), (7,8,9)]
for a,b,c in seq:
    print (a+b)*c

9
54
135


`tuple.count()` could count the presence of specific value

In [83]:
a = (1,2,3,4,5,4,2,3,1,2,3,4,2,1)
a.count(2)

4

**List** - 1D, variable length, changable

In [None]:
list.append(value)
list.remove(value)
list.insert(position, value)
list.pop(position) # remove and return the value at certain position
value in list 
list_1 + list_2
list.extend([list of values]) # extend is faster than concatenation
list.sort(key=function)

Binary search -   
`bisect`: finds the location where an element should be inserted into a sorted list to keep it sorted  
`insort`: insert the element into the place

In [84]:
import bisect
c = [1,2,2,2,3,4,7]
bisect.bisect(c,2)

4

In [86]:
bisect.bisect(c,5)

6

In [88]:
bisect.insort(c,5)
c

[1, 2, 2, 2, 3, 4, 5, 5, 7]

In [None]:
list[start:stop:step]
list[start:stop:-1] # reverse a list

Some built-in functions: 
+ enumerate: returns position and value  

In [89]:
list_1 = ['aaa', 'bbb', 'ccc']
for i, value in enumerate(list_1):
    print i, value

0 aaa
1 bbb
2 ccc


In [91]:
mapping = dict((v,i) for i,v in enumerate(list_1))
mapping

{'aaa': 0, 'bbb': 1, 'ccc': 2}

+ sort : works for both numbers and strings

In [92]:
sorted([1,4,5,2,4,7,3])

[1, 2, 3, 4, 4, 5, 7]

In [93]:
sorted('things are getting interesting')

[' ',
 ' ',
 ' ',
 'a',
 'e',
 'e',
 'e',
 'e',
 'g',
 'g',
 'g',
 'g',
 'h',
 'i',
 'i',
 'i',
 'i',
 'n',
 'n',
 'n',
 'n',
 'r',
 'r',
 's',
 's',
 't',
 't',
 't',
 't',
 't']

Combine `sorted` with `set` to get a list of unique elements:

In [94]:
sorted(set('things are getting interesting'))

[' ', 'a', 'e', 'g', 'h', 'i', 'n', 'r', 's', 't']

+ zip: pairs up the elements of objects to create tuples  
takes any numbers of objects, product depends on shortest one

In [98]:
seq1 = [1,2,3,4]; seq2 = ['a', 'b', 'c', 'd']
zip(seq1, seq2)

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

In [102]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print ('%d: %d, %s' % (i, a, b))

0: 1, a
1: 2, b
2: 3, c
3: 4, d


Also `zip` could be used to unzip a zipped object:

In [106]:
zipped = zip(seq1, seq2)
a, b = zip(*zipped)
a, b

((1, 2, 3, 4), ('a', 'b', 'c', 'd'))

+ `reversed`

In [109]:
list(reversed(a))

[4, 3, 2, 1]

**Dict** : key-value pairs, changable, keys must be hashable

In [111]:
empty_dict = {}
d1 = {'a': 123, 'b':[1,2,3,4]}
d1

{'a': 123, 'b': [1, 2, 3, 4]}

In [113]:
d1[7] = 'seven'
d1

{7: 'seven', 'a': 123, 'b': [1, 2, 3, 4]}

Values could be deleted using `del dic[]` or `dict.pop()`

In [118]:
d1[5] = 'five'
d1['dummy'] = '10101'
d1

{5: 'five', 7: 'seven', 'a': 123, 'b': [1, 2, 3, 4], 'dummy': '10101'}

In [119]:
del d1['dummy']
d1

{5: 'five', 7: 'seven', 'a': 123, 'b': [1, 2, 3, 4]}

In [125]:
d1.pop('a')

123

In [126]:
d1

{7: 'seven', 'b': [1, 2, 3, 4]}

In [128]:
d1.keys(), d1.values()

(['b', 7], [[1, 2, 3, 4], 'seven'])

Dicts could be merged using `dict.update`

In [130]:
d1.update({'c':'ccc', 'd':'ddd'})
d1

{7: 'seven', 'b': [1, 2, 3, 4], 'c': 'ccc', 'd': 'ddd'}

In [132]:
mapping = {}
key_list = ['a', 'b', 'c', 'd']
value_list = range(4)
for key, value in zip(key_list, value_list):
    mapping[key] = value
mapping

{'a': 0, 'b': 1, 'c': 2, 'd': 3}

`dic.get(key, defaul_value)` get the value of the key, or returns default value if key is not in dict.

In [139]:
value = mapping.get('a', 'NA')
value
mapping

{'a': 0, 'b': 1, 'c': 2, 'd': 3}

In [152]:
words = ['abc', 'bca', 'abb', 'cba', 'bca']
by_letter = {}
for word in words:
    letter = word[0]
    # print letter, word
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter

Use **`defaultdict`** to make things easier: 

In [154]:
from collections import defaultdict
d = defaultdict(list)
for word in words: d[word[0]].append(word)
d

The initialization `defaultdict()` takes any function, and if we want default value to be 4, can use lambda returning 4:

In [174]:
four = defaultdict(lambda: 4)

Also **`.setdefault`** could help

In [172]:
by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['abc', 'abb'], 'b': ['bca', 'bca', 'bca', 'bca', 'bca'], 'c': ['cba']}

**Set** : unordered, unique elements

In [178]:
a = {1,2,3,4,5}
b = {3,4,5,6,7}
a^b #xor

{1, 2, 6, 7}

In [179]:
{1,2,3}.issubset(a)

True

In [180]:
a.issuperset({1,2,3})

True

In [181]:
a.isdisjoint(b)

False

In [None]:
set.add(element), set.remove(element), set.difference(set_2)

##List comprehension
List comprehension:  `[expr for val in collection if condition]`  
Dict comprehension:  `dict_comp = {key-expr: value-expr for value in collection if condition}`  
Set comprehension:  `set_comp = {expr for value in collection if condition}`

In [184]:
# list comp
strings = ['a', 'ab', 'abc', 'abcd', 'abcde']
[str for str in strings if len(str) >= 2]

['ab', 'abc', 'abcd', 'abcde']

In [186]:
# set comp
unique_length = {len(x) for x in strings}
unique_length

{1, 2, 3, 4, 5}

In [188]:
# dic comp
loc_mapping = {val: index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'ab': 1, 'abc': 2, 'abcd': 3, 'abcde': 4}

In [191]:
# similar to previous cell
loc_mapping = dict((value, index) for index, value in enumerate(strings))
loc_mapping

{'a': 0, 'ab': 1, 'abc': 2, 'abcd': 3, 'abcde': 4}

Nested list comp example:

In [192]:
all_data = [['Tom', 'Billy', 'Andrew'], ['Susie', 'Casey', 'Ana']]

In [198]:
# names with one or more a's : double loop
two_e = [name for names in all_data for name in names if (name.count('a') > 0 or name.count('A') > 0)]
two_e

['Andrew', 'Casey', 'Ana']

In [200]:
some_tuples = [(1,2,3),(4,5,6),(7,8,9)]
flatterned = [x for tup in some_tuples for x in tup]
flatterned

[1, 2, 3, 4, 5, 6, 7, 8, 9]

The order of `for` expressions will be in same order if written in for loop!

In [201]:
# this is a list comp containing list comp, which is totally different from above
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

##Functions
+ `def` and `return`, if no `return`, None will be returned.  
+ variables within a function is saved in local namespace
+ return multiple values - tuple

In [225]:
import re # regular expression
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!?#]', '', value)
        value = value.title()
        result.append(value)
    return result

In [226]:
strings = ['    Alabama', 'Georgia!', 'FloRIDa', 'sounth     Carolina#']
clean_strings(strings)

['Alabama', 'Georgia', 'Florida', 'Sounth     Carolina']

In [227]:
def remove_punctuation(value):
    return re.sub('[!?#]', '', value)

In [228]:
clean_ops = [str.strip, remove_punctuation, str.title]

In [229]:
def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result


In [212]:
clean_strings(strings, clean_ops)

TypeError: title() takes no arguments (1 given)

+ The function `map` applies a function to all elements in an object

In [220]:
map(remove_punctuation, strings)

['    Alabama', 'Georgia', 'FloRIDa', 'sounth     Carolina']

+ Anonymous function (lambda)

In [223]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
strings.sort(key=lambda x:len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

+ Closure: functions that return functions

In [230]:
def make_watcher():
    have_seen = {}
    
    def has_been_seen(x):
        if x in have_seen:
            return True
        else:
            have_seen[x] = True
            return False
    return has_been_seen

In [231]:
watcher = make_watcher()
vals = [5,6,1,5,6,3,1,5]

In [232]:
[watcher(x) for x in vals]

[False, False, False, True, True, False, True, True]

Don't really know how this works...

In [233]:
def make_counter():
    count = 0
    def counter():
        count[0] += 1
        return count[0]
    return counter
counter = make_counter()

Yet another example:

In [235]:
def format_and_pad(template, space):
    def formatter(x):
        return((template % x).rjust(space))
    return formatter


In [237]:
fmt = format_and_pad('%.4f', 15)
fmt(2.1234567)

'         2.1235'

##Extended call syntax \*args, \**kwargs
function receives: tuple `args` of positions, and dic `kwargs` of keyword arguments

In [239]:
def hi(f, *args, **kwargs):
    print 'args is', args
    print 'kwargs is', kwargs
    print("Hello! Now I'm going to call %s" % f)
    return f(*args, **kwargs)
def g(x, y, z=1):
    return (x+y)/z

In [240]:
hi(g, 1,2,z=5)

args is (1, 2)
kwargs is {'z': 5}
Hello! Now I'm going to call <function g at 0x000000000A7516D8>


0.6

In [242]:
hi(g, 1,2,6)

args is (1, 2, 6)
kwargs is {}
Hello! Now I'm going to call <function g at 0x000000000A7516D8>


0.5

##Currying-partial argument application
`partial` function in `functools` will do the job

In [244]:
from functools import partial
def add_numbers(x,y):
    return x+y
add_five = partial(add_numbers, 5)
add_five(6)

11

##Generators
simple way to construct new iterable object  
use `yield` instead of `return` to create a generator, a sequence of value will be returned only upon request

In [248]:
some_dict = {'a':1, 'b':2, 'c':3}
dict_iterator = iter(some_dict)
dict_iterator

<dictionary-keyiterator at 0xa765ef8>

In [249]:
list(dict_iterator)

['a', 'c', 'b']

In [254]:
def squares(n=10):
    print 'Generating squares from 1 to %d' % (n**2)
    for i in xrange(1, n+1):
        yield i**2
gen = squares()
gen

<generator object squares at 0x000000000A7A2AF8>

In [255]:
for x in gen:
    print x,

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100
