# The Collections Module

The collections model contains a number of not-quite-builtin collection types that are nonetheless used very frequently.

In [1]:
import collections

## namedtuple

tuples are nice, but it's sometimes hard to remember the meaning of each position. `namedtuple` lets you refer to tuple positions by name as well as position.

In [2]:
Point = collections.namedtuple('Point', 'x y')
pt = Point(2, 3)
pt

Point(x=2, y=3)

You can retrieve values from a `namedtuple` by index or by value:

In [3]:
print 'by index:', pt[0]
print 'by name:', pt.x

by index: 2
by name: 2


You can also easily convert between a `dict` and a `namedtuple`:

In [4]:
dict(pt._asdict())

{'x': 2, 'y': 3}

In [12]:
dct = {'x': 5, 'y': 9}
Point(**dct)    # equivalent to Point(x=5, y=9)

Point(x=5, y=9)

In [9]:
Point.mro()

[__main__.Point, tuple, object]

## OrderedDict

`OrderedDict` provides a dict-like object that remembers the order of its keys (generally, the order of keys and values in `dict` objects are unstable).

In [13]:
od = collections.OrderedDict()
od['first'] = 1
od['second'] = 2
od['third'] = 3
od

OrderedDict([('first', 1), ('second', 2), ('third', 3)])

`namedtuple._asdict()` actually returns an `OrderedDict` since `namedtuple`s are, in fact, ordered:

In [14]:
pt._asdict()

OrderedDict([('x', 2), ('y', 3)])

## defaultdict

`defaultdict` provides a `dict` subclass that is never missing a key. To use it, you supply a "default factory" function which the object will return (and set) when you try to look up a missing key:

In [15]:
def default_factory():
    return 'NotFound'

dd = collections.defaultdict(default_factory)
dd['x'] = 1
dd['y'] = 2
dd

defaultdict(<function __main__.default_factory>, {'x': 1, 'y': 2})

In [16]:
dd['z']

'NotFound'

In [17]:
dd

defaultdict(<function __main__.default_factory>,
            {'x': 1, 'y': 2, 'z': 'NotFound'})

In [20]:
seq = range(10)
even_odd = collections.Counter()
# even_odd = {}
for x in seq:
    if x % 2 == 0:
        even_odd['even'] += 1
    else:
        even_odd['odd'] += 1
even_odd

Counter({'even': 5, 'odd': 5})

`defaultdict` is often useful when performing aggregations. For instance, we might have a list of names and phone numbers, and want to collect the phone numbers for an individual:

In [18]:
raw_data = [
    ('Rick', '111-222-3333'),
    ('Kelby', '444-555-6666'),
    ('Rick', '777-888-9999')
]
grouped = collections.defaultdict(list)    # list() returns an empty list

In [19]:
for name, number in raw_data:
    grouped[name].append(number)
print grouped

defaultdict(<type 'list'>, {'Kelby': ['444-555-6666'], 'Rick': ['111-222-3333', '777-888-9999']})


# Functional Programming

In `defaultdict`, we saw an example of passing a function as a parameter to another function. In Python, functions are *first-class objects*, meaning that you can use them wherever you can use other objects. Using a function as a "factory" parameter for `defaultdict` is one example.

Python provides three useful builtin functions (`map`, `filter`, and `reduce`) for functional programming, and one keyword `lambda`.

## lambda

The `lambda` keyword allows you to define simple, single-expression functions as an expression:

In [21]:
double_me = lambda x: x * 2
double_me(6)

12

`lambda` is especially useful when used as a parameter to a function:

In [22]:
dd = collections.defaultdict(lambda: 'NotFound')
dd['x'] = 1
dd['y'] = 2
dd

defaultdict(<function __main__.<lambda>>, {'x': 1, 'y': 2})

In [23]:
dd['z']

'NotFound'

## map()

The `map()` builtin applies a function to each element of a sequence and returns a list of the results:

In [26]:
my_list = range(5)
my_list

[0, 1, 2, 3, 4]

In [27]:
map(lambda x: x**2, my_list)

[0, 1, 4, 9, 16]

We can also apply `map` to multiple sequences with a multi-parameter function:

In [29]:
def adder(x, y):
    return x + y

In [32]:
map(adder, my_list, my_list)

[0, 2, 4, 6, 8]

## filter()

The `filter()` builtin allows us to return only elements of a list that match a certain predicate function:

In [33]:
filter(lambda x: x % 2 == 0, my_list)    # Filter out even numbers

[0, 2, 4]

In [34]:
map(lambda x: x % 2 == 0, my_list)    # Filter out even numbers

[True, False, True, False, True]

## reduce()

The `reduce()` builtin allows us to apply a "reduction" operation that uses a function to combine elements of a sequence into a single value. For instance, we could compute the sum of a list using `reduce()` as follows:

In [35]:
reduce(lambda acc, val: acc + val, my_list)

10

In [37]:
lst = [
    [1, 2],
    [3, 4],
    [5, 6]
]

In [39]:
reduce(lambda acc, val: acc + val, lst)

[1, 2, 3, 4, 5, 6]

# The operator module

While `lambda` is handy, sometimes it's verbose. For times like this, we can use the `operator` module, which provides functions representing Python built-in operators (e.g. `operator.add` for `+`). We could re-write the example above as follows:

In [42]:
from operator import add as add
reduce(add, my_list)

10

Combining these ideas, we could then define a `dot_product` function using `map` and `reduce` as follows:

In [43]:
from operator import add, mul

def dot_product(xs, ys):
    return reduce(add, map(mul, xs, ys))

dot_product([1, 2, 3], [4, 5, 6])

32

(Please don't do this, however, as `reduce(add...)` is much slower than the builtin `sum()`, and `numpy` has built-in dot products anyway.)

# Exercise 1

- Create a counter with a `defaultdict` by setting the `default_factory` to `int`. Use your counter to count the number of times each letter appears in this sentence: 
    - `a quick brown fox jumps over the lazy dog`



In [17]:
from collections import defaultdict
ctr = defaultdict(int)

mystr = 'a quick brown fox jumps over the lazy dog'

def foo(x):
    ctr[x] += 1
map(foo, mystr)
    
print ctr

defaultdict(<type 'int'>, {' ': 8, 'a': 2, 'c': 1, 'b': 1, 'e': 2, 'd': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'o': 4, 'n': 1, 'q': 1, 'p': 1, 's': 1, 'r': 2, 'u': 2, 't': 1, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1})


# Functional Closures and Decorators

Python has a feature known as *lexical scoping*. This means that when a function references a name that is not local to the function, it attempts to resolve that name where the function was initially *defined*. A simple example is when using global names:


In [48]:
x = 5
def print_x():
    print x
    
print_x()

5


A more interesting case is when you define a function *within* another function. In this case, Python will search each enclosing function for the name being referenced, starting from the inside. Using this feature, we can make a "function factory" that returns functions with certain values "bound" to where the function was defined. We call such a function a **closure**. For instance:

In [49]:
def make_adder(x):
    def adder(y):
        return x + y
    return adder
add5 = make_adder(5)
add6 = make_adder(6)

In [50]:
add5(10)

15

In [51]:
add6(12)

18

## Function wrappers and decorators

A specific case where closures are frequently seen is in building *function wrappers*. For instance, we may wish to log each invocation of a function:

In [53]:
def logging(f):
    def wrapper(*args, **kwargs):
        print 'Calling %r(%r, %r)' % (f, args, kwargs)
        return f(*args, **kwargs)
    return wrapper

logging_add5 = logging(add5)
logging_add5(4)

Calling <function adder at 0x1099ffc08>((4,), {})


9

This case is so common that it has its own term (*decorator*), and its own syntax. Suppose we had defined our logging decorator before another function that we wanted to wrap:

In [59]:
def mul(x, y):
    return x * y

wrapped_mul = logging(mul)
print wrapped_mul(x=2, y=3)

Calling <function mul at 0x1099ff9b0>((), {'y': 3, 'x': 2})
6


In [54]:
def wrapped_function():
    print 'Calling wrapped function'
    
wrapped_function = logging(wrapped_function)

wrapped_function()

Calling <function wrapped_function at 0x1099ff398>((), {})
Calling wrapped function


A "nicer" way to write the above is to use the *decorator syntax*:

In [55]:
@logging
def wrapped_function():
    print 'Calling wrapped function'
    
wrapped_function()

Calling <function wrapped_function at 0x1099fff50>((), {})
Calling wrapped function


In [None]:
@foo(1,2)
def bar():
    ...
    
    
    
    
    
def bar():
    ...
deco = foo(1,2)
bar = deco(bar)

In [60]:
@logging
def myfun():
    '''This is a docstring'''
    print 'hi'
    

In [61]:
help(myfun)

Help on function wrapper in module __main__:

wrapper(*args, **kwargs)



## functools.wraps

The Python standard library `functools` provides a number of useful functions for functional programming. One of these is the `@wraps` decorator. It is useful when defining decorators to ensure that the function signature, docstring, etc. is copied onto the wrapper:

In [67]:
from functools import wraps

def logging_message(message):
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            'Wrapper docstring'
            print '%s: %r(%r, %r)' % (message, f, args, kwargs)
            return f(*args, **kwargs)
        return wrapper
    return decorator



In [68]:
@logging_message('Calling it now!')
def func():
    'Func docstring'
    print 'Running it now!'
    
print func()
    
print func
print func.__doc__
print help(func)

Calling it now!: <function func at 0x109a66230>((), {})
Running it now!
None
<function func at 0x109a662a8>
Func docstring
Help on function func in module __main__:

func(*args, **kwargs)
    Func docstring

None


In [50]:
func()

Calling it now!: <function func at 0x1040a2aa0>((), {})
Running it now!


In [91]:
def deco_factory(a):
    print 'Entering deco_factory with', a
    def decorator(func):
        print 'Entering decorator with', func
        @wraps(func)
        def wrapper(x):
            print 'Calling wrapper with', a, func, x
            result = func(x)
            print 'Returning', result, 'from wrapper'
        return wrapper
    return decorator





In [92]:
@deco_factory('avalue')
def func(x):
    print 'Calling func with', x
    return x * 2

Entering deco_factory with avalue
Entering decorator with <function func at 0x109a66b90>


In [93]:
func

<function __main__.func>

In [94]:
func(2)

Calling wrapper with avalue <function func at 0x109a66b90> 2
Calling func with 2
Returning 4 from wrapper


### Exercises:
- Create a counter with a `defaultdict` by setting the `default_factory` to `int`. Use your counter to count the number of times each letter appears in this sentence: 
    - `a quick brown fox jumps over the lazy dog`


- Create a function called `printer` that takes a string and prints it. Then create a wrapper that will print the number of times each letter appears in the string passed in to `printer`, followed by the string.


- Use the wrapper as a decorator on your `printer` function. 

In [29]:
from functools import wraps
from collections import Counter
def print_letters(func):
    @wraps(func)
    def wrapped(astring):
        ctr = Counter()
        for chr in astring:
           ctr[chr] += 1 
        print ctr
        return func(astring)
    return wrapped


@print_letters
def printer(astring):
    print astring
    
printer("This is vinod gupta")
    

Counter({' ': 3, 'i': 3, 's': 2, 'a': 1, 'd': 1, 'g': 1, 'h': 1, 'o': 1, 'n': 1, 'p': 1, 'u': 1, 'T': 1, 'v': 1, 't': 1})
This is vinod gupta


In [96]:
def decorator(func):
    def wrapper(astring):
        c = Counter()
        for letter in astring:
            c[letter] += 1
        print c
        print astring
        return func(astring)
    return wrapper

In [97]:
@decorator
def printer(astring):
    print astring

In [98]:
printer('a quick brown fox jumps over the lazy dog')

Counter({' ': 8, 'o': 4, 'a': 2, 'e': 2, 'r': 2, 'u': 2, 'c': 1, 'b': 1, 'd': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'n': 1, 'q': 1, 'p': 1, 's': 1, 't': 1, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1})
a quick brown fox jumps over the lazy dog
a quick brown fox jumps over the lazy dog


In [53]:
def counter(sentence='a quick brown fox jumps over the lazy dog'):
    count = collections.defaultdict(int)

    for letter in sentence.lower().replace(' ', ''):
        count[letter] += 1

    for i in count.items():
        print i
counter('Hello, there')

('e', 3)
('h', 2)
('l', 2)
('o', 1)
(',', 1)
('r', 1)
('t', 1)


In [54]:
# Wrapper counter
def printer(text):
    print text


def count(f):
    def wrapper(*args, **kwargs):
        sentence = args[0]
        sentence = sentence.lower().replace(' ', '')
        counter = collections.defaultdict(int)

        for letter in sentence:
            counter[letter] += 1

        for l, c in counter.items():
            print '{}: {}'.format(l, c)
        return f(*args, **kwargs)
    return wrapper


In [55]:
wrapped_print = count(printer)

In [56]:
wrapped_print('message')

a: 1
e: 2
s: 2
m: 1
g: 1
message


In [57]:
@count
def printer(text):
    print text


In [58]:
printer('message')

a: 1
e: 2
s: 2
m: 1
g: 1
message
