Rediscovering the stdlib

The collections module

from collections import (
    deque, defaultdict, namedtuple,
    OrderedDict, Counter
)

About me

Baptiste Mispelon (bmispelon)

I've been using python for over 5 years.

Currently doing web development with django at M2BPO.

About this presentation

No advanced python knowledge required.

Lots of small code examples.

Available online.

Introduction

Python has "batteries included" philosophy,

Introduction

Python has "batteries included" philosophy,

but they are easy to miss if you don't know about them.

What is this talk about?

The :navy:`collections` module.

5 specialized container datatypes.

ABC (not presented today).

1: collections.deque

List-like container with fast appends and pops on either end.

Introduced in python 2.4.

Short for "doubled-ended queue".

Time Complexity

Operation	list	deque
append
pop (right)
prepend
pop (left)

Time Complexity

Operation	list	deque
append	O(1)
pop (right)
prepend
pop (left)

Time Complexity

Operation	list	deque
append	O(1)
pop (right)	O(1)
prepend
pop (left)

Time Complexity

Operation	list	deque
append	O(1)
pop (right)	O(1)
prepend	O(n)
pop (left)

Time Complexity

Operation	list	deque
append	O(1)
pop (right)	O(1)
prepend	O(n)
pop (left)	O(n)

Time Complexity

Operation	list	deque
append	O(1)	O(1)
pop (right)	O(1)	O(1)
prepend	O(n)	O(1)
pop (left)	O(n)	O(1)

Initialization

deque(iterable, maxlen=None)

Both parameters are optional.

Iterable: list, str, dict, generator, ...

Initialization

deque() # ?

Initialization

deque() # deque([])

Initialization

deque() # deque([])
deque('abc') # ?

Initialization

deque() # deque([])
deque('abc') # deque(['a', 'b', 'c'])

Initialization

deque() # deque([])
deque('abc') # deque(['a', 'b', 'c'])
deque(xrange(1000, 0, -1), maxlen=3)
# ?

Initialization

deque() # deque([])
deque('abc') # deque(['a', 'b', 'c'])
deque(xrange(1000, 0, -1), maxlen=3)
# deque([3, 2, 1], maxlen=3)

Practical Example

Print the last 20 lines of a file.

$ tail -n 20 some_file.txt

Approach 1: Naive

with open('some_file.txt') as f:
    last = list(f)[-20:] ###
print "\n".join(last)

Approach 2: More Robust

last = []
with open('some_file.txt') as f:
    for line in f:
        last.append(line)
        if len(last) > 20:
            last.pop(0) ###
print "\n".join(last)

Approach 3: deque

with open('some_file.txt') as f:
    last = deque(f, maxlen=20) ###
print "\n".join(last)

2: collections.defaultdict

Dict subclass that calls a factory function to supply missing values.

Introduced in python 2.5.

Initialization

# defaultdict(factory, *a, **kw)
defaultdict(f)
defaultdict(f, {'foo': 'bar'})
defaultdict(f, [('foo', 'bar')])
defaultdict(f, foo='bar')

What's a factory?

Any callable without required arguments:

functions (named or anonymous)
classes
instance methods

Factory examples:

bool
int
float
complex
str
list
dict

Factory examples:

bool()	`False`
int
float
complex
str
list
dict

Factory examples:

bool()	`False`
int()	`0`
float
complex
str
list
dict

Factory examples:

bool()	`False`
int()	`0`
float()	`0.0`
complex
str
list
dict

Factory examples:

bool()	`False`
int()	`0`
float()	`0.0`
complex()	`0j`
str
list
dict

Factory examples:

bool()	`False`
int()	`0`
float()	`0.0`
complex()	`0j`
str()	`''`
list
dict

Factory examples:

bool()	`False`
int()	`0`
float()	`0.0`
complex()	`0j`
str()	`''`
list()	`[]`
dict

Factory examples:

bool()	`False`
int()	`0`
float()	`0.0`
complex()	`0j`
str()	`''`
list()	`[]`
dict()	`{}`

Missing Keys

# Regular dict:
rd = dict(foo='bar')
rd['foo']     # ?

Missing Keys

# Regular dict:
rd = dict(foo='bar')
rd['foo']     # 'bar'

Missing Keys

# Regular dict:
rd = dict(foo='bar')
rd['foo']     # 'bar'
rd['missing'] # ?

Missing Keys

# Regular dict:
rd = dict(foo='bar')
rd['foo']     # 'bar'
rd['missing'] # KeyError

Missing Keys

factory = lambda: 'X'
dd = defaultdict(factory, foo='bar')
dd['foo']     # ?

Missing Keys

factory = lambda: 'X'
dd = defaultdict(factory, foo='bar')
dd['foo']     # 'bar'

Missing Keys

factory = lambda: 'X'
dd = defaultdict(factory, foo='bar')
dd['foo']     # 'bar'
dd['missing'] # ?

Missing Keys

factory = lambda: 'X'
dd = defaultdict(factory, foo='bar')
dd['foo']     # 'bar'
dd['missing'] # 'X'

Practical Example

Given a list of payments (date, amount),

we want to get a mapping of {date: [amounts]}

Approach 1: naive

d = {}
for date, amount in L:
    if date not in L:
        d[date] = []
    d[date].append(amount)

Approach 2: Improved

d = {}
for date, amount in L:
    l = d.setdefault(date, [])
    l.append(amount)

Approach 3: defaultdict

d = defaultdict(list)
for date, amount in L:
    d[date].append(amount)

3: collections.namedtuple

Factory function for creating tuple subclasses with named fields.

Introduced in python 2.6.

What is a named tuple?

It's a subclass of tuple whose attributes can also be accessed by name (not just by position).

It uses as much memory as a regular tuple.

Like a tuple, it's immutable.

Initialization

# cls = namedtuple(name, fields)

Initialization

# cls = namedtuple(name, fields)
namedtuple('Point', ['x', 'y', 'z'])

Initialization

# cls = namedtuple(name, fields)
namedtuple('Point', ['x', 'y', 'z'])
namedtuple('Point', 'x y z')

Initialization

# cls = namedtuple(name, fields)
namedtuple('Point', ['x', 'y', 'z'])
namedtuple('Point', 'x y z')
namedtuple('Point', 'x,y,z')

Initialization

# cls = namedtuple(name, fields)
namedtuple('Point', ['x', 'y', 'z'])
namedtuple('Point', 'x y z')
namedtuple('Point', 'x,y,z')
# Creates **classes**,
#     not **instances**.

Creation

Point = namedtuple('Point', 'x y z')

Creation

Point = namedtuple('Point', 'x y z')
p = Point(23, 10, 85)

Creation

Point = namedtuple('Point', 'x y z')
p = Point(23, 10, 85)
p = Point(x=23, y=10, z=85)

Creation

Point = namedtuple('Point', 'x y z')
p = Point(23, 10, 85)
p = Point(x=23, y=10, z=85)
p = Point(y=10, x=23, z=85)

Accessing Fields

p = Point(x=23, y=10, z=85)

Accessing Fields

p = Point(x=23, y=10, z=85)
p[0] # ?

Accessing Fields

p = Point(x=23, y=10, z=85)
p[0] # 23

Accessing Fields

p = Point(x=23, y=10, z=85)
p[0] # 23
p.z  # ?

Accessing Fields

p = Point(x=23, y=10, z=85)
p[0] # 23
p.z  # 85

Attributes And Methods

p = Point(x=23, y=10, z=85)
p._fields # ?

Attributes And Methods

p = Point(x=23, y=10, z=85)
p._fields # ['x', 'y', 'z']

Attributes And Methods

p1 = Point(x=23, y=10, z=85)

Attributes And Methods

p1 = Point(x=23, y=10, z=85)
p2 = p1._replace(z=56)

Attributes And Methods

p1 = Point(x=23, y=10, z=85)
p2 = p1._replace(z=56)
tuple(p2) # ?

Attributes And Methods

p1 = Point(x=23, y=10, z=85)
p2 = p1._replace(z=56)
tuple(p2) # (23, 10, 56)

Attributes And Methods

p = Point(x=23, y=10, z=85)
p._asdict()
# ?

Attributes And Methods

p = Point(x=23, y=10, z=85)
p._asdict()
# {'x': 23, 'y': 10, 'z': 85}

Attributes And Methods

p = Point(x=23, y=10, z=85)
p._asdict()
# {'x': 23, 'y': 10, 'z': 85}
# Actually an OrderedDict instance

4: collections.OrderedDict

Dict subclass that remembers the order entries were added.

Introduced in python 2.7.

Initialization

Identical to dict.

OrderedDict(mapping)
OrderedDict(iterable)
OrderedDict(**kwargs)

dict Has No Order

d = {}
for char in 'abc':
    d[char] = None
print ''.join(d) # ?

dict Has No Order

d = {}
for char in 'abc':
    d[char] = None
print ''.join(d) # 'acb'

dict Has No Order

d = {}
for char in 'abc':
    d[char] = None
print ''.join(d) # 'acb'
# Actually,
# it depends on python's version.

dict Has No Order

d = {}
for char in 'abc':
    d[char] = None
print ''.join(d) # 'acb'
# Actually,
# it depends on python's version.
# And it's also affected by -R flag.

But ...

Order is consistent between two iterations if no items have been added or deleted.

zip(d.keys(), d.values())
# Same as d.items()

Shortcut (fromkeys)

d = {}
for char in 'abc':
    d[char] = None

Shortcut (fromkeys)

d = {}
for char in 'abc':
    d[char] = None
# Can be written as:
d = dict.fromkeys('abc')

Bringing Back Order

OrderDict are ordered by insertion order:

d = OrderedDict.fromkeys('abc')
''.join(d) # ?

Bringing Back Order

OrderDict are ordered by insertion order:

d = OrderedDict.fromkeys('abc')
''.join(d) # 'abc'

Keys are still unique!

d = OrderedDict.fromkeys('abcda')
''.join(d) # ?

Keys are still unique!

d = OrderedDict.fromkeys('abcda')
''.join(d) # 'bcda'

Comparisons

od1 = OrderedDict.fromkeys('abc')
od2 = OrderedDict.fromkeys('cba')
rd1 = dict.fromkeys('abc')
rd2 = dict.fromkeys('cba')

rd1 == rd2 # ?

Comparisons

od1 = OrderedDict.fromkeys('abc')
od2 = OrderedDict.fromkeys('cba')
rd1 = dict.fromkeys('abc')
rd2 = dict.fromkeys('cba')

rd1 == rd2 # True

Comparisons

od1 = OrderedDict.fromkeys('abc')
od2 = OrderedDict.fromkeys('cba')
rd1 = dict.fromkeys('abc')
rd2 = dict.fromkeys('cba')

rd1 == rd2 # True
od1 == od2 # ?

Comparisons

od1 = OrderedDict.fromkeys('abc')
od2 = OrderedDict.fromkeys('cba')
rd1 = dict.fromkeys('abc')
rd2 = dict.fromkeys('cba')

rd1 == rd2 # True
od1 == od2 # False

Comparisons

od1 = OrderedDict.fromkeys('abc')
od2 = OrderedDict.fromkeys('cba')
rd1 = dict.fromkeys('abc')
rd2 = dict.fromkeys('cba')

rd1 == rd2 # True
od1 == od2 # False
od1 == rd2 # ?

Comparisons

od1 = OrderedDict.fromkeys('abc')
od2 = OrderedDict.fromkeys('cba')
rd1 = dict.fromkeys('abc')
rd2 = dict.fromkeys('cba')

rd1 == rd2 # True
od1 == od2 # False
od1 == rd2 # True

5: collections.Counter

Dict subclass for counting hashable objects.

Introduced in python 2.7.

5: collections.Counter

Basically a defaultdict(int).

With some useful methods for counting stuff.

Initialization

Counter(iterable)

Iterable: list, str, dict, generator, ...

Counting Characters

# First approach, with a plain dict:
counter = {}
for letter in 'ababac':
    if letter not in counter:
        counter[letter] = 0
    counter[letter] += 1
counter['a'] # 3

Counting Characters

# Third approach, with a defaultdict
counter = defaultdict(int)
for letter in 'ababac':
    counter[letter] += 1
counter['a'] # 3

Counting Characters

# Finally, with a Counter
counter = Counter('ababac')
counter['a'] # 3

Counter Is A dict!

counter = Counter('aaabbc')
counter['a'] # ?

Counter Is A dict!

counter = Counter('aaabbc')
counter['a'] # 3

Counter Is A dict!

counter = Counter('aaabbc')
counter['a'] # 3
counter['d'] # ?

Counter Is A dict!

counter = Counter('aaabbc')
counter['a'] # 3
counter['d'] # 0

Counter Is A dict!

counter = Counter('aaabbc')
sum(counter.values()) # ?

Counter Is A dict!

counter = Counter('aaabbc')
sum(counter.values()) # 6
# Total number of elements

Counter Is A dict!

counter = Counter('aaabbc')
len(counter) # ?

Counter Is A dict!

counter = Counter('aaabbc')
len(counter) # 3
# Total number of unique elements:

Counter Is A dict!

counter = Counter('aaabbc')
counter.keys()
# ?

Counter Is A dict!

counter = Counter('aaabbc')
counter.keys()
# ['a', 'b', 'c']
# List of unique elements

Counter.elements()

# List elements, with repetition
c = Counter('ababac')
c.elements()
# ?

Counter.elements()

# List elements, with repetition
c = Counter('ababac')
c.elements()
# ['a', 'a', 'a', 'b', 'b', 'c']

Counter.elements()

# List elements, with repetition
c = Counter('ababac')
c.elements()
# ['a', 'a', 'a', 'b', 'b', 'c']
# The order is arbitrary!

Counter.most_common()

# List (element, count)
c = Counter('aaabbc')
c.most_common()
# ?

Counter.most_common()

# List (element, count)
c = Counter('aaabbc')
c.most_common()
# [('a', 3), ('b', 2), ('c', 1)]

Counter.most_common(N)

c = Counter('aaabbc')
c.most_common(2)
# ?

Counter.most_common(N)

c = Counter('aaabbc')
c.most_common(2)
# [('a', 3), ('b', 2)]

Counter.update()

# Add elements
c = Counter('aaabb')
c.update(a=2, b=1)
c # ?

Counter.update()

# Add elements
c = Counter('aaabb')
c.update(a=2, b=1)
c # Counter({'a': 5, 'b': 3})

Counter.subtract()

# subtract elements
c = Counter('aaabb')
c.subtract(a=2, b=1)
c # ?

Counter.subtract()

# subtract elements
c = Counter('aaabb')
c.subtract(a=2, b=1)
c # Counter({'a': 1, 'b': 1})

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # ?

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})
c1 - c2 # ?

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})
c1 - c2 # Counter({'a': 2})

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})
c1 - c2 # Counter({'a': 2})
c1 & c2 # ?

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})
c1 - c2 # Counter({'a': 2})
c1 & c2 # Counter({'a': 2, 'b': 2})

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})
c1 - c2 # Counter({'a': 2})
c1 & c2 # Counter({'a': 2, 'b': 2})
c1 | c2 # ?

Arithmetic Operations

c1 = Counter('aaaabb')
c2 = Counter('aabbb')
c1 + c2 # Counter({'a': 6, 'b': 5})
c1 - c2 # Counter({'a': 2})
c1 & c2 # Counter({'a': 2, 'b': 2})
c1 | c2 # Counter({'a': 4, 'b': 3})

Conclusion (TL;DL)

deque: list with fast operations on both sides;
defaultdict: dict with default values;
namedtuple: tuple with named attributes;
OrderedDict: ordered dict;
Counter: counts things.

Questions?

bmispelon@gmail.com
github.com/bmispelon
twitter: @bmispelon
IRC: bmispelon on Freenode (#python, #django)

Bonus: Recipes

Formatting With Defaults

d = dict(y='yes', no='no')
'%(y)s %(n)s' % d
# ?

Formatting With Defaults

d = dict(y='yes', n='no')
'%(y)s %(n)s' % d
# 'yes no'

Formatting With Defaults

d = dict(y='yes')
'%(y)s %(n)s' % d
# ?

Formatting With Defaults

d = dict(y='yes')
'%(y)s %(n)s' % d
# KeyError: 'n'

Formatting With Defaults

factory = lambda: 'no'
d = defaultdict(factory, y='yes')
'%(y)s %(n)s' % d
# ?

Formatting With Defaults

factory = lambda: 'no'
d = defaultdict(factory, y='yes')
'%(y)s %(n)s' % d
# 'yes no'

Auto-vivifying Tree

# don't try this at home!
def factory():
    return defaultdict(factory)
d = defaultdict(factory)

d['a']['b']['c'] = 42

Counting Letters

s = 'a,b.a8a?b'
Counter(c for c in s
        if c.isalpha())
# ?

Counting Letters

s = 'a,b.a8a?b'
Counter(c for c in s
        if c.isalpha())
# Counter({'a': 3, 'b': 2})

Counting Letters (CI)

s = 'a,b.A8A?B'
Counter(c.lower() for c in s
        if c.isalpha())
# ?

Counting Letters (CI)

s = 'a,b.A8A?B'
Counter(c.lower() for c in s
        if c.isalpha())
# Counter({'a': 3, 'b': 2})

Couting words

text = 'foo bar foo'
Counter(text.split())
# ?

Couting words

text = 'foo bar foo'
Counter(text.split())
# Counter({'foo': 2, 'bar': 1})

Questions?

bmispelon@gmail.com
github.com/bmispelon
twitter: @bmispelon
IRC: bmispelon on Freenode (#python, #django)

Files

20130227_stdlib_collections.rst

Latest commit

History

20130227_stdlib_collections.rst

File metadata and controls

Rediscovering the stdlib

About me

About this presentation

Introduction

Introduction

What is this talk about?

1: collections.deque

1: collections.deque

Time Complexity

Time Complexity

Time Complexity

Time Complexity

Time Complexity

Time Complexity

Initialization

Initialization

Initialization

Initialization

Initialization

Initialization

Initialization

Practical Example

Approach 1: Naive

Approach 2: More Robust

Approach 3: deque

2: collections.defaultdict

2: collections.defaultdict

Initialization

What's a factory?

Factory examples:

Factory examples:

Factory examples:

Factory examples:

Factory examples:

Factory examples:

Factory examples:

Factory examples:

Missing Keys

Missing Keys

Missing Keys

Missing Keys

Missing Keys

Missing Keys

Missing Keys

Missing Keys

Practical Example

Approach 1: naive

Approach 2: Improved

Approach 3: defaultdict

3: collections.namedtuple

3: collections.namedtuple

What is a named tuple?

Initialization

Initialization

Initialization

Initialization

Initialization

Creation

Creation

Creation

Creation

Accessing Fields

Accessing Fields

Accessing Fields

Accessing Fields

Accessing Fields

Attributes And Methods

Attributes And Methods

Attributes And Methods

Attributes And Methods

Attributes And Methods

Attributes And Methods

Attributes And Methods

Attributes And Methods