### Doing things with python collections

The builting `dict`, `set` and `list` along with `collections` provide some useful utilities for transofrming your data. We will do some common operations here.

* create dict from two iterables
* remove duplicates from list
* Most common in iterable
* Least common in iterable
* Counts in iterable
* remove duplicates from list while maintaining insertion order
* sort dictionary by items
* sort dictionary by values
* group dictionary by keys


The builtin dict makes no guarantees about the order of keys. If you want to keep the order of keys, use `ordereddict`

In [4]:
from collections import OrderedDict
od = OrderedDict(a=1, b=2)
print od
print od["a"]

OrderedDict([('a', 1), ('b', 2)])
1


remove duppicates from list

In [9]:
list(set([1, 5, 2, 3, 3, 1, 5]))

[1, 2, 3, 5]

remove duplicates from list while maintaining insertion order

In [10]:
list(OrderedDict.fromkeys([1, 5, 2, 3, 3, 1, 5]))

[1, 5, 2, 3]

create dict from two iterables   
For example for a substitution cipher, you want numeric representation of letters

In [7]:
import string
letters = zip(string.lowercase, range(len(string.lowercase)))
lettersdict = OrderedDict(letters)
print lettersdict
print lettersdict["z"]

OrderedDict([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4), ('f', 5), ('g', 6), ('h', 7), ('i', 8), ('j', 9), ('k', 10), ('l', 11), ('m', 12), ('n', 13), ('o', 14), ('p', 15), ('q', 16), ('r', 17), ('s', 18), ('t', 19), ('u', 20), ('v', 21), ('w', 22), ('x', 23), ('y', 24), ('z', 25)])
25


---
`Counter` can be used to find out occurances of values in an iterable.
* Most common in iterable
* Least common in iterable
* Counts in iterable

In [14]:
from collections import Counter
import random
list_of_ints = [random.randint(0, 100) for _ in range(1000)]
counts = Counter(list_of_ints)
print "Most Common: {} with {} occurances".format(*counts.most_common(1)[0])
print "Least Common: {} with {} occurances".format(*counts.most_common(len(list_of_ints))[-1])

Most Common: 20 with 17 occurances
Least Common: 75 with 2 occurances


What is the count of any elemnet in an iterable

In [16]:
counts = Counter([1, 1, 2, 3, 5, 5, 5, 1, 2])
print "count of 1: {}".format(counts[1])
print "count of 100: {}".format(counts[100])

count of 1: 3
count of 100: 0


Using OrderedDict you can

* sort dictionary by items
* sort dictionary by values

They are useful if you later want to get the `.items` out in sorted order.

In [17]:
data

NameError: name 'data' is not defined

In [21]:
letters = zip(string.lowercase, [random.randint(0, 100) for _ in range(len(string.lowercase))])
sorted_by_key = OrderedDict(sorted(letters, key=lambda t: t[0]))
print sorted_by_key

OrderedDict([('a', 71), ('b', 33), ('c', 87), ('d', 58), ('e', 75), ('f', 16), ('g', 81), ('h', 48), ('i', 83), ('j', 1), ('k', 5), ('l', 12), ('m', 72), ('n', 13), ('o', 85), ('p', 58), ('q', 12), ('r', 8), ('s', 46), ('t', 28), ('u', 34), ('v', 44), ('w', 82), ('x', 56), ('y', 38), ('z', 86)])


In [23]:
sorted_by_value = OrderedDict(sorted(letters, key=lambda t: t[1]))
print sorted_by_value

OrderedDict([('j', 1), ('k', 5), ('r', 8), ('l', 12), ('q', 12), ('n', 13), ('f', 16), ('t', 28), ('b', 33), ('u', 34), ('y', 38), ('v', 44), ('s', 46), ('h', 48), ('x', 56), ('d', 58), ('p', 58), ('a', 71), ('m', 72), ('e', 75), ('g', 81), ('w', 82), ('i', 83), ('o', 85), ('z', 86), ('c', 87)])


group dictionary by keys

In [29]:
from collections import defaultdict
dict_with_lists = defaultdict(list)
movies = (("Nolan", "Memento"), ("Nolan", "Inception"), 
          ("James Cameron", "Avatar"), ("Satyajit Ray", "pather panchali"),
         ("Satyajit Ray", "Devi"),)
for k, v in movies:
    dict_with_lists[k].append(v)
print dict_with_lists


defaultdict(<type 'list'>, {'Nolan': ['Memento', 'Inception'], 'James Cameron': ['Avatar'], 'Satyajit Ray': ['pather panchali', 'Devi']})
