### Doing things with python collections

The builtin `dict`, `set` and `list` along with `collections` provide some useful utilities for transforming your data. We will do some common operations here.

* create dict from two iterables
* remove duplicates from list
* Most common in iterable
* Least common in iterable
* Counts in iterable
* remove duplicates from list while maintaining insertion order
* sort dictionary by items
* sort dictionary by values
* group dictionary by keys


The builtin dict makes no guarantees about the order of keys. If you want to keep the order of keys, use `ordereddict`

In [1]:
from collections import OrderedDict
od = OrderedDict()
od.update({"a": 1})
od.update({"b": 2})
print od
print od["a"]

OrderedDict([('a', 1), ('b', 2)])
1


remove duplicates from list

In [2]:
list(set([1, 5, 2, 3, 3, 1, 5]))

[1, 2, 3, 5]

remove duplicates from list while maintaining insertion order

In [3]:
list(OrderedDict.fromkeys([1, 5, 2, 3, 3, 1, 5]))

[1, 5, 2, 3]

create dict from two iterables   
For example for a substitution cipher, you want numeric representation of letters

In [4]:
import string
letters = zip(string.lowercase, range(len(string.lowercase)))
lettersdict = OrderedDict(letters)
print lettersdict
print lettersdict["z"]

OrderedDict([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4), ('f', 5), ('g', 6), ('h', 7), ('i', 8), ('j', 9), ('k', 10), ('l', 11), ('m', 12), ('n', 13), ('o', 14), ('p', 15), ('q', 16), ('r', 17), ('s', 18), ('t', 19), ('u', 20), ('v', 21), ('w', 22), ('x', 23), ('y', 24), ('z', 25)])
25


---
`Counter` can be used to find out occurances of values in an iterable.
* Most common in iterable
* Least common in iterable
* Counts in iterable

In [5]:
from collections import Counter
import random
list_of_ints = [random.randint(0, 100) for _ in range(1000)]
counts = Counter(list_of_ints)
print "Most Common: {} with {} occurances".format(*counts.most_common(1)[0])
print "Least Common: {} with {} occurances".format(*counts.most_common(len(list_of_ints))[-1])

Most Common: 61 with 23 occurances
Least Common: 68 with 3 occurances


What is the count of any element in an iterable

In [6]:
counts = Counter([1, 1, 2, 3, 5, 5, 5, 1, 2])
print "count of 1: {}".format(counts[1])
print "count of 100: {}".format(counts[100])

count of 1: 3
count of 100: 0


Using OrderedDict you can

* sort dictionary by items
* sort dictionary by values

They are useful if you later want to get the `.items` out in sorted order.

In [7]:
letters = zip(string.lowercase, [random.randint(0, 100) for _ in range(len(string.lowercase))])
sorted_by_key = OrderedDict(sorted(letters, key=lambda t: t[0]))
print sorted_by_key

OrderedDict([('a', 74), ('b', 95), ('c', 61), ('d', 91), ('e', 33), ('f', 96), ('g', 18), ('h', 52), ('i', 63), ('j', 70), ('k', 83), ('l', 90), ('m', 63), ('n', 90), ('o', 52), ('p', 96), ('q', 27), ('r', 33), ('s', 79), ('t', 67), ('u', 34), ('v', 51), ('w', 98), ('x', 82), ('y', 65), ('z', 99)])


In [8]:
sorted_by_value = OrderedDict(sorted(letters, key=lambda t: t[1]))
print sorted_by_value

OrderedDict([('g', 18), ('q', 27), ('e', 33), ('r', 33), ('u', 34), ('v', 51), ('h', 52), ('o', 52), ('c', 61), ('i', 63), ('m', 63), ('y', 65), ('t', 67), ('j', 70), ('a', 74), ('s', 79), ('x', 82), ('k', 83), ('l', 90), ('n', 90), ('d', 91), ('b', 95), ('f', 96), ('p', 96), ('w', 98), ('z', 99)])


group dictionary by keys

In [9]:
from collections import defaultdict
dict_with_lists = defaultdict(list)
movies = (("Nolan", "Memento"), ("Nolan", "Inception"), 
          ("James Cameron", "Avatar"), ("Satyajit Ray", "pather panchali"),
         ("Satyajit Ray", "Devi"),)
for k, v in movies:
    dict_with_lists[k].append(v)
print dict_with_lists


defaultdict(<type 'list'>, {'Nolan': ['Memento', 'Inception'], 'James Cameron': ['Avatar'], 'Satyajit Ray': ['pather panchali', 'Devi']})
