# Collections package

## 1. Counter

Let's say we need to count the number of times an element is present in a list.
One easy to way to do this is through iterating over the list:


In [1]:
food = {"ham" : "yes", "egg" : "yes", "bread" : "no" }

In [2]:
for f in food:
    print(f)

ham
egg
bread


In [4]:
counter = dict()

foods = ['soy', 'dairy', 'gluten', 'soy', 'mango', 'dairy', 'mango']

for k in foods:
    if k not in counter:
        counter[k] = 1
    else:
        counter[k] += 1

In [5]:
counter

{'soy': 2, 'dairy': 2, 'gluten': 1, 'mango': 2}

### But we have another good library which has a functionality called as Counter

### It supports three forms of initialization:

- Its constructor can be called with a sequence of items and this is the main use of this Counter function
- a dictionary containing keys and counts, 
- using keyword arguments mapping string names to counts.

### 1.1 We import the library first:

######## pre defined - already available


len(),range(),type(),sum()

In [6]:
import collections

In [8]:
cobject=collections.Counter(['soy', 'dairy', 'gluten', 'soy', 'mango', 'dairy', 'mango'])

print(cobject)

Counter({'soy': 2, 'dairy': 2, 'mango': 2, 'gluten': 1})


### Or you can have a dictionary and you create a counter object of it as below:

In [9]:
dict1={'a':2, 'b':3, 'c':1}

In [10]:
counter2= collections.Counter(dict1)
print(counter2)

Counter({'b': 3, 'a': 2, 'c': 1})


### Now, why would someone would do that? 

### Well let me demonstrate:

In [14]:
type(dict1)

dict

In [15]:
type(counter2)

collections.Counter

In [11]:
dict1['d']

KeyError: 'd'

### Dictionary will give you error if some element doesn't exist in it as a key, but a counter object won't:

In [12]:
counter2['d']

0

### It gives you zero. Now this functionality is very useful is many cases:

### And the last way is as shown below:

In [16]:
counter3 = collections.Counter(a=2, b=3, c=1)

In [17]:
print(counter3)

Counter({'b': 3, 'a': 2, 'c': 1})


### 1.2 We can use counter object to get the maximum:

In [18]:
dict1

{'a': 2, 'b': 3, 'c': 1}

In [19]:
c = collections.Counter(['soy', 'dairy', 'gluten', 'soy', 'mango', 'dairy', 'mango', 'mango'])
c.most_common(2)

[('mango', 3), ('soy', 2)]

In [None]:
c

## 2. defaultdict
A defaultdict works exactly like a normal dict, but it is initialized with a function (“default”) that takes no arguments and provides the default value for a nonexistent key.

A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default.

### Now as we saw in counter, if a value isn't present, it gives the count as 0, but what if we want our dictionary to always get's initialized with a value if not present, instead of 0.

In [20]:
### method 1
import collections

collections.defaultdict()

In [21]:
###### method 2
from collections import defaultdict

In [22]:
from collections import *

In [23]:
dict1

{'a': 2, 'b': 3, 'c': 1}

In [25]:
newDict = defaultdict(lambda : 'yes',dict1)

In [27]:
newDict['f']

'yes'

In [29]:
newDict

defaultdict(<function __main__.<lambda>()>,
            {'a': 2, 'b': 3, 'c': 1, 'f': 'yes'})

In [28]:
dict1['f']

KeyError: 'f'

### The way we define a defaultdict is as shown below:

In [30]:
counter = defaultdict(lambda : 'yes')

In [31]:
counter

defaultdict(<function __main__.<lambda>()>, {})

### And we assign values you to it as  key, value pair only as we did in normal dictionary:

In [32]:
counter['chocolate']

'yes'

In [33]:
counter

defaultdict(<function __main__.<lambda>()>, {'chocolate': 'yes'})

In [34]:
counter['chocolate']='no'

counter['vanilla']='no'

In [36]:
d = dict()
d[[1,2]] = 'hi'

TypeError: unhashable type: 'list'

### Now if you see, the counter object, with the items, you will also see a function attached to it, which in fact is the one which assigns a default value:

In [37]:
print(counter)

defaultdict(<function <lambda> at 0x7f259d4f4950>, {'chocolate': 'no', 'vanilla': 'no'})


### And the items will give internal items:

In [38]:
print(counter.items())

dict_items([('chocolate', 'no'), ('vanilla', 'no')])


### Now if you try to find a key which you never initiazed before, it will give you a default value:

In [39]:
counter["butterscotch"]

'yes'

In [40]:
counter.items()

dict_items([('chocolate', 'no'), ('vanilla', 'no'), ('butterscotch', 'yes')])

## Why to even use defaultdict()?

### An example to illustrate why we use defaultdict:

In [None]:
food_list = 'spam spam spam spam spam spam eggs spam'.split()
food_list

In [None]:
food_count = dict()

In [None]:
for food in food_list:
    food_count[food] += 1 # increment element's value by 1

### Now if we define a dictionary as defaultdict and set it to int, it takes the default value of as 0

In [None]:
food_count = defaultdict(int)

In [None]:
#food_count = defaultdict(lambda:2) #One can give default values by doing this(here default =2)

In [None]:
for food in food_list:
    food_count[food] += 1 # increment element's value by 1

In [None]:
food_count['eggs']

## 3. OrderedDict
**An OrderedDict is a dictionary subclass that remembers the order in which its contents are added.**

### Note- Just for knowledge, from python 3.7 onwards, every dictionary will retain the order of insertion. If you are using an older version, this might come handy.

In [41]:
from collections import OrderedDict
print ('Regular dictionary:')
d={}
d['c'] = 'C'
d['a'] = 'P'
d['b'] = 'F'

# for k, v in d.items():
#     print (k, v)
d

Regular dictionary:


{'c': 'C', 'a': 'P', 'b': 'F'}

In [43]:
dict1

{'a': 2, 'b': 3, 'c': 1}

In [44]:
# sorted()
OrderedDict(d)

OrderedDict([('c', 'C'), ('a', 'P'), ('b', 'F')])

In [42]:
print ('\nOrderedDict:')
d = OrderedDict()
d['c'] = 'C'
d['a'] = 'P'
d['b'] = 'F'
d


OrderedDict:


OrderedDict([('c', 'C'), ('a', 'P'), ('b', 'F')])

A regular dict in Python 3.6 or below does not track the insertion order, and iterating over it produces the values in an arbitrary order. In an OrderedDict, by contrast, the order the items are inserted is remembered and used when creating an iterator.

## 4. zip()
### Iterator functions for efficient looping


In [54]:
l1 = ['x','y','z']
l2 = ['23','44','67']

In [48]:
lis_new = []
for ind in range(len(l1)):
    lis_new.append((l1[ind],l2[ind]))
lis_new

[('x', '23'), ('y', '44'), ('z', '67')]

In [49]:
list(zip(l1,l2))

[('x', '23'), ('y', '44'), ('z', '67')]

In [50]:
# l1 = [1, 2, 3, 4, 5, 6]
# l2 = [2, 3, 4, 5, 6, 7]

In [59]:
z = list(zip(l1, l2))

In [60]:
for k in z:
    print(k)

('x', '23')
('y', '44')
('z', '67')


In [61]:
for k in z:
    print(k)

('x', '23')
('y', '44')
('z', '67')


### Now what you see here is called a generator. A more or rather simple explanation we will give in next part of today's class.

### To work with these zipped values, you either have to use them as iterable object or convert it into a list

In [None]:
list(zip(l1,l2))

### Or like this:

In [None]:
for a,b in z:
    print(a,b)

### Now here's a thing about generators, they do the work only once and then flush out their memory. So if you run the same thing again, it won't print anything:

In [None]:
for a,b in z:
    print(a,b)

### What if the lists are of unequal lengths??

In [62]:
l1 = [1, 2, 3, 4, 5, 6 , 8, 9]
l2 = [2, 3, 4, 5, 6, 7]

In [63]:
z = zip(l1, l2)

In [64]:
print(list(z))

[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]


### If unequal length, zip creates list of tuple till the lowest length list only: