## Sets and Dictionaries (aka Hash Tables)
Sets and dictionaries are ideal data structures to be used when your data has no intrinsic order, but does have a unique object (the key) that can be used to reference it. Sets consist only of keys, while dictionaries of key-value pairs. Internally, a key is hashed and the hashed value is used as the index in the underlying hashtable. This way search and addition of elements can be considered to have complexity **`O(1)`** if a good collision resolution strategy is used. Sets, consisting of (and indexing) only keys are ideal for doing performing set operations like unions, intersections and differences and answering questions like "how many unique elements are contained".

### Sets
Syntax: `s={a, b, c}`
-  `s.add(x)`: add element `x` to the set
-  `s.discard(x)`: removes element `x` from set if exists (alternatively `s.remove(x)` throws error if `x` does not exist in `s`)
-  `len(s)`: number of elements in set s (cardinality)
-  `x in s`: test x for membership in s
-  `x not in s`: test x for non-membership in s
-  `s.issubset(t)`	or `s <= t`:	test whether every element in s is in t
-  `s.issuperset(t)` or	`s >= t`:	test whether every element in t is in s
-  `s.union(t)`	or `s | t`:	new set with elements from both s and t
-  `s.intersection(t)` or	`s & t`:	new set with elements common to s and t
-  `s.difference(t)` or `s - t`:	new set with elements in s but not in t

### Dictionaries
Syntax: `d={a:data1, b:data2, c:data3}`
- `d.get(k)` or `d[k]`: get value associated with key k
- `d.update({k:v})`: adds v if k is not in d or updates v if k already in d
- `d.pop(k)`: remove item with key k and return it
-  `d.len()`: return the length of the dictionary
- `for k in d.keys()`: iterate over keys
- `for value in d.values():` iterate over values
-  `for key, value in d.iteritems():` iterate on both

### Exercises
#### Given two lists, A and B, of unique strings determine if A is a subset of B

In [1]:
def check_subset(A, B):
    s1=set(A)
    s2=set(B)
    if s1<=s2: return True
    return False

In [2]:
l1=["Spain","Greece","Italy","France","Portugal"]
l2=["Italy","Greece"]
check_subset(l2, l1)

True

In [3]:
l2.append("England")
l2

['Italy', 'Greece', 'England']

In [4]:
check_subset(l2, l1)

False

A slightly more low-level implementation that does not use the built-in subset python facilities:

In [5]:
def check_subset2(A, B):
    s1=set(A)
    s2=set(B)
    counter=0
    for i in s1:
        if i in B: counter=counter+1
    if counter==len(s1): return True
    return False

In [6]:
check_subset2(l2, l1)

False

In [7]:
l2.remove("England")

In [8]:
check_subset2(l2, l1)

True

#### Write code to remove duplicates from an unsorted linked list

In [9]:
%run LinkedList.ipynb

In [10]:
alist=MyLinkedList()
for i in range(8):
    alist.insert(i,i)
alist.insert(2,8)
alist.insert(6,2)
alist.printList()

0 1 6 2 3 4 5 6 7 2 size= 10


In [11]:
alist.printList()
s=set()
l=alist.size
node=alist.head 
i=0
while (node!=None):
    v=node.val
    if v in s: 
        alist.delete(i)
        alist.printList()
    elif v not in s: 
        s.update([v])
        i+=1 #only gets increased in the non-delete case, see comment below
    node=node.next

0 1 6 2 3 4 5 6 7 2 size= 10
0 1 6 2 3 4 5 7 2 size= 9
0 1 6 2 3 4 5 7 size= 8


What is interesting to note above is that when trying to delete on the fly from a linked list while iterating on it, the iterator emulating `i` should not get increased if a delete has happened. For example if `i=3` at `[...,7,4,5,...]` and we delete 7, the list becomes `[...,4,5,...]` and 4 now is in position `i=3` in place of 7. So, in order to continue iterating without skipping for, i needs to remain `i=3`. If we overlook this, the iterator emulating index `i` gets out of sync with the actual underlying pointer level iteration on the list and will cause exceptions.

#### You are given a 2D array with 2 columns, one is the product id and the second is the number of items sold per day for a number of days. Return a summary 2D array of 2 columns with the total items sold over a period per each (unique) product id

In [12]:
a=[[224,3],
   [225,5],
   [224,2],
   [225,1],
   [225,8],
   [224,1],
   [223,12],
   [223,1],
   [224,3]]
len(a)
a[0][0]

224

In [19]:
d={}
for i in range(len(a)):
    if a[i][0] not in d: d.update({a[i][0]:a[i][1]})
    else:
        v=d.get(a[i][0])
        v=v+a[i][1]
        d.update({a[i][0]:v})
print d
alist=[]
for key, value in d.iteritems():
    temp = [key,value]
    alist.append(temp)
print alist

{224: 9, 225: 14, 223: 13}
[[224, 9], [225, 14], [223, 13]]


#### Write code to detect if a string is permutation of a palindome

In [20]:
def palperm(astring):
    d={}
    a=list(astring)
    for i in range(len(a)):
        k=a[i]
        if k not in d: d.update({k:1})
        else: 
            c=d.get(k)
            d.update({k:c+1})
    c=0
    for k in d.keys(): 
        if d.get(k)%2==1: c+=1
        if c>1: return False
    return True

In [21]:
palperm("Hello")

False

In [22]:
palperm("HhHhelleo")

True

#### Given an English word in the form of a string, find all valid anagrams