## Huffman Codes

In [46]:
## get some data into the notebook
with open('Downloads/algo2huffman.txt') as f:
    numSymbols = f.readline()
    weights = f.readlines()

In [47]:
numSymbols

'1000\n'

In [48]:
weights[0]

'7540662\n'

In [49]:
n = int(numSymbols.strip('\n'))

In [50]:
weights = [int(w.strip('\n')) for w in weights]

In [51]:
sum(weights) / n

4990911.37

In [52]:
print(min(weights), max(weights))

1873 9979223


Your task in this problem is to run the Huffman coding algorithm from lecture on this data set. What is the maximum length of a codeword in the resulting Huffman code?

In [53]:
weights = sorted(weights)

In [54]:
weights[:5]

[1873, 12710, 37164, 40882, 57802]

In [55]:
## A utility to swap in place two array items
def swap(arr, i, j):
    temp = arr[i]
    arr[i] = arr[j]
    arr[j] = temp

In [56]:
def bubbleUp(array, newIndex):
    while newIndex > 0:
        oldIndex = (newIndex + 1) // 2 - 1
        if array[oldIndex] < array[newIndex]:
            return
        swap(array, newIndex, oldIndex)
        newIndex = oldIndex

In [57]:
def bubbleDown(array, newIndex=0):
    # default newIndex to 0 for when the item to bubble down has just been swapped from end of array to start
    leftChild = (newIndex + 1) * 2 - 1 # left child of newIndex
    while leftChild < len(array):
        minChild = leftChild + 1  # right child of newIndex
        if minChild == len(array):  # rare case where the bubbleDown has reached a final, left child without sibling
            if array[newIndex] < array[leftChild]: return
            else:
                swap(array, leftChild, newIndex)
                return
        if array[leftChild] < array[minChild]:
            minChild = leftChild
        if array[newIndex] < array[minChild]:
            return
        swap(array, newIndex, minChild)
        newIndex = minChild
        leftChild = (newIndex + 1) * 2 - 1

In [30]:
def getMin(minHeap):
    h = len(minHeap)
    swap(minHeap, 0, h-1)
    minNode = minHeap.pop()
    bubbleDown(heap)
    return minNode

In [25]:
def merge(t1, t2):
    '''
    arguments are 3-tuples representing merged subtrees, and comprising a weight, 
    a min number of bits from root to leaf, and a max number of bits from root to leaf
    '''
    return (t1[0] + t2[0], min(t1[1], t2[1]) + 1, max(t1[2], t2[2]) + 1)

In [26]:
merge((14, 0, 0), (25, 1, 2))

(39, 1, 3)

In [58]:
# Use a min heap to keep track of merged symbols
weights = [(w, 0, 0) for w in weights]   ## add min and max merges of node as 2nd and 3rd elements of tuples
heap = [merge(weights[0], weights[1])]   ## keep track of merged weights and its min and max number of merges in this minHeap
i = 2   ## iterate through remaining weight list using this index
while i < n:  ## merge all individual weights into nodes in a heap
    ## Find the 2 lightest nodes, min1 and min2, either at index i of weights or at top of minHeap of merged nodes
    if weights[i] < heap[0]:
        min1 = weights[i]
        i += 1
    else:
        min1 = getMin(heap)
    if not heap:  ## just for when there was only 1 node left on heap
        min2 = weights[i]
        i += 1
    elif i == n:  # just for when min1 was the last element in weights
        min2 = getMin(heap)
    elif weights[i] < heap[0]:
        min2 = weights[i]
        i += 1
    else:
        min2 = getMin(heap)
        
    heap.append(merge(min1, min2))
    bubbleUp(heap, len(heap)-1)
    
while len(heap) > 1:   ## after individual weights have all been merged, continue merging heap nodes
    min1 = getMin(heap)
    min2 = getMin(heap)
    heap.append(merge(min1, min2))
    bubbleUp(heap, len(heap)-1)
        
    

In [59]:
heap   ## 2nd and 3rd elements of remaining super-tuple should be assgmt answers

[(4990911370, 9, 19)]

## Now for some Max Wt Indep Set Dynamic Programming action

In [60]:
# Get the full nodepath into the notebook
with open('Downloads/algo2maxWeightIS.txt') as f:
    n = int(f.readline().strip('\n'))
    nodes = f.readlines()

In [61]:
n

1000

In [62]:
nodes[0]  #remember to index this as 1 in the list, as per the question definition, or subtract one from question indices

'4962786\n'

Your task in this problem is to run the dynamic programming algorithm (and the reconstruction procedure) from lecture on this data set. The question is: of the vertices 1, 2, 3, 4, 17, 117, 517, and 997, which ones belong to the maximum-weight independent set? (By "vertex 1" we mean the first vertex of the graph---there is no vertex 0.) In the box below, enter a 8-bit string, where the ith bit should be 1 if the ith of these 8 vertices is in the maximum-weight independent set, and 0 otherwise.

In [65]:
nodes = [int(node.strip('\n')) for node in nodes]

In [66]:
nodes[:4]

[4962786, 6395702, 5601590, 3803402]

In [67]:
best = [nodes[0], max(nodes[0], nodes[1])]  ## get the best list started for easier indexing in main loop
for i in range(2,n):
    best.append(max(nodes[i] + best[i-2], best[i-1]))

In [68]:
print(best[-4:])
print(nodes[-4:])

[2947394128, 2948442421, 2950698717, 2955353732]
[7546051, 8594344, 3304589, 6911311]


In [71]:
## Reconstruct the path
answer = []
i = n-1  ## final index
while i > 0:
    if best[i] > best[i-1]:   # i-th element was included
        answer.append(i)
        i -= 2   # need to skip the previous element by definition of independent set
    else:
        i -= 1
if i==0:
    answer.append(i)   ## 1-th element was not included, so include 0-th and terminate

In [72]:
len(answer)

459

In [73]:
print(answer[-4:])
print(answer[:4])

[7, 4, 2, 0]
[999, 997, 994, 992]


In [77]:
select = [1, 2, 3, 4, 17, 117, 517, 997]  ## assgmt question
select = [s-1 in answer for s in select]
select   ## hopefully assgmt answer


[True, False, True, False, False, True, True, False]