In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Greedy Algorithms - Huffman Coding


### How to implement Huffman Coding? -  Using Priority Queues

The *priority queue* is a tree-based data structure that matches well with greedy algorithms since it allows for efficient **insertions**, **removals** and **updates** of items. 


For simplicity, we'll assume that we are always seeking the minimum-value element from the priority queue. The priority queue data structure needs to support some basic operations:

- *deleteMin*: Identify the element with minimum value and remove it. 

- *insert(x, s)*: insert a new element $x$ with initial value $s$.

 


### The Heap Property

The *min-heap property* for a tree states that every node in the tree is smaller than either of its children. This means that the root of a tree with the heap property is always the minimum element. So for a binary tree:

<img src="heap_property_fixed_examples.jpg" width="70%">

Notice that a **binary heap** is less restrictive than a **binary search tree** since the left and right subtrees can be swapped.

> Maintaining the heap property upon insertion or deletion requires time proportional to the depth of the tree because we can swap elements upward or downward, following the path from the modification either upward or downward.


> This leads to $O(\log n)$ work per operation.

We need to efficiently retrieve the next two smallest frequency nodes.


1. Initialize a min-heap with character frequencies $f(\sigma)$


![huffman-heap-2.png](huffman-heap-2.png)


Then, repeat:

2. Call `deleteMin` twice to get the two least frequent nodes $x$ and $y$
3. Create a new node $z$ with frequency $f(x) + f(y)$
4. Make $x$ and $y$ children of $z$ in the tree.
4. Call `insert` to add $z$ to the heap

How many times will this repeat if $|\Sigma| = n$?

<br><br>

<span style="color:red">**Question**:</span> What is work/span of this algorithm? What is the recurrence of work/span?


Because we will always reduce the number of nodes by 1, this will repeat $n$ times (where $n = |\Sigma|$).

The cost of 2 calls to `deleteMin` and one call to `insert` is $3 \lg n$.

Thus, total work is $O(n \lg n)$. 

We unfortunately have not exposed any parallelism in this algorithm, so the span is also $O(n \lg n)$.

In [1]:
import math, queue
from collections import Counter

## we can also do `map`, `reduce`
D = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'D', 'B', 'C', 'B']
cnt = Counter()
for c in D:
    cnt[c] += 1

## print each frequency per key
for c in cnt.keys():
    print('The charachter is', c, 'with its frequency as', cnt[c  ])

    


The charachter is A with its frequency as 9
The charachter is B with its frequency as 4
The charachter is C with its frequency as 2
The charachter is D with its frequency as 1


In [2]:
class TreeNode(object):
    # we assume data is a tuple (frequency, character)
    def __init__(self, left=None, right=None, data=None):
        self.left = left
        self.right = right
        self.data = data
    def __lt__(self, other):
        return(self.data < other.data)
    def children(self):
        return((self.left, self.right))


p = queue.PriorityQueue()
# construct heap from frequencies, the initial items should be
# the leaves of the final tree
for c in cnt.keys():
    p.put(TreeNode(None,None,(cnt[c], c)))

## print the priority queue    
for i in range(p.qsize()):    
    print(p.qsize())
    print(p.get().data)

4
(1, 'D')
3
(2, 'C')
2
(4, 'B')
1
(9, 'A')


### Meldable Heaps

We'll look at an alterntaive to maintaining the shape property. We will still use binary trees and maintain the heap property, but will not require them to be almost-complete. We will make use of the observation that really, heap operations just require the ability to combine or **meld** heaps efficiently:

- **deleteMin** needs to delete the root of the tree, and then somehow meld the left and right subtrees.

![heap-meld-1.png](heap-meld-1.png)

to delete the minimum value, remove the root and meld the two subtrees

![heap-meld-2.png](heap-meld-2.png)


- **insert** is just the melding of the current tree and and a singleton tree.

<br>

- With the **meld** operation, we can construct a priorty queue in parallel:

> `val pq = Seq.reduce Q.meld Q.empty (Seq.map Q.singleton S)`

<br>

Suppose we wish to meld two heaps $A$ and $B$, with $A$ smaller than $B$. To create a single tree $C$ from $A$ and $B$, we need to decide on the root. 

> Let the root $r_A$ of the smaller tree $A$ be the new root. What do we do with the left and right subtrees $L_A$ and $R_A$ of $r_A$ and $B$? 

If we maintain the left subtree $L_A$ of $r_A$, we can meld $R_A$ and $B$ and make this the right subtree of $r_A$. 

<img src="meld_schematic.jpg" width="70%">

<img src = "example_heap_meld.jpg" width="50%">

This defines a recursive procedure for melding two heaps:

<img src = "naive_meldable_heap_spec.png" width="50%">


This is a well-defined procedure for melding two heaps, but as we can see in this example, we may actually obtain a very long right "spine" of the melded tree. Actually in the worst case we might take $\Theta(|A|+|B|)$ work! 

### Leftist Heaps

To address this imbalance in our approach, we can do some bookkeeping and use our flexibility in choosing how to orient subtrees left to right. 

We will ensure that the tree is **always deeper on the left** than the right.

- The **right spine** of a binary tree is the path from the root to the rightmost node.

- Let $rank(x)$ be length of the right spine starting at $x$. 

- Let $L(x)$ be the left child of $x$ and $R(x)$ the right child of $x$.

#### A **leftist heap** has the property that for any node $x$ in the heap: $rank(L(x)) \geq rank(R(x))$.

<br>

Of course, a leftist heap could be imbalanced on the left:

<img src="leftist_heap-unbalanced.jpg" width="20%">

This is okay, since meld only traverses the right spine of a tree.


Fortunately, keeping the $rank$ measure at every node will allow us to essentially balance the heap. The key idea is that since melding only recurses right, if we use meld to insert elements into the heap it will "balance out" the left bias of the leftist property.

#### The work of deletion and insertion is $O(\log n)$.

<img src="cost_comp.png" width="60%"> 

<img src = "leftist_meldable_heap_spec.png" width="45%">

We maintain a rank at each leftist node, incrementing it and always guaranteeing that the leftist property holds when we create a new node.