In [None]:
// run this cell to prevent Jupyter from displaying the null output cell
com.twosigma.beakerx.kernel.Kernel.showNullExecutionResult = false;

<a id="notebook_id"></a>
# Binary heaps

A priority queue can be implemented using a self-balancing binary search tree in which case the `insert`, `max`, and `removeMax` operations have $O(\log n)$ complexity.

**Exercise 1** Suppose you have $n$ key-value pairs. What is the complexity of building a binary search tree by adding the $n$ pairs to an empty tree?

A binary heap is an array representation of a complete binary tree. Recall the definition of a complete binary tree:

> In a complete binary tree of height $h$ all levels except possibly level $h - 1$ are completely full. If the bottom-most 
> level $h - 1$ is not full then the nodes are filled in from left to right. The following figure illustrates a complete binary tree:

![Complete binary tree](../resources/images/pq/complete_tree.png)

In a binary heap, the nodes of the tree are stored in an array where the root node is the first element of the array. The remaining nodes are stored in the array in breadth-first order. For example, the binary tree shown above would be stored in an array as follows:

![Array representation](../resources/images/pq/array_rep.png)

Given the index of a node, it is easy to find the children and parent of the node:

```
leftChild(index):
    return 2 * index + 1
```

```
rightChild(index):
    return 2 * index + 2
```

```
parent(index):
    return floor((index - 1) / 2)
```

**Exercise 1** Prove that algorithms for finding the index of the left child, right child, and parent are correct. *Hint:* Use induction.

## The heap property

Heaps satisfy the *heap property*: The value of a node is less than or equal to its parent node.

(Note: The heap property describes a max-heap. If the value of a node is greater than or equal to its parent node then we get a min-heap).

**Exercise 2** Verify that the tree illustrated above satisfies the heap property.

The heap property implies that the largest value in a heap is stored in the root node, and that the nodes in a subtree of of a node contain values that are less than the value stored in the node.

**Exercise 3** What is the minimum number of values in a heap of height $h$?

**Exercise 4** What is the maximum number of values in a heap of height $h$?

**Exercise 5** What is the height of a heap having $n$ elements?

**Exercise 6** Does an array of size 1 represent a heap?

**Exercise 7** If an array is sorted from largest to smallest value, does it represent a heap?

**Exercise 8** How many different heaps can be made by swapping exactly two values in the array shown above?

## Inserting an element

Inserting a value $v$ into a heap is simple. We add a new leaf node to the tree in the next available breadth-first location and set a reference $n$ to refer to the node. If $v$ is less than the value stored in the parent $p$ of $n$ then we are done. Otherwise, we copy the value stored in $p$ into $n$ and then set $n = p$. We repeat the process of copying parent node values down until we find a parent node having a value greater than $v$. This process is illustrated in the following figure where the value 85 is inserted into an existing heap.

A new node $n$ is created in the next available leaf node position. The parent node of $n$ holds the value 27 which is less than 85.

![](../resources/images/pq/insert-1.png)

The value 27 from the parent node is copied into $n$ and $n$ is set to refer to its parent node. The new parent node of $n$ is 74 which is less than 85.

![](../resources/images/pq/insert-2.png)

The value 74 from the parent node is copied into $n$ and $n$ is set to refer to its parent node. The new parent node of $n$ is 93 which is greater than 85. 

![](../resources/images/pq/insert-3.png)

The value 85 is now inserted into the tree by copying the value into $n$.

![](../resources/images/pq/insert-4.png)

The algorithm for inserting a value $v$ into an array $arr$ representing a binary heap is shown below:

```
insert(v, arr)
    i = heap size
    pi = parent(index)
    while i > 0 and arr[pi] < v {
        arr[i] = arr[pi];
        i = pi
        pi = parent(index)
    }
    arr[i] = v
    increase heap size by 1
```

**Exercise 9** The `insert` algorithm does not specify what happens if the array is full. For a re-sizable heap, the array needs to be re-allocated to grow in size and the heap values need to be copied into the re-allocated array. If the full array has size equal to $n$, what is the size of the re-allocated array if the heap can always represent a perfect binary tree?

**Exercise 10** What is the worst-case big-$O$ complexity of `insert`?

## Getting the maximum value

The maximum value in a heap is located in the root node which is represented by the first element of an array.

```
max(arr)
    if heap size == 0 {
        error
    }
    return arr[0]
```

## Getting and removing the maximum value

Removing the maximum value from a heap is conceptually equivalent to removing the root node of the tree. Instead of literally removing the root node, we replace the value in the root node with the value of the last leaf node and remove the leaf node.
Consider removing the maximum value in the following heap:

![](../resources/images/pq/remove-1.png)

The value 27 in the last leaf node is copied into the root node:

![](../resources/images/pq/remove-2.png)

We now need to restore the heap property because the child nodes of 27 have values that are greater than 27.

To restore the heap property, we begin by swapping the value in the root node with the greatest child node value; in this case, we swap the 27 and 85.

![](../resources/images/pq/remove-3.png)

The heap property is still not restored because the child nodes of 27 have values that are greater than 27.

We again swap the value of node 27 with its greatest child node value; in this case, we swap the 27 and 74.

![](../resources/images/pq/remove-4.png)

There are no remaining child nodes of 27 so the heap property is now restored.

The process of restoring the heap property by "sinking" a value down the heap is called *heapify*. Starting at an index $i$ in an array $arr$ that represents a heap a recursive algorithm for heapify is:

```
heapify(arr, i)
    // indexes of left and right children of arr[i]
    ileft = left(i)
    iright = right(i)
    
    // find maximum value between arr[i] and its children
    imax = i
    sz = heap size
    if ileft < sz and arr[ileft] > arr[imax] {
        imax = ileft
    }
    if iright < sz and arr[iright] > arr[imax] {
        imax = iright
    }
    
    // if needed swap arr[i] and arr[imax] and recurse
    if imax != i {
        swap arr[i] and arr[imax]
        heapify(arr, imax)
    }
```

The heapify procedure described above assumes that each child node of `arr[i]` is the root of a sub-heap.

The complexity of heapify can be derived as follows. All of the operations before the recursive call are constant time operations. The recursive call operates on a child node of `arr[i]` and the child node is the root of a heap of size less than $n$ (the heap rooted at the child node contains some fraction of the total nodes in the heap rooted at `arr[i]`). Therefore, the recurrence relation describing the running time of heapify is:

$$T(n) \leq T(n / b) + \Theta(1)\quad \text{for some value of}\ b > 1$$

By the master theorem, $T(n)$ is an element of $O(\log n)$. The recurrence can also be solved by repeated substitution.

**Exercise 11** In a complete binary tree $t$ having $n$ nodes, what is the maximum number of nodes in a subtree of $t$? Express your answer as some multiple of $n$.

## Building a heap from an array of values

Suppose that we have an array `arr` containing $n$ elements. Then the elements `arr[n / 2]` through `arr[n - 1]` can be considered the leaves of a binary heap (where `n / 2` is computed using truncating division). Because a leaf node is a heap, we can safely call `heapify(arr, i)` on the parent node of each leaf. If we work backwards from `i = n / 2 - 1` to `i = 0` then the array of values will be converted to a heap.

```
buildHeap(arr)
    n = heap size
    for i = (n / 2 - 1) down to 0 {
        heapify(arr, i)
    }
```

A loose upper-bound on the complexity of buildHeap is easy to derive: Each call to heapify is in $O(\log n)$ and there are $O(n)$ such calls; therefore, the complexity is at most $O(n \log n)$.

A weakness of the previous analysis is that most of the calls to heapify operate on heaps having low height. In an heap having $n$ elements, there are at most $\text{ceil}(n\ /\ 2^{h+1})$ nodes of height $h$ and there are $\text{floor}(\log n)$ levels, where $\text{ceil}$ is the ceiling (round up towards positive infinity) operator and $\text{floor}$ is the floor (round down towards negative infinity) operator. The complexity of heapify when called on a node of height $h$ is $O(h)$ and we require at most

$$\sum_{h = 0}^{\text{floor}(\log n)} \text{ceil}(n\ /\ 2^{h+1})$$

calls to heapify. The complexity is thus:

$$
\begin{align}
\sum_{h = 0}^{\text{floor}(\log n)} \text{ceil}(n\ /\ 2^{h+1}) O(h) & = 
O\left(\sum_{h = 0}^{\text{floor}(\log n)} \text{ceil}(n\ /\ 2^{h+1}) h \right)\\
& = O\left(\frac{n}{2} \sum_{h = 0}^{\text{floor}(\log n)} \frac{h}{2^{h}} \right)
\end{align}
$$

The sum satisfies the following inequality:

$$\sum_{h = 0}^{\text{floor}(\log n)} \frac{h}{2^{h}} < \sum_{h = 0}^\infty \frac{h}{2^{h}}$$

The sum on the right-hand side of the inequality can be solved using the following identity:

$$\sum_{k = 0}^\infty kx^k = \frac{x}{(1 - x)^2}$$

and substituting $x = 1 / 2$.

$$\sum_{h = 0}^\infty \frac{h}{2^{h}} = \frac{1/2}{(1 - 1/2)^2} = 2$$

Thus

$$
\begin{align}
O\left(\frac{n}{2} \sum_{h = 0}^{\text{floor}(\log n)} \frac{h}{2^{h}} \right) & =
O\left(\frac{n}{2} \sum_{h = 0}^{\infty} \frac{h}{2^{h}} \right)\\
& = O\left(\frac{n}{2} 2 \right)\\
& = O(n)
\end{align}
$$

In other words, the complexity of building a heap from an unordered array is in $O(n)$.

**Exercise 12** Show that there are at most $\text{ceil}(n\ /\ 2^{h+1})$ nodes of height $h$ in a binary heap of $n$ elements.

## Sorting using a heap

The heapsort algorithm sorts an array `arr` of $n$ elements by first building a heap using buildHeap. After building the heap, the maximum element of `arr` is at the root of the heap (i.e., it is the first element of `arr`). We move the maximum element to its correct sorted position by swapping array elements `arr[0]` and `arr[n - 1]`. We then decrease the size of the heap by 1 to avoid re-processing the last element of the array. We fix the heap by calling heapify which moves the second largest element of `arr` to the root of the heap. If we repeat this process for the remaining elements of `arr` then the elements of `arr` will be arranged in sorted order.

```
heapSort(arr)
    buildHeap(arr)
    n = heap size
    for i in (n - 1) to 1 {
        swap arr[0] and arr[n - 1]
        decrease heap size by 1
        n--
        heapify(arr, 0)
    }
```

buildHeap has complexity $O(n)$ and there are $n - 1$ calls to heapify each of which has complexity $O(\log n)$. Therefore, the complexity of heapSort is in

$$ O(n) + (n - 1) O(\log n)$$

which is in $O(n \log n)$.

## Summary

The following table summarizes the worst-case complexity of the main heap operations:

| operation | complexity |
| :- | :-: |
| `max` | O(1) |
| `removeMax`<br /> via `heapify` |  O(log n) |
| `insert` |  O(log n) |
| `buildHeap` | O(n) |
| `heapSort` | O(n log n) |