In [1]:
import math
import logging
FORMAT = '[%(name)s:%(levelname)s]  %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT)
logger = logging.getLogger('dbg')

def dprint(s):
    logger.debug(s)

def iprint(s):
    logger.info(s)

logger.setLevel(logging.INFO)

## Heapsort and Binary Heaps

Efficient in-place sorting using a *binary heap* - **not** a stable sort (does not preserve input order of identical keys).

Complexity is better than quicksort in the worst case, and equivalent on average. Storage is $\Theta(n)$. 

| Complexity | Average Case  | Worst Case |
| ---------- | ----------  | ---------- |
| Sort     | $O(n \log n)$ | $O(n \log n)$ |

Applications include the Linux Kernel to avoid quicksort $O(n^2)$ worst case. hybrid *introsort* exists.

### Binary Heaps (Max and Min)

A Binary Max-Heap is a binary tree satisfying the following properties:

* **Shape:** - The binary tree is *Complete*, all levels except the *last* one are **full**. The last is filled **left to right**.
* **Max-Heap:** - The key of each node is $\geq$ to the keys of its children. 
* (**Min-Heap:**) - The key of each node is $\leq$ to the keys of its children. 

<img src="media/maxheap.png" alt="drawing" width="450"/>

> N.B. the word 'binary' is often dropped form 'binary heap'

A heap of `heap_size` nodes is stored as a continuous array length $n \geq$ `heap_size`. This allows all relationships to be specified with **indices**.

For the node at index $i$:

* **Parent** has index $ \lfloor (i-1)/2 \rfloor$
* **Left Child** has index $ 2i+1 $
* **Right Child** has index $ 2i+2 $
* A[$i$] $\leq$ A[parent($i$)] for $1 \leq i <$ `heap_size` - Satisfies Max Heap Property

<img src="media/maxheaparray.png" alt="drawing" width="450"/>

#### Binary Heap Height

The Height of a heap with $n$ keys is clearly $\lfloor \log n \rfloor$ or $\theta( \log n)$ as the heap is a complete binary tree. 

> **n.b. a single item tree (just the root) has a  height 0**

* A heap has at least $2^h$ nodes, where the lowest level has only 1 node.
* A heap has at most $2^{h+1} - 1$ nodes, where the lowest level is full.
*  $2^h \leq n \leq 2^{h+1} - 1$ implies  $h = \lfloor \log n \rfloor$




#### Fixing Max Heap Violations (Max Heapify)

To maintain the max-heap property (The key of each node is $\geq$ to the keys of its children), we have to be able to fix structural violations by 'bubbling' violating keys down the heap.

This is achieved with the `max_heapify()` function, order $O(h)$, when parsed an expected violating index/key. It assumes the left and right children of the violating index/key are both valid.

1. if the left child is $\geq$ index i, swap it and rerun checks on new left child
2. else if the right child is $\geq$ index i, swap it and rerun checks on new right child
3. else both left and right are less, exit.

#### Building a Max Heap in Linear Time

A max heap can be built from an unsorted array in **linear time**. Simply calling the violation repair on the left half of the array, in place. Each leaf or terminating node is by definition a valid max-heap.

In [3]:
def left_child(i):
    return 2*i + 1

def right_child(i):
    return 2*i + 2

def max_heapify(A, heap_size, i):
    """ recursive violation solver for a max heap A"""
    left = left_child(i)
    right = right_child(i)
    max_i = i

    if left < heap_size and A[left] > A[max_i]:
        max_i = left
    if right < heap_size and A[right] > A[max_i]:
        max_i = right
    if max_i != i:
        A[i], A[max_i] = A[max_i], A[i]
        max_heapify(A, heap_size, max_i)

def build_mh(A):
    size = len(A)
    for i in range(size//2 - 1, -1, -1):
        print(f"Checking Index {i}")
        max_heapify(A, size, i)
    return A


A = [0,2,5,3,6,4]
print(build_mh(A))

Checking Index 2
Checking Index 1
Checking Index 0
[6, 3, 5, 0, 2, 4]


#### Linear Complexity

n.b. **CLRS** $\Rightarrow$ Introduction To Algorithms TB

<img src="media/heapconstruct.png" alt="drawing" width="750"/>

Let the total cost of building a max heap be described such that:

$ C_T = \sum_{h \in \{h\}}$  Node count at height $h$ $\times$ cost of `max_heapify()` at height $h$ where the height of a leaf node is 0.

1. $n$ keys $\Rightarrow$ height $= \lfloor \log n \rfloor$
2. Heap with $n$ keys has at most $\lceil n / 2^{h+1} \rceil$ at height $h$
3. `max_heapify()` $= O(h)$

$ C_T = \sum_{h=0}^{\lfloor \log n \rfloor}$ $\lceil \frac{n}{2^{h+1}} \rceil \times O(h)$

$ \quad \quad \leq \sum_{h=0}^{\lfloor \log n \rfloor}$ $\frac{n}{2^{h}} \times O(h) = O( \sum_{h=0}^{\lfloor \log n \rfloor}$ $\frac{n}{2^{h}} h )$

$ \quad \quad \leq O( n \times \sum_{h=0}^{\infty}$ $\frac{h}{2^{h}} )$

$ \quad \quad \leq O( n \times 2) = O(n) $

Hence the total cost is less than a fixed constant multiple time the cost for $n$, implying $O(n)$.

#### Heap Sort

Heap sort completes the following iteratively, starting with a valid max-heap:

1. Swaps the last and first item
3. Decrease the last item index
2. Shuffle the new first item down the tree until its in the right spot

Complexity $O(n \log n)$

In [1]:
def heapsort(A): # Floyd
    build_mh(A)
    print(f"Heap: {A}")
    heap_size = len(A)
    while heap_size > 1:
        A[heap_size - 1], A[0] = A[0], A[heap_size - 1]
        heap_size -= 1
        print(f"New: {A}")
        print(f"Shuffle: {A[0]}")
        max_heapify(A, heap_size, 0)

heapsort(A)
print(A)

NameError: name 'A' is not defined

Heaps are often used to implement **Priority Queues**, an abstract datatype containing key value pairs. Allow methods `get_max()`, `pop_max()`, `insert()` and `increase_key()` that allows value key-pair's key or priority to be updated. Used in schedulers, bandwidth managers, graph algorithms.

A full max priority queue implementation is below:

In [26]:
def el(key, value):
    return {"key": key, "value": value}

def eli(keys, values):
    return [{"key": key, "value": val} for key, val in zip(keys, values)]

class MaxPQ():
    """ takes A, a list if key, value pairs"""
    def __init__(self, A) -> None:
        self.A = A
        self.size = len(A)
        self.build_mh()

    def __repr__(self) -> str:
        s = ""
        for i in self.A:
            s += f" {i['key']}:{i['value']},"
        return "[" + s[:-1] + " ]"

    def left_child(self, i):
        return 2*i + 1

    def right_child(self, i):
        return 2*i + 2
    
    def parent(self,i):
        return (i-1) // 2

    def max_heapify(self, i):
        left = self.left_child(i)
        right = self.right_child(i)
        max_i = i

        if left < self.size and self.A[left]["key"] > self.A[max_i]["key"]:
            max_i = left
        if right < self.size and self.A[right]["key"] > self.A[max_i]["key"]:
            max_i = right
        if max_i != i:
            self.A[i], self.A[max_i] = self.A[max_i], self.A[i]
            self.max_heapify(max_i)

    def build_mh(self):
        for i in range(self.size//2 - 1, -1, -1):
            self.max_heapify(i)

    def increase_key(self, plus, value):
        # find value add plus to key
        # basic linear scan or alternative
        # assume that its there

        # find i with value=value
        for i in range(self.size):
            if self.A[i]["value"] == value:
                break

        # increase key
        self.A[i]["key"] += plus

        # move up parents until the increase_key is valid
        while i > 0 and self.A[i]["key"] > self.A[self.parent(i)]["key"]:
            # swap with parent
            self.A[i], self.A[self.parent(i)] = self.A[self.parent(i)], self.A[i]
            i = self.parent(i)

    def get_max(self):
        return self.A[0]["value"]
    
    def pop_max(self):
        mv = self.get_max()
        self.A[0] = self.A.pop(self.size - 1)
        self.size -= 1
        self.max_heapify(0)
        return mv


mpq = MaxPQ(eli([0,2,5,3,6,4],["M", "L", "C", "A", "B", "K"]))
print(mpq)
mpq.increase_key(5, "L")
print(mpq)
print(f"Max V: {mpq.pop_max()}")
print(mpq)


[ 6:B, 3:A, 5:C, 0:M, 2:L, 4:K ]
[ 7:L, 6:B, 5:C, 0:M, 3:A, 4:K ]
Max V: L
[ 6:B, 4:K, 5:C, 0:M, 3:A ]
