# 4. Heaps and Heap Sort

- **Priority Queue - Data Structure**
    - implements a set S of elements and each of the elements is associated with a key
    - motivates understanding of Heap (data structure)
    - operations on Priority Queue:
        - insert(S, x): insert element x into set S
        - max(S): return the element of S with the largest key
        - extract-max(S): returns the element of S with the largest key AND removes it from S
            - element in the queue was serviced
        - increase_key(S, x, k): increase the value of element x's key to the new value k (in the set S)
- **Heap - Data Structure**
    - implementation of a priority queue
    - an array visualized as a nearly complete binary tree
        - Ex) [16 14 10 8 7 9 3 2 4 1]
        - index 1 is the root of the tree --> (note: not 0 based index)
        - 2 and 3 are the root's children
        - keep going from L -> R where every node has at most 2 children
        - <img src = "screenshots/img_2.png" width = 300>
    - Heap as a tree:
        - root of tree: first element (i = 1)
        - parent(i) = i/2
        - left(i) = 2i (index of left child)
        - right(i) = 2i + 1 (index of right child)
        - Height of binary heap is O(log(n))
    - Max-Heap property:
        - key of a node is >= the keys of its children
    - Min-Heap property:
        - key of a node is <= the keys of its children
    - On a Max-Heap, the max(...) operation is performed trivially 
    - We could conduct sorting on a heap by continually running extract-max(...) and we would get a list of keys in decreasing order.
        - We would want to maintain Max-Heap property as we do this.
    - Question to consider: How do we build a Max-Heap from an unsorted array?
    - _HEAP OPERATIONS/ METHODS_
        - heap_size(A): returns the number of elements in heap
        - build_max_heap(array): produces a max-heap from an arbitrary/ unordered array
            - build_max_heap(A): --> pseudocode
                - for i = n/2 down to 1: --> n/2 because it is the level above the leaves
                - do max_heapify(A, i)
            - **COMPLEXITY**: O(nlog(n)) worst case but IT IS O(n) complexity
                - 1. Observe that max_heapify takes O(1) fr nodes that are one level above leaves and in general O(l) time for nodes that are l levels above the leaves.
                - 2. there are n/4 (+/- 1) with level 1, n/8 with level 2, ... and 1 node at the log(n) level (root node)
                    - there's a decrease in nodes the more work the algorithm needs to do
                - 3. Total amount of work in the for loop can be summed as:
                    - n/4(1 x c) + n/8(2 x c) + n/16(3 x c + ... 1 * (log(n) * c)
                    - set n/4 = 2^k (in order to write the following)
                    - <img src = "screenshots/img_3.png" width = 400>
                    - the expression in the brackets converges to around 2 or 3 which ends up asymptotically cancelling with the 1/4 and c is just a constant so we end up with O(n)
        - max_heapify(A, i): correct a single violation of the heap property in a subtree's root.
            - A = array, i = index
            - takes in a non max-heap by a single violation and it will fix that
            - needs to be able to be done recursively at different levels in order to build a max-heap
            - max_heapify(A, i): --> psuedocode
                - l = left(i)
                - r = right(i)
                - if (l < heap_size(A) and A[l] > A[i])
                    - then largest = l, else largest = i
                - if (r < heap_size(A) and A[r] > A[largest])
                    - then largest = r
                - if largest != i
                    - then exchange A[i] and A[largest]
                    - max_heapify(A, largest)
            - KEY ASSUMPTION:
                - Assume that the trees rooted at left(i) and right(i) are max-heaps. If this is violated, we can't do anything
                - leaves of the tree are by definition max-heaps
             - **COMPLEXITY**: _O(log(n))_
                 - nearly complete binary tree, we're moving level by level to try and correct violation
                 - IT IS NOT O(n) because there's only 1 violation to correct upon a single call of max-heapify
- **Heap Sort from an Array Steps**:
    - 1. Convert A[1...n] to a max-heap
    - 2. Find maximum element A[1]
    - 3. Swap elements A[n] with A[1]
        - now the maximum element is at the end of the array
        - why?
            - creates a violation that allows us to call max_heapify
    - 4. Discard node n from heap by decrementing heap-size
    - 5. New root may violate max-heap property, but the children are max-heaps (because of 1.) so we can run max_heapify

In [3]:
def max_heapify(A,i):
    l = left(i)
    r = right(i)

    if l < len(A) and A[l] > A[i]:
        largest = l
    else:
        largest = i
    if r < len(A) and A[r] > A[largest]:
        largest = r
    if largest != i:
        A[i], A[largest] = A[largest], A[i]
        max_heapify(A, largest)

def left(i):
    return 2 * i + 1


def right(i):
    return 2 * i + 2


def build_max_heap(A):
    n = int((len(A) // 2)-1)
    for i in range(n, -1,-1):
        max_heapify(A, i)

def heap_sort(A):
    build_max_heap(A)
    sorted_list = []
    for i in range(len(A)):
        A[len(A)-1], A[0] = A[0], A[len(A)-1]
        max_elm = A.pop()
        sorted_list.append(max_elm)
        max_heapify(A, 0)
    return sorted_list
    

A = [7, 16, 14, 2, 19, 8, 11, 3, 1, 12, 6, 15, 9, 5, 4, 18, 20, 13, 10, 17]
print(heap_sort(A))

[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
