# Heap Data structure
In this notebook we will look at heap data structure and come up with a simple implementation in Python

We have seen Queues and Stack before which are support FIFO(First In First Out) and LIFO(Last In First Out) ordering of elements added to these data structures respectively. They are both used in BFS and DFS traversal of graph/trees respectively. We will now look at a another special type of data structure called Heap which has a very typical usecases.

When choosing a Data structure its important to think which operation we will perfrom frequently. For example, in Djikstra's algorithm, a part of it goes through all edges and vertices (m + n) gives us a complexity of $\theta(m + n)$, however, find the next vertex with lowest Djikstra score requires $\theta(n)$ and thus the entire algorithm has complexity $\theta{((m + n) \cdot n)}$ which is quadratic. Imagine we have an algorithm which gives us this next vertex to pick in $\theta(log(n))$, then our algorithm's complexity is $\theta((m + n) \cdot log(n))$ which is way faster than quadratic complexity. As choosing the next vertex in Djikstra's algorothm is a frequent operation, making it run faster makes the entire algorithm run faster.

With this in mind, lets define the heap data structure
***
Heap data structure lets us maintain the minimum/maximum value of an evolving set of objects.

The key here is the word evolving. Finding the minimum/maximum from a fixed set of values can be done in linear time. However maintaining the minimum and maximum from an evolving stream of objects supporting two operations Extract-Min(or Extract-Max) and Insert is not straight forward. We may think of sorting the numbers, and lets look at the time complexity of these operations

- Case 1, Sorting the array

    - Extract-Min: The time complexity if extracting the min value from a sorted set is $\theta(1)$
    - Insert: Initial operation with a list of numbers will require $\theta(n \cdot log(n))$ with each subsequent insert requiring linear time $\theta(n)$
- Case 2: Keeping unordered linked list

    - Extract-Min: Scanning the unordered linked list to extract the minimum will take $\theta(n)$
    - Insert: This straightforward and we just add the object to the end of the linked list in $\theta(1)$
As we can see, both options has a linear time operation for either insert of extract and what we need is s datastructure that allows both these operations to be performed much faster than linear time.

The Heap Datastructure will give us the following running time guarantees

Operation	Complexity
Insert	$\theta(logn)$
Extract-Min	$\theta(logn)$
Find-Min	$\theta(1)$
Delete	$\theta(logn)$
Heapify	$\theta(n)$
Naive implementation of Find-Min simply extracts min and inserts it back in $\log(n)$ time but we will implement the datastructure which will do it in constant time

Similarly, heapify can simply sort the input in $\theta(nlog(n))$ time (or perform Insert on all n elements) but we will see how we can heapify the unordered array in linear time.

Before we implement this datastructure, lets look at a very good use of it. Let's start with Selection sort algorithm which we will implement below.



In [11]:
def selectionSort(arr):
    print('Before Sorting', arr)
    for i in range(len(arr)-1):
        minidx = i
        for j in range(i+1,len(arr)):
            if arr[minidx] > arr[j]:
                minidx = j
        if minidx != i:
            arr[i], arr[minidx] = arr[minidx], arr[i]
    print('After Sorting', arr)
    
selectionSort([2, 4, 1, 6, 9, 7, 3, 5, 8])

('Before Sorting', [2, 4, 1, 6, 9, 7, 3, 5, 8])
('After Sorting', [1, 2, 3, 4, 5, 6, 7, 8, 9])


As we see above, selection sort scans all elements after the index i to find the minimum value after the index at i and swaps the minimim found at after i with i if we find one. Thus in first iteration we have n comparisons and subsequent comparisons are 1 less then previous iteration, Therefore the number of comparisons for an array of size n is n + (n - 1) + (n - 2) + ... 1 = $\frac{(n)(n + 1)}{2}$ which is $\theta(n^2)$

As we can see the most frequent operation we do is find the minimum starting at an index. We therefore see a good use of heap here where initially heapify the array in linear time and then keep extracting the minimum element in $\theta(log(n))$ n times giving us the time complexity of $\theta(n \cdot log(n))$

We also know that no comparison based sorting algorithm can perform better than $\theta(n \cdot log(n))$, which also means heap cannot perform Extract-Min better than $\theta(log(n))$ as any better complexity will give us the time complexity of the sorting algorithm better than $\theta(n \cdot log(n))$ which is not possible.
***
Quiz 10.1

The answer of (b), $\theta(n \cdot log(n))$
***
One application of Heaps is median maintenance. The goal of this problem is to find the median of the given stream of numbers. Finding median of a static list of numbers if not difficult. However, doing so for a stream of numbers efficiently requires us to use two Heaps. Let us write a Python implementation of this problem. Since we havent implemented heaps ourselves, we will use the standard Python package for heaps heapq