# EC2202 Heaps

**Disclaimer.**
This code examples are based on 
1. [KAIST CS206 (Professor Otfried Cheong)](https://otfried.org/courses/cs206/)
2. [LeetCode](https://leetcode.com/)
3. [GeeksForGeeks](https://practice.geeksforgeeks.org/)
4. Coding Interviews

In [None]:
import doctest

## Implementing Priority Queues

### Simple Python List (Slow)

In [None]:
# Priority Queue with Python list

class PriorityQueue():
  def __init__(self):
    self._data = []
    
  def insert(self, x):
    self._data.append(x)
    
  def findmin(self):
    if len(self._data) == 0:
      raise ValueError("empty priority queue")
    return min(self._data)  # O(N)

  # 'ppp' exercise
  # find the min. value and its position
  # then, delete the min. value
  def deletemin(self):
    # sol. #1 O(2N)
    min_val = self.findmin()
    min_idx = self._data.index(min_val)
    self._data.pop(min_idx)
    return min_val

    # sol. #2 O(N)
    if len(self._data) == 0:
      raise ValueError("empty priority queue")
    i = 0
    x = self._data[0]
    for j in range(1, len(self._data)):
      if self._data[j] < x:
        x = self._data[j]
        i = j
    self._data.pop(i)
    return x

### Sorted Python List

In [None]:
# Priority Queue with sorted Python list
# (last element in the list is smallest)

class PriorityQueue():
  def __init__(self):
    self._data = []
    
  def insert(self, x):
    i = 0
    while i < len(self._data) and x < self._data[i]:
      i += 1
    self._data.insert(i, x)
    
  def findmin(self):
    if len(self._data) == 0:
      raise ValueError("empty priority queue")
    return self._data[-1]

  def deletemin(self):
    if len(self._data) == 0:
      raise ValueError("empty priority queue")
    return self._data.pop()

### Binary Heap

Of course a complete binary tree can be implemented as a linked data structure, with Node objects that store references to their parent and children. 

However, because the tree is complete, we can actually store it very compactly in an array or Python list. To do so, simply number the nodes of the tree from top to bottom, left to right, starting with one for the root, so that the nodes are numbered from 1 to n.

We observe that the left child of node i has index 2i and the right child has index 2i + 1. The parent of node i has index ≈(i/2). We will simply store node i in slot i of an array, and we can move from a node to its parent or its children by doing simple arithmetic on the index. (We started numbering from 1 for convenience—the numbers are nicer. Of course this means that we waste slot 0 of the array—not a big problem, but it could be fixed by changing the numbering scheme slightly.)

물론 완전한 이진 트리는 부모 및 자식에 대한 참조를 저장하는 노드 개체와 함께 연결된 데이터 구조로 구현될 수 있습니다. 

그러나 트리가 완전하기 때문에 실제로는 어레이 또는 파이썬 목록에 매우 압축적으로 저장할 수 있습니다. 이렇게 하려면 트리의 노드를 위에서 아래로 왼쪽에서 오른쪽으로, 루트에 대한 노드부터 시작하여 노드에 1부터 n까지 번호를 매기기만 하면 됩니다.

노드 i의 왼쪽 자식은 인덱스 2i이고 오른쪽 자식은 인덱스 2i + 1입니다. 노드 i의 상위 항목에는 색인 ≈(i/2)이 있습니다. 노드 i를 배열의 슬롯 i에 저장하기만 하면 인덱스에서 간단한 산술을 수행하여 노드에서 상위 또는 하위로 이동할 수 있습니다. (편의상 1부터 번호를 매기기 시작했습니다. 숫자가 더 좋습니다.). 물론 이는 어레이의 슬롯 0을 낭비한다는 것을 의미합니다. 큰 문제는 아니지만 번호 체계를 약간 변경하면 해결할 수 있습니다.)

In [None]:
# Priority Queue with binary heap
#       A(1)    
#      /   \
#     B(2)  C(3)  
#   /   \      \
#  D(4)  E(5)   F(7)

# if (say)parent = p 
# then left_child = 2 * p 
# and right_child = 2 * p + 1

class PriorityQueue():
  DEFAULT_CAPACITY = 100

  def __init__(self):
    self._data = [ None ] * PriorityQueue.DEFAULT_CAPACITY
    self._size = 0
      
  def __len__(self):
    return self._size

  def findmin(self):
    if self._size == 0:
      raise ValueError("empty priority queue")
    return self._data[1]  # O(1)

  def insert(self, x):
    if self._size + 1 == len(self._data):
      # double size of the array storing the data
      self._data.extend( [ None ] * len(self._data) )
    self._size += 1
    hole = self._size
    
    # bubble up
    while x < self._data[hole // 2]:
      # exchange the values of the child and the parent
      self._data[hole] = self._data[hole // 2]
      hole //= 2  # inspect the next parent
    self._data[hole] = x
    
  def deletemin(self):
    min_item = self.findmin()         # raises error if empty
    self._data[1] = self._data[self._size]
    self._size -= 1
    self._bubble_down(1)
    return min_item

  # 'ppp' exercise
  def _bubble_down(self, i):
    value = self._data[i]
    hole = i
    child = self._smaller_child(hole, value)
    while child != 0:
      self._data[hole] = self._data[child]
      hole = child
      child = self._smaller_child(hole, value)
    self._data[hole] = value

  # 'ppp' exercise
  # Is one child smaller than element's value?
  # Returns the index of the smaller child, or 0 if no child is smaller
  def _smaller_child(self, index, value):
    child = 2 * index
    if child <= self._size:
      if child != self._size and self._data[child + 1] < self._data[child]:
        child += 1
      if self._data[child] < value:
        return child
    return 0

## Application of Heap

### Default Heap from Python

In [None]:
import heapq  # min heap

heap = []
items = [4, 1, 7, 9, 3]
for item in items:
  heapq.heappush(heap, item)

min_val = heapq.heappop(heap)  # heap[0]

rand_list = [4, 1, 7, 9, 3]
heapq.heapify(heap)
print(rand_list)

### K-th largest element

**Approach 1**
1. Construct a Max Heap.
2. Add all elements into the Max Heap.
3. Traversing and deleting the top element (using pop() or poll() for instance).
4. Repeat Step 3 K times until we find the K-th largest element.

**Approach 2**
1. Construct a Min Heap with size K.
2. Add elements to the Min Heap one by one.
3. When there are K elements in the “Min Heap”, compare the current element with the top element of the Heap:
  - If the current element is not larger than the top element of the Heap, drop it and proceed to the next element.
  - If the current element is larger than the Heap’s top element, delete the Heap’s top element, and add the current element to the “Min Heap”.
4. Repeat Steps 2 and 3 until all elements have been iterated.

In [None]:
import heapq

def kth_smallest(nums, k):
  heap = []
  for num in nums:
    heapq.heappush(heap, num)  # insert
  # heapq.heapify(nums)

  kth_min = None
  for _ in range(k):
    kth_min = heapq.heappop(heap)  # deletemin
  return kth_min

print(kth_smallest([4, 1, 7, 3, 8, 5], 3))
print(heapq.nlargest(4, [4, 1, 7, 3, 8, 5]))
print(heapq.nsmallest(3, [4, 1, 7, 3, 8, 5]))

4
[8, 7, 5, 4]
[1, 3, 4]


### Heap Sort

1. We put the objects inside the array into heap order.
2. We then remove them
one by one using deletemin

In [None]:
# 'ppp' exercise
def heap_sort(nums):
  '''
  >>> heap_sort([4, 1, 7, 3, 8, 5])
  [1, 3, 4, 5, 7, 8]
  '''
  # from class
  hip = []
  for i in nums: heapq.heappush(hip, i)
  return [heapq.heappop(hip) for _ in range(len(nums))]
  
  ##########################################################
  heap = []
  for num in nums:
    heapq.heappush(heap, num)

  sorted_nums = []
  while heap:
    sorted_nums.append(heapq.heappop(heap))
  return sorted_nums

print(heap_sort([4, 1, 7, 3, 8, 5]))

[1, 3, 4, 5, 7, 8]
