# Chapter 7: Binary Heap

## Trees in Arrays
The previous lectures store a binary tree with the help of pointer-liked structures, in which each item contains references to its children. 
If the tree is a `complete binary tree`, there is a useful `array-based` alternative.  

**Definition**. A binary tree is complete if every level, except possibly the last, is completely filled, and all the leaves on the last level are placed as far to the left as possible.

A complete binary tree is one that can be obtained by filling the nodes starting with the root, and then each next level in turn, always from the left, until one runs out of nodes. 
Complete binary trees always have minimal height for their size $n$, namely $logn$, and are always perfectly balanced.

- where to store in an array P, 
    - root is at $0$
    - left(i) = $2i+1$
    - right(i) = $2i+2$
    - parent(i) = $floor(\frac{i-1}{2})$

- Storing a binary tree as an array is not efficient 
    - if the tree is not complete, reserve space in the array for every possible node in the tree
    - for binary search tree, insertion or deletion will involve shifting large portions of the array




## Heaps

The (binary) heap data structure is an array object that we can view as a nearly complete binary tree.
  - Each node of the tree corresponds to an element of the array. 
  - The tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point
    - An array A that represents a heap is an object with two attributes: `A.length`, which (as usual) gives the number of elements in the array, and `A.size`, which represents how many elements in the heap are stored within array `A`.
    - Given the index `i`, we can easily calculate its parent, left and right child based on the way they are stored.

Max-heap:
  - `max-heap property`: for every node `i` except the root, $A[parent(i)] >= A[i]$
  - `max-heap` is an array satisfying max-heap property at all nodes
  - **Min-heap** is symmetric



### Heap Operations

```Python
import math
class P():
    def __init__(self, n):
        self.length = n
        self.size = 0
        self.A = [0]*n
    
    def left(self, i):
        return 2*i + 1
    
    def right(self, i):
        return 2*i + 2
    
    def parent(self, i):
        return math.floor((i-1)/2)
    
    # The height of the tree is $floor(log(n))$. The height for each node $i$ is $floor(log(n)) - floor(log(i+1))$
    def getHeight(self, i):
        return math.floor(math.log(n)) - math.floor(math.log(i+1))
    
    # maintain the max-heap property of the `i`th node in heap `P` in $O(log(n))$ time from top to down
    # check whether A[i] >= A[j] for j in {left(i), right(i)}
    #    - if not, swap A[i] with A[j] for child j in {left(i), right(i)} with maximum value and recursively `heapifyDown(A, j)`.
    def isValid(self, parent, left, right):
        return self.A[parent] > self.A[left] and self.A[parent] > self.A[right]

    def heapifyDown(self, i):
        left = self.left(i)
        right = self.right(i)
        
        if not self.isValid(i, left, right)
            if self.isValid(left, i, right):
                self.A[i], self.A[left] = self.A[left], self.A[i]
                j = left
            elif: self.isValid(right, left, i):
                self.A[i], self.A[right] = self.A[right], self.A[i]
                j = right
        
            self.heapifyDown(j)
    # maintain the max-heap property of the `i`th node in heap `P` in $O(log(n))$ time from bottom to up
    # check whether A[i] >= A[j] for j in {left(i), right(i)}
    #    - if not, swap A[i] with A[j] for child j in {left(i), right(i)} with maximum value and recursively `heapifyDown(A, j)`.
    def heapifyUp(self, i):
        parent = self.parent(i)
        
        if self.A[i] > self.A[parent]:
            self.A[i], self.A[parent] = self.A[parent], self.A[i]
            j = parent
            self.heapifyUp(j)

    # insert an item with key k in the heap A ~ $O(logn)$
    #   - append the item with key k to the end of the heap: A[n+1] = k
    #   - heapifyUp(A, n+1)
    def insert(self, key):
        self.A.append(key)
        self.length += 1
        self.size +=1

        self.heapifyUp(self.length-1)

    def getMax(self):
        return P.A[0]
    
    def pop(self):
        self.size -= 1
        self.length -= 1
        self.A.pop()

    # deleteMax:
    #  can only easily delete the last element in a dynamic array, but the max of a max_heap is at the root
    #    - normally it requres $O(n)$ time by removeing the first element in a dynamic array. can we do it in $logn$ time?
    #    - algorithm
    #        - swap the max at root node $i = 0$ with the last item at node $n-1$ in heap array, and then delete the last
    #        - update heap size by -1
    #        - `heapifyDown(A, 0)` after swaping to maintain `max-heap property`
    #        - return the deleted node
    def deleteMax(self):
        self.A[0], self.A[-1] = self.A[-1], self.A[0]
        max = self.pop()
        self.heapifyDown(0)

        return max


### Heap Sort

HEAPSORT: `heapsort(A)`:
- a in-place sorting algorithm that runs in $O(nlogn)$. note that `merge_sort` is also $O(nlogn)$ but requires additional $O(n)$ space.
- algorithm
  - for i in range(n, -1, 0)
  - `deleteMax(A)`
   

     
## Priority Queue

- one of the most popular applications of a heap: as an efficient `priority queue`
    - a priority queue is a data structure for maintaining a set `S` of elements, each with an associated value called a key.
    - when we use a heap to implement a priority queue, therefore, we often need to store a handle (or key) to the corresponding application object in each heap element. 
        The exact makeup of the handle (such as a pointer or an integer) depends on the application. 
        Similarly, we need to store a handle to the corresponding heap element in each application object.
    - operations
        - `get_max(A)` -> get max of a max-heap A in $O(1)$
            - return $A[0]$
        - `delete_max(A)` -> same as a max-heap
        - `insert(A, k)` -> insert an item with key $k$ in the heap A ~ $O(logn)$
            - append the item with key $k$ to the end of the heap: $A[n+1] = k$
            - `max_heapify_up(A, n+1)`
        - `delete_root()` -> To use a binary heap tree as a priority queue, we will regularly need to delete the root, i.e. remove the node with the highest priority.
            - this equals to `delete_max()`

        - `delete(A, k)` -> delete an item with key $k$ in the heap A ~ $O(logn)$
            - same as the `delete` method in a max_heap
        

## Problems

### 1. Selecting K elements from array to Maximize Goal

This type of question typically have a genernal statement:
- given two attributes of a task, e.g., two arrays
- choose `k` elements so that some goal is maximized/minimized -> usually subsequence not a subarray/substring problem

The selection order is not relevant, therefore sorting is typically used here to find the max/min value.


Leetcode problem:
- [2542: maximum subsequence score](https://leetcode.com/problems/maximum-subsequence-score)
- [0502: IPO](https://leetcode.com/problems/ipo/)
- [0857: minimum cost to hire k workers](https://leetcode.com/problems/minimum-cost-to-hire-k-workers/)
- [2813. Maximum Elegance of a K-Length Subsequence](https://leetcode.com/problems/maximum-elegance-of-a-k-length-subsequence)


The following template may be used:

```python

sorted_array = ...
heap = []
state = 0

for a, b in sorted_array:
    update current k-window state
    push to heap
    
    # get the current k-window score
    if len(heap) == k:
        calcualte the current max/min value
        pop heap

```

**2542: maximum subsequence score**


You are given two 0-indexed integer arrays nums1 and nums2 of equal length n and a positive integer k. You must choose a subsequence of indices from nums1 of length k.

For chosen indices i0, i1, ..., ik - 1, your score is defined as:

The sum of the selected elements from nums1 multiplied with the minimum of the selected elements from nums2.
It can defined simply as: (nums1[i0] + nums1[i1] +...+ nums1[ik - 1]) * min(nums2[i0] , nums2[i1], ... ,nums2[ik - 1]).
Return the maximum possible score.

A subsequence of indices of an array is a set that can be derived from the set {0, 1, ..., n-1} by deleting some or no elements.

```
Example 1:

Input: nums1 = [1,3,3,2], nums2 = [2,1,3,4], k = 3
Output: 12
Explanation: 
The four possible subsequence scores are:
- We choose the indices 0, 1, and 2 with score = (1+3+3) * min(2,1,3) = 7.
- We choose the indices 0, 1, and 3 with score = (1+3+2) * min(2,1,4) = 6. 
- We choose the indices 0, 2, and 3 with score = (1+3+2) * min(2,3,4) = 12. 
- We choose the indices 1, 2, and 3 with score = (3+3+2) * min(1,3,4) = 8.
Therefore, we return the max score, which is 12.
Example 2:

Input: nums1 = [4,2,3,1,1], nums2 = [7,5,10,9,6], k = 1
Output: 30
Explanation: 
Choosing index 2 is optimal: nums1[2] * nums2[2] = 3 * 10 = 30 is the maximum possible score.
```

**Algorithm:**
- sort both arrays based on `nums2` in non-increasing order
- traverse the sorted array,
    - the current traversal of `nums2` is the minimum so far
    - to maximize the score, we need maximize the sum of `k` selected elements in `nums1`, which can be maintained in a `k-length` priority heap that records the top `k` greatest elements

In [2]:
import heapq
def maxScore(nums1, nums2, k):
    res = 0
    prefix = 0 # the current sum of the top-k elements in nums1
    min_hp = [] # a min heap to store the top-k elements in nums1
    
    for b, a in sorted(zip(nums2, nums1), reverse=True):
        prefix += a
        heapq.heappush(min_hp, a)
        
        if len(min_hp) == k:
            res = max(res, prefix * b)
            prefix -= heapq.heappop(min_hp)
    
    return res

# tests 
nums1 = [1, 3, 3, 2]
nums2 = [2, 1, 3, 4]
k = 3
print(maxScore(nums1, nums2, k)) # expects 12

nums1 = [4, 2, 3, 1, 1]
nums2 = [7, 5, 10, 9, 6]
k = 1
print(maxScore(nums1, nums2, k)) # expects 30

12
30


**502: IPO**

You are given n projects where the ith project has a pure profit profits[i] and a minimum capital of capital[i] is needed to start it.

Initially, you have w capital. When you finish a project, you will obtain its pure profit and the profit will be added to your total capital.

Pick a list of at most k distinct projects from given projects to maximize your final capital, and return the final maximized capital.

```
Example 1:

Input: k = 2, w = 0, profits = [1,2,3], capital = [0,1,1]
Output: 4
Explanation: Since your initial capital is 0, you can only start the project indexed 0.
After finishing it you will obtain profit 1 and your capital becomes 1.
With capital 1, you can either start the project indexed 1 or the project indexed 2.
Since you can choose at most 2 projects, you need to finish the project indexed 2 to get the maximum capital.
Therefore, output the final maximized capital, which is 0 + 1 + 3 = 4.
Example 2:

Input: k = 3, w = 0, profits = [1,2,3], capital = [0,1,2]
Output: 6
``


**Algorithm**

We cannot do arbitrary `k` projects, because each project has a minimum captital to kick on.

```
- sort two arrays by `capital` in increasing order
- for each `capital` limit, we will greedily choose the most profitable project
    - for i in range(k)
        - push new available projects to max-heap
        - choose the most profitable projects
```

TODO: can we maintain a k-length heap here?
- seems not because the second choice of the project relies on the previous choice (e.g., the profit it can bring)

In [5]:
def IPO():
    pass 

**857 Minimum Cost to Hire K workers**


**Algorithm**

- sort workers based on `pay-to-quality` ratio in increasing order.
- maintain a `k-length` max heap to track the `top-k` minimum quality workers to make the current cost minimum


In [7]:
def mincostToHireWorkers(quality, wage, k):
    pay_to_quality = sorted((w / q, q) for w, q in zip(wage, quality))
    
    min_cost = float('inf')
    sum_quality = 0
    max_heap = []
    
    for ratio, q in pay_to_quality:
        sum_quality += q 
        heapq.heappush(max_heap, -q)
        
        if len(max_heap) == k:
            min_cost = min(min_cost, sum_quality * ratio)
            sum_quality += heapq.heappop(max_heap)
    
    return min_cost

# test 
quality = [10, 20, 5]
wage = [70, 50, 30]
k = 2
print(mincostToHireWorkers(quality, wage, k)) # expects 105.0

quality = [3, 1, 10, 10, 1]
wage = [4, 8, 2, 2, 7]
k = 3
print(mincostToHireWorkers(quality, wage, k)) # expects 30.666666666666668

105.0
30.666666666666664
