# Heaps
specialized binary tree. keys must satisfy heap property.

e.g.
in max-heap: key at each node is at least as great as key stored at its children.

if parent at i then children are at 2i+1 and 2i+2

**insertion**: O(logn)

**lookup at max element**: O(1)

**deletion of max element**: O(logn)

## Example Usecase
From a stream we need to know k longest subtstring. In stream we can't back to read a value. 
As we process inputs we keep track of k longest substring. the string to be evicted from heap when longer stirng is to be added is the smaller one. So, min-heap is going to be effective.

In [1]:
import heapq
def top_k(k, stream):
    # Entries compared by their lengths
    import itertools
    min_heap = [(len(s), s) for s in itertools.islice(stream, k)]
    heapq.heapify(min_heap)
    for next_string in stream:
        heapq.heappushpop(min_heap, (len(next_string), next_string))
    return [p[1] for p in heapq.nsmallest(k, min_heap)]
top_k(3, ["ehll","s","dsaa","ew"])

['dsaa', 'ehll', 'ehll']

## Note
- Use **heap** when all you care about is largest or smallest elements and you don't need to support fast lookup, delete or search operations.
- for k-largest use min-heap and for k-smallest use max-heap

## heap operations
- **heapq.heapify(L)**: which transforms the elements in L into heap in place.
- **heapq.nlargest(k, L)|heap.nsmallest(k,L)**: k-largest|k-smallest elements in L.
- **heapq.heappush(h, e)**: pushes new element on heap h.
- **heapq.heappop(h)**: pops smallest element from heap
- **heapq.heappushpop(h, a)**: pushes a on the heap and then pops and returns the smallest element
- e = h[0] returns smallest element on the heap without popping it.

# Merge Sorted Files
Write a program that takes input a set of sroted sequences and computes the union of these sequences as a sorted sequence.

e.g. [3,5,7], [0,6] and [0,6,28] is input then output should be [0,0,3,5,6,6,7,28]

In [2]:
def merge_sorted_files(sorted_files):
    res = []
    for i in range(len(sorted_files)):
        res += sorted_files[i]
    heapq.heapify(res)
    return res
merge_sorted_files([[3,5,7],[0,6],[0,6,28]])

[0, 0, 3, 5, 6, 7, 6, 28]

### Approach
store the iterator for each of the sorted arrays in another array like (first_elem, arr_num). Put first element from each iterator in min-heap.
while min heap is not empty
 - pull out smallest entry from heap and from what array it belongs.
 - now call next() on whatever array we pulled smallest entry from to get next
 - and push next element on heap

In [3]:
def merge_sorted_files(sorted_files):
    min_heap = []
    sorted_arrays = [iter(x) for x in sorted_files]
    
    # pushing first element from every sorted array on heap
    for i, array_iter in enumerate(sorted_arrays):
        first_element = next(array_iter, None)
        if first_element is not None:
            heapq.heappush(min_heap, (first_element, i))
    
    res = []
    while min_heap:
        smallest_elem, smallest_arr_i = heapq.heappop(min_heap)
        res.append(smallest_elem)
        smallest_arr_iter = sorted_arrays[smallest_arr_i]
        next_smallest_elem = next(smallest_arr_iter, None)
        if next_smallest_elem is not None:
            heapq.heappush(min_heap, (next_smallest_elem, smallest_arr_i))
    return res
merge_sorted_files([[3,5,7],[0,6],[0,6,28]])

[0, 0, 3, 5, 6, 6, 7, 28]

In [4]:
def merge_sorted_arrays(sorted_arrays):
    return list(heapq.merge(*sorted_arrays))

Let k be the number of input sequences. Then there are no more than k elements in the min-heap.
Extract-min and insert take O(log k) time. Hence we can do the merge in O(nlogk) time. Space complexity is O(k)

# Sort an increasing-decreasing array

In [5]:
def sort_inc_dec_arr(arr):
    A = []
    INC, DEC = range(2)
    ls, curr = [arr[0]], INC
    for i in range(1, len(arr)):
        if arr[i] >= arr[i-1] and curr == DEC:
            A.append(ls[::-1])
            ls, curr = [arr[i]], INC
        elif arr[i] <= arr[i-1] and curr == INC:
            A.append(ls)
            ls, curr = [arr[i]], DEC
        else:
            ls.append(arr[i])
    A.append(ls[::-1])
    return merge_sorted_files(A)
sort_inc_dec_arr([57,131,493,294,221,339,418,452,442,190])

[57, 131, 190, 221, 294, 339, 418, 442, 452, 493]

In [6]:
def sort_inc_dec_arr(arr):
    A = []
    INC, DEC = range(2)
    ls, curr = [arr[0]], INC
    for i in range(1, len(arr)+1):
        if (i == len(arr) # we need not append last list separately after loop
            or (arr[i] >= arr[i-1] and curr == DEC)
            or (arr[i] <= arr[i-1] and curr == INC)
        ):
            A.append(ls if curr == INC else ls[::-1])
            ls = [arr[i]] if i != len(arr) else None
            curr = INC if curr == DEC else DEC
        else:
            ls.append(arr[i])
    return merge_sorted_files(A)
sort_inc_dec_arr([57,131,493,294,221,339,418,452,442,190])

[57, 131, 190, 221, 294, 339, 418, 442, 452, 493]

In [7]:
def sort_inc_dec_arr(arr):
    A = []
    INC, DEC = range(2)
    start_idx, curr = 0, INC
    for i in range(1, len(arr)+1):
        if (i == len(arr)
            or (arr[i] <= arr[i-1] and curr == INC)
            or (arr[i] >= arr[i-1] and curr == DEC)
           ):
            A.append(arr[start_idx:i] if curr == INC 
                     else arr[i-1:start_idx-1:-1])
            start_idx = i
            curr = INC if curr == DEC else DEC
    return merge_sorted_files(A)
sort_inc_dec_arr([57,131,493,294,221,339,418,452,442,190])

[57, 131, 190, 221, 294, 339, 418, 442, 452, 493]

Time complexity: O(nlogk)

# Sort an almost sorted array
Write a program which takes input a very long sequence of numbers and prints the numbers in sorted order. Each element is atmost k away from its correctly sorted position.

In [8]:
def sort_approx_sorted_array(A, k):
    min_heap = A[:k+1]
    heapq.heapify(min_heap)
    res = []
    for i in range(k+1, len(A)):
        res.append(heapq.heappushpop(min_heap, A[i]))
    res = res + min_heap
    return res
sort_approx_sorted_array([3,-1,2,6,4,5,8], 2)

[-1, 2, 3, 4, 5, 8, 6]

In [9]:
def sort_approx_sorted_array(A, k):
    from itertools import islice
    min_heap, res = [], []
    for i in islice(A, k):
        heapq.heappush(min_heap, i)
    for i in A[k:]:
        res.append(heapq.heappushpop(min_heap, i))
    while min_heap:
        res.append(heapq.heappop(min_heap))
    return res
sort_approx_sorted_array([3,-1,2,6,4,5,8], 2)

[-1, 2, 3, 4, 5, 6, 8]

# Compute K closest stars
Consider a coordinate system for the Milky Wayk, in which the Earth(0,0,0). Model stars as points, and assume distances are in light years. Compute k closest stars to earth. 

Time: O(nlogk) Space: O(k)

In [10]:
class Star:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
        
    @property
    def distance(self):
        return ((self.x)**2 + (self.y)**2 + (self.z)**2)**0.5
    def __lt__(self, rhs):
        return self.distance < rhs.distance
    def __repr__(self):
        return f"star({self.x},{self.y},{self.z})"

def k_closest_stars(A, k):
    from itertools import islice
    max_heap = []
    for star in islice(A, k):
        heapq.heappush(max_heap, (-star.distance, star))
    for star in A[k:]:
        heapq.heappushpop(max_heap, (-star.distance, star))
    return [star[1] for star in max_heap]
A = [Star(1,1,1), Star(3,3,3), Star(-4,-4,-4), Star(6,6,6)]
k_closest_stars(A, 2)

[star(3,3,3), star(1,1,1)]

# Compute the median of online data
Compute the running median of a sequence of numbers. The sequence is presented in a streaming fashion - we cannot back up to read an earlier value.

We can take advantage of previous computations. Median divides the array into two subarrays. Now next median will be largest element of smaller half or smallest element of larger half. We created two heaps - minheap for larger half and maxheap for smaller half. Now first element comes in we first move it to min heap then pop out and move into max heap. median is that first element itself. Now comes second element we move it into min heap then pop out and move into max heap. Now you see whatever elements comes will move into max heap ultimately. We need to balance min heap and max heap. So if max heap is > then min heap then pull out largest from max heap and push it into min heap.<br/>
now first element comes into min heap then into max heap and now max heap is greater than min heap it moves back to min heap. now comes second element we move it into min heap and pull out minimum element from min heap and push it into max heap. now both lengths are equal which means even number of elements so their average would be next median. now comes the third element. if that element is larger than median then we push it into min heap and it will stay in min heap and smallest element from min heap will be pushed to max heap. and so on.

In [11]:
def median_online_data(stream):
    min_heap, max_heap = [], []
    result = []
    for x in stream:
        heapq.heappush(max_heap, -heapq.heappushpop(min_heap, x))
        if len(max_heap) > len(min_heap):
            heapq.heappush(min_heap, -heapq.heappop(max_heap))
        result.append(0.5 * (min_heap[0] + (-max_heap[0]))
                      if len(min_heap) == len(max_heap)
                      else min_heap[0]
                     )
    return result

In [18]:
 def online_median(A):
    min_heap, max_heap = [], []
    ans = []
    for i in range(len(A)):
        heapq.heappush(min_heap, A[i])
        if len(min_heap) > len(max_heap) + 1:
            heapq.heappush(max_heap, -heapq.heappop(min_heap))
            ans.append((min_heap[0] + (-max_heap[0])) / 2)
            continue
        ans.append(min_heap[0])
    return ans
online_median([1,0,3,5,2,0,1])

[1, 0.5, 1, 2.0, 2, 1.5, 1]