# Heaps

**Question 10.1**: Merge sorted files

In [None]:
"""Book solution
    + review comments
    
Time complexity: O(nlogk) with k being the number of input sequences.
    extract-min and insert take O(logk) time
    n is the total number of elements
Space complexity: O(k) beyond the space needed to write the result array
"""

def merge_sorted_arrays(sorted_arrays: List[List[int]]) -> List[int]:
    min_heap: List[Tuple[int, int]] = []
    # Builds a list of iterators for each array in sorted_arrays.
    #   this way you can avoid having to merge the arrays which'll
    #   take up more time
    sorted_arrays_iters = [iter(x) for x in sorted_arrays]
    
    # Puts first element from each iterator in min_heap.
    for i, it in enumerate(sorted_arrays_iters):
        first_element = next(it, None)
        if first_element is not None:
            heapq.heappush(min_heap, (first_element, i))
    # only the first element is important because the arrays are
    # already sorted so the first is the lowest
    
    # then with the heap made:
    #   pop the min value
    #   insert lowest value in result array and iter the popped list
    #      in sorted_array_iters
    #   insert the new head value of the list back into heap
    #   repeat until all values are iter'd and popped
    result = []
    while min_heap:
        smallest_entry, smallest_array_i = heapq.heappop(min_heap)
        smallest_array_iter = sorted_arrays_iters[smallest_array_i]
        result.append(smallest_entry)
        next_element = next(smallest_array_iter, None)
        if next_element is not None:
            heapq.heappush(min_heap, (next_element, smallest_array_i))
    return result


# Pythonic solution, uses the heapq.merge() method which takes multiple inputs.
def merge_sorted_arrays_pythonic(sorted_arrays):
    return list(heapq.merge(*sorted_arrays))

**Question 10.4**: Compute the *k* closest stars

Consider coordinate system for the Milky Way and Earth is at (0, 0, 0). Stars are modeled as points and assume distances are in light years. Milky way consists of approximately 10^12 stars.

How would you compute the k stars which are closest to Earth?

*Hint*: Suppose you know the k closest stars in the first n stars. If the (n+1)th star to be added to the set of k closest stars, which element in that set should be evicted?

If we want to keep track of k elements where the top element is always available for us, we'd use a heap. In this case, because we want to keep the closest k planets to Earth, we'd have a max heap where the furthest away from Earth is removeable (so we always keep the closest). 
Even new planet we get the distance for, we'll push into the heap and then pop the new furthest planet so we'll always have only k planets in the heap.

- Time: O(nlogk) where k is the number of planets that'll always be in the heap and n is the total number of planets being compared
- Space: O(k) 

In [1]:
import math

def k_closest_stars(k: int, stars: list[tuple[int,int,int]]) -> list[tuple[int, int]]:
    # assumes there's at least k stars
    max_heap: list[tuple[int, int]] = []
    earth = [0, 0, 0]
    for i in range(k):
        d = math.dist(earth, stars[i]) * -1
        heapq.heappush(max_heap, (d, i))
        
    # we now have a heap of the first k stars which are currently the closest 10
    # now we check the rest of the stars to see if any of them are closer than the
    # current max.
    for i in range(k, len(stars)):
        d = math.dist(earth, stars[i]) * -1
        heapq.heappushpop(max_heap, (d, i))
    
    # now we should have the top k elements in our heap. 
    # maxheap holds (-dist from earth, index i in star list)
    return max_heap

The book solution has the same idea of using a max heap but they optimize the code further by negating the need for tuples by creating a star class that has an inherent comparable distance that can be used to compare in maxheap.

code below:

In [4]:
class Star:
    def __init__(self, x: float, y: float, z: float) -> None:
        self.x, self.y, self.z = x, y, z
        
    @property
    def distance(self) -> float:
        return math.sqrt(self.x**2 + self.y**2 + self.z**2)
    
    def __lt__(self, rhs: 'Star') -> bool:
        return self.distance < rhs.distance
    


def find_closest_k_stars(star: list[Star], k: int) -> list[Star]:
    # max_heap to store the closest k stars seen so far.
    max_heap: list[Tuple[float, Star]] = []
    for star in stars:
        # Add each star to the max-heap. If the max-heap size exceeds k, remove
        # the maximum element from the max-heap
        # As python has only min-heap, insert tuple (negative of distance, star)
        # to sort in reverse distance order.
        heapq.heappush(max_heap, (-star.distance, star))
        if len(max_heap) == k + 1:
            heapq.heappop(max_heap)
            
    # Iteratively extract from the max-heap, which yields the stars sorted
    # according from furthest to closest.
    return [s[1] for s in heapq.nlargest(k, max_heap)]