# Elements of Programming Interview

# Heaps

A heap is a specialized binary tree. Specifically

1. It is a **complete tree***.
2. The keys must satisfy the **heap property**.
    * **Max-heap**: The key at each node is at least as **great** as the keys stored at its children. 
    * **Min-heap**: The key at each node is at least as **small** as the keys stored at its children.

*A complete binary tree is a binary tree in which every level of the tree is fully filled, except perhaps the last level. To the extent that the last level is filled, it is filled left to right.

## Implemented as an array

The children of the node at index $i$ are at indices $2i+1$ and $2i+2$

## Time Complexities

| Operation | Time complexity |
| --------- | --------------- |
| Insertion | $O(log{n})$     |
| Lookup for max (max-heap) or min (min-heap) | $O(1)$ |
| Delete max (max-heap) or min (min-heap) | $O(log{n})$ |
| Searching for arbitrary keys | $O(n)$ |


## Note

* Use a heap when **all you care about** is the **largest** or **smallest** elements, and you **do not need** to support fast lookup, delete, or search operations for arbitrary elements.

* A heap is a good choice when you need to comute the $k$ **largest or $k$ smallest** elements in a collection. For the former, use a min-heap, for the latter, use a max-heap.


**Problem**: Write a program which takes a sequence of strings presented in "streaming" fashion: you cannot back up to read an earlier valu. The program must compute the $k$ longest strings in the sequence. 
**Solution**: A mean-heap (not a max-heap!) is the right data structure for this application, since it supports efficient find-min, remove-min, and insert.

In [25]:
import heapq
import itertools

def top_k1(k, stream):
    # Entries are compared by their lengths.
    min_heap = [(len(s), s) for s in itertools.islice(stream, k)]
    heapq.heapify(min_heap)
    for next_string in stream:
        # Push next_string and pop the shortest string in min_heap.
        heapq.heappushpop(min_heap, (len(next_string), next_string))
    return [p[1] for p in heapq.nsmallest(k, min_heap)]

stream = (word for word in ['bed', 'breakfast', 'lunch', 'dinner', 'restaurant', 'coffee', 'tea', 'a', 'is'])
print(top_k1(4, stream))

['coffee', 'dinner', 'breakfast', 'restaurant']


**Notice** that in the above code **stream = (word for word in ['bed', 'breakfast', 'lunch', 'dinner', 'restaurant', 'coffee', 'tea', 'a', 'is'])** is a generator, not a sequence. That's why in the function *top_k* we iterate over the elements of *stream* and not *stream[k:]*. We would use *stream[k:]* if we were given a sequence (for example a list) instead of a streaming data. See bellow for the difference.

In [26]:
import heapq
import itertools

def top_k2(k, sequence):
    # Entries are compared by their lengths.
    min_heap = [(len(s), s) for s in sequence[:k]]
    heapq.heapify(min_heap)
    for next_string in sequence[k:]:
        # Push next_string and pop the shortest string in min_heap.
        heapq.heappushpop(min_heap, (len(next_string), next_string))
    return [p[1] for p in heapq.nsmallest(k, min_heap)]

sequence = ['bed', 'breakfast', 'lunch', 'dinner', 'restaurant', 'coffee', 'tea', 'a', 'is']
print(top_k2(4, sequence))

['coffee', 'dinner', 'breakfast', 'restaurant']


## *heapq* module

**Important**: heapq only provides min-heap functionality. To build a max-heap on integers or floats, insert their negative to get the effect of a max-heap using heapq.

* **heapq.heapify(L)**: Transforms the elements in L into a heap in-place.
* **heapq.nlargest(k, L) (heapq.nsmallest(k, L))**: Returns the $k$ largest (smallest) elements in L.
* **heapq.heappush(h, e)**: Pushes a new element on the heap.
* **heapq.heappop(h)**: Pops the smallest element from the heap.
* **heapq.heappushpop(h, a)**: Pushes $a$ on the heap and then pops and returns the smallest element.
* **e = h[0]**: Returns the smallest element on the heap without popping it.

