# Problem Statement



### K largest from an array of size n in n log k

Given an (unsorted) array of n elements, return the largest k elements.
- Simplistic, obvious approach? (n log n)
- You are asked to do it in (n log k).
- You will not get full marks if your code's runtime is not O(n log k).
- To clarify any misunderstanding on the correct parameters, here is the functions signature. Fill-in the blanks.

def largest_k(a,k=1):
    """a is an array of n elements. We return an array of the k largest."""
    return b

## General Idea

We create a min-heap of size 'k' and add or don't add the element a[i] into the heap. The smallest element in the heap is at position 1. So we compare every element in the array with the heap at position 1, and add to the heap depending on the comparison. Then we return the heap which would contain the largest 'k' elements of the array. 

## Code 

In [1]:
# functions for min-heap to work


def new_heap(n):
    return [0]+[0]*n


# Inserting element e into min-heap a at the end
def insert(a, e):
    a[0] = a[0] + 1
    a[a[0]] = e
    min_heap_fix_up(a,a[0])


# Fix up from position i to restore min-heap property of heap a
def min_heap_fix_up(a, i):
    while i > 1:
        p = i // 2
        # determines if heap is max-heap or min-heap
        #       \/
        if a[p] > a[i]:
            a[p],a[i] = a[i],a[p]
            i = p
        else:
            return


# remove the top element and fix the rest of the heap from that point
def extract_first(a):
    e,a[1],a[0] = a[1],a[a[0]],a[0]-1
    min_heap_fix_down(a,1)
    a[a[0]+1]=0
    return e


# starting from i, go down and fix the heap by swapping parent and child
def min_heap_fix_down(a, i):
    while 2*i <= a[0]:
        c = 2*i
        if c+1 <= a[0]:
            # switch to the smaller of the two children
            if a[c+1] < a[c]:
                c = c+1
        # if the child is smaller then swap with parent
        if a[i] > a[c]:
            a[i],a[c] = a[c],a[i]
            i = c
        else:
            return

In [2]:
def largest_k_heap(a,k=1):
    # create heap and add elements to heap
    h = new_heap(k)
    for i in range(len(a)):
        # if the heap is full
        if h[0] == k:
            if a[i] > h[1]:
                h[1] = a[i]
                min_heap_fix_down(h, 1)     

        # if the heap is not full
        else:
            insert(h, a[i])

    # return largest k by returning the heap sorted
    b = [None]*k
    for j in range(k-1, -1, -1):
        b[j] = extract_first(h)
    
    return b

## Tests

In [3]:
import random
test_results = []
for _ in range (1000):
    a = random.sample(range(1, 100), random. randint(5, 20))
    k = random. randint(1, len(a))
    test_results.append(all(x in sorted(a)[-k:] for x in largest_k_heap(a, k)))
print(all(test_results))

True


## Proof of Correctness

The outer for loop goes through every element 'i' in the array 'a'. The element is either added or not added to the min-heap of size 'k'. If the element is smaller than the smallest value in the heap (the value at position 1), then it is not added to the heap. 

If the element is larger than or equal to the smallest value, and the heap size is smaller than k, then the element will be inserted at the end of the heap and heapify. If the heap is full, the smallest vlaue will be replaced by the element and the heap will re-heapify.

After the for loop, the largest 'k' elements will be in the heap and the heap can be read from smallest to largest by extracting the first element over and over until the heap runs out. 

## Runtime

The heap structure is a binary tree. So the worst case for inserting an element into the heap is to traversing the whole heap top-to-bottom (or bottom-to-top) which takes log_base2(len(heap)). Since the size of the heap is fixed to 'k', Inserting should take log(k). 

If we go through the entire array 'a' of length 'n', and worst case scenario insert every element, the final runtime would be 
O(n log(k)). 