# Problem Statement



### K largest from an array of size n in n log k

Given an (unsorted) array of n elements, return the largest k elements.
- Simplistic, obvious approach? (n log n)
- You are asked to do it in (n log k).
- You will not get full marks if your code's runtime is not O(n log k).
- To clarify any misunderstanding on the correct parameters, here is the functions signature. Fill-in the blanks.

def largest_k(a,k=1):
    """a is an array of n elements. We return an array of the k largest."""
    return b

## General Idea

We create a min-heap of size k and only add the element a[i] to the heap if it is larger than or equal to the smallest element in the heap. The smallest element in the heap is at position 1. So we compare every element in the array with the heap at position 1, and add to the heap depending on the comparison and the size of the heap at the time.

## Code 

In [6]:
# O(n k)
def largest_k_linear(a,k=1):
    b = [0]*k

    # goes through the entire array 'a' in O(n)
    for i in a:
        # goes through the array 'b' in worse case O(k)
        for j in b:
            if a[i] > b[j]:
                b.insert(j, a[i])
                b = b[0:k]
                break
    return b

In [7]:
# O(n log n) + O(k) = O(n log n)
def largest_k_sorted(a, k=1):
    # can be assumed to sort in O(n log n)
    a = a.sort(reverse=True)
    # can be assumed to retrieve first k elements in O(k)
    return a[0:k]

In [15]:
# a is an array of n elements. We return an array of the k largest. 
# for every element in a, insert in sorted array b of size k. insert using binary search
def largest_k_binary(a,k=1):
    b = [0]*k

    # goes through the entire array 'a' in O(n)
    for i in range(len(a)):
        l, r = 0, len(b)-1
        
        # inserts element into 'b' in O(log k)
        while (l <= r):
            m = l + (r-l)//2
            
            # code if you reach the last element
            if l == r:
                if a[i] >= b[m]:
                    b.insert(m, a[i])
                    b.pop()
                    break            
                elif (a[i] < b[m]):
                    b.insert(m+1, a[i])
                    b.pop()
                    break
                else:
                    break
            
            # code if you by chance come across an element of the same value
            if a[i] == b[m]:
                b.insert(m, a[i])
                b.pop()
                break

            # code to cut the searching space in half every time
            if a[i] > b[m]:
                r = m-1            
            if a[i] < b[m]:
                l = m+1

    return b

In [48]:
# functions for min-heap to work
def new_heap(n):
    return [0]+[0]*n

# Inserting element e into min-heap a at the end
def insert(a, e):
    a[0] = a[0] + 1
    a[a[0]] = e
    min_heap_fix_up(a,a[0])


# Fix up from position i to restore min-heap property of heap a
def min_heap_fix_up(a, i):
    while i > 1:
        p = i // 2
        # determines if heap is max-heap or min-heap
        #       \/
        if a[p] > a[i]:
            a[p],a[i] = a[i],a[p]
            i = p
        else:
            return


# remove the top element and fix the rest of the heap from that point
def extract_first(a):
    e,a[1],a[0] = a[1],a[a[0]],a[0]-1
    min_heap_fix_down(a,1)
    a[a[0]+1]=0
    return e


# starting from i, go down and fix the heap by swapping parent and child
def min_heap_fix_down(a, i):
    while 2*i <= a[0]:
        c = 2*i
        if c+1 <= a[0]:
            # switch to the smaller of the two children
            if a[c+1] < a[c]:
                c = c+1
        # if the child is smaller then swap with parent
        if a[i] > a[c]:
            a[i],a[c] = a[c],a[i]
            i = c
        else:
            return

In [45]:
def largest_k_heap(a,k=1):
    # create heap and add elements to heap
    h = new_heap(k)
    for i in range(len(a)):
        # if the heap is full
        if h[0] == k:
            if a[i] > h[1]:
                h[1] = a[i]
                min_heap_fix_down(h, 1)     

        # if the heap is not full
        else:
            insert(h, a[i])

    # return largest k by returning the heap sorted
    b = [None]*k
    for j in range(k-1, -1, -1):
        b[j] = extract_first(h)
    
    return b

## Tests

In [47]:
v = [18,5,7,10,3,6]


largest_k_heap(v, 4)

[18, 10, 7, 6]

In [16]:
import random
test_results = [1]
for _ in range (100):
    a = random.sample(range(1, 100), random. randint(5, 20))
    k = random. randint(1, len(a))
    test_results.append(all(x in sorted(a)[-k:] for x in largest_k_binary(a, k)))
print(all(test_results))

False


In [42]:
import random
def generate_array(n):
    a = []
    for _ in range(n):
        a.append(random.randint(0,50))
    return a

In [46]:
a = generate_array(10)
print(a)
print(largest_k_heap(a, 3))

[6, 33, 40, 23, 33, 38, 50, 31, 21, 7]
[50, 40, 38]


## Proof of Correctness

## Runtime

Inserting an element into the heap takes log(len(heap)). Since the size of the heap is 'k', Inserting should take log(k). If we go through the entire array 'a' of length 'n', and worse case insert every element into the heap, the whole runtime should be O(n log(k)). 