# CSPB 3104 Assignment 3:

***
# Instructions

This assignment is to be completed as a python3 notebook.

The questions  provided  below will ask you to either write code or 
write answers in the form of markdown.

 Markdown syntax guide is here: [click here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

Using markdown you can typeset formulae using latex.
This way you can write nice readable answers with formulae like thus:

The algorithm runs in time $\Theta\left(n^{2.1\log_2(\log_2( n \log^*(n)))}\right)$, 
where $\log^*(n)$ is the inverse _Ackerman_ function.

__Double click anywhere on this box to find out how your instructor typeset it. Press Shift+Enter to go back.__

***

## Question 1

Answer the following questions about heaps.

__1(a)__  Write down an algorithm to find the third smallest element in a minheap with more than $3$ elements. You may write pseudocode or english description of the algorithm's steps. What is the running time complexity on a heap of size $n$? * Assume all elements in the heap are distinct *






We know that the smallest element in the heap is going to be the root of the heap, this then means we need to compare the left child's child with that of the right child of the root to determine which one is the third smallest. This is because the left child of the root is going to be the second smallest element in the root. So we only have to compare the left grand child of the root and the right child of the root to see what the third element is. Namely:

```
// Just three elements
FindThirdSmallest(rootNode):
    // Root will always be the smallest
    smallestElement = NULL;

    leftChild = root->left->data; // Left child of root
    rightChild = root->right->data; // Right child of root

    // Compare smaller of two elements
    if (leftChild < rightChild) {
        smallestElement = leftChild;
    }
    else {
        smallestElement = rightChild;
    }

    return smallestElement;

// More than three elements
FindThirdSmallest(rootNode):
    // Root will always be the smallest
    smallestElement = NULL;

    leftChild = root->left->data; // Left child of root
    rightChild = root->right->data; // Right child of root
    leftGrandChild = leftChild->left->data; // Left grand child

    // Compare smaller of two elements
    if (leftGrandChild < rightChild) {
        smallestElement = leftGrandChild;
    }
    else {
        smallestElement = rightChild;
    }

    return smallestElement;
```

Because the aforementioned executions are of constant time, the time complexity of this algorithm for a heap of size `n` is going to be

\begin{equation}
\color{blue}{\mathcal{O}(1)}.
\end{equation}

__1(b)__ We wish to find the largest element in a min-heap represented by array $A[1], \ldots, A[n]$. Show using a series of examples for $n=7$ that any element starting from $A[\lceil{\frac{n}{2}}\rceil], \ldots, A[n]$ can be the largest element. Your answer should be in the form of 4 min heaps.

In the context of this problem, when $n = 7$, $\lceil \frac{n}{2} \rceil = 4$. Here, we are essentially saying that any of the leaf nodes are capable of being the maximum value in the array. Using this we can demonstrate a couple of examples of this. Consider the following min-heaps:

- `[1,2,3,7,6,5,4]` - Indexing from 1, the max value of this min heap would then be at `A[4] = 7`.
- `[2,3,4,7,8,6,5]` - Indexing from 1, the max value of this min heap would then be at `A[5] = 8`.
- `[3,4,5,7,6,9,8]` - Indexing from 1, the max value of this min heap would then be at `A[6] = 9`.
- `[4,5,6,8,7,9,10]` - Indexing from 1, the max value of this min heap would then be at `A[7] = 10`.

From the above, we can see that the max value can reside anywhere in between $A[\lceil{\frac{n}{2}}\rceil], \ldots, A[n]$, where we start the indexing of our heap with 1.

***
## Question 2

Suppose you have an array __A__ of *n* distinct elements.

The following pseudocode finds the k biggest values of __A__:

```
Biggest(A, k): \\returns an array of the k biggest values of A
        mergesort(A)  
        return A[n-k, n]
 ```
 
__2(a)__ What is the complexity of the above algorithm and why?



We know that the time complexity of the `mergesort` algorithm is going to be $\mathcal{O}(n\log{(n)})$. Considering that the only other line of this algorithm is returning a slice of the original array, then the time complexity of this algorithm should just be that of the `mergesort` algorithm. So the worst case time complexity of this algorithm would then be

\begin{equation*}
\color{blue}{\mathcal{O}(n \log{(n)})}.
\end{equation*}

__2(b)__ Now suppose that the order of the array was important.  Design and implement an algorithm that returns an array of the k largest elements of __A__ in their original order, and it should run in $\Theta(nk)$ time.

For example, BiggestInOrder([0,5,1,3,4], 3) should return [5,3,4].

In [17]:
""" BiggestInOrder - Returns the k largest elements in their original order from array A
    Input:
        A - Array of values
        k - Number of largest elements to return
    Algorithm:
        * Initialize an empty list 'selectedIdx' to store the indices of the largest values
        * Initialize an empty list 'ret' to store the final return values in original order
        * Create a list 'markedVal' of the same length as A, initialized with False, to mark selected elements
        * Iterate k times to find the k largest elements:
            * Initialize 'maxVal' to negative infinity and 'maxIdx' to -1 at the start of each iteration
            * Iterate over the elements of A:
                * If the current element is not marked and is greater than 'maxVal', update 'maxVal' and 'maxIdx'
            * If a valid 'maxIdx' is found, mark it in 'markedVal' and append 'maxVal' to 'ret'
            * If no valid 'maxIdx' is found (all elements are marked), break the loop
        * After finding the k largest elements, iterate over 'markedVal' to collect the indices of selected elements
        * Iterate over 'selectedIdx' to append the corresponding elements from A to 'ret' in their original order
    Output:
        ret - The k largest elements from A in their original order
"""
def BiggestInOrder(A, k):
    # YOUR CODE HERE
    selectedIdx = []
    ret = []
    markedVal = [False] * len(A)
    # Iterate k times
    for j in range(k):
        maxVal = float('-inf')
        maxIdx = -1
        # Iterate over length of A
        for i in range(len(A)):
            # Not marked and greater than max
            if (not markedVal[i] and A[i] > maxVal):
                maxVal = A[i]
                maxIdx = i
        # Valid max index
        if (maxIdx != -1):
            markedVal[maxIdx] = True
        else:
            break
    # Selected indices
    for i, selected in enumerate(markedVal):
        if selected:
            selectedIdx.append(i)
    # Select in order
    for i in selectedIdx:
        ret.append(A[i])
    return ret

__2(c)__ If we don't care about the original ordering, then we can use a heap to design an algorithm that runs faster than the one in part (b).  Design and implement an algorithm that returns an array of the k largest elements of __A__ using a heap.

In [18]:
""" BiggestOutOfOrder - Returns the k largest elements from array A, potentially out of order
    Input:
        A - Array of values
        k - Number of largest elements to return
    Algorithm:
        * Copy the first k elements of A into 'ret' and convert it into a min-heap
            * This heap will keep track of the k largest elements found so far
        * Iterate through the elements of A starting from the k+1-th element to the end:
            * Compare the current element with the smallest element in the heap (ret[0])
            * If the current element is larger than the smallest element in the heap:
                * Remove the smallest element from the heap
                * Add the current element to the heap, maintaining the heap property
            * This ensures that the heap always contains the k largest elements seen so far
        * After iterating through all elements of A, the heap 'ret' contains the k largest elements
    Output:
        ret - A list of the k largest elements from A, not guaranteed to be in the original order
"""
def BiggestOutOfOrder(A, k):
    # YOUR CODE HERE
    # Min heap data structure
    def leftChild(j):
        return 2 * j + 1
    def rightChild(j):
        return 2 * j + 2
    def heappush(heap, item):
        heap.append(item)
        siftUp(heap, len(heap) - 1)
    def heappop(heap):
        lastelt = heap.pop()
        if heap:
            returnitem = heap[0]
            heap[0] = lastelt
            minHeapify(heap, 0)
        else:
            returnitem = lastelt
        return returnitem
    def siftUp(heap, idx):
        while idx > 0:
            parent = (idx - 1) // 2
            if heap[idx] < heap[parent]:
                heap[idx], heap[parent] = heap[parent], heap[idx]
                idx = parent
            else:
                break
    def minHeapify(A, j):
        l = leftChild(j)
        r = rightChild(j)
        smallest = j
        if l < len(A) and A[l] < A[j]:
            smallest = l
        if r < len(A) and A[r] < A[smallest]:
            smallest = r
        if smallest != j:
            A[j], A[smallest] = A[smallest], A[j]
            minHeapify(A, smallest)
    def buildMinHeap(A):
        n = (len(A) // 2) - 1
        for j in range(n, -1, -1):
            minHeapify(A, j)
    # Solution
    # Build min heap
    ret = A[:k]
    buildMinHeap(ret)
    # Process (n - k) remaining of A
    for i in range(k, len(A)):
        if A[i] > ret[0]:
            heappop(ret)
            heappush(ret, A[i])
    return ret

__2(d)__  What is the complexity of your algorithm for part (c)?

First, in this algorithm we are heapifying an array of size $k$. This contributes $\mathcal{O}(k)$ in our runtime calculation. The potential heap operations that happen on the rest of the algorithm are $(n - k) \cdot \mathcal{O}(\log{(k)})$. Adding this together we get a runtime complexity for $\mathcal{O}(k + (n - k)\log{(k)})$. When $n$ explodes (gets very large), we have a runtime of that simplifies to

$$
\color{blue}{\mathcal{O}(n\log{(k)})}.
$$

Which will be faster than the previous runtime in part (b).

---
## Testing your solutions -- Do not edit code beyond this point

In [19]:
from random import sample, randint
def testBiggestInOrder(n_tests, test_size):
    n_passed = 0
    n_failed = 0
    for i in range(0, n_tests):
        a = sample( range(-10 * n_tests,  10 * n_tests ), test_size)
        k = randint(1, len(a))
        kbiggest = BiggestInOrder(a.copy(), k)
        if len(kbiggest) != k:
            if n_failed < 10:
                print(' Code returns the wrong sized array!')
            n_failed += 1
            continue
        if sorted(kbiggest) != sorted(a)[-k:]:
            if n_failed < 10:
                print(' Code did not return the ', k, ' biggest elements!')
                print(' Code returned ', sorted(kbiggest), ' but we wanted ', sorted(a)[-k:], ' of ', a)
            n_failed +=1
            continue
        currIndex = 0
        inOrder = True
        for j in range(0, len(kbiggest)):
            for l in range(currIndex, len(a)):
                if kbiggest[j] == a[l]:
                    currIndex = l
                    break
                if l == len(a) - 1:
                    inOrder = False
        if inOrder == False:
            if n_failed < 10:
                print(' Code failed for input: ', a, 'returned : ', kbiggest, 'last correct index: ', currIndex)
        else:
            n_passed = n_passed + 1

    return n_passed

n_tests = 10000
n_passed = testBiggestInOrder(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000


In [20]:
from random import sample, randint
def testBiggestOutOfOrder(n_tests, test_size):
    n_passed = 0
    n_failed = 0
    for i in range(0, n_tests):
        a = sample( range(-10 * n_tests,  10 * n_tests ), test_size)
        k = randint(1, len(a))
        kbiggest = BiggestOutOfOrder(a.copy(), k)
        if len(kbiggest) != k:
            if n_failed < 10:
                print(' Code returns the wrong sized array!')
            n_failed += 1
            continue
        if sorted(kbiggest) != sorted(a)[-k:]:
            if n_failed < 10:
                print(' Code did not return the ', k, ' biggest elements!')
                print(' Code returned ', sorted(kbiggest), ' but we wanted ', sorted(a)[-k:], 'where a is', a)
            n_failed += 1
            continue
        n_passed = n_passed + 1
    return n_passed

n_tests = 10000
n_passed = testBiggestOutOfOrder(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000
