## Algorithmic Design Homework 2

### Exercise 1
Let H be a `Min-Heap` containing $n$ integer keys and let $k$ be an integer value. Solve the following exercises by using the procedures seen during the course lessons:

a) Write the pseudo-code of an in-place procedure `RetrieveMax(H)` to
efficiently return the maximum value in H without deleting it and
evaluate its complexity.  

In case of a Min-Heap we know that $parent(p) \leq p$ so the maximum has to be found in the leaves of the Min-Heap which are $\lceil n/2 \rceil$ where $n$ is the heap size. Considering the index starting from 0 we get that the first leaf will be at index $\lfloor n/2 \rfloor$.  

The time complexity of this algorithm will be $\Theta(\lceil n/2 \rceil) = \Theta(n/2) = \Theta(n)$ because it requires to find the maximum among an unordered array of $\lceil n/2 \rceil$ elements.

In [None]:
# Pseudocode of exercise 1.a

def find_max(a):
    # Finds the maximum value of an unordered array a
    current_max = a[0]
    for element in a:
        if element > current_max:
            current_max = element
    
    return element

def RetrieveMax(H):
    return find_max(H[floor(H.size/2):])

b) Write the pseudo-code of an in-place procedure DeleteMax(H) to efficiently deletes the maximum value from H and evaluate its complexity.  

To delete the maximum element of an heap firstly we must find the index corresponding to the maximum value, which takes time $\Theta(\lceil n/2 \rceil)$, then we must delete this maximum element by shifting all of the elements after it left by one position, which takes time $\Theta(\lceil n/2 \rceil)$ in the worst case and $\Theta(1)$ in the best case.  
So the time complexity of this operation is $\Theta(n)$ because even if we are lucky that the maximum element is at the last index `H.size - 1` we still must find the maximum which takes time $\Theta(n)$.

In [None]:
# Pseudocode of exercise 1.b da rifare

def find_max_index(a, start, end):
    # Finds the index of the maximum value of an unordered array a
    max_index = start
    current_max = a[max_index]
    
    for i in range(start+1, end):
        if a[i] > current_max:
            current_max = a[i]
            max_index = i
    
    return max_index

def DeleteMax(H):
    # Find the maximum index
    max_index = find_max_index(H, floor(H.size/2), H.size)
    
    # Shift the array left by one position past the maximum
    for i in range(max_index, H.size-1):
        H[i] = H[i+1]
    H.size -= 1

c) Provide a working example for the worst case scenario of the procedure `DeleteMax(H)` (see Exercise 1b) on a heap $H$ consisting in 8 nodes and simulate the execution of the function itself.

In case of 8 nodes we will have $\lceil n/2 \rceil = 4$ leaves and the first leaf index will be at $\lfloor n/2 \rfloor = 4$, the algorithm first finds the index of the maximum value among $H$\[4:8\] which always takes the same time. ...

### Exercise 2

Let $A$ be an array of n integer values (i.e., the values belong to $\mathbb Z$). Consider the problem of computing a vector $B$ such that, for all $i \in [1, n], B[i]$ stores the number of elements smaller than $A[i]$ in $A[i + 1, ... , n]$. More formally:
$$ B[i] = |\{z \in [i + 1, n]| A[z] < A[i]\}|$$

a) Evaluate the array B corresponding to $A = [2, -7, 8, 3, -5, -5, 9, 1, 12, 4]$.  

In this case we have $B=[4,0,5,3,0,0,2,0,1,0]$

b) Write the pseudo-code of an algorithm belonging to $O(n^2)$ to solve the problem. Prove the asympotic complexity of the proposed solution and its correctness.  

The most straightforward method is to simply test the condition for each element of A, the pseudocode of this algorithm will be:

In [1]:
# Pseudocode of exercise 2.b

def compute_B(A):
    n = len(A)
    B = []
    
    for i in range(n):
        count = 0
        for z in range(i+1, n):
            if A[z] < A[i]:
                count += 1
        B.append(count)
    
    return B

A = [2, -7, 8, 3, -5, -5, 9, 1, 12, 4]
compute_B(A)

[4, 0, 5, 3, 0, 0, 2, 0, 1, 0]

The time complexity of this algorithm is given by the formula:
$$\sum_{i=1}^n\sum_{i+1}^n \Theta(1) = \alpha \sum_{i=1}^n (n-i) = \alpha \sum_{j=0}^{n-1} j = \frac{\alpha}{2} n(n-1) = \Theta(n^2)$$

Where I used $\Theta(1) = \alpha$ in and $j=n-i$

c) Assuming that there is only a constant number of values in $A$ different from 0, write an efficient algorithm to solve the problem, evaluate its complexity and correctness.  



In [None]:
def compute_B(A):
    zeros_array = [0]*len(A)
    for i in range(len(A)):
        if A[i] == 0:
            zeros_array[i] = zeros_array[i-1] + 1
        else:
            zeros[i] = zeros[i-1]
    n = len(A)
    B = []
    
    for i in range(n):
        count = 0
        for z in range(i+1, n):
            if A[z] < A[i]:
                count += 1
        B.append(count)
    
    return B

A = [2, -7, 8, 3, -5, -5, 9, 1, 12, 4]
compute_B(A)

### Exercise 3

Let $T$ be a Red-Black Tree

a) Give the definition of Red-Black Trees  

Red-Black trees are a data structure satisfing the followind properties:
1. Every node has a color associated with it which is either a red or black.
2. The tree's root is black.
3. Every leaf is a black NIL node.
4. If a node is red, then both its children are black.
5. For each node, all the branches from the node to descendant leaves contain the same number of black nodes.

b) Write the pseudo-code of an efficient procedure to compute the height of $T$. Prove its correctness and evaluate its asymptotic complexity.

In general we know that the height of a Red-Black Tree is bounded from above and below, in particular $\log(n+1) \leq h \leq 2\log(n+1)$, however we don't have a closed formula of the exact height of a Red-Black Tree. A naive appoach to this problem would be to perform an exhaustive search across all paths, this would take $O(2^h)$ in the case of a perfectly balanced Red-Black Tree because at each step we must choose to go either left or right and we must take this decision $h$ times, this can be seen as the number of distinct possible sequences of 0s and 1s with length $h$ where 0=right, 1=left.  

A better approach can be found by noticing that if I know the height of the right $h_r$ and left $h_l$ child of a node $x$ I can compute the height of $x$ by simply computing $h(x) = \max(h_l, h_r) + 1 = \max(h(x.left), h(x.right)) + 1$, the last step gives us a recursive formula to compute the height of any node $x$ and so even of the root of $T$, the base case of this recursion is when I reach a null node, in that case the height is set to -1 because I am "below" a leaf node whose height is by definition 0.  

To prove the correctness of this algorithm we proceed by induction, suppose $x$ is a leaf, the algorithm will perform $max(-1, -1) + 1 = 0$ and return the correct result. Now suppose it works for the left and right child of a node $x$, we must prove that it works for $x$, indeed we have $height(x) = max(height(x.left), height(x.right)) + 1 = max(h_l, h_r) + 1$ which is the correct result, we take the longest branch and increase it by 1. As far as the complexity is concerned this algorithm will pass through all the nodes only once because each recursive step shifts the problem to the nodes one level below and will compute the same operation each time, so the complexity will be $\Theta(n)$ where $n$ is the number of nodes in the tree 

In [None]:
# Pseudocode of exercise 3.b

def height(x):
    if x is NIL:
        return -1
    
    return max(height(x.left), height(x.right)) + 1

c) Write the pseudo-code of an efficient procedure to compute the black-height of $T$. Prove its correctness and evaluate its asymptotic complexity.  

The black-height of a node $x$ is the number of black nodes below it, to compute it efficently we can simply travel down the tree until we reach a leaf and keeping track of the number of black nodes encountered so far, the correctness of this algorithm is given by property 5 of the Red-Black Tree data structure: it doesn't matter which path we choose because all the paths will always contain the same number of black nodes.  
The time complexity of this algorithm will be $\Theta(h) = \Theta(\log(n))$ because we must execute the loop $h$ times and the Red-Black Trees have the following property $\log(n+1) \leq h \leq 2\log(n+1)$

In [None]:
# Pseudocode of exercise 3.c

def black_height(x):
    height = 0
    while x is not NIL:
        height += x.color=="black"
        x = x.left
        
    return height

### Exercise 4

Let $(a_1, b_1 ), ..., (a_n, b_n)$ be n pairs of integer values. They are lexicographically sorted if, for all $i \in [1, n-1]$, the following conditions hold:
* $a_i \leq a_{i+1}$ ;
* $a_i = a_{i+1}$ implies that $b_i \leq b_{i+1}$. 

Consider the problem of lexicographically sorting n pairs of integer values.

a) Suggest the opportune data structure to handle the pairs, write the pseudo-code of an efficient algorithm to solve the sorting problem and compute the complexity of the proposed procedure;  

The choice of the data structure depends on what we have to do with the pairs, if the pairs must be lexicographically sorted at all times and new pairs are added or removed frequently then a Red-Black Tree could be a good data structure, because it always keeps the pairs ordered and implements insertion, deletion and finding a pair in $\Theta(\log(n))$ time.  

However if the pairs are not added or deleted after initialization a good data structure could be a simple array of pairs, if the values must be sorted or if we want to perform a find operation multiple times we could sort the values at initialization, this would take $O(n\log(n))$ time using a general purpose sorting algorithm which is the same as inserting n values in a Red-Black Tree. Furthermore this simple data structure saves space because we don't need to keep track of the pointers to left and right child and the color of the node.
I will consider the latter data structure in the following exercises because sorting operation on a Red-Black Tree are never required since the tree is sorted by default.  



In [None]:
# Pseudocode of exercise 4.a

def 

b) Assume that there exists a natural value $k$, constant with respect to $n$, such that $a_i \in [1, k]$ for all $i \in [1, n]$. Is there an algorithm more efficient than the one proposed as solution of Exercise 4a? If this is the case, describe it and compute its complexity, otherwise, motivate the answer.

c) Assume that the condition of Exercise 4b holds and that there exists a natural value $h$, constant with respect to $n$, such that $b_i \in [1, h]$ for all $i \in [1, n]$. Is there an algorithm to solve the sorting problem more efficient than the one proposed as solution for Exercise 4a? If this is the case, describe it and compute its complexity, otherwise, motivate the answer.

### Exercise 5
Consider the select algorithm. During the lessons, we explicitly assumed
that the input array does not contain duplicate values.


a) Why is this assumption necessary? How relaxing this condition does affect the algorithm?

b) Write the pseudo-code of an algorithm that enhance the one seen during the lessons and evaluate its complexity.