# Quiz Week 3

**Question 1)** Let $0 < \alpha < 0.5$ be some constant (independent of the input array length n). Recall the Partition subroutine employed by the QuickSort algorithm, as explained in lecture. 

What is the probability that, with a randomly chosen pivot element, the Partition subroutine produces a split in which the size of the smaller of the two subarrays is ≥ $\alpha$ times the size of the original array?

Options: 
- $1 - 2\alpha$
- $\alpha$
- $1 - \alpha$
- $2 - 2\alpha$

To solve this, firts let's recall how partitions work in QuickSort. A pivot partitions the array in 2 sections as shown in the following picture:
<img src="Week3_img_1_p1.png">
See that each element has a probaility of $\frac{1}{n}$ of being the pivot with a randomly assigned pivot (and a uniform distribution, as we can assume is the case here).

So, basically the quiz is asking for the probability of the smallest partition to be larger than (or equal to) a given $\alpha$ in the ]0,0.5[ range.

To view this more easily, let's take a (valid) value for $\alpha$, say $\alpha=0.4$, this means that we are asked for:

$$ Prob(\alpha*n <= pivot < 0.5*n) = Prob(0.4*n <= pivot < 0.5*n)$$

Notice that the upper limit needs to be 0.5 for the partition to be smaller than the other one. 

Now, since we have a uniform distribution for the value of the pivot, we can write this as:

$$Prob(\alpha n <= pivot < 0.5n) = \sum_{j=\alpha n}^{0.5n}\frac{1}{n} = (0.5n - \alpha n)\frac{1}{n}=(0.5-\alpha)$$

Now, we just need to notice that we care about the smallest partition, which can occur when the pivot is  smaller than $\frac{n}{2}$ or larger than it, so in order to have the "complete" probability for the cases in which we can have the sought after scenario, we need to multiply the value by 2:

$$Actual\_Prob(\alpha n <= smallest\_partition < 0.5n) = 2(0.5-\alpha) = 1 - 2\alpha$$

**Question 2)** Now assume that you achieve the approximately balanced splits above in every recursive call --- that is, assume that whenever a recursive call is given an array of length *k*, then each of its two recursive calls is passed a subarray with length between $\alpha k$ and $(1-\alpha)k$ (where $\alpha$ is a fixed constant strictly between 0 and .5). How many recursive calls can occur before you hit the base case? Equivalently, which levels of the recursion tree can contain leaves? Express your answer as a range of possible numbers d, from the minimum to the maximum number of recursive calls that might be needed.

Options: 
-  $ -\frac{log(n)}{log(\alpha)} <= d <= -\frac{log(n)}{log(1-\alpha)}$
-  $ 0 <= d <= -\frac{log(n)}{log(\alpha)} $
-  $ -\frac{log(n)}{log(1-\alpha)} <= d <= -\frac{log(n)}{log(\alpha)}$
-  $ -\frac{log(n)}{log(1-2\alpha)} <= d <= -\frac{log(n)}{log(1-\alpha)}$

To solve this problem it's probably easiest to recall the recursive tree with a 'shinkage factor' of 2:

<img src="recursive_tree.png">

We can see that this tree needs $log_2(n)$ levels to arrive at the leaf level (when all subproblems have just one element).

In the tree example, we are partitioning the tree in 2 equal sections thus, both sections are a 0.5 fraction of the previous one. In the problem of the quiz we are making two partitions with sizes $\alpha$ and $(1-\alpha)$, so the 'shrinkage factors' are $\frac{1}{\alpha}$ and $\frac{1}{1-\alpha}$.

This means that the number of levels to reach the base will be between $log_{\frac{1}{\alpha}}(n)$ and $log_{\frac{1}{1-\alpha}}(n)$. From log properties, we know that:

$log_{\frac{1}{\alpha}}(n) = - \frac{log(n)}{log\alpha}$, and\
$log_{\frac{1}{1-\alpha}}(n) = - \frac{log(n)}{log(1-\alpha)}$

In [9]:
# We also know that 0 < alpha < 0.5, so we can evaluate which value is larger to properly define the range

import numpy as np

alpha = 0.4
print(f"log(alpha):{-1/np.log(alpha)}\nlog(1-alpha):{-1/np.log(1-alpha)}")

log(alpha):1.0913566679372915
log(1-alpha):1.9576151889712174


Consequently, we have that the answer to the problem is:

$$ -\frac{log(n)}{log(\alpha)} <= d <= -\frac{log(n)}{log(1-\alpha)}$$

**Question 3)** Define the recursion depth of QuickSort to be the maximum number of successive recursive calls before it hits the base case --- equivalently, the number of the last level of the corresponding recursion tree. Note that the recursion depth is a random variable, which depends on which pivots get chosen. What is the minimum-possible and maximum-possible recursion depth of QuickSort, respectively?

Options: 
- Minimum: $\Theta(log(n))$; Maximum: $\Theta(n)$
- Minimum: $\Theta(log(n))$; Maximum: $\Theta(nlog(n))$
- Minimum: $\Theta(1)$; Maximum: $\Theta(n)$
- Minimum: $\Theta(\sqrt n)$; Maximum: $\Theta(n)$

To solve this problem we need to recall that the depth of the QuickSort algorithm depends on the selection of the pivot (which is made at random). We have 2 extreme scenarios:

- **Best case Scenario**: when the problem is split evenly for **every partition** the number of recursions is minimised and has an order of $\Theta(log(n))$ as we saw in the previous problem.
- **Worst case Scenario**: when the pivot is always selected as one of the extremes of the array (always either the first or the last element). This causes the partitions to always be (n-1) elements (with n being the elements of the previous partition) and 0 (since the only other element is the pivot). Thus, each recursive call only reduces the size of the array by 1, maximising the number of recursions and making it a $\Theta(n)$ type of recursion.

With this, we can see that the correct answer is the first option: 

- Minimum: $\Theta(log(n))$; Maximum: $\Theta(n)$

**Question 4)**: Consider a group of k people.  Assume that each person's birthday is drawn uniformly at random from the 365 possibilities.  (And ignore leap years.)  What is the smallest value of k such that the expected number of pairs of distinct people with the same birthday is at least one?

[Hint: define an indicator random variable for each ordered pair of people.  Use linearity of expectation.]

Options: 
- 28
- 366
- 20
- 23
- 27

This is an interesting problem. Notice that, in order to have a 100% probability of having at least two people with the same birthdays, we would need k=366 (since, at the worst-case scenario, we could have all previous 365 people with a different birthday).

However, this problem is asking for a different question. We are asked to consider a group of 'k' people and, with them form as many pairs as possible. With these pairs, we need to find the minimum number of people 'k' for which the **expected number of pairs** of distinct people with the same birthday is at least one. The key here is the "expected number", which means that we **do not need to be certain** of having a pair of people with the same birthday, we just need to have **on average** at least one pair of people with the same birthday.

To begin, let's see how many pairs of people can we have with 'k' people (which we can easily calculate using combinations:

$$ {k\choose2} = C(k,2) = \frac{k!}{(k-2)! 2!} = \frac{k(k-1)}{2}$$ 

Now, remembering that the probability of a randomly chosen person to have a given birthday is $\frac{1}{365}$ (omitting leap-years) and seeing that, for every pair of people we form, the second person in the pair has a probability of $\frac{1}{365}$ of having the same birthday as the 1st chosen person for the pair. 

With all this, to determine the probability of having on average at least a pair of people with the same birthday within all the possible pairs we can form with 'k' people, we simply sum the probability over all the possible pairs (using linearity of expectation). That is:

$$ E\ [\sum_{j=1}^{\frac{k(k-1)}{2}} Xj\ ]= \sum_{j=1}^{\frac{k(k-1)}{2}} E[Xj] = \sum_{j=1}^{\frac{k(k-1)}{2}}\frac{1}{365} = \frac{k(k-1)*365}{2} >= 1$$

Solving for the least possible integer for which this is true, we have:

$$k=28$$

**Question 5)**:Let X1, X2, X3 denote the outcomes of three rolls of a six-sided die.  (I.e., each $X_i$ is uniformly distributed among 1,2,3,4,5,6 and by assumption they are independent).

Let Y denote the product of X1 and X2 and Z the product of X2 and X3. Which of the following statements is correct?

Options: 
- Y and Z are **not independent**, and $E[Y*Z] \neq E[Y]*E[Z]$
- Y and Z are **independent**, and $E[Y*Z] = E[Y]*E[Z]$
- Y and Z are **not independent**, and $E[Y*Z] = E[Y]*E[Z]$
- Y and Z are **independent**, and $E[Y*Z] \neq E[Y]*E[Z]$

To solve this problem, we can see that, since X1, X2, and X3 are independent of each other, we can say that:

$E[Y] = E[X1*X2] = E[X1]*E[X2]= \sum_{j=1}^{6} \frac{j}{6} * \sum_{i=1}^{6} \frac{i}{6}= 3.5*3.5 = 12.25$

Likewise, for E[Z] we have: \
$E[Z] = E[X2*X3] = E[X2]*E[X3]= \sum_{j=1}^{6} \frac{j}{6} * \sum_{i=1}^{6} \frac{i}{6}= 3.5*3.5 = 12.25$

Let's demonstrate this for E[Y] (and thus also for E[Z]):

E[Y] = $E[X1*X2] = \sum_{j=1}^6 \sum_{i=1}^6 X_{j1} * X_{i2} * p(X_{j1}) * p(X_{i2})$ \
&emsp; &emsp; $= (1*(1+2+3+4+5+6)*\frac{1}{6}*\frac{1}{6} + 2*((1+2+3+4+5+6)*\frac{1}{6}*\frac{1}{6} + ... + 6*(1+2+3+4+5+6)*\frac{1}{6}*\frac{1}{6})$\
&emsp; &emsp; $= (1+2+3+4+5+6)*(1+2+3+4+5+6)*\frac{1}{6}*\frac{1}{6} = 21*21*\frac{1}{36} = \frac{441}{36}= 12.25$

With this we've demonstrated that $E[Y] = E[X1*X2] = E[X1]*E[X2]$ or that X1 and X2 are indeed independent. It's easy to see that an equivalent approach can be used to demonstrate E[Z] and the independence of X2 and X3.


Now that we have E[Y] and E[Z], we can evaluate $E[Y*Z]$ and compare it with $E[Y]*E[Z]$:

$E[Y*Z] = E[X1*X2*X2*X3] = E[X1*X2^2*X3] $

Notice that Y and Z use the same value of X2, so we only need to roll the second die once to have the values of Y and Z. Understanding this is easy to see that the number of possible outcomes we have is $6*6*6 = 216$.

The following picture can help understand the combinations described above (notice that the image represents all the combinations for just one value of X1): 

<img src="Week_3_img_2_p5.png">

Now, we can see that expected value for Y*Z can be represented using the following equation:

$E[Y*Z]= E[X1*X2^2*X3] = \sum_{i=1}^6 \sum_{j=1}^6 \sum_{k=1}^6 X_{1i}*X_{2j}^2*X_{3k}*P(X1=X_{1i})*P(X2=X_{2j})*P(X3=X_{3k})$ \
&emsp; &emsp; $=\frac{1}{216}\sum_{i=1}^6 \sum_{j=1}^6 \sum_{k=1}^6 X_{1i}X_{2j}^2X_{3k} = \frac{40131}{216} = 185.79$

This shows that $E[Y*Z] \neq E[Y]*E[Z]$ and thus, that Y and Z are not independent.

In [45]:
sum_poss_values = 0
for i in range(1,7):
    for j in range(1,7):
        for k in range(1,7):
            sum_poss_values += i*j*j*k
#print(sum_poss_values)

exp_value = sum_poss_values/np.power(6,3)

print("E[Y*Z] = {:.2f}".format(exp_value))

E[Y*Z] = 185.79
