# CSCI 3104 Assignment 1:

***
# Instructions

This assignment is to be completed as a python3 notebook.  When you upload, please upload the completed notebook (ipynb file).

The questions  provided  below will ask you to either write code or 
write answers in the form of markdown.

 Markdown syntax guide is here: [click here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

Using markdown you can typeset formulae using latex.
This way you can write nice readable answers with formulae like thus:

The algorithm runs in time $\Theta\left(n^{2.1\log_2(\log_2( n \log^*(n)))}\right)$, 
where $\log^*(n)$ is the inverse _Ackerman_ function.

__Double click anywhere on this box to find out how your instructor typeset it. Press Shift+Enter to go back.__

***

## Question 1 : Everyday Algorithms.

  Write down a ten sentence description of an algorithm that drives an important
piece of technology that you encounter in your every day life. 
Research on the Internet to find out what sort of algorithms can solve the 
core problem behind the technology and how it works.

__Examples:__ 
> 1. Auto-complete feature when typing text messages on my iphone, 
> 2. My email service automatically tags dates/times so that I could insert 
them in my calendar, or 
> 3. My python IDE automatically takes me to a function's code from a call site.

A really important algorithm that is used in our everyday lives are encryption algorithms. Encryption algorithms are very important because they perform operations on data in such a way that the data can be safely transferred between users. Without encryption algorithms, one's data could be at severe risk of being intercepted and interpreted.

One of the most common encryption algorithms is the RSA algorithm. The RSA algorithm was originally invented by Rivest–Shamir–Adleman (RSA) and utilizes something called a public and private key. The public key in the algorithm is calculated with the use of two large prime numbers and some other mathematical operations such as FME (Fast Modular Exponentiation). RSA is extremely safe because breaking the encryption key is very difficult due to the factorization of large numbers. With the difficulty of factoring these large numbers, the runtime that would be required to break the encryption can be on the order of magnitude of how old the Earth is and longer!

Encryption algorithms like RSA solve the problem of safely transmitting data between users with the use of very powerful mathematical theorems and procedures. Safely transmitting data is a very important aspect of society and something we do everyday, without it, important information would be susceptible to being intercepted and used in ways that the users of the information would not have originally intended.

***
## Question 2(a): Insert into a sorted array.

Write a python3 function `insert_into(a, j)` that given sorted array `a` and a number `j`,  returns a new array that includes the contents of the original array and the newly inserted element `j`, so that the returned array is also sorted. 

__You are not allowed to use inbuilt routines in python such as sort. You should also avoid using the list insert method.__



In [55]:
# Answer 2(a): IMPLEMENT HERE. Each time you edit, do not forget to type shift+enter
""" insert_into - Inserts a value into an already sorted list
    Input:
        a - Sorted array that is or is not empty
        j - Value that is to be inserted into the sorted array
    Algorithm:
        * Insert the value into the array if the array is empty
        * Otherwise
            * Create a boolean value to keep track of if a value has been inserted
            * Iterate over the array
                * If the value to be inserted is less than the current value in the array
                    * Insert it, change the boolean value to true, and break the loop
                * Otherwise, continue to the next index of the array
            * If a value has not been inserted
                * Insert the value at the end of the array
    Output:
        a - Sorted array of values
"""
def insert_into(a, j):
    # YOUR CODE HERE
    # Array is empty, just add the value to the list
    if (len(a) == 0):
        a.append(j)
    # Array is not empty
    else:
        # Boolean to keep track if a value has been inserted
        ins = False
        # Iterate over the list
        for i in range(0,len(a)):
            # Inserted value is less than the current value
            if (j < a[i]):
                # Insert the value, change ins to true, and break the for loop
                a.insert(i,j)
                ins = True
                break
        # The value is greater than or equal to all values in the list
        if not ins:
            # Add value to the end of the list
            a.append(j)
    return a


# Press Shift enter when you are done with your code

## Question 2(b): Running time of your insertion routine
For an input of size $n$, how much time does your routine take to run in the worst case? You can use big-theta notation $\Theta$ for your answer.

In the case of an input of size $n$, the algorithm is going to run until it gets to the end of the input. In this case, the worst case runtime of this algorithm is going to be $\Theta(n)$. This worst case scenario occurs when the value that is to be inserted is greater than or equal to the largest value in the list. The best case runtime would be $\Omega(1)$ when the value that is to be inserted is less than or equal to the smallest value in the list.

***
## Question 3: Tournaments

A tennis tournament has $n$ participants who must play matches in rounds to determine a winner. For instance if $n = 100$, then the first round has $50$ matches, the second round has $25$ matches and so on. If the number of players left in a round is odd, then one lucky player is chosen at random to move to the next round without contest.

Do not use asymptotic ($O, \Omega, \Theta$) notations in your answer below. Provide exact numbers as much as possible.

1. Write down a formula involving $n$ for the total number of rounds played? (*Hint*: You should try some values for $n$, before you attempt to derive a formula ).

2. Show that the total number of matches played cannot exceed $n$. (*Hint*: write down a series summation for the total number of matches played and use known facts about summation of geometric series ).

3. How many matches does the overall winner need to play in the _best case_?

4. How many matches does the overall winner need to play in the _worst case_?

5. Assume that every player has a unique hidden talent score so that for any match, the higher talent score is always going to win. 
Is the winner always guaranteed to be the highest talent score? Is the runner up (i.e, the person who lost to the winner in the final match) always the second highest talent score? If your answer is no, illustrate using a counter-example.

6. From the answer in 5, design a scheme to identify the second highest talent score among the $n$ participants. Your scheme may select some players and schedule extra matches. (*Hint*: Look up the term _repechage_ in olympics sports)


### 3.1 Solution

To solve this problem, we need to count how many rounds are required until there are only 2 contestants left. Take for instance a tournament of 8 contestants. Round 1 there are 8 contestants, round 2 there are 4 contestants, and then round 3 there are 2 contestants. So for a tournament of 8 contestants, it only requires 3 rounds.

Diving into this deeper, to find the number of rounds that are required based upon the number of contestants, we need calculate $2^{\text{Rounds}} = n$. We will refer to the number of rounds required to be played as $\rho$. We then have:

$$
\begin{align*}
2^{\rho} & = n & \text{(Premise)} \\
\log_{2}{(2^{\rho})} & = \log_{2}{(n)} & \text{(Logarithmic Implementation)} \\
\rho & = \log_{2}{(n)}. & \text{(Rules of Logarithms)}
\end{align*}
$$

Now, the only problem with our final expression is that if we have a value of $n$ when plugged into our formula does not return an integer, we need to perform one more operation to get an integer value. In our original problem of $n = 100$, it takes 7 rounds to crown a winner. Namely:

$$
\text{Round 1: } n = 100, \text{ Round 2: } n = 50, \text{ Round 3: } n = 25, \text{ Round 4: } n = 13, \text{ Round 5: } n = 7, \text{ Round 6: } n = 4, \text{ Round 7: } n = 2.
$$

This means that when $n = 100$, we require 7 rounds to crown a winner. Going back to our original problem, we can then see that all we need to do our expression is to take the ceiling of our value because $\rho = \log_{2}{(8)} \approx 6.64$. Namely:

$$
\color{blue} \rho = \lceil \log_{2}{(n)} \rceil.
$$

### 3.2 Solution

For this part of the problem we want to show that it is not possible for $\rho > n$. In every round, the number of players that are left is effectively cut in half. Mathematically, the number of matches that will occur for a tournament is:

$$
\rho = \frac{n}{2} + \frac{n}{4} + \frac{n}{8} + \dots
$$

According to [Khan Academy](https://www.khanacademy.org/math/algebra2/x2ec2f6f830c9fb89:poly-factor/x2ec2f6f830c9fb89:geo-series/v/deriving-formula-for-sum-of-finite-geometric-series#:~:text=A%20finite%20geometric%20series%20can,number%20of%20terms%20being%20summed.), a geometric series can be represented as:

$$
S_{k} = \frac{a(1 - r^{k})}{(1 - r)}
$$

where $a$ is the first term of the first $k$ terms in the series and $r$ is the common ratio in the geometric series. In our case, $a = n / 2$, $r = 1 / 2$, and $k$ is unknown. We can simplify the expression for the geometric series by evaluating it when $k$ becomes really large. Namely,

$$
\lim_{k \rightarrow \infty} \left( \frac{a(1 - r^{k})}{(1 - r)} \right) \leq \frac{a}{1 - r}.
$$

If we now plug in our values for $a$ and $r$ we have

$$
\color{blue} \rho \leq \frac{n / 2}{1 - 1 / 2} \leq \frac{n / 2}{1 / 2} \leq n.
$$

Thus, the number of rounds can not exceed the number of contestants that are in a tournament.

### 3.3 Solution

If we assume in our rules where a contestant was a lucky recipient of being advanced without needing to play, and that same player cannot be selected again to skip a round, the best case scenario for the winner of the contest is going to be

$$
\color{blue} \rho_{\text{Winner}} = \lceil \log_{2}{(n)} \rceil - 1.
$$

The winner would have to play one less round than the number of rounds that are necessary for the tournament.

### 3.4 Solution

If the winner of the contest did not skip any rounds due to them being a lucky recipient of an odd numbered round, then the worst case scenario for the number of rounds played by the winner is going to be

$$
\color{blue} \rho_{\text{Winner}} = \lceil \log_{2}{(n)} \rceil.
$$

### 3.5 Solution

The short answer to this question is, the winner will **always** have the highest hidden talent score, and it is **not the case** that the runner up will always have the second highest hidden talent score.

Take for example a tournament with 4 contestants. Contestant 1 has the highest, contestant 2 has the second highest, contestant 3 has the third highest, and contestant 4 has the fourth highest hidden talent score. If player 1 and player 2 play each other and player 3 and player 4 play each other in the first round, then player 1 and player 3 advance to the finals. The winner of the final round would then be player 1 because they have the highest hidden talent score. In this case, the runner up would be player 3 and they do not have the second highest hidden talent score amongst all contestants.

This counter example I gave is a perfect example of why in professional sports they always pair up the highest seeds with the lowest seeds in each subsequent round in the hopes that the final round will consist of the teams / contestants with the highest skills / wins.

### 3.6 Solution

The simplest scheme to make sure that the runner up in the contest is the contestant with the second highest hidden talent score is to create scheme that is analogous to how professional sports teams schedule playoffs in their leagues. The scheme do achieve this outcome is the following:

1. Create a numerical score for the hidden talent of each contestant.

2. Put all the hidden talent scores of each contestant into a list / array.

3. Sort the list in descending order. (Highest talent is the 1 seed, lowest talent is the $n^{th}$ seed.)

4. Create matches for each round such that the match ups are $\text{contestant}[i] \text{ vs. } \text{contestant}[ (n - 1) - i]$ where $n$ is the number of players in a round and $i$ is the index of the player in the list.

    - If $n$ is odd, the highest seed is given a bye and will not participate in the current round. After each round, re-calculate the number $n$.

    - Sort the remaining contestants in descending order again.

    - Re-create the seeding for the new number of contestants in the current round.

This scheme will assure that the last two contestants will play in the final round. The winner will be the original one seed, and the runner up will be the two seed. Assuring that the runner up will always have the second highest hidden talent / skill.

***
## Autograder for quesion 2(a): Do not edit code below. 

In [57]:
## DO NOT EDIT TESTING CODE FOR YOUR ANSWER ABOVE
# Press shift enter to test your code. Ensure that your code has been saved first by pressing shift+enter on the previous cell.
from IPython.display import display, HTML
def test_insert():
    failed = False
    test_cases = [ # (Input Array, Inserted Number, Expected Output)
            ([1,3,6,8,10], 4 , [1,3,4,6,8,10]),
            ([1,1,1,1,3,3,5,5,7,7], 1, [1,1,1,1,1,3,3,5,5,7,7]),
            ([-10,9,15,18,35,44], 47, [-10, 9, 15, 18, 35, 44, 47]),
            ([], 10, [10]),
            ([-10, 9, 10, 20, 35], -20, [-20, -10, 9, 10, 20, 35]),
            ([0,0,0,0,0,0], 0, [0,0,0,0,0,0,0])]
    for (test_array, j, expected_output) in test_cases:
        obtained_output = insert_into(test_array, j)
        if obtained_output != expected_output:
            s1 = '<font color=\"red\"> Failed - test case: Inputs: a=' + str(test_array)+ ' j=' + str(j)
            s2 = '  <b> Expected Output: </b> ' + str(expected_output) + ' Your code output: ' + str(obtained_output) + ' </font>'
            display(HTML(s1+s2))
            failed = True
            
    if failed:
        display(HTML('<font color="red"> One or more tests failed. </font>'))
    else:
        display(HTML('<font color="green"> All tests succeeded! </font>'))
test_insert()