# Complexity

(Download this file as an
<a href="complexity.html" download>HTML document</a> or a
<a href="complexity.ipynb" download>Jupyter notebook</a>.)

Complexity is an attempt at measuring the efficacy of an algorithm.

Let's imagine the following scenario: we have a list of integers and we wish to check whether there are any duplicates in the list.

One idea is to use the following algorithm

<ul style="list-style-type:none">
    <li>ALGORITHM dup1</li>
    <ul style="list-style-type:none">
        <li><strong>Input:</strong> $L$ a list of integers, possibly empty</li>
        <li><strong>Output:</strong> True or False, is there any duplicated value?</li>
        <li>Let $n$ be the length of $L$</li>
        <li><strong>if</strong> $n < 2$ <strong>then</strong><li>
        <ul style="list-style-type:none">
            <li><strong>return</strong> False</li>
        </ul>
        <li><strong>end if</strong></li>
        <li><strong>for $i$ in $0 \ldots n - 2$ <strong>do</strong></li>
        <ul style="list-style-type:none">
        <li><strong>for $j$ in $i + 1 \ldots n - 1$ <strong>do</strong></li>
        <ul style="list-style-type:none">
        <li><strong>if</strong> $L_i = L_j$ <strong>then</strong><li>
        <ul style="list-style-type:none">
            <li><strong>return</strong> True</li>
        </ul>
        <li><strong>end if</strong></li>       
        </ul>
        <li><strong>end for</strong></li>
        </ul>
        <li><strong>end for</strong></li>
        <li><strong>return</strong> False</li>
    </ul>
</ul>

In [20]:
from random import choices, seed
seed(123)

In [12]:
def dup1(L):
    '''First attempt at finding duplicates
Input:  ùêø  a list of integers, possibly empty
Output: True or False, is there any duplicated value?'''
    
    # COMPLETE THIS PART OF THE CODE

Is it effective?

In [21]:
L = choices(range(10000), k=100)

In [23]:
dup1(L)

## Q: Scenarios?

How do you evaluate the efficacy of the algorithm? What scenarios do you consider?

## ‚ÄòBig O‚Äô notation

We are only interested in trends.

* $4 n^2 + n + 5$ behaves the same as $4 n^2$ for large $n$ so trends like $n^2$
* Operation are implemented differently depending on languages/processors

Possible outcomes:

* $O(1)$ it does not matter how big the dataset is, the algorithm will operate in constant time (eg, in the worst case scenario)
* $O(n)$ scales linearly with the size $n$ of the input
* $O(n \log n)$ is quasilinear
* $O(m + n)$ is kinda linear
* $O(n^2)$ is quadratic
* $O(m n)$ is kinda quadratic
* $O(n!)$ is really bad news (except for cryptographic systems)

## Other approaches

* If the language has a ‚Äòset‚Äô data structure, we can test if the ‚Äòset‚Äô and the ‚Äòlist‚Äô have the same number of elements
* If we know that the integers are chosen in a given range (say, $0 \leq i < 365$), then we can answer in $O(1)$
* We can sort the list $O(n \log n)$ and see whether some neighbouring values are repeated?

## Generalisation

* **Time complexity**  How does the algorithm scale as a function of the input size in terms of memory space?
* **Space complexity** How does the algorithm scale as a function of the input size in terms of computation time?