<a id="notebook_id"></a>
# Computational complexity

Computational complexity is often the criterion used to select a particular data structure over another data structure. A commmon example of using computational complexity to choose between different implementations of a data structure is choosing between an array-based list or a linked list implementation. Quoting from [the official Java tutorials](https://docs.oracle.com/javase/tutorial/collections/implementations/list.html):

> There are two general-purpose List implementations — ArrayList and LinkedList. Most of the time, you'll probably use ArrayList, which offers constant-time positional access and is just plain fast. It does not have to allocate a node object for each element in the List, and it can take advantage of System.arraycopy when it has to move multiple elements at the same time.

> If you frequently add elements to the beginning of the List or iterate over the List to delete elements from its interior, you should consider using LinkedList. These operations require constant-time in a LinkedList and linear-time in an ArrayList. But you pay a big price in performance. Positional access requires linear-time in a LinkedList and constant-time in an ArrayList. Furthermore, the constant factor for LinkedList is much worse. If you think you want to use a LinkedList, measure the performance of your application with both LinkedList and ArrayList before making your choice; ArrayList is usually faster.

Space complexity is another criterion that is used to choose between different data structures.

## Counting operations

Before discussing computational complexity, we need to clarify which operations can be completed in constant time. We assume that all fundamental operations take constant time:

- arithmetic operators `+`, `-`, `*`, `/`, `%` involving integer or floating-point values
- comparison operators `>`, `>=`,`<`, `<=`, `==`, `!=` involving integer or floating-point values
- comparison operators `==`, `!=` involving boolean values
- comparison operators `==`, `!=` involving reference values
- the act of branching in an `if` statement 
    - evaluating the branch condition is not necessarily a constant time operation
- array access
- declaring a variable
- the act of assigning a value to a variable
    - evaluating the right-hand side of an assignment statement is not necessarily a constant time operation
- `break`, `continue` statements
- the act of returning a value from a method
    - evaluating the value returned in a `return` statement is not necessarily a constant time 
- the act of calling a method
- allocating a constant amount of memory (using `new` in Java)
    - if the array or object size depends on the problem size then this is not a constant time operation
    
An example of a branch condition in an `if` statement that is not a constant operation is:

```java
// t is a list of characters
if (t.contains('a')) { /* ... */ }
```

because `contains` potentially tests all $n$ elements of the list for equality to `'a'`.

An example of evaluating the right-hand side of an assignment statement that is not a constant operation is:

```java
// t is a list of characters
boolean b = t.contains('a');
```

for the same reason as the previous example.

An example of evaluating the value returned in a `return` statement that is not a constant operation is:

```java
// arr is an array
return Arrays.copyOf(arr);
```

because `copyOf` must copy the $n$ elements of `arr`.

An example of creating a new object that is not a constant time operation is:

```java
// s is a list
List<Integer> t = new ArrayList<>(s);
```

because the `ArrayList` constructor copies the $n$ references stored in `s`.

## Timing functions

To determine the timing function for an algorithm we count the number of fundamental operations as a function of the size of the input. When counting operations, we usually count only the operations that involve the actual data ignoring constant time operations such as loop control computations, branching statements, and index computations (although we cannot ignore such operations if they are functions of the input size).

Consider a simple loop that sums the elements of an array `arr` of length `n`:

```
// ALG1                           NUMBER OF OPERATIONS
sum = 0;                          1 assignment
for i = 0 to (n - 1)
    sum = sum + arr[i]            2 * n (1 sum, 1 assignment repeated n times)
```

Here we are using pseudocode instead of Java. Notice that we do not count the operations required to manage the loop (although there is nothing wrong if we do include these counts).

The timing function for the algorithm is $T_\text{ALG1}(n) = 2n + 1$.

Consider a slightly more complicated algorithm:

```
// ALG2                           NUMBER OF OPERATIONS
min = -infinity                   1 assignment
index = -1;                       1 assignment
for i = 1 to (n - 1)
    val = arr[i];                 n (1 assignment repeated n times)
    if (val < min)                n (1 comparison repeated n times)
        min = val                 n (1 assignment repeated n times)
        index = i                 n (1 assignment repeated n times)
    else
        min = min                 n (1 assignment repeated n times)
        index = index             n (1 assignment repeated n times)
```

The `else` part of the if statement does not do anything useful, but including it simplifies things for now because the if statement always requires the same number of operations regardless of which branch of the if statement runs. Only one branch of the if statement runs during a loop iteration so the timing function of the function is $T_\text{ALG2}(n) = 4n + 2$.

Now consider a third algorithm that prints an $n \times n$ pattern of alternating stars and dashes:

```
// ALG3                           NUMBER OF OPERATIONS
for i = 0 to (n - 1)
    for j = 0 to (n - 1)
        if (i + j) % 2 == 0
            print "*"             n^2 print operations
        else
            print "-"             n^2 print operations
    print "\n"                    n print operations
```

Here we assume that `print` does not move to the next line and that `print "\n"` prints a newline character. Only one branch of the if statement runs during a loop iteration so he timing function for the algorithm is $T_\text{ALG3}(n) = n^2 + n$.


## Growth rate of a timing function

We would like to use timing functions to compare the computational complexity of algorithms but the functions are inexact because we do not count every operation and we do not know how much time different constant time operations take to execute. Instead of examining the timing functions directly we examine the growth rate of the timing function.

The following table contains the values of $T_\text{ALG1}(n)$, $T_\text{ALG2}(n)$, and $T_\text{ALG3}(n)$ for values of $n$ that repeatedly double in magnitude:

| $n$ | `ALG1` | `ALG2` | `ALG3` |
| :- | :- | :- | :- |
| 1 | 3 | 6 | 2 |
| 2 | 5 | 10 | 6 |
| 4 | 9 | 18 | 20 |
| 8 | 17 | 34 | 72 |
| 16 | 33 | 66 | 272 |
| 32 | 65 | 130 | 1056 |
| ... |  |  |  |

The ratios of successive rows are:

- `ALG1`: $\frac{5}{3}$, $\frac{9}{5}$, $\frac{17}{9}$, $\frac{33}{17}$, $\frac{65}{33}$
- `ALG2`: $\frac{10}{6}$, $\frac{18}{10}$, $\frac{34}{18}$, $\frac{66}{34}$, $\frac{130}{66}$
- `ALG3`: $\frac{6}{2}$, $\frac{20}{6}$, $\frac{72}{20}$, $\frac{272}{72}$, $\frac{1056}{272}$

and it appears that the ratios $\frac{T_\text{ALG1}(2n)}{T_\text{ALG1}(n)}$,
$\frac{T_\text{ALG2}(2n)}{T_\text{ALG2}(n)}$, and
$\frac{T_\text{ALG3}(2n)}{T_\text{ALG3}(n)}$ converge to $2$, $2$, and $4$ as $n$ approaches infinity.

Instead of doubling the input size $n$ for each row of the table, what happens if we triple the input size? If you performed the necessary calculations you would see that the ratios
$\frac{T_\text{ALG1}(3n)}{T_\text{ALG1}(n)}$,
$\frac{T_\text{ALG2}(3n)}{T_\text{ALG2}(n)}$, and
$\frac{T_\text{ALG3}(3n)}{T_\text{ALG3}(n)}$ converge to $3$, $3$, and $9$ as $n$ approaches infinity.

For any positive integer value $k$ we can show that:

- $\frac{T_\text{ALG1}(kn)}{T_\text{ALG1}(n)} \leq k$


- $\frac{T_\text{ALG2}(kn)}{T_\text{ALG2}(n)} \leq k$


- $\frac{T_\text{ALG3}(kn)}{T_\text{ALG3}(n)} \leq k^2$


## Big-O notation

From your previous courses you should have been introduced to $O$-notation (big-O notation). For a given function $g(n)$ we say that $f(n)$ is an element of $O(g(n))$ if there are positive constants $c$ and $n_0$ such that

$$
| f(n) | \leq c | g(n) | \ \text{for all} \ n \geq n_0
$$

In the analysis of algorithms, $f(n)$ is a timing function (that should not take on negative values) and $g(n)$ is positive for positive values of $n$. With these assumptions in mind, some authors say that $f(n)$ is an element of $O(g(n))$ if there are positive constants $c$ and $n_0$ such that

$$
0 \leq f(n) \leq c g(n)  \ \text{for all} \ n \geq n_0
$$

How are the ratios described in the previous section related to $O$-notation? 

Consider the function $T_{\text{ALG1}}(n) = 2n + 1$. Suppose that there are some positive values $c$ and $n_0$ and such that $T_{\text{ALG1}}(n_0) \leq cn_0$. Now consider $T_{\text{ALG1}}(kn_0)$ for some positive value $k$. We know that

$$\begin{split}
\frac{T_\text{ALG1}(kn_0)}{T_\text{ALG1}(n_0)}  & \leq k \\
T_\text{ALG1}(kn_0) & \leq k T_\text{ALG1}(n_0) \\ 
& \leq kcn_0
\end{split}$$

Now substitute $n = kn_0$ to get

$$T_\text{ALG1}(n) \leq cn$$

Can we find values of $c$ and $n_0$ such that the inequality holds for all $n \geq n_0$? Yes, if we choose $c = 3$ and $n_0 = 1$ the inequality is sastified which implies

$T_\text{ALG1}(n) = 2n + 1$ is an element of $O(n)$.

What about $T_{\text{ALG3}}(n) = n^2 + n$? Suppose that there are some positive values $c$ and $n_0$ and such that $T_{\text{ALG3}}(n_0) \leq cn_0$. Now consider $T_{\text{ALG3}}(kn_0)$ for some positive value $k$. We know that

$$\begin{split}
\frac{T_\text{ALG3}(kn_0)}{T_\text{ALG3}(n_0)}  & \leq k^2 \\
T_\text{ALG3}(kn_0) & \leq k^2 T_\text{ALG3}(n_0) \\ 
& = k^2cn_0
\end{split}$$

Now substitute $n = kn_0$ and $k = \frac{n}{n_0}$ to get

$$\begin{split}
T_\text{ALG3}(n) & \leq kc(kn_0) \\
& = \frac{n}{n_0}cn \\
& = \frac{c}{n_0}n^2 \\
& = d n^2
\end{split}$$

where $d = \frac{c}{n_0}$ is a constant (because $c$ and $n_0$ are constants). Choosing $d = 2$ and $n_0 = 1$ ensures that the inequality is satisfied for all $n > n_0$ which implies

$T_{\text{ALG3}}(n) = n^2 + n$ is an element of $O(n^2)$.

For a polynomial timing function $T(n)$ it is easy to prove that $T(n)$ is in $O(n^m)$ where $m$ is the degree of the polynomial.

**Theorem** A degree-$m$ polynomial $f(n) = a_m n^m + a_{m-1} n^{m-1} + ... + a_1 n + a_0$ is an element of $O(n^m)$.

**Proof** For $n > 1$

$$\begin{split}
f(n) & = a_m n^m + a_{m-1} n^{m-1} + ... + a_1 n + a_0 \\
& \leq |a_m| n^m + |a_{m-1}| n^{m-1} + ... + |a_1| n + |a_0| \\
& = n^m \left(|a_m| + |a_{m-1}| / n  + ... + |a_1| / n^{m-1} + |a_0| / n^m \right) \\
& \leq n^m \left(|a_m| + |a_{m-1}|  + ... + |a_1| + |a_0| \right)
\end{split}$$

The inequality holds for $c = \left(|a_m| + |a_{m-1}|  + ... + |a_1| + |a_0| \right)$ and $n_0 = 1$ and therefore $f(n)$ is an element of $O(n^m)$.

### Significance of $O$-notation

What is the significance of $f(n)$ being in $O(g(n))$? For large values of $n$, the growth rate of $f(n)$ is no greater than the growth rate of $g(n)$. In other words, the growth rate of $g(n)$ is an upper-bound on the growth rate of $f(n)$.

## Common $O$-notation classes

| Dominant term | Big-$O$ class | Description | Example algorithm |
| :- | :- | :- | :- |
| $c$ | O$(1)$ | constant time | array access |
| $c \log n$ | O$(\log n)$ | logarithmic time | binary search of sorted array |
| $c n$ | O$(n)$ | linear time | linear search of an unsorte array |
| $c n \log n$ | O$(n \log n)$ | linearithmic time | fastest comparison-based sort |
| $c n^2$ | O$(n^2)$ | quadratic time | selection sort, matrix-vector multiplication |
| $c n^3$ | O$(n^3)$ | cubic time | naive matrix multiplication |
| $c k^n$ | O$(k^n)$ | exponential time | exhaustive search of $n$ digit combination lock |
| $c n!$ | O$(n!)$ | factorial time | Bogosort |

## Exercises

1. Prove that the ratio $\frac{T_\text{ALG1}(kn)}{T_\text{ALG1}(n)} \leq k$ for all positive integers $k$.

2. Prove that the ratio $\frac{T_\text{ALG2}(kn)}{T_\text{ALG1}(n)} \leq k$ for all positive integers $k$.

3. Prove that the ratio $\frac{T_\text{ALG3}(kn)}{T_\text{ALG1}(n)} \leq k^2$ for all positive integers $k$.

4. Prove that if $f_1(n)$ and $f_2(n)$ are both elements of $O(g(n))$ then $f_1(n) + f_2(n)$ is also an element of $O(g(n))$.

5. Prove that if $f_1(n)$ is an element of $O(g_1(n))$ and $f_2(n)$ is an element of $O(g_2(n))$ then $f_1(n) + f_2(n)$ is an element of $O(\text{max}(g_1(n), g_2(n)))$.

6. Prove that $T_{\text{ALG3}}$ is *not* an element of $O(n)$.

7. Prove that all constant valued functions $f(n) = c_f$ are elements of $O(1)$.