# Big O

In [17]:
#include <iostream>
using namespace std;

There are often many ways to solve a problem. 

How do you decide which algorithm to use?

- Fastest?
- Smallest? (uses least memory)
- Simplest? (easiest for humans to understand, modify, fix, etc.)

## How do you know how much time/space an algorithm will use?

- Empirical measurement
  - e.g. `%%timeit` 

- Analysis
  - e.g. look at the code and do some math

```c++
double compute_average(int n, double a[]) {
    double sum = 0;
    for (int i = 0; i < n; i++) {
        sum += a[i];
    }
    double mean = sum / n;
    return mean;
}
```

Assume each statement takes one unit of time. 

Define a function $T(n)$ that describes how many units of time this function takes.

$T(n) = 3n + 5$

As $n$ gets bigger, the total runtime gets bigger. 

In this example, the total runtime scales **linearly**. 

If $n$ where REALLY large, would you care about the $5$?

In [2]:
#include <cmath>

In [10]:
double n = 10;
3*n + 5

35.000000

In [11]:
n = 10000000;
3*n + 5

30000005.

$T(n) = n^2 + 5n + 2$

In this example, the runtime of the algorithm scales **quadratically**.

If $n$ were REALLY large, would you care about the $5n + 2$?

In [12]:
double n = 10;
pow(n, 2) + 5*n + 2

152.00000

In [14]:
double n = 10000000;
pow(n, 2) + 5*n + 2

1.0000005e+14

In [15]:
pow(n,2)

1.0000000e+14

$T(n) = 2^n + 5n^3 + 2n^2$

In this example, the runtime of the algorithm scales **exponentially**. 

If $n$ were REALLY large, would you care about the $5n^3 + 2n^2$?

In [16]:
5*pow(n,3) + 2*pow(n,2)

5.0000002e+21

In [7]:
pow(2,n)

inf

## Key idea

> **When considering the performance of an algorithm, the largest term dominates**

## Big O Notation

In computer science, we talk about the "order" of an algorithm as describing the dominating term in the algorithm's performance.

$T(n) = n^2 + n + 5$

We would say that this algorithm is $O(n^2)$ because it is dominated by the $n^2$ term.

$T_1(n) = n + 1000$

$T_2(n) = n^2 + 1$

Which algorithm has better runtime performance?

If $n$ is small, then no one cares. We built computers to handle situations with big $n$.

When $n$ is BIG, which algorithm is better?

What is the order of $T_1$?

What is the order of $T_2$?

$O(T_1) = O(n)$

$O(T_2) = O(n^2)$

Another way of thinking about big-O is that it describes the rate of growth.

As $n$ grows, how quickly does the algorith runtime grow?

$O(1)$ has **constant** runtime (no matter how big the input is, the result is computed in constant time).

$O(n)$ grows **linearly**. 

$O(\log n)$ grows **logarithmically**.

$O(n \log n)$ we say "grows with **n log n**".

$O(n^2)$ grows **quadratically**.

$O(n^2)$, $O(n^3)$, and $O(n^6)$ have **polynomial** growth. 

$O(2^n)$ has **exponential** growth. 

$O(n!)$ has **factorial** growth.

```c++
void foo(int n) {
    // Brilliant algorithm implemented here...
}
```

Assume `foo(numbers)` takes 10 seconds for $n = 10$:

- If `foo` is $O(n)$, about how long will `foo` take for $n = 11$? For $n = 20$?
- If `foo` is $O(n^2)$, about how long will `foo` take for $n = 11$? For $n = 20$?
- If `foo` is $O(2^n)$, about how long will `foo` take for $n = 11$? For $n = 20$?
- If `foo` is $O(\log n)$, about how long will `foo` take for $n = 11$? For $n = 20$?
- If `foo` is $O(n\log n )$, about how long will `foo` take for $n = 11$? For $n = 20$?
- If `foo` is $O(n!)$, about how long will `foo` take for $n = 11$? For $n = 20$?

- If `foo` is $O(n)$, about how long will `foo` take for $n = 11$? For $n = 20$?
  - 11, 20
  
  
- If `foo` is $O(n^2)$, about how long will `foo` take for $n = 11$? For $n = 20$?
  - 12.1, 40
  
  
- If `foo` is $O(2^n)$, about how long will `foo` take for $n = 11$? For $n = 20$?
  - 20, 10240
  
  
- If `foo` is $O(\log n)$, about how long will `foo` take for $n = 11$? For $n = 20$?
  - 10.4, 13.0
  
  
- If `foo` is $O(n \log n )$, about how long will `foo` take for $n = 11$? For $n = 20$?
  - 11.5, 26.0
  
  
- If `foo` is $O(n!)$, about how long will `foo` take for $n = 11$? For $n = 20$?
  - 110, 6.7e+12  (yes, that's 6.7 *TRILLION*)

#### $n^2$

In [31]:
(10 / pow(10,2)) * pow(11, 2)

12.100000

In [32]:
(10 / pow(10,2)) * pow(20, 2)

40.000000

#### $2^n$

In [34]:
(10 / pow(2,10)) * pow(2,11)

20.000000

In [33]:
(10 / pow(2,10)) * pow(2,20)

10240.000

#### $\log n$

In [4]:
(10 / log2(10)) * log2(11)

10.413927

In [5]:
(10 / log2(10)) * log2(20)

13.010300

#### $n \log n$

In [6]:
(10 / (10 * log2(10))) * 11 * log2(11)

11.455320

In [7]:
(10 / (10 * log2(10))) * 20 * log2(20)

26.020600

#### $n!$

In [1]:
double factorial(double n) {
    double result = 2;
    for (int i = 3; i <= n; i++) {
        result *= i;
    }
    return result;
}

In [3]:
(10 / (factorial(10))) * factorial(11)

110.00000

In [4]:
(10 / (factorial(10))) * factorial(20)

6.7044257e+12

## Key Ideas

> **If your algorithm runs in exponential time...you lose.**

> **If your algorithm runs in factorial time... ... ... ... ... ... ... .........**

## Key Ideas

> **$O(1)$ is unstoppable! (but typically rare).**

> **$O(\log n)$ is FANTASTIC.**

> **$O(n)$ is very good.**

> **$O(n \log n)$ is pretty decent (especially when compared to alternate algorithms).**

> **$O(polynomial)$ is...better than exponential. 😬**

## Definition of $O$

$T(n)$ is $O(f(n))$ if there exists two constants $c$ and $n_o$ such that:

$$
T(n) \le c \cdot f(n) \;\; \forall n \ge n_o
$$

## How do I determine the big-O from $T(n)$

Essentially:

1. Keep the dominant term/drop low-order terms
2. Drop the constants

$5n^3 + 2n^2 + 7n + 1000$ => $O(n^3)$ 

### Bad big-O styles
- $O(n+2)$
- $O(n^2 + n)$
- $O(2n)$

## How do you determine the Big O of an algorithm?

Simple statements have constant time.

```c++
int small = 1 + 1;
```

A block of simple statements has constant time.

```c++
int a = 1;
int b = 2;
int c = a + b;
```

To calculate the big-O of a loop:

- Determine the big-O of the loop body
- Multiply that by the number of times the loop runs

```c++
for (int i = 0; i < n; i++) {
    a = a + i;
}
```

- Constant time body => $O(1)$
- $n$ times through the loop => $O(n)$

```c++
for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        a = a + i*j;
    }
}
```

- Constant time body => $O(1)$
- $n$ times through $j$ loop => $O(n)$
- $n$ times through $i$ loop => $O(n^2)$

```c++
for (int i = 0; i < 76; i++) {
    a = a + i;
}
```

- Constant time body => $O(1)$
- 76 times through the loop => $O(1)$

What is the big-O of a sequence of statements?

```c++
for (int i = 0; i < n; i++) {
    a = a + i;
}
for (int i = 0; i < n; i++) {
    a = a + i;
}
```

You add the runtime of consecutive statments.

$n + n = 2n$ => $O(n)$

What is the runtime of conditional statements?

```c++
if (x == 0) {
    for (int i = 0; i < n; i++) {
        a = a + x*i;
    }
} else {
    for (int i = 0; i < n; i++) {
        for (int j = 0; i < n; i++) {
            a = a + x*i*j;
        }
    }
}
```

- Find the big-O of each branch => $O(n)$, $O(n^2)$
- Take the larger of the two => $O(n^2)$

## Find the big-O for each set of statements

1.
```c++
for ( int i = 0; i < n; i++ )
  sum++;
```


2.
```c++
for ( int i = 0; i < n; i += 2 )
  sum++;
```


3.
```c++
for ( int i = 0; i < n; i++ )
  for ( int j = 0; j < n; j++ )
    sum++;
  for ( int j = 0; j < n; j++ )
    sum++;
```


4.
```c++
for ( int i = 0; i < n; i++ )
  sum++;
for ( int j = 0; j < n; j++ )
  sum++;
```


5.
```c++
for ( int i = 0; i < n; i++ )
  for ( int j = 0; j < n*n; j++ )
    sum++;
```


6.
```c++
for ( int i = 0; i < n; i++ )
  for ( int j = 0; j < i; j++ )
    sum++;
```


7.
```c++
for ( int i = 0; i < n; i++ )
  for ( int j = 0; j < n*n; j++ )
    for ( int k = 0; k < j; k++ )
      sum++;
```

1. $O(n)$
2. $O(n)$
3. $O(n^2)$
4. $O(n)$
5. $O(n^3)$
6. $O(n^2)$
7. $O(n^5)$

```c++
for ( int i = 0; i < n; i++ )
  for ( int j = 0; j < i; j++ )
    sum++;
```

The inner loop doesn't repeat $n$ times. However, as $n$ increases, the number of times that loop runs increases.

```
-
--
---
----
-----
------
```

The actual number of statements (i.e. $T(n)$) is $\approx \frac{n^2}{2}$.

So the big-O is $O(n^2)$.

What is the big-O of this algorithm?

```c++
for (i = 0; i < n-1; i++) {
  small = 1;
  for (j = i+1; j < n; j++)
    if (A[j] < A[small])
      small = j;
  temp = A[small];
  A[small] = A[i];
  A[i] = temp;
}
```

$O(n^2)$

What is the big-O of this algorithm?

```c++
int maxSum = 0;

for( int i = 0; i < n; i++ )
  for( int j = i; j < n; j++ ) {
    int thisSum = 0;
    for( int k = i; k <= j; k++ )
      thisSum += a[ k ];
    for( int k = i; k <= j; k++ )
      thisSum += 10 + a[ k ];
    if( thisSum > maxSum ) {
      maxSum = thisSum;
      seqStart = i;
      seqEnd = j;
    }
  }
for( int k = i; k <= j; k++ )
  thisSum += a[ k ];
return maxSum;
```

$O(n^3)$

## Logarithms

If 

$b^x = k$

then 

$log_b k = x$

So $10^3 = 1000$, and $\log_{10} 1000 = 3$

So $2^5 = 32$, and $\log_{2} 32 = 5$

What are the following values?

$\log_{10} 10000$

$\log_2 16 $

What is the *base* of each logarithm?

What is an easy way to estimate the $\log_{10}$?

- Count the number of digits

What is an easy way to estimate the $\log_{2}$?

- Count the number of bits (i.e. binary digits)

Remember: $\log(a \cdot b) = \log(a) + \log(b)$

How does $\log n$ compare to $n$ for large values of $n$?

In [5]:
log2(100000)

16.609640

In [6]:
log10(100000)

5.0000000

In [7]:
log2(10) / log10(10)

3.3219281

In [8]:
log2(1000) / log10(1000)

3.3219281

Changing the logarithm base only scales the result by a constant factor.

So...does the base matter in big-O notation?

## Logs in Big-O

If you start with $X = 1$, how many times you can double $X$
before $X$ becomes greater than or equal to $N$?

If you start with $X = N$, how many times you can cut $X$ in half
before $X$ becomes less than or equal to $1$?

When you are analyzing code to find the Big-Oh,
when do you use a logarithm in the Big-Oh formula?

1.
```c++
for ( int x = 1; x < n; x *= 2 )
  sum++;
```

2.
```c++
for ( int i = 0; i < n; i++ )
  for ( int x = n; x > 1; x /= 2 )
    sum++;
```

3.
```c++
for ( int x = n; x > 1; x /= 10 )
  sum++;
```

4.
```c++
for ( int x = 1; x < n; x *= 10 )
  for ( int i = 0; i < n; i++ )
    sum++;
```

1. $O(\log n)$
2. $O(n \log n)$
3. $O(\log n)$
4. $O(n \log n)$

### What is the time-complexity (big-O) of the binary search algorithm?

- In each iteration I compute an index, retrieve a value, compare it, and compute new min/max indexes (all $O(1)$)
- How many times do I perform an iteration?
  - Each time I iterate, the range decreases by a factor of $\frac{1}{2}$
  - How many times can I halve the range before I reach 1?
    - $\log n$
- so total work is $O(1) \cdot O(\log n) \rightarrow O(\log n)$

## Logarithm and Big-O Key Ideas

- $O(\log n)$ is WAY better than $O(n)$
- $O(n \log n)$ is slower than $O(n)$, but WAY better than polynomial performance.
- Logarithms show up in big-O when you half the problem space with each iteration
  - Each iteration through a loop deals with half as many as before
  - Each recursive function call gets half the current collection
  - The ratio doesn't really matter (half, third, fourth, etc.)