# Big O Notation

## Introduction
>Software engineers spend a significant amount of time improving the efficiency of code: increasing the __algorithm speed__ and decreasing the space complexity. The demand for efficiency has led to the development of different methods and the normalisation of such methods to ensure uniform analysis conditions for every algorithm.

This efficiency measurement is known as __asymptotic analysis,__ and it provides information on the computational cost (or complexity) of an algorithm as a function of different parameters, including the input size.


## Time Complexity

Imagine a simple `for` loop that iterates through `n` elements:
`for i in range(n)`.




![](images/BigO_1.gif)

Now, imagine a nested `for` loop:
```
for i in range(n):
    for j in range(n):
```

![](images/BigO_2.gif)



The first graph indicates that the run time of the code increases proportionally to the number of inputs.

The second graph indicates that the run time of the code increases proportionally to the square of the number of inputs.

There is a shorthand for writing this: Big O notation, which represents the time complexity in the worst-case scenario (WCS). 

It has the following presentation: `O(n`<sup>`2`</sup>`)`, where n is the number of inputs, and n<sup>2</sup> is the time complexity. Thus for the first and second cases, we have `O(n)` and `O(n`<sup>`2`</sup>`)`, respectively.

### WCS 
Here, we describe the WCS for a `for` loop. Consider a case where we are attempting to find a specific number in a list. The WCS would be finding the number at the end of the list.

_Similarly, big Ω exists for the Best Case Scenario and big θ for the combination of both. However, we will focus on big O here._

### Rules


Having learnt how to define the Big O notation, we will explore some rules to follow when calculating the Big O of an algorithm.


#### Rule 1

If a function, `f(n)`, performs a sequence of steps `n` times, then the performance run time is given by `O(f(n))` or `O(n)`. We have already seen this with iterating through a list.



#### Rule 2

If a function takes f(n) steps to run and another takes g(m) steps to run, then the complexities can be added to yield the O(F(n)+g(m)) or O(m + n) complexity.

Consider the example:

```
for x in range(n):
    do something
for y in range(m):
    do something
```

The resulting time complexity will be O(n+m).

#### Rule 3
 
If a function takes f(n) steps to run and another takes g(m) steps to run with g(m) > f(n), then the resulting complexity would be O(g(m)). In summary, when adding complexities, the non-dominant terms are excluded.

Consider the example: 

```
for x in range(n):
    do something

for k in range(n):
    do something
    for j in range(n):
        do something
```

Here, one would expect the time complexity to be O(n + n<sup>2</sup>); however, notice that for considerably large values of n, the n<sup>2</sup> term will dominate the n term. Therefore, we can remove the dominant terms, resulting in a time complexity of O(n<sup>2</sup>).

> As a rule of thumb, if there are two functions: f(n) and g(n), and the algorithm is of the form, 'do this, then that', add the complexities.

#### Rule 4

If there are two functions: f(n) and g(n), and the algorithm takes f(n) steps and for every step, another g(n) steps are taken, then multiply the complexities.

Consider the aforementioned nested loop as an example:
```
for k in range(n):
    do something
    for j in range(n):
        do something
```

The resultant complexity will be O(n * n) or O(n<sup>2</sup>).

> As a rule of thumb, if an algorithm is of the form, 'Do this every time you do that', multiply the complexities.

#### Rule 5

For rule 5, we exclude the constants. Big O measures the processing time for your input (in the WCS). Consider the example:

```
for x in range(len(n)):
    y = 3 * x
```

This will result in a complexity of O(n), not O(3n). This is because, once again, the n term will dominate the constant for considerably large values of `n`. 

### Example: Fibonacci Big O
Let us consider a classical example: Recursive Fibonacci vs Loop Fibonacci. 

In [None]:
def recur_fibo(n):
    if n <= 1:
        return n
    else:
        return(recur_fibo(n-1) + recur_fibo(n-2))


# check if the number of terms is valid
def while_fib(n):
    n1, n2 = 0, 1
    count = 0
    while count < n - 1:
        nth = n1 + n2
        # update values
        n1 = n2
        n2 = nth
        count += 1
    return nth

Now, we time each function.

In [None]:
import time
n = 35
t_0 = time.time()
print(recur_fibo(n))
print(f'Recursive Fib took {time.time() - t_0} s')


t_0 = time.time()
print(while_fib(n))
print(f'While Fib took {time.time() - t_0} s')

To improve your understanding, consider determining the big O of each algorithm.

## Space Complexity



Here, we explore space complexity, which can also be measured using the big O. In essence, it measures the amount of memory allocated to run a program.

Thus, O(n) indicates that for each input processed, one variable is added per processed input.

The process of iterating through lists, dictionaries, sets or tuples usually has a linear Big O (`O(n)`) in terms of time complexity. However, the process of fetching specific items occurs speedily in dictionaries and sets.

In [None]:
my_dict = {'One': 1, 'Two': 2, 'Three': 3, 'Four': 4}
my_list = ['One', 'Two', 'Three', 'Four']

In this case, we may not observe a difference when checking if, for example, 'Three' exists; however, for large data inputs, dictionaries are associated with comparatively high speed.

### Generators: a solution for space complexity

The storage of all information in a list will take `O(n)` in terms of space complexity. Thus, large datasets are associated with significant data complexity. 

As an alternative, a generator is preferred since it takes `O(1)` in space complexity (provided that its values are not appended in another list).

In [2]:
from sys import getsizeof

def my_gen(n):
    for i in range(n):
        yield i

y = my_gen(20)
s = next(y)

In [5]:
print(s)

0


Run the following cell multiple times to confirm that the generator is working.

In [9]:
next(y)

4

In [None]:
print(getsizeof(y))

Observe that there is only one variable for iterating through 20 items.

In [None]:
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [None]:
print(getsizeof(my_list))

Although iterating through `my_list` will take the same time complexity, a list with 20 memory allocations is required.

## More Big O Possibilities

Big O can show many complexities, and as we progress in this unit, we will explore more of them.

![](images/BigO_3.jpg)

![](images/BigO_4.png)

As you can observe in the above graph, the input size is low and some algorithms are better than others. For example, O(n^2) appears better than O(n) at the beginning. However, this is only true if the input size is less than 1. Note that in other cases, the input size does not have to be greater than 1 for this to be true; it will also depend on the operation.

> Complexity varies with the algorithm. Complexities should be compared when they are employed for the same purpose.

## Structures for Improving Complexity

Before proceeding, it is worth mentioning that there are many data structures in Python that can save significant time and space. Many of these structures are in the library named, `collections`. To demonstrate, we will work with `defaultdict`. We will attempt to determine the element with the highest occurrence frequency in a list or another structure.

In [10]:
def get_number_with_highest_count(counts: dict) -> int:
    max_count = 0
    for number, count in counts.items():
        if count > max_count:
            max_count = count
            number_with_highest_count = number
    return number_with_highest_count


def most_frequent(numbers: list) -> int:
    counts = {}
    for number in numbers:
        if number in counts:
            counts[number] += 1
        else:
            counts[number] = 1

    return get_number_with_highest_count(counts)

Although this solution works, an improvement in conciseness can be achieved using `defaultdict`, which will save you the work of checking if an element is already in the dictionary.

In [11]:
from collections import defaultdict


def get_number_with_highest_count(counts: dict) -> int:
    max_count = 0
    for number, count in counts.items():
        if count > max_count:
            max_count = count
            number_with_highest_count = number
    return number_with_highest_count


def most_frequent(numbers: list) -> int:
    counts = defaultdict(int) # <-- If the key does not exist, `defaultdict` will create a new key whose value is an integer
    for number in numbers:
        counts[number] += 1

    return get_number_with_highest_count(counts)

Although this code is more concise than the former, there is room for improvement. The same `collections` library has a class named `Counter`, which, as the name suggests, counts the number of times an element appears:

In [12]:
from collections import Counter

my_list = [1, 1, 2, 3, 4, 5, 6, 8, 8, 9, 1, 11, 1, 1, 14, 15, 16, 17, 18, 19]

counts = Counter(my_list)
print(counts)

Counter({1: 5, 8: 2, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 9: 1, 11: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1})


Thus, we can implement a Counter inside our code, as follows:

In [13]:
from collections import Counter


def get_number_with_highest_count(counts):
    max_count = 0
    for number, count in counts.items():
        if count > max_count:
            max_count = count
            number_with_highest_count = number
    return number_with_highest_count


def most_frequent(numbers):
    counts = Counter(numbers)
    return get_number_with_highest_count(counts)

Now, we test:

In [14]:
get_number_with_highest_count(counts)

1

It appears that the code cannot be improved further. However, on paying close attention to the `get_number_with_highers_count` function, we could simply use the `max` function that is built in Python by default. The problem is that, by default, applying `max` to a dictionary will return the maximum of the values of its keys. If you attempt to apply it to the values, you will obtain the maximum value amongst the values; however, you will not know the key to which it corresponds.

As a solution, we can apply lambda functions. Recall that many functions, such as sort, filter, map, etc., accept a key that will define the rules to compare the values from which the max value is determined.

In [15]:
max(counts.values())

5

In [16]:
from collections import Counter


def get_number_with_highest_count(counts):
    '''
    Get a dict and return the key corresponding to the highest value
    '''
    return max(
        counts,
        key=lambda number: counts[number] # maximise by value
    )

def most_frequent(numbers):
    counts = Counter(numbers)
    return get_number_with_highest_count(counts)

Let us do a final check:

In [18]:
most_frequent(my_list)

1

Everything works properly, achieved with relatively few lines of code.

These examples serve as demonstrations and are by no means a measure of the difficulty of applying these techniques. Bear in mind that your understanding of these techniques will improve with time and practice. Eventually, you will be able to detect when your code can be improved. Additionally, you may have observed that we used two new classes, which you may not be familiar with. Since there are many Python libraries, it is imperative that you harness and improve your research skills to enable you take on problems and tackle errors.

## Tools for Measuring Complexity

We have learnt that time and space complexity improvements are essential. Conveniently, to ascertain that you are on the right path, Python offers ways to measure both of them.

### timeit

`timeit` is a module that can be used in the CLI or in your code.

#### In the command line

In [None]:
!python -m timeit "total=sum(range(1000))"

#### In code

In [None]:
from timeit import timeit

result = timeit(stmt='total=sum(range(1000))', number=5000)
print(result/5000)

This tool can be utilised to determine the run time of code snippets.

### CPU profiling

You can evaluate more metrics from your code, such as the number of times that each piece of code ran, the time spent on each part, and the time spent on each call. To check, download the following file, and run the following cell.

In [None]:
!wget https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/cpu.py

In [None]:
!python -m cProfile --sort tottime cpu.py

As can be observed, information is provided on everything, including all imports and calls.

This will help you identify bottlenecks in your code and optimise your algorithms.

## Conclusion

At this point, you should have a good understanding of
- big O notation.
- time complexity and the rules to follow.
- space complexity and generators.
- how to measure and improve complexity.

## References

[Wikipedia](https://en.wikipedia.org/wiki/Big_O_notation)

[Big O Notation](https://www.freecodecamp.org/news/big-o-notation-why-it-matters-and-why-it-doesnt-1674cfa8a23c/)

[Introduction to the theory of computation](http://fuuu.be/polytech/INFOF408/Introduction-To-The-Theory-Of-Computation-Michael-Sipser.pdf)
