# Module 4 - Intro to Algorithms

An algorithm is a list of procedures that, when followed exactly, solves a specific problem.

## Complexity

### Examples, Measuring

There are multiple ways to solve the same problem. Complexity in coding refers to the computational complexity, which results in the amount of time it takes a computer to process the algorithm.

**Example**: Sum of all integers from 1 to N

In [24]:
def sumOfN(n):
    sum = 0
    for i in range (1,n+1):
       sum = sum + i
    return sum 

In [25]:
print (sumOfN(10))

55


We can find how long a process takes with the `time` module

In [26]:
import time

def sumOfN2(n):
    start = time.time()

    sum = 0
    for i in range (1,n+1):
        sum = sum + i
    
    end = time.time()

    return sum, end-start

In [27]:
print(sumOfN2(10000))
print(sumOfN2(100000))
print(sumOfN2(1000000))

#Increases linearly

(50005000, 0.0)
(5000050000, 0.003499269485473633)
(500000500000, 0.03959941864013672)


Optimizing an algorithms processing time can be dramatic. This is a faster method of calculating the sum of 1 to N:

In [28]:
def sumOfN3(n):
    start = time.time()
    sum = (n*(n+1)/2)
    end = time.time()
    return sum, end-start

In [29]:
print(sumOfN3(10000))
print(sumOfN3(100000))
print(sumOfN3(1000000))

#Does not increase, runs in constant time!

(50005000.0, 0.0)
(5000050000.0, 0.0)
(500000500000.0, 0.0)


### Big-O Notation: `O( f(n) )`

- Measures time and space complexity of an algorithm
- Asymptotic complexity as n grows large

If an algorithm's complexity is g(n), one can say g(n) has a complexity of O (f (n) ) if there exist M and n<sub>0</sub> such that |g(n)| <= M|f(n)| for all n >= n<sub>0</sub>


Variables:
- n: number of items in the input set
- f: operation done per item

Calculation steps:

1. Break the algorithm/function into individual operations
2. Calculate the BigO of each operation
3. Add up the BigO of each operation together
4. Remove constants
5. Find the highest order item: this will be the BigO of the overall algorithm/function

n<sup>3</sup> > n<sup>2</sup> > n*log(n) > n > log(n) > 1

Remember, computers are binary. So any logarithm without a base is assumed to be base 2.

O(1):
- runs at constant time
- not depenent on the size of the input data
- time required to run is the same every single time
- addition, subtraction, and most basic lookups

O(n):
- runs at linear time
- the number of operations increases linearly with the size of the input data
- the operation is performed for *each item* in an input
- for `loops`, `.shift()`, `.unshift()`

O(n<sup>2</sup>):
- runs at quadratic time
- perform a linear operation for *each value* in an input, not just the input itself
- nested loops (since O(n*n) = O(n<sup>2</sup>))

Once you have more than a million data points, quadratic or more complex functions are not practically feasible.

If an O(n) code took 1 hour to process n = 1000....

- O(n<sup>2</sup>) would take 1,000 hours
- O(n<sup>3</sup>) would take 1,000,000 hours
- O(n log n) would take 10 hours 

#### `O(n`<sup>`2`</sup>`)`

In [30]:
def example1(n):
    test = 0
    for i in range(n):
        for j in range(n):
            test = test+i*j
    return test

#### `O(n)`

In [31]:
def example2(n):
    test = 0
    for i in range(n):
        test = test+1

    for j in range(n):
        test = test - 1
    return test

#### `O(n*log`<sub>`2`</sub>`(n))`

It decreases by half each time.... so it's a log in base 2!

In [32]:
def example3(n):
    i = n
    while i > 0:
        k = 2+2
        i = i//2
    return k

## Anagram Example

An anagram is a word or phrase formed by rearranging another word

To detect an anagram, check if two strings are anagrams. Assume if they are not the same length, they cannot be anagrams.

### Brute Force

1. Select first letter from string 1
2. Check whether the letter exists in string 2
- -- if it does exist, delete the letter from string 2, keep going
- -- if it does not exist, STOP! Can't be anagrams
3. Select the next letter from string 1, and repeat step 2.

In [33]:
def anagramSolution1(s1,s2):
    alist = list(s2)
    pos1 = 0
    stillOK = True
    while pos1 < len(s1) and stillOK:
        pos2 = 0
        found = False
        while pos2 < len(alist) and not found:
            if s1[pos1] == alist[pos2]:
                found = True
            else:
                pos2 = pos2+1
        if found:
            alist[pos2] = None
        else:
            stillOK = False
        pos1 = pos1+1
    return stillOK

In [34]:
print(anagramSolution1('abcd','dcba'))

True


This solution is O(n<sup>2</sup>)

### Sort and Compare

We sort the input strings and then compare then 1:1 for each letter in each

In [35]:
def anagramSolution2 (s1,s2):
    alist1 = list(s1)
    alist2 = list(s2)
    alist1.sort()
    alist2.sort()
    pos = 0
    matches = True
    while pos < len(s1) and matches:
        if alist1[pos] == alist2[pos]:
            pos = pos+1
        else:
            matches = False
    return matches

In [36]:
print(anagramSolution2('abcde','edcba'))

True


Sorting takes n*logn on average

This solution has nlogn+nlogn+n

This solution is O(nlogn)

### "Worst Case"

The worst case algorithm would be one that generates all possible anagrams of s1 and compares them with s2

This would result in n! anagrams generated, which the function would then iterate over, resulting in O(n!)

### "Best Case": Binned Count

In [37]:
def anagramSolution4(s1,s2):
    c1 = [0]*26
    c2 =[0] *26
    for i in range(len(s1)):
        pos = ord(s1[i])-ord('a')
        c1[pos] = c1[pos]+1
    for i in range(len(s2)):
        pos = ord(s2[i])-ord('a')
        c2[pos] = c2[pos]+1
    j = 0
    stillOK = True
    while j<26 and stillOK:
        if c1[j]==c2[j]:
            j=j+1
        else:
            stillOK = False
    return stillOK

In [38]:
print(anagramSolution4('apple','pleap'))

True


This solution is n + n + 26,

This solution is O(n)

## Complexity of Python Lists

### Complexity of List Creation: Examples

#### Concatenations

In [39]:
def test1():
    l = []
    for i in range(1000):
        l = l + [i]
    return l

When adding a list of length k to a list of length n when concatenating, it iterates over the list of length k. 

The for loop is O(n), with O(1) each time. 

Overall, this method is O(n)

#### Appending

In [40]:
def test2():
    l = []
    for i in range(1000):
        l = l.append[i]
    return l

Appending only needs to add something to the end, not iterate over, so it is O(1). 

The for loop is O(n), meaning that overall, this method is O(n)

#### List Comprehension

In [41]:
def test3():
    l = [i for i in range(1000)]
    return l

Range is a sequence of 3 operations of O(1). The list comprehension is similar to a for loop that appends, iterating over each element once as it is created.

It is O(n)

#### List with Range

In [42]:
def test4():
    l = list(range(1000))
    return l

Calling list(inter(n)) has a complexity of O(len(inter)). Since range is a sequence of 3 functions of O(1), this solution is O(n)

### `Pop()` vs `Pop(0)`

`pop()` is constant time

`pop(0)` is linear time, because to remove the first element means we have to *shift every single item's index*

### Summary Table

| Operation        | Complexity |
|------------------|------------|
| Index[]          | O(1)       |
| Index Assignment | O(1)       |
| Append           | O(1)       |
| pop()            | O(1)       |
| pop(i)           | O(n)       |
| insert(i,item)   | O(n)       |
| remove(item)     | O(n)       |
| sort()           | O(nlogn)   |
| concatenate      | O(k)       |
| contains(in)     | O(n)       |

## Complexity of Python Dictionaries

### Indexing: Dictionary vs List

Since a list must iterate over each item in the list to see if it exists. Checking if an item exists in a list is O(n).

In contrasts, dictionaries utilize hashing in order to make matches in constant time. Checking if a key exists in a dictionary is O(1)

### Summary Table

| Operation   | Complexity |
|-------------|------------|
| Copy        | O(n)       |
| get item    | O(1)       |
| set item    | O(1)       |
| delete item | O(1)       |
| iteration   | O(n)       |

## Complexity of Sorting

Sorting is the process of placing elements of a collection in an order.

So long as an algorithm allows you to compare items and set a specific order, it is a sorting algorithm.

Many, many sorting algorithms have been developed and analyzed. Sorting a large number of items can take a substantial amount of computing resources.

1. Compare two values to determine which should come first 
- number of comparisons determine the complexity
2. Put items in the right order
- move to a specific position
- swap two items

### Bubble Sort

One of the simplest sorting algorithms that exists.

Compares adjacent items and exchanges those that are out of order.

In each pass, the largest (or smallest) item will move to the end of the list.

This results in n(n-1)/2 comparisons made. Overall, this is O(n<sup>2</sup>)!

It is also more inefficient than other O(n<sup>2</sup>) algorithms, because of the excessive number of exchanges.

#### Improvements?

If the list is 'already sorted', it could stop after (n-1) comparisons. It could stop after it makes a pass without any exchanges!

Best case, O(n), worst case n(n-1)/2

## Merge and Sort

A "divide and conquer" strategy, where two sorted lists would be merged into a single sorted list.

1. Make each item a 'sublist' of its own
- -- N steps
2. At each pass, merge 2 'sublists' into 1 sorted list
- -- Each merge is N comparisons
3. Repeat until all sublists are merged
- -- Results in log n "passes", with n comparisons in each pass

Each "row" is logn complexity, and there are a maximum of "rows" in the process.

Overall, this is O(n*logn) in complexity, making it one of the most efficient methods to sort large data sets.

However, there are additional memory requirements to keep 'sorted sublists' and merge.