# _10. Understanding Program Efficiency, Part 1_

Notebook follows along with the [tenth video](https://www.youtube.com/watch?v=o9nW0uBqvEo&t=6s) in MIT's 6.0001 Introduction to Computer Science and Programming in Python, Fall 2016.

### _Want to Understand Efficiency of Programs_

- how can we reason about an algorithm in order to predict the amount of time it will need to solve a problem of a particular size?
- how can we related choices in algorithm design to the time efficiency of the resulting algorithm?
    - are there fundamental limits on the amount of time we will need to solve a particular problem?
- computers are fast and getting faster - so maybe efficient programs don't matter?
    - but data sets can be very large (e.g., in 2014, Google served 30,000,000,000,000 pages, covering 100,000,000 GB - how long to search brute force?)
    - thus, simple solutions may simply not scale with size in acceptable manner
- how can we decide which option for program is most efficient?


### _How To Evaluate Efficiency of Programs_

- measure with a timer
- count the operations
- abstract notion of order of growth
    - will argue that this is the most appropriate way of assessing the impact of choices of algorithm in solving a problem; and in measuring the inherent difficulty in solving a problem


### _Timing a program_

- use `time` module
- recall that importaing means to bring in that class into your own file

In [2]:
import time

def c_to_f(c):
    return c*9/5 + 32

# start clock
t0 = time.clock()
# call function
c_to_f(100000)
# stop clock
t1 = time.clock() - t0
print(f't = {t1}')


t = 3.799999999998249e-05


### _Timing Programs is Inconsistent_

- GOAL: to evaluate different algorithms
- running time varies between algorithms
- running time varies between implementations
- running time varies between computers
- running time is not predictable based on small inputs
- time varies for different inputs but cannot really express relationship between inputs and time

### _Counting Operations_

- assume these steps take **constant time**
    - mathematical operations
    - comparisons
    - assignments
    - accessing objects in memory
- then count the number of oerpations executed as function of size of input


In [0]:
def c_to_f(c):
    return c*9.0/5 + 32 # 3 operations

def mysum(x):
    total = 0 # 1 operation
    for i in range(x+1): # one operation, looping x times
        total += 1 # two operations
    return total

# mysum --> 1 + 3x

### _Counting Operations is Better, But Still..._

- GOAL: to evaluate algorithms
- count depends on algorithm
- count depends on implementations
- count independent of computers
- no clear definition of which operations to count
- count varies for different inputs and can come up with a relationship between inputs and counts

### _Different Inputs Change How the Program Runs_

In [0]:
# a function that searched for an element in a list
def search_for_elmt(L, e):
    for i in L:
        if i == e:
            return True
    return False

- at most, 3 steps within loop
- `e` is not in list, will go through entire list before returning `False`

### _Best, Average, Worst Cases_

- suppose you are given a list `L` of some length `len(L`
- best case: minimum running time over all possible inputs of a given size `len(L)`
    - constant for `search_for_elmt`
    - first element in any list
- average case: average running time over all possible inputs of a given size `len(L)`
    - practical measure
- worst case: maximum running time over all possible inputs of a given size `len(L)`
    - linear in length of list for `search_for_elmt`
    - must search entire list and not find it

### _Exact Steps vs O()_

In [0]:
def fact_iter(n):
    '''assumes n an int >= 0'''
    answer = 1 # one operation
    while n > 1: # test n
        answer *= n # two operations
        n -= 1 # two operations
    return answer

- computes factorial
- number of steps: `5n + 2`
- worst case asumptotic complexity
    - ignore additive constants
    - ignore multiplicative constants

### _Simplification Examples_

- drop constants and multiplicative factors
- focus on dominant terms
- polynomial

```
O(n**2) : n**2 + 2n + 2
O(n**2)  : n**2 + 100000n + 3**1000
O(n)  : log(n) + n + 4
O(n log n)  : 0.0001*n*log(n) + 300n
O(3**n)  : 2*n**30 + 3**n
```

### _Types of Orders of Growth_

- constant
- linear
- quadratic
- logarithmic
- n log n
- exponential

### _Analyzing Programs and their Complexity_

- combine complexity classes
    - analyze statements inside functions
    - apply some rules, focus on dominant term
- Law of Addition for `O()`:
    - used with sequential statements
    - `O(f(n)) + O(g(n))` is `O(f(n) + g(n))`

In [0]:
# for example 
for i in range(n): # O(n)
    print('a')
for j in range(n * n): # O(n * n)
    print('b')

- is `O(n) + O(n * n)` = `O(n + n**2)` = `O(n**2)`
    - because the outer loop goes n times and the inner loop goes n times for every out loop iteration

### _Complexity Classes_

- `O(1)` denotes constant running time
- `O(log n)` denotes logarithmic running time
- `O(n)` denotes linear running time
- `O(n log n)` denotes log-linear running time
- `O(n**c)` denotes polynomial running time (c is a constant)
- `O(c**n)` denotes exponential running time (c is a constant being raised to a power based on size of input)

- be as high up the list as possible!

### _Linear Search on Unsorted List_

In [0]:
def linear_search(L, e):
    found = False
    for i in range(len(L)):
        if e == L[i]:
            found = True # speed up a little with regards to average time but speed up doesn't impact worst case
    return found

# must look through all elements to decide it's not there

- `O(len(L)) for the loop * O(1) to test if e == L[i]`
- `O(1 + 4n + 1) = O(4n + 2) = O(n)`

### _Linear Search on Sorted List_

In [0]:
def search(L, e):
    for i in range(len(L)):
        if L[i] == e:
            return True
        if L[i] > e:
            return False
    return False

- must only look until reach a number greater than e
- `O(len(L)) for the loop * O(1) to test if e == L[i]`
- overall complexity is **O(n) - where n is len(L)**
- Note: order of growth is same, though run time may differ for two search methods

### _Linear Complexity_

- searching a list in sequence to see if an element is present
- add characters of a string, assumed to be composed of decimal digits

In [0]:
def add_digits(s):
    val = 0
    for c in s:
        val += int(c)
    return val

- complexity often depends on number of iterations

In [0]:
def fact_iter(n):
    prod = 1
    for i in range(1, n+1):
        prod *= i
    return prod

- number of times around loop is n
- number of operations inside loop is constant (in this case, 3 - set i, multiply, set prod)
    - `O(1 + 3n + 1) = O(3n + 2) = O(n)`

### _Quadratic Complexity_

- determine if one list is a subset of second, i.e., every element of first, appears in second (assumes no duplicates)

In [0]:
def is_subset(L1, L2):
    for e1 in L1:
        matched = False
        for e2 in L2:
            if e1 == e2:
                matched = True
                break
        if not matched:
            return False
    return True

- outer loop executed `len(L1)` times
- each iteration will execute inner loop up to `len(L2)` times, with constant number of operations

In [0]:
# find intersection of two lists, return a list with each element appearing only once
def intersect(L1, L2):
    tmp = []
    for e1 in L1:
        for e2 in L2:
            if e1 == e2:
                tmp.append(e1)
    res = []
    for e in tmp:
        if not(e in res):
            res.append(e)
    return res

- first nested loop takes len(L1) * len(L2)
- second loop takes at most len(L1)
    - determining if element in list might take len(L1) steps
- nested loop, usually going to have quadratic behavior