# Why is my code slow?

## Outline

-   Caching, Memoization, and Vectorization

-   Parallel Computing

-   Faster Implementations versus Faster Algorithms

# Caching, Memoization, and Vectorization

## Caching

-   *Caching* refers to storing things for later use

    -   Your browser probably does by temporarily downloading page
        details on your local disk

    -   Faster, reduces server load

    -   Other examples include 3D rendering and saving common database
        queries

-   However, caching usually takes space in exchange for faster run
    times

-   The *space-time* trade off is a case where an algorithm trades
    increased space usage for faster runtimes

## Memoization

-   *Memoization* refers to storing results of function calls to use for
    later

    -   Specific method of caching

-   This is useful for methods with a lot of repeated computations

-   For instance, in our recursive Fibonacci number function.

-   `fib(12)` is called by `fib(13)`, `fib(14)` etc.

    -   And `fib(3)` is called many many times

-   $F(5) = F(4) + F(3) = F(3) + F(2) + F(2) + F(1)$ Which calculates
    repeated subproblems

## How Memoization Works

-   Since we store the results, each function call is only made once,
    making the time complexity $O(n)$, much better than $O(2^n)$ [1]

-   Memoization can also avoid the maximum recursion depth error because
    the call stack is smaller

## Memoization Python

[1] From Bhargava chapter 8

In [1]:
def fib_original(n):
    if n <= 1:
        return n
    else:
        return fib_original(n-1) + fib_original(n-2)

In [5]:
fib_original(40) # 30 seconds calculate O(2^n)

102334155

102334155

In [6]:
cache = {0: 0, 1: 1}

def fib(n):
  if n in cache:
    return cache[n]
  else:
    cache[n] = fib(n - 1) + fib(n - 2)
    return cache[n]

In [8]:
fib(40) # Does not take 30 seconds

102334155

In [12]:
fib(3000) # Does not take 30 seconds O(n)

410615886307971260333568378719267105220125108637369252408885430926905584274113403731330491660850044560830036835706942274588569362145476502674373045446852160486606292497360503469773453733196887405847255290082049086907512622059054542195889758031109222670849274793859539133318371244795543147611073276240066737934085191731810993201706776838934766764778739502174470268627820918553842225858306408301661862900358266857238210235802504351951472997919676524004784236376453347268364152648346245840573214241419937917242918602639810097866942392015404620153818671425739835074851396421139982713640679581178458198658692285968043243656709796000

For the base cases, we replace calling `fib(0)` and `fib(1)` by getting
the values from the dictionary

## Memoization Python

-   We can use the `functools` library, which is included in the
    standard library (no pip install needed!)

    -   `functools` does memoization for you!

-   We can use the `@cache` decorator, but the cached dictionary can
    grow to massive sizes

-   Instead, `@lru_cache(maxsize = n)` uses the LRU (least recently
    used) `n` computations

-   Alternatively, we can use `joblib` to store the memoized results in
    a file

## Memoization Python

In [3]:
from functools import lru_cache

@lru_cache(maxsize=3)
def fib_rec(n):
  if n == 0 or n == 1:
    return n
  else:
    return fib_rec(n-1) + fib_rec(n-2)

In [21]:
fib_rec(300)

222232244629420445529739893461909967206666939096499764990979600

In [33]:
@lru_cache(maxsize=5)
def x(n):
    return n

In [34]:
x(40)

40

In [38]:
from cache_em_all import Cachable

In [None]:
np.array([1, 2, 3, 4, 5]).max()

## Vectorized Operations

-   *Vectorization* is a technique of implementing array operations
    without for loops

-   We use functions defined by various modules that are highly
    optimized for the specific problem

-   NumPy provides a lot of functions that vectorized and are faster
    than for loops

    -   Array add/subtract/multiply/divide by scalar

    -   Sum of array

    -   Max/min of array

-   Keep this in mind for some ML processes that are iterative, such as
    gradient descent

## Why Vectorized Operations Work

-   Python is an interpreted language. There is no compiler and the
    languages are dynamic

-   C language, for instance, makes optimization at the compiler level
    (before execution) to speed up your code

-   Thus, NumPy implements arrays in C, which speeds things up

-   The other reason vectorization works in because of parallelization

# Parallel Computing

## Parallelization

Compare the following codes. What are their run times?

In [4]:
@lru_cache(maxsize=10) # O(n)
def fib(n):
  if n <= 1:
    return n
  else:
    return fib(n - 1) + fib(n - 2)

## Parallelization

In [5]:
import numpy

def add_one(n, x): #O(n)
  y = np.zeros(n)
  for i in range(n):
    y[i] = x[i] + 1
        
  return y

In [None]:
5, [1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]


In [None]:
# joblib

## Parallelization

-   Both are $O(n)$, but the second code chunk can be done in *parallel*
    because the $n$ computations are independent.

-   Fibonacci depends on the previous two values

-   The requirements for code to the parallelized and vectorized are
    similar, but not the same

-   The Numba library can help will parallelizing your code

-   Note parallel means the process takes place on one machine, but
    *distributed* means the computation is shared across many machines


From
[leetcode](https://leetcode.com/problems/best-time-to-buy-and-sell-stock-ii/description/)

# Faster Implementations versus Faster Algorithms

## Faster Implementations versus Faster Algorithms

-   There are two ways we speed up our code

    -   Use a faster algorithm, such as dynamic programming instead of
        brute force. Algorithms are concerned with the approach to the
        problem

    -   Use a faster implementation, such as vectorization instead of
        loops

-   It is useful to think about these separately when developing a
    programming, then combining them to create a super-fast approach!

# Recommended Problems and References

## Recommended Problems and Readings

-   Cormen: Chapter 34 on NP-Completeness (highly optional)

-   Bhargava: Chapter 8 exercises

    -   8.1 - 8.8

-   Vectorize the second code chunk in the Parallelization section

-   [Find the longest palindrome from a
    string](https://leetcode.com/problems/longest-palindrome/) Hint: use
    a greedy alogrithm

-   [Computing Pascal’s
    triangle](https://leetcode.com/problems/pascals-triangle/) Hint: use
    dynamic programming

## References

-   Bhargava, A. Y. (2016). *Grokking algorithms: An illustrated guide
    for programmers and other curious people.* Manning. Chapter 1.

-   Cormen, T. H. (Ed.). (2009). *Introduction to algorithms* (3rd ed).
    MIT Press. Chapter 1 and 3.