# Analysis of Algorithms

Will focus on **running time** - how many times does a discrete computation happen to get a result?

Reasons to analyze algorithms:

- Predict performance
- Compare algorithms
- Provide guarantees
- Understand theoretical basis

The primary practical reason is to avoid performance bugs.

You want to know if your program will be able to solve a large practical input, you use the **scientific method** to understand its performance:

- **Observe** some feature of the natural world
- **Hypothesize** a model that is consistent with the observations
- **Predict** events using the hypothesis
- **Verify** the predictions by making further observations
- **Validate** by repeating until the hypothesis and observations agree

Experiments must be **reproducible** and hypotheses must be **falsifiable**.

## Observations

The first step is to make some observations about the running time of the programs.

1. Run empirical analysis: time how long a program takes to run
2. Analyze the data: plot the running time $T(N)$ versus input size $N$
    - Usually plot $lg(T(N))$ vs $lgN$, and check the slope
    - Regression analysis fits a straight line through the data, where **power law** states: $aN^b$, where $b$ is the slope. Example:

$$
y = mx + b \\
lg(T(N)) = b \; lg(N) + c \\
T(N) = a \; N^b \text{ where } a=2^c
$$

Most algorithms have some form of the power law involved when you're analyzing them.

The system you run the experiments on will make a difference in some areas. System independent effects are the **algorithm** and the **input data**, which determine the exponent $b$ in teh power law (slope of the line in a log-log scale graph). System dependent effects include **hardware** (CPU, memory, cache), **software** (compiler, interpreter, garbage collector), and the **system** (operating system, network, other applications). These determine the constant $a$ in the power law.

## Mathematical Models

Observing the behavior of an algorithm helps to predict performance, but to understand what the algorithm is doing you need mathematical models.

The **total running time** is the sum of cost X frequency for all operations. You need to analyze the program to determine the set of operations. The cost depends on the machine, the compiler, etc. The frequency depends on the algorithm and input data.

Earlier days, computers would list the time it took to perform certain operations (integer addition, float addition, etc.). With modern machines, you'd run an analysis if you really wanted to know.

One simplification is to look at the costs and use some basic operation as a proxy for running time. For example, array accesses in a nested loop adding all integers in an array to see if any pair sums to zero.

Another simplification is use **tilda notation** which ignores the lower-order terms in the cost formulas you derive. For example, $\frac{1}{6} N^3 + 20N + 16$ would simplify to ~$\frac{1}{6}N^3$.

