# Background

## AMD vs Intel
* In 2006, AMD had a larger market share (+5%). Suddenly, in 2007, Intel overtook the lead (+14%)
* Different architectures provide optimal solutions to certain problems
    * AMD for bitcoin mining
    * NVDIA for gaming

## Introduction
Everyone uses MIPS64 instruction set, so it's worth knowing it.

### Task of the Computer Designer
* Coordinate many levels of abstraction
* Under a rapidly changing set of forces
* Design, measure, evaluate

# Quantitative Principles

### *Make the Common Case Fast*
* eg. ketchup bottle is upside down so that ketchup is always ready

* Compilers (eg. GCC) will actually output different instructions depending on their era. The different combinations are a result of the trends of the current hardware

> Two engineers argue. One wants a FP unit that improves division 40x. Another wants a general FP unit that improves everything by 1.5x. Which is better?
* Application dependent

### *Take Advantage of Parallelism*
* **TLP**: Thread level: independent programs run on different processors
* **DLP**: Data level: multiple instances of the same program on different input data
* **ILP**: Instructions level: different instructions may be run simultaneously
* Different parts in a digital circuit can be parallel during a clock cycle (eg. processes in VHDL architecture, pipelining)

### *Principle of Locality*
Real programs are not completely random; programs have structure and purpose. Many are quite predictable.

* eg. programs spend 90% of the time executing 10% of code

### *Bang for Buck (!/\$)*

### *Amdahl's Law*
Gene Amdahl argued that it was very important to focus on a single processor/core since there is unparallelizable code that will always be the bottleneck.

$$ Speedup = \frac{ExTime_{old}}{ExTime_{new}} = \frac{1}{(1 - Frac_{enhanced}) + \frac{Frac_{enh}}{Speedup_{enh}}} $$

---

# Example: Floating Point (FP) Square Root (FPQSR)

* 20% of ExTime due to FPSQR
* 50% of ExTime due to **all** FP operations

Two options:
* Speedup FPSQR by factor of 10
* Speedup all FP by a factor of 1.6

Result:
* FPSQR = 1.22 overall speedup
* FP = 1.23 overall speedup

---

*Tuesday 13 September*

---

# Simulation

> Does simulation accuracy limit innovation?

If simulations were cycle accurate, we would not be able to measure results. It would take too long.


# Metrics of Performance

Stack | Care
---|---
Application | Answers/month, ops/sec
Programming language | programs/time
Compiler |
**ISA** | MIPS/MFLOP/s (millions of instr./s, milions of floating point ops/sec)
Datapath control | gigabytes per second
Function units, Transistors, wires/pins | cycles per second

* Perhaps as an architect, we would make our chip optimized for certain instructions to be faster. Or we prioritize MIPS over MFLOPS.

## Definitions
* **Performance**: units of things per second. Bigger is better
    * Notice there are exceptions. For example, Google might have a high macroscopic request/sec performance. However, each individual request may take too long. In this case, we may want a smaller performance.
    * $\text{Performance}(x) = \frac{1}{executionTime(x)}$
    * $speedup = n = \frac{performance(x)}{performance(y)} = \frac{exectime(x)}{exectime(y)}$
    * Alternative marketing definitions:
        * $perf = \frac{instructions}{second}$
        * $perf = FLOPS$
        * $perf = GHz$
        * In these cases, maybe something could be clocked faster. However, if it had bad architecture, it could be worse performing compared to slower chips
        * Only when comparing two exact architectures may these definitions be interesting
* **Benchmarks**: Programs which evaluate performance
    * Real applications (eg. weather simulations)
        * May take too long to run a full program
    * Kernels
        * Small key pieces from real programs (eg. linpack)
        * Kernel may not be a perfect representative of a program
    * Toy benchmarks
        * Sieve of Eratosthenes, Puzzle, Quicksort
        * Leave these for APSC160/CPSC260, most applications don't spend most of its time doing one specific algorithm
    * Synthemtic Benchmarks
        * Programs that no customer will ever run
        * Poor for sales, since you're not actually building what somebody wants

## Benchmark Suites
A collection of applications used to measure performance of copmuter

Different companies may have different benchmark suites because they're targeting specific goals
* What do you want out of a computer?
* What programs do you run most often?

eg. [SPECint CPU2006](www.spec.org)
* 12 applications (gzip, gcc, perl, + other more exotic ones)
* Benchmarks update as software improves

### Comparing and Summarizing Performance
 | Computer A | Computer B | Computer C
 ---|---|---
 Prog P1 | 1s | 10s | 20s
 Prog P2 | 1000s | 100s | 20s
 
 The fastest computer depends on which program matters for you. **No computer is *fastest overall*.**
 
 Typically, large enough companies will get a bunch of samples and run their own benchmarks (eg. Oracle)
 
 You can summarize performance using average execution time:
 $$Average Exec Time = \frac{1}{n}\sum_{i=1}^{n}Time_i$$
 
 What if you don't run programs 1 and 2 the same number of times?
 
 * Use weighted Execution Time
 
 $$Weighted Exec Time = \sum{i=1}{n}Weight_i \times Time_i$$
 
 * Use Geometric Mean
     * Execution Time Ratio is the *speedup* for benchmark *i*
     * This is the most common one that people use
 
 $$Geometric Mean = \sqrt[n]{\Pi_{i=1}^{n}ExecutionTimeRatio_i}$$
 
 $$\frac{GeometricMean(X_1, X_2, \ldots, X_n)}{GeometricMean(Y_1, Y_2, \ldots, Y_n)} = GeometricMean(\frac{X_1}{Y_1}, \frac{X_2}{Y_2}, \ldots, \frac{X_n}{Y_n})$$
 
### Example: Weighted Execution Time
* The "fastest" computer is determined by the weighting of program mix

### Example: Arithmetric Mean vs Geometric Mean
* Arithmetic mean still allows you to "cook" the numbers to make a certain computer faster. The result depends on whichever "base machine" you use to base all speedups
* Geometric mean will always pick the same machine regardless of the "base machine"
    * Drawback: geo mean doesn't predict execution time
    
> *What are we comparing when we use "Average Performance?"*
> * One computer versus another computer across a set of different programs

### Harmonic Mean

$$Harmonic Mean = \frac{n}{\sum\frac{1}{ExecutionTimeRation_i}}$$

* Gives more weight to smaller differences

Mathematical relationship:
$$harmonic \le geometric \le arithmetic$$

### SPEC Benchmark Evolution
* A suite of benchmarks
* Changes of included programs reflect changes in computer usage at the time
* Uses geometric mean speedup

# Power