## Agile Hardware Design
***
# Power + Design Space Exploration

<img src="../resource/logo.svg" alt="agile hardware design logo" style="float:right"/>

## Prof. Scott Beamer
### sbeamer@ucsc.edu

## [CSE 228A](https://classes.soe.ucsc.edu/cse228a/Spring25/)

## Plan for Today

* Power & DVFS
* Design Space Exploration

## Why Power Matters for Chip Design

* Is often the _biggest_ constraint for many applications
* _Power_ over time requires _energy_
  * Energy consumption affects battery life or device size for mobile
  * Energy also costs money
* Power consumed produces _heat_ which must be cooled
  * Needs enough thermal capacity to handle peak
  * Might need to _throttle_ device is peak is too long
* Peak power draw determines power supply capacity needed (cost & size)
* Average power draw sets energy consumption


## Designers' Ability to Impact Power

### Dynamic Power $P \approx \alpha C V^2 f$
* $\alpha$ = _activity factor_
* $C$ = capacitance
* $V$ = voltage (note squared above)
* $f$ = frequency

### Ways designers can reduce power contributors from above
* _activity_ - put idle things to sleep to reduce activity factor $\alpha$
* _area_ - smaller design will have less capacitance $C$ to charge/discharge
* _frequency (& voltage)_ - reduce critical path and run _slower_

## Turning Things Off To Save Power

### Power Gating
  * Power off entire portions of the design until they are needed again
  * Can incur latency penalty to turn back on, but saves most power

### Clock Gating
  * Turn off clock to registers when their contents don't matter
  * Clock toggles continuously otherwise, so can yield significant power saving
  * CAD tools can often do this automatically _if_ register uses a _write enable_
    * Add write enable to registers for when their value is a "Don't Care"
      * e.g. use `RegEnable` in Chisel
    * Most beneficial when register is bigger (more bits) to amortize overhead

## Power - Time Tradeoffs

<img src="images/power-tradeoffs.svg" alt="power tradeoffs" style="width:80%;margin-left:auto;margin-right:auto"/>

## Going Slow to Save Energy

* Start with a correct circuit and a performance goal
  * Set voltage & frequency to most efficiently meet target
  * Sometimes called "Crawl to deadline"

* Reducing voltage slows a circuit, but also saves power
  * Energy savings can outweigh performance loss (remember $V^2$)

* _**Dynamic Voltage & Frequency Scaling**_ (DVFS) - change frequency & voltage at runtime to meet changing performance needs

## Going Fast to Save Energy

* Reducing execution time (without increasing power too much) will save energy

* Complete task as fast as possible, and then go to sleep
  * Often called "Race to halt"

* In practice, designs often do both "Race to halt" and "Crawl to deadline"

## Design Space Exploration Motivation

### How do you pick the right design?

### What metrics should you consider?

### How do you optimize for multiple metrics?

## Common Design Space Metrics

* Power - average (energy efficiency) & peak (thermals & delivery)
* Performance - latency vs throughput
* Area - die area, IO pins, other components (cost)
* Usability
* Security
* Manufacturability
* Testability
* Fault tolerance
* Reusability
* Sustainability

## Design Space Parameters

* Generators make it easy to consider even more

* External parameters (how component behaves & what it does)

* Internal parameters (generator microarchitectures)
  * Parallelism
  * Buffer/queue/field sizes
  * Approach (different algorithms)
  * Topologies

## Taming a Design Space

#### Evaluate Metrics
* Is a metric necessary to this application, or can it be removed?
* Even if necessary, can it be turned into a constraint (e.g. power < 1W)?
* Of remaining metrics, clear precedence order or need _tradeoffs_?

#### Identify parameters and prune early
* Is every parameter and all of its settings independent?
* Are there some that should be matched?
  * e.g. producer throughput = consumer throughput?
* Are some parameters more impactful than others?

## Exploring a Design Space

* Solution & approach are very application depedent
* Can be formalized as a non-convex optimization problem
* Exhuastive search (brute force) is typically intractable, at least for full design space

#### Typical methods
* Often involve some amount of human guidance
* Classic algorithms - branch and bound, dynamic programming, randomized search
* More sophisticated algorithms - genetic evolution, machine learning, ...

#### What about using models to more quickly evaluate points (and design space)?
* Can be helpful, but also at the mercy of model accuracy
* Generators make trying out options cheaper so model not as necessary
* Hard to perfectly model/predict/forecast exact right sizing/design in advance, so _agile approach_ of trying out options will give better result

## Example - Matrix Multiplication Design Space Exploration (1/2)

#### External Parameters
* Problem size (i.e. matrix dimensions, size flexibility)
* Performance knobs
* Interface details - pipelined?

#### Internal Parameters
* Architecture - 1D, 2D, systolic, other?
* Buffers/scratchpads/caches/off-chip memory sizes & organizations
* Parallelism - number of ALUs

#### Metrics
* Power & Area
* Performance - throughput or latency

## Example - Matrix Multiplication Design Space Exploration (2/2)

#### Hypothetical Process
1. Specify problem details (external parameters).
2. Identify resources available (ASIC/FPGA, off-chip memory, etc.)
3. Use analytic models to define architectural space at a course granularity. For example, can everything fit on chip, or is off-chip memory needed?
4. Implement most promising architecture. Get it working. Set up evaluation setup.
5. Sweep parameters to see tradeoffs for that architecture.
6. With insights learned, consider minor changes to that architecture or even new architectures.
7. Repeat #4-7 until done.

## Exploring a Tradeoff with a Pareto Frontier

* _**Pareto Optimal**_ - can't improve a metric without worsening another metric
* _**Pareto Frontier**_ - set points that are pareto optimal

<img src="images/pareto.svg" alt="pareto tradeoffs" style="width:55%;margin-left:auto;margin-right:auto"/>

## Example Pareto Tradeoffs for Matrix Multiply

<img src="images/aladdin.pdf" alt="GEMM DSE" style="width:70%;margin-left:auto;margin-right:auto"/>

["Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures," Sophia Shao et al., ISCA 2014](https://ieeexplore.ieee.org/abstract/document/6853196)

## Example Pareto Tradeoffs for Core Design (w/ DVFS)

<img src="images/core-tradeoffs.pdf" alt="core tradeoffs" style="width:65%;margin-left:auto;margin-right:auto"/>

["Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis," Omid Azizi et al., ISCA 2010](https://dl.acm.org/doi/abs/10.1145/1816038.1815967)