# Adder Tree / MAC

B11901027 王仁軒

## Energy Efficiency

- ■TOPS/W drops from 0.65 to 0.33 with the increase of input toggle rate
  - Tcycle = 2.6ns, 2.30 TOPS/mm<sup>2</sup>



### Power of All Adder Tree Stages

□ Power of first few stages greatly depend on input toggle rate, while other stages do not have obvious dependencies



#### Input Initial condition

■The power distribution in all stages are same as 50% 0's input pattern's result



#### Input Initial condition

■The proportion of 0's in first input pattern does not affect the power of MAC



## Timing diagram?



#### Interleaved FA

- □ Interleave 28T FAs along with 14T Fas
  - 30% smaller than the original design
  - But require longer cycle time (2.9ns)
  - Consumes 18% more power in average



#### Interleaved FA

- Energy Efficiency and Area Efficiency
  - 0.26 TOPS/W, 2.97 TOPS/mm<sup>2</sup>

Interleaved FA structure consumes more power in most stages



# Interleaved FA Ipeak & but Iay 1

- AND
- UVDD1
- □VDD2
- UVDD3
- □VDD4
- □VDD5
- □VDD6



- ☐ High Vth for first few stages
  - But I don't know how to modify Vth
- Design first few stages smaller and last 2 stages larger faster
  - Alternative approach
  - May require more area



- Applied on the interleaved design
  - Energy efficiency: 0.26 -> 0.29 TOPS/W
  - Area efficiency: 2.97 -> 2.81 TOPS/mm<sup>2</sup>
- □ Slight improvement on energy efficiency
  - Area efficiency dropped, but not much



- Apply on the original design
- Energy efficiency
  - 0.33 -> 0.36 TOPS/W
- □ Area efficiency
  - 2.30 -> 1.27 TOPS/mm2 cannot furthur lower width
  - Since minimum width is 250nm, increase L to lower drain current -> more area -> bad result

#### Adder Tree

- □ Calculate 64 1-bit input with groups of 2 input adders
  - FA has 3 degree of freedom at input but we only utilize 2 of them
  - We should use the FA fully
- □ Carry save adder design
  - Multiple input

Inputs to the second stage

Result of the second stage



#### Adder Tree



#### Carry Save Adder Tree

- □ For regular 64b to 6b adder tree
  - Require 120 Full adders
  - Ripple carry adders propagate carries in every stage
  - Long critical path
- Carry save adder design
  - Only 67 full adders is used → 64 as CSA
  - Only need to propagate carry in the last stage
  - Relatively short critical path

#### CSA Tree Performance

□ 0.45 TOPS/W

0.35

- 36% improvement w.r.t. the original design
- □4.33 TOPS/mm<sup>2</sup>
  - 88% improvement w.r.t. the original design
- □ 2.3 ns cycle time
  - 15% less than the original design



## CSA Tree with interleaving

- □ 1<sup>st</sup> stage: 14T, 2<sup>nd</sup> stage: 28T ......
  - But ensure output are driven by 28T FA
  - 37/67 FAs are replaced by 14T FA



0.35

■ 0.43 TOPS/W, 4.47 TOPS/mm<sup>2</sup>





## CSA Tree with inverting FA

- More than better
- □ Use inverting full adders to remove the redundant inverters in FA's output stage
  - 28T -> 24T but 14T remains 14 T
  - May require extra inverters in some stages and at output
  - Driving capability of FA may drop due to no buffer at output
  - Saves some time consumed by INVs but smaller drain currents require more time to charge nodes



#### Performance of CSA Tree w/ invFA

- □ 0.55 TOPS/W, 5.14 TOPS/mm<sup>2</sup>
  - 22% improvement in energy efficiency
  - 15% improvement in area efficiency
  - $\blacksquare$  Cycle time = 2.3 ns
- What about the design with interleaving
  - 0.51 TOPS/W, 5.35 TOPS/mm2
  - 19% improvement in energy efficiency
  - 19% improvement in area efficiency
  - Cycle time = 2.9ns

# Summary of All Designs

| Adder<br>Type | Additional<br>Feature     | Energy<br>Efficiency | Area<br>Efficiency | Cycle<br>Time        |
|---------------|---------------------------|----------------------|--------------------|----------------------|
| Ripple Carry  | No                        | 0.33                 | 2.30               | 2.6                  |
| Ripple Carry  | Interleaving              | 0.26                 | 2.97               | 2.9                  |
| Ripple Carry  | Sizing                    | 0.36                 | 1.27               | 2.7                  |
| Ripple Carry  | Interleaving<br>Sizing    | 0.29                 | 2.81               | 3.0                  |
| Carry Save    | No                        | 0.45                 | 4.33               | 2.3                  |
| Carry Save    | Interleaving              | 0.43                 | 4.47               | 2.9                  |
| Carry Save    | Inverting                 | 0.55 +66             | <b>%</b> 5.14 +12  | 2.3                  |
| Carry Save    | Interleaving<br>Inverting | 0.51                 | 5.35 +13           | 3% 2.9 <sub>21</sub> |

#### Summary

- □ Ripple carry adder tree
  - Longer delay, large area, and higher power
  - Easy to implement and debug
  - Compatible with input size scaling
- □ Carry save adder tree
  - Shorter delay, small area, and lower power
  - Hard to implement and debug
  - Cannot be made to be compatible with input size scaling easily (I don't know how to)
- Maybe even better? > I tried but performance are poor
  - Use abc+yosys to further optimize adder tree

#### Summary

- Standard (do nothing)
  - Old friends are always reliable
- Interleaving
  - Energy efficiency + Speed <-> Area efficiency
- Vth scaling Transistor Sizing
  - As pathetic as my electric circuit design grade
- Inverting Full Adders
  - Increasing both energy and area efficiency
  - Very very good idea