# EECS 151/251A Discussion 9

Zhaokai Liu

## Agenda

#### Adders

- Single-bit Full Adder
- Ripple-carry Adder
- Carry-bypass Adder
- Carry-lookahead Adder
- CLA Trees

#### Multipliers

- Array Multiplier w/o and w/ CSA
- Wallace Tree
- Booth Recording
- o Baugh-Wooley Multiplication

## Adders

## Single-bit Full Adder



$$S = A \oplus B \oplus C_i$$

$$S = A \overline{B} \overline{C_i} + \overline{A} B \overline{C_i} + \overline{A} \overline{B} C_i + A B C_i$$

$$C_o = AB + BC_i + AC_i$$

| a b c <sub>i  </sub> | C <sub>i+1</sub> | s  |
|----------------------|------------------|----|
| 000                  | 0                | 0  |
| 001                  | 0                | 1_ |
| 010                  | 0                | 1  |
| 011                  | 1                | 0  |
| 100                  | 0                | 1  |
| 101                  | 1                | 0  |
| 110                  | 1                | 0  |
| 111                  | 1                | 1  |

- A full adder implements a single-bit adder with carry in
- A half adder doesn't have a carry in, but still has a carry out
- The full adder is the primitive used in many adder topologies

### Static CMOS Full Adder

Direct mapping of logic function

→ A better structure: The mirror adder





## Ripple-carry Adder

For a 1-bit added, assume: t\_sum > t\_carrier



What's the critical path here?

## Carry-bypass Adder

N inputs/M bits in each block







What's the critical path here?

M bits

## Carry-select Adder (1/2)

N inputs/M bits in each block



What's the critical path here?

## Carry-select Adder (1/2)

N inputs/M bits in the first stage, P stages



What's the critical path here?

## **Quick Aside: Associativity**

- An operator, #, is associative iff: (a # b) # c = a # (b # c)
- Addition\*, multiplication, AND, OR, XOR, are associative
- Allows for tree computation
- Ex. a + b + c + d + e + f + g + h



- Problem: carry logic not associative -> linear FA chain
- Solution: re-define FAs to generate 2 new signals
  - o **g** (Generate): True if adder is guaranteed to **generate a carry**

$$g_i = a_i \cdot b_i$$

o **p** (Propagate): True if carry-out equals carry-in (**propagate carry-in**)

Both g & p have no dependence on carry-in (c\_i)

- Sum & carry-out of FA defined in terms of these new signals
  - Sum is true if:
    - A single input is true, carry-in is false
    - o Inputs are both 0 or 1, carry-in is true

- Carry-out is true if:
- Carry generate is true
- Propagate is true and carry-in is true

$$c_{i+1} = g_i + p_i \cdot c_i$$

However, sum and carry-out depend on carry-in

- Problem: carry logic not associative -> linear FA chain
- Solution: re-define FAs to generate 2 new signals
  - o **g** (Generate): True if adder is guaranteed to **generate a carry**

$$g_i = a_i \cdot b_i$$

o **p** (Propagate): True if carry-out equals carry-in (**propagate carry-in**)

Both g & p have no dependence on carry-in (c\_i)

- Sum & carry-out of FA defined in terms of these new signals
  - Sum is true if:
    - A single input is true, carry-in is false
    - o Inputs are both 0 or 1, carry-in is true

- Carry-out is true if:
- Carry generate is true
- Propagate is true and carry-in is true

$$c_{i+1} = g_i + p_i \cdot c_i$$

However, sum and carry-out depend on carry-in

#### **CLA: Tree Structure**

- Smallest blocks are modified full adders
- Calculate g and p immediately
- Wait for carry-in to compute sum bit
- Some FAs are required to create carry-out





## **CLA:** Grouping

- Group together adders and create P & G for higher levels of the hierarchy.
  - P = entire group propagates a carry
  - G = entire group generates a carry
  - o P & G can be computed without carry-in
- Carry-in required to generate carry-out
  - $\circ$  P = P\_A•P\_B
  - $\circ$  G = G\_B + G\_A•P\_B
  - $\circ$  C\_{out} = G + C\_{in}-P







#### Parallel Prefix Adder

- Remaining problem: CLA as described still ripples carry through groups in first layer of tree
- Solution: unroll the expression for the carry bit

```
o c_0 = 0 (unsigned)

o c_1 = g_0 + p_0 \cdot c_0 = g_0

o c_2 = g_1 + p_1 \cdot c_1 = g_1 + p_1 g_0

o c_3 = g_2 + p_2 \cdot c_2 = g_2 + p_2 g_1 + p_2 p_1 g_0

o c_4 = g_3 + p_3 \cdot c_3 = g_3 + p_3 g_2 + p_3 p_2 g_1 + p_3 p_2 p_1 g_0

o Recall p's and g's can be computed in parallel (not dependent on carry-in)

o These operations are associative -> "prefix tree" (parallel) computation!
```

#### Overall flow:

Break into group of bits -> Precalculate P<sub>i</sub>, G<sub>i</sub> in each group -> combine the groups in a tree structure-> calculate carries in parallel -> simple full adder to generate sum

## Prefix Tree Adder Graphs: 3-bit Koggle-Stone adder



## **Prefix Tree Adder Graphs**



Min. critical path & fanout Cost: more logic resources



Tradeoff logic reuse w/ critical path



Most reuse (min. area) Cost: longest critical path





## "A Taxonomy of Parallel Prefix Networks", Harris [2003]



# Multipliers

## **Unsigned Multiplication Example**

- Partial Products can be generated in parallel
- Challenge: improve the addition of partial products





What's the critical path here?

## **Carry-Save Addition**

- When we generate a carry in a given column, add it to the 2 values in the next column.
  - S\_i = A\_i ^ B\_i ^ C\_i
  - C\_{o, i+1} = A\_i B\_i + A\_i C\_i + B\_i C\_i
  - Delay adding carry bits until the end
- Basis of CSA:
  - Takes in a, b, c<sub>in</sub> (multi-bit)
  - Produces a sum and c<sub>out</sub> (multi-bit)
- Benefits:
  - CSAs have no carry ripple => small & fast!
  - Only 1 standard CLA/PPA at end
  - Addition is associative => trees!







## Array Multiplier w/ CSA



## Wallace Tree Multiplier

#### Method to construct Wallace Tree:

- Draw a dot diagram where each column has as many dots as number of partial products
- 2. Group dots in the same column by 2 (half adder) or 3 (full adder)
- Propagate carries and sum by adding one dot in the grouped column and one dot in the next column









## Radix and Multiplication

- Binary multiplication -> N partial products! Can we reduce this?
  - Yes! Let's use a larger radix (think: base)
- E.g. 2 bits at a time (radix 4) -> halve number of partial products

| B Digit | Partial Product | Partial Product (Rewritten) |
|---------|-----------------|-----------------------------|
| 0       | 0*A             | 0                           |
| 1       | 1*A             | A                           |
| 2       | 2*A             | 4*A - 2*A                   |
| 3       | 3*A             | 4*A - A                     |

- Recall: Multiplications by powers of 2 are left shifts
  - Let's use this property!

## **Booth Recoding**

- 4\*A = A << 2
- 2\*A = A << 1
- Recall: radix 4 multiplication => shift left by 2 positions for next partial product
- Therefore, any 4\*A term can be handled in the next partial product!
  - Multiplier looks a 3 (rather than just 2) bits
  - Extra bit is MSB of the previous

| B Digit |     | Partial Product<br>(Rewritten) |
|---------|-----|--------------------------------|
| 0       | 0*A | 0                              |
| 1       | 1*A | A                              |
| 2       | 2*A | 4*A - 2*A                      |
| 3       | 3*A | 4*A - A                        |

## **Booth Recoding**

| B <sub>i+1</sub> | B <sub>i</sub> | B <sub>i-1</sub> | Action  | Comment                                                                                                                |
|------------------|----------------|------------------|---------|------------------------------------------------------------------------------------------------------------------------|
| 0                | 0              | 0                | Add 0   |                                                                                                                        |
| 0                | 0              | 1                | Add A   | Includes +4*A from previous radix 4 digit = +A in this position due to left shift by 2                                 |
| 0                | 1              | 0                | Add A   |                                                                                                                        |
| 0                | 1              | 1                | Add 2*A | Includes +4*A from previous round (+A in this position). *2 is implemented as a left shift by 1                        |
| 1                | 0              | 0                | Sub 2*A | 4*A will be added in when handling next radix 4 digit. *2 is implemented as a left shift by 1                          |
| 1                | 0              | 1                | Sub A   | 4*A will be added in when handling next radix 4 digit. Includes +4*A from previous radix 4 digit (+A in this position) |
| 1                | 1              | 0                | Sub A   | 4*A will be added in when handling next radix 4 digit.                                                                 |
| 1                | 1              | 1                | Add 0   | 4*A will be added in when handling next radix 4 digit. Includes +4*A from previous radix 4 digit (+A in this position) |

## **Booth Recoding Example (Unsigned)**

```
6 * 7
```

• 
$$B_{-1} = 0$$



## Signed Multiplication: Baugh-Wooley

- Recall: 2's complement MSB has negative weight
- Nominally:
  - 1. Subtract last partial product
  - Sign-extend the rest of the partial products
- Recall 2's complement negation: -A = ~A + 1
  - Add ~A + 1 instead! Result: basically same hardware as unsigned mult.
  - o Implementation: invert some bits, insert a 1 left of the first & last partial products