### **Multiplication in binary**

Multiplying in binary follows the same form as in decimal:



- Note that the product P is composed purely of selecting, shifting and adding A. The ith column of B indicates whether or not a shifted version of A is to be selected or not in the ith row of the sum.
- So we can perform multiplication using just full adders and a little logic for selection, in a layout which performs the shifting.



### Тор

#### **Multiplication in decimal**

Starting with an example in decimal:

| 214          |
|--------------|
| x45          |
| 1070         |
| <b>+8560</b> |
| 9630         |

Note that we do  $214 \times 5 = 1070$  and then add to it the result of  $214 \times 4 = 856$  right-shifted by one column.

For each additional column in the second operand, we shift the multiplication of that column with the first operand by another place.

xaaaa bbbb +cccc0 +dddd00 +eeee000 etc...

Developed by: www. steepestascen .com

### Structure for multiplication

This figure shows a four-bit multiplication:



- The AND gate connected to a and b performs the selection for each bit. The diagonal structure of the multiplier implicitly inserts zeros in the appropriate columns and shifts the a operands to the right.
- Note that this structure does not work for signed two's complement!



Note the function of the simple AND gate.

The operation of multiplying 1's and 0's is the same AND 1's and 0's

| A | В | Z |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

 $Z = A \times B$  (where X = M and B = AB) or in Boolean algebra Z = A and B = AB

Hence the AND gate is the bit multiplier. The function of one partial product stage of the multiplier is as shown below.





.com

## Xilinx Virtex-II Pro multiplication

3.31



Picture of Xilinx-II Pro slice (upper half) taken from "Virtex-II Pro Platform FPGAs: Introduction and Overview", DS083-1 (v2.4) January 20, 2003. http://www.xilinx.com

LUT implements the XOR of two ANDs:



The dedicated MULTAND unit is required as the intermediate product **G1G2** cannot be obtained from within the LUT, but is required as an input to MUXCY. The two AND gates perform a one-bit multiply each, and the result is added by the XOR plus the external logic (MUXCY, XORG):

$$S_{out} = C_{IN} \times OD$$
,  $C_{OUT} = DAB + C_{IN} \overline{D}$ 

This structure will perform one cell (see below) of the multiplier.

## Xilinx Virtex-II Pro multiplication (II)

Multiplier cells as above can be chained to do bigger multiplies:



Тор

The first half of the truth table for  $\mathbf{Y}$  and  $\mathbf{C}_{\mathbf{OUT}}$  (from Slide 3.31):

| G1 (B <sub>0</sub> ) | G2 (A <sub>1</sub> ) | G3 (B <sub>1</sub> ) | <b>G4</b> (A <sub>0</sub> ) | D | C <sub>IN</sub> | Y | C <sub>OUT</sub> |
|----------------------|----------------------|----------------------|-----------------------------|---|-----------------|---|------------------|
| 0                    | 0                    | 0                    | 0                           | 0 | 0               | 0 | 0                |
| 0                    | 0                    | 0                    | 1                           | 0 | 0               | 0 | 0                |
| 0                    | 0                    | 1                    | 0                           | 0 | 0               | 0 | 0                |
| 0                    | 0                    | 1                    | 1                           | 1 | 0               | 1 | 0                |
| 0                    | 1                    | 0                    | 0                           | 0 | 0               | 0 | 0                |
| 0                    | 1                    | 0                    | 1                           | 0 | 0               | 0 | 0                |
| 0                    | 1                    | 1                    | 0                           | 0 | 0               | 0 | 0                |
| 0                    | 1                    | 1                    | 1                           | 1 | 0               | 1 | 0                |
| 1                    | 0                    | 0                    | 0                           | 0 | 0               | 0 | 0                |
| 1                    | 0                    | 0                    | 1                           | 0 | 0               | 0 | 0                |
| 1                    | 0                    | 1                    | 0                           | 0 | 0               | 0 | 0                |
| 1                    | 0                    | 1                    | 1                           | 1 | 0               | 1 | 0                |
| 1                    | 1                    | 0                    | 0                           | 1 | 0               | 1 | 0                |
| 1                    | 1                    | 0                    | 1                           | 1 | 0               | 1 | 0                |
| 1                    | 1                    | 1                    | 0                           | 1 | 0               | 1 | 0                |
| 1                    | 1                    | 1                    | 1                           | 0 | 0               | 0 | 1                |

.com

# Xilinx Virtex-II Pro multiplication (VI)

- As we can do one bit of a multiply in a slice, we can do an N-bit by 2-bit multiply in N/2 slices. In the example above, we have 4-bit by 2-bit in 2 slices.
- Perhaps the most important thing to note is that this is very complicated!
- Tools are designed to automate the process of connecting the components within a slice in order to perform efficient operations.
- But it is important to note that the tools aren't infinitely clever, and sometimes we need to bear in mind the structure of the FPGA in order to generate an efficient design.



Тор

The second half of the truth table for  $\mathbf{Y}$  and  $\mathbf{C}_{\mathbf{OUT}}$  (from Slide 3.31):

| G1 (B <sub>0</sub> ) | G2 (A <sub>1</sub> ) | G3 (B <sub>1</sub> ) | G4 (A <sub>0</sub> ) | D | C <sub>IN</sub> | Y | C <sub>OUT</sub> |
|----------------------|----------------------|----------------------|----------------------|---|-----------------|---|------------------|
| 0                    | 0                    | 0                    | 0                    | 0 | 1               | 1 | 0                |
| 0                    | 0                    | 0                    | 1                    | 0 | 1               | 1 | 0                |
| 0                    | 0                    | 1                    | 0                    | 0 | 1               | 1 | 0                |
| 0                    | 0                    | 1                    | 1                    | 1 | 1               | 0 | 1                |
| 0                    | 1                    | 0                    | 0                    | 0 | 1               | 1 | 0                |
| 0                    | 1                    | 0                    | 1                    | 0 | 1               | 1 | 0                |
| 0                    | 1                    | 1                    | 0                    | 0 | 1               | 1 | 0                |
| 0                    | 1                    | 1                    | 1                    | 1 | 1               | 0 | 1                |
| 1                    | 0                    | 0                    | 0                    | 0 | 1               | 1 | 0                |
| 1                    | 0                    | 0                    | 1                    | 0 | 1               | 1 | 0                |
| 1                    | 0                    | 1                    | 0                    | 0 | 1               | 1 | 0                |
| 1                    | 0                    | 1                    | 1                    | 1 | 1               | 0 | 1                |
| 1                    | 1                    | 0                    | 0                    | 1 | 1               | 0 | 1                |
| 1                    | 1                    | 0                    | 1                    | 1 | 1               | 0 | 1                |
| 1                    | 1                    | 1                    | 0                    | 1 | 1               | 0 | 1                |
| 1                    | 1                    | 1                    | 1                    | 0 | 1               | 1 | 1                |

### **ROM-based multipliers**

- Just as logical functions such as XOR can be stored in a LUT as shown for addition, we can use storage-based methods to do other operations.
- By using a ROM, we can store the result of every possible multiplication of two operands.
- The two operands are concatenated to be used as the address by which to access the ROM.
- The value stored at that address is the multiplication result:







There is one serious problem with this technique: as the operand size grows, the ROM size grows exponentially. For two N bit input operands (therefore an 2N bit output operand)  $2N2^{2N}$  bits of storage are required. For example, with 8 bits operands (a fairly reasonable) size, 1Mbit of storage is required - a large quantity. For bigger operands e.g. 16 bits, a huge quantity of storage is required. 16 bit operands require 128Gbits of storage!

Developed by: www. steepestasc

### **Constant ROM-based multipliers**

 Consider a ROM multiplier with 8 bit inputs: 65,536 8-bit locations are required



• If input **B** is constant and B = k only 256 locations are accessed



This constitutes a Constant Coefficient Multiplier (KCM)



In the above example, 8-bit input B is fixed to one value. Which means that in total only 256 out of a total of 65,536 locations are accessed. Therefore, when one of the inputs of the ROM-based multiplier is fixed the size of the required ROM can be reduced.

It is also possible to reduce the memory requirements of this structure if additional knowledge of the constant value is available. For example, if the value of B is 10, the maximum output required for any 8-bit input A will be  $-128 \times 10 = -1280$ , which can be represented with 12 bits.

# **Constant Coefficient Multiplier (KCM)**

- ROM-based multipliers with a constant input
- This reduces the size of the required ROM
- Further reductions in size requirement can be made if there is knowledge of the constant value



### 2's complement Multiplication

 For one negative and one positive operand just remember to sign extend the negative operand.





### Тор

#### 2's complement multiplication (II)

For both operands negative, subtract the last partial product.

We use the trick of inverting (negating and adding 1) the last partial product and adding it rather than subtracting.



Of course, if both operands are positive, just use the unsigned technique!

The difference between signed and unsigned multiplies results in different hardware being necessary. DSP processors typically have separate unsigned and signed multiply instructions.

Developed by: www. steepestascent .com

### 3.38

### **Fixed Point multiplication**

 Fixed point multiplication is no more awkward than integer multiplication:

| 11010.110                |            |
|--------------------------|------------|
| <u>x00101.101</u>        | 26.750     |
| 11.010110                | x5.625     |
| 000.00000<br>1101.011000 | 0.133750   |
| 11010.11000              | 0.535000   |
| 000000.00000             | 16.050000  |
| 1101011.000000           | 133.750000 |
| 00000000.00000           | 150.468750 |
| 0010010110.011110        |            |

 Again we just need to remember to interpret the position of the binary point correctly.



## **On-chip multipliers**

- The Xilinx Virtex-II Pro FPGA has a set of "on-chip" multipliers.
- These are in hardware on the ASIC, not actually in the user FPGA area, and therefore are permanently available, and they use no slices. They also consume less power than a slice-based equivalent.



- A and B are 18-bit input operands, and P is the 36-bit product  $P = A \times B$ .
- Depending upon the particular device, between 12 and 512 of these dedicated multipliers are available.

