# SELF PROJECT Report on

# LUT Optimization for Memory-Based Computation

By: Yashawant Kumar Singh(213070099)

Along with the progressive device scaling, semiconductor memory has become cheaper, faster, and more power-efficient. Embedded memories will have dominating presence in the system on-chips (SoCs), which may exceed 90% of the total SoC content. It has also been found that the transistor packing density of memory components is not only higher but also increasing much faster than those of logic components. Because of these reasons, Memory based computations are rapidly gaining in popularity .

Apart from that, memory-based computing structures are more regular than the multiply- accumulate structures i.e easy to lay out and offer many other advantages, e.g., greater potential for high-throughput and low-latency implementation and less dynamic power consumption. This kind of computation is really helpful for DSP applications which involve multiplication with a fixed set of coefficients.

A conventional lookup-table (LUT)-based multiplier is shown in Fig. 1, where A is a fixed coefficient, and X is an input word to be multiplied with A. Assuming X to be a positive binary number of word length L, there can be  $2^L$  possible values of X, and accordingly, there can be  $2^L$  possible values of product  $C = A \cdot X$ .

$$X \xrightarrow{L} \overbrace{\bigcirc}^{SSL} \underbrace{\bigcirc}_{(2^L \text{ WORDS})} \xrightarrow{AX} \underbrace{\bigcirc}^{(W+L)} AX$$

Thus an LUT of  $2^L$  words, consisting of precomputed product values corresponding to all possible values of X, is conventionally used. The product word  $A \cdot Xi$  is stored at the location Xi for  $0 \le Xi \le 2^L - 1$ , such that if an L-bit binary value of Xi is used as the address for the LUT, then the corresponding product value  $A \cdot Xi$  is available as its output.

In this project, using combined techniques of APC (Antisymmetric Product Coding) and OMS (Odd Multiple Storage) scheme, a reduction in LUT size to one-fourth of the conventional LUT is achieved where APC provides reduction by 2 and OMS provides a further reduction by 2.

## **APC Technique**

 $\label{eq:table_interpolation} \text{TABLE I}$  APC Words for Different Input Values for L=5

| Input, $X$ | product<br>values | Input, X | product<br>values | address $x_3'x_2'x_1'x_0'$ | APC<br>words |
|------------|-------------------|----------|-------------------|----------------------------|--------------|
| 00001      | A                 | 11111    | 31A               | 1 1 1 1                    | 15A          |
| 00010      | 2A                | 11110    | 30A               | 1 1 1 0                    | 14A          |
| 00011      | 3A                | 11101    | 29A               | 1 1 0 1                    | 13A          |
| 00100      | 4A                | 11100    | 28A               | 1 1 0 0                    | 12A          |
| 00101      | 5A                | 11011    | 27A               | 1 0 1 1                    | 11A          |
| 00110      | 6A                | 11010    | 26A               | 1 0 1 0                    | 10A          |
| 00111      | 7A                | 11001    | 25A               | 1 0 0 1                    | 9A           |
| 01000      | 8A                | 11000    | 24A               | 1 0 0 0                    | 8A           |
| 01001      | 9A                | 10111    | 23A               | 0 1 1 1                    | 7A           |
| 01010      | 10A               | 10110    | 22A               | 0 1 1 0                    | 6A           |
| 01011      | 11A               | 10101    | 21A               | 0 1 0 1                    | 5A           |
| 01100      | 12A               | 10100    | 20A               | 0 1 0 0                    | 4A           |
| 01101      | 13A               | 10011    | 19A               | 0 0 1 1                    | 3A           |
| 01110      | 14A               | 10010    | 18 <i>A</i>       | 0 0 1 0                    | 2A           |
| 01111      | 15A               | 10001    | 17A               | 0 0 0 1                    | A            |
| 10000      | 16A               | 10000    | 16A               | 0 0 0 0                    | 0            |

For  $X = (0\ 0\ 0\ 0\ 0)$ , the encoded word to be stored is 16A.

For simplicity of presentation, we assume both X and A to be positive integers. The product words for different values of X for L = 5 are shown in Table I. It may be observed in this table that the input word X on the first column of each row is the two's complement of that on the third column of the same row. In addition, the sum of product values corresponding to these two input values on the same row is 32A.

Let the product values on the second and fourth columns of a row be u and v, respec tively. Since

one can write u = [(u + v)/2 - (v - u)/2] and v = [(u + v)/2 + (v - u)/2], for (u + v) = 32A, we can have

$$u = 16A - \left[\frac{v-u}{2}\right] \quad v = 16A + \left[\frac{v-u}{2}\right].$$

The product values on the second and fourth columns of Table I therefore have a negative mirror symmetry. This behavior of the product words can be used to reduce the LUT size, where, instead of storing u and v, only [(v-u)/2] is stored for a pair of input on a given row. The 4-bit LUT addresses and corresponding coded words are listed on the fifth and sixth columns of the table, respectively. Since the representation of the product is derived from the antisymmetric behavior of the products, we can name it as antisymmetric product code(APC).

4-bit address  $X' = (x_3' x_2' x_1' x_0')$  of the APC word is given by

$$X' = \begin{cases} X_L, & \text{if } x_4 = 1\\ X'_L, & \text{if } x_4 = 0 \end{cases}$$
 (2)

Where  $X_L$  is 2's complement of  $X_L$  and  $X_L$  is the 4 LSBs of X.

Product word =  $16A + (\text{sign value}) \times (\text{APC word})$  (3)

where sign value = 1 for  $x_4 = 1$  and sign value = -1 for  $x_4 = 0$ . The product value for X = (10000) corresponds to APC value "zero," which could be derived by resetting the LUT output, instead of storing that in the LUT.

### **OMS Technique**

 ${\it TABLE~II} \\ {\it OMS-Based Design of the LUT of APC Words for } L=5 \\$ 

| $ \begin{array}{c} \text{input } X' \\ x_3' x_2' x_1' x_0' \end{array} $ | product<br>value | # of<br>shifts | shifted input, $X''$ | stored APC<br>word | address $d_3d_2d_1d_0$ |
|--------------------------------------------------------------------------|------------------|----------------|----------------------|--------------------|------------------------|
| 0 0 0 1                                                                  | A                | 0              |                      |                    |                        |
| 0 0 1 0                                                                  | $2 \times A$     | 1              | 0001                 | P0 = A             | 0 0 0 0                |
| 0 1 0 0                                                                  | $4 \times A$     | 2              | 0001                 |                    |                        |
| 1 0 0 0                                                                  | $8 \times A$     | 3              |                      |                    |                        |
| 0 0 1 1                                                                  | 3A               | 0              |                      |                    |                        |
| 0 1 1 0                                                                  | $2 \times 3A$    | 1              | 0 0 1 1              | P1 = 3A            | 0 0 0 1                |
| 1 1 0 0                                                                  | $4 \times 3A$    | 2              |                      |                    |                        |
| 0 1 0 1                                                                  | 5A               | 0              | 0101                 | P2 = 5A            | 0 0 1 0                |
| 1 0 1 0                                                                  | $2 \times 5A$    | 1              | 0101                 | 1 2 = 011          | 0010                   |
| 0 1 1 1                                                                  | 7A               | 0              | 0 1 1 1              | P3 = 7A            | 0011                   |
| 1 1 1 0                                                                  | $2 \times 7A$    | 1              | 0 1 1 1              | IJ = IA            | 0011                   |
| 1 0 0 1                                                                  | 9A               | 0              | 1001                 | P4 = 9A            | 0 1 0 0                |
| 1 0 1 1                                                                  | 11A              | 0              | 1011                 | P5 = 11A           | 0 1 0 1                |
| 1 1 0 1                                                                  | 13A              | 0              | 1 1 0 1              | P6 = 13A           | 0 1 1 0                |
| 1 1 1 1                                                                  | 15A              | 0              | 1111                 | P7 = 15A           | 0 1 1 1                |

From this table, its clear that it is sufficient to store only the odd multiple APC words as the even multiples can be obtained by left shifting accordingly as given in the table.

Product word = 
$$16A + (\text{sign value}) \times (\text{APC word})$$

We want to have product values as 0 and 32 A as well. So, need to store APC word 16 A. However, if 16A is not derived from A, only a maximum of three left shifts is required to obtain all other even multiples of A.

TABLE III  ${\it Products \ and \ Encoded \ Words \ for \ } X = (00000) \ {\it and \ } (10000)$ 

| $x_4$ |   | $x_2$ |   | $x_0$ | product<br>values | encoded<br>word | stored<br>values | # of<br>shifts | $\begin{array}{c} \text{address} \\ d_3 d_2 d_1 d_0 \end{array}$ |
|-------|---|-------|---|-------|-------------------|-----------------|------------------|----------------|------------------------------------------------------------------|
| 1     | 0 | 0     | 0 | 0     | 16A               | 0               |                  |                |                                                                  |
| 0     | 0 | 0     | 0 | 0     | 0                 | 16A             | 2A               | 3              | 1 0 0 0                                                          |

A maximum of three bit shifts can be implemented by a two-stage logarithmic barrel shifter, but the implementation of four shifts requires a three-stage barrel shifter. Therefore, it would be a more efficient strategy to store 2A for input X = (00000), so that the product 16A can be derived by three arithmetic left shifts. The product values and encoded words for

input words X = (00000) and (10000) are separately shown in Table III. For X = (00000), the desired encoded word 16A is derived by 3-bit left shifts of 2A [stored at address (1000)]. For X = (10000), the APC word "0" is derived by resetting the LUT output, by an active-high RESET signal given by RESET = (x0 + x1 + x2 + x3)" · x4.

So, the possible product values we

can get: 0,A,2A,....,31A.

For the mappings, refer to the Paper.

#### **Results:**



We can observe that the outputs are as expected.