# NUMERIC BASICS

3

One of the most common kinds of information processed by digital systems is numeric information. In this chapter, we will examine various binary codes for unsigned integers, signed integers, fixed-point fractions and floating-point real numbers. For each kind of code, we will describe how some arithmetic operations can be performed. We will also look at combinational circuits that implement arithmetic operations, and discuss trade-offs among different circuits that perform the same operation.

### 3.1 UNSIGNED INTEGERS

In many applications of digital electronics, we deal with signals that only take on nonnegative integer values. Some signals may be representations of real-world information, for example, the temperature set on a thermostat. Other signals may arise as a consequence of the way we organize the digital system, for example, as numeric indices for tables of information stored in the system's memory. In this section, we start with the most common representation for nonnegative integers, then describe arithmetic operations using that representation. We will finish the section by looking at an alternative representation that is used in some systems.

#### 3.1.1 CODING UNSIGNED INTEGERS

We are all familiar with decimal positional representation of numbers. A decimal number such as  $124_{10}$  denotes the sum of 1 hundred, 2 tens and 4 units. We use the subscript notation to specify that the number is to be interpreted as decimal, that is, base 10. The position of each digit in the number determines the power of 10 by which the digit is multiplied, starting with  $10^0$  for the right-most digit,  $10^1$  for the next digit to the left, and increasing by successive powers of ten for further digits from right to left. Thus, we write

$$124_{10} = 1 \times 10^2 + 2 \times 10^1 + 4 \times 10^0$$

In most applications that deal with nonnegative integers, the natural way to represent the numeric values is using *unsigned binary* numbers. Unsigned binary representation works in the same way as decimal representation, except that we only use the binary digits 0 and 1 and we multiply digits by powers of 2 instead of powers of 10. We can represent the same numeric value as  $124_{10}$  in binary by determining the powers of two that sum to the number, namely,

$$124_{10} = 1 \times 2^6 + 1 \times 2^5 + 1 \times 2^4 + 1 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 0 \times 2^0$$
$$= 1111100_2$$

So, to represent this number in a digital system, we would need seven single-bit signals, each carrying one bit of the binary number. In general, we represent a number x using n bits  $x_{n-1}, x_{n-2}, \ldots, x_0$ , with

$$x = x_{n-1}2^{n-1} + x_{n-2}2^{n-2} + \dots + x_02^0$$

EXAMPLE 3.1 What number is represented by the unsigned binary number 101101<sub>2</sub>?

SOLUTION Express the number as a sum of powers of two and calculate the result:

$$\begin{aligned} 101101_2 &= 1 \times 2^5 + 0 \times 2^4 + 1 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 \\ &= 1 \times 32 + 0 \times 16 + 1 \times 8 + 1 \times 4 + 0 \times 2 + 1 \times 1 \\ &= 45_{10} \end{aligned}$$

Our discussion of binary codes in Section 2.2 applies equally to unsigned binary representation of numbers, since that is just one particular binary code. Thus, given an n-bit unsigned binary code, we can represent  $2^n$  distinct numbers. The smallest number has all 0 bits, representing the number 0, and the largest number has all 1 bits, representing

$$1 \times 2^{n-1} + 1 \times 2^{n-2} + \dots + 1 \times 2^1 + 1 \times 2^0 = 2^n - 1$$

Conversely, if we need to represent numbers between 0 and N-1, we need at least  $\lceil \log_2 N \rceil$  bits for the unsigned binary representation. In computer systems, unsigned binary numbers are typically 8, 16 or 32 bits long, allowing representation of numbers up to 256, over 65,000, and over 4 billion, respectively. However, when we are designing a digital system with no other constraints applied to the number of bits, we would typically choose the smallest number of bits that can represent the range of numbers we expect to encode. There is no reason why this should not be a number of bits other than 8, 16 or 32, such as 5, 17 or 26.

EXAMPLE 3.2 Suppose we are designing a scientific instrument to measure the time interval between two random events very precisely, with a resolution of nanoseconds ( $1ns = 10^{-9}$  seconds). Events may occur as much as a day apart. How many bits are needed to represent the interval as a number of nanoseconds?

SOLUTION There are  $10^9$  nanoseconds per second, and  $60 \times 60 \times 24 = 86,400$  seconds per day, so the largest number we need to allow for is  $8.64 \times 10^{13}$ . The number of bits needed is

$$\lceil \log_2(8.64 \times 10^{13}) \rceil = \left\lceil \frac{\log(8.64 \times 10^{13})}{\log 2} \right\rceil = \lceil 46.296 \dots \rceil = 47$$

So at least 47 bits are needed.

### Unsigned Integers in Verilog

We saw in Section 2.1.3 that we can use vectors to model binary coded data. Since unsigned binary is just one form of binary code, we can use vectors for numeric data also, specifying ranges of index values for nets, variables and ports, and using indexing to refer to individual bits. When we look at arithmetic operations on unsigned integers, we will see how they can be modeled in Verilog as operations on vectors.

EXAMPLE 3.3 Develop a Verilog model of a 4-to-1 multiplexer that selects among four unsigned 6-bit integers.

SOLUTION The module definition is

This is much the same as the multiplexer model that we saw in Section 2.3.2. The input ports a0 through a3 and the output port z are all 6-bit unsigned vectors, indexed from 5 down to 0. We choose this index range so that the index of each bit in a vector corresponds to the power of its binary weight. The input port sel, used to select among the inputs, is also a vector, though we are not interpreting it as representing a number.

#### Octal and Hexadecimal Codes

We have seen that we need at least approximately  $\log_2 N$  bits to represent the number N in unsigned binary form. The same number is represented in decimal with approximately  $\log_{10} N$  digits. Now

$$\log_2 N = \log_{10} N / \log_{10} 2 = \log_{10} N / 0.301 \dots = \log_{10} N \times 3.32 \dots$$

In other words, we need more than three times as many binary digits as decimal digits to represent a given number. While that is not necessarily a problem in terms of the digital system, it is cumbersome and error prone for us to write down and read the long strings of bits required for large numbers. For this reason, we often use *hexadecimal* (base 16) or, less commonly, *octal* (base 8) for those purposes. We will show how these representations work first, then discuss the advantages of using them.

Octal is just another form of positional number system, except that we use the digits 0 through 7 and multiply them by powers of 8 depending on their position. Thus, for example,

$$253_8 = 2 \times 8^2 + 5 \times 8^1 + 3 \times 8^0$$
$$= 2 \times 64 + 5 \times 8 + 3 \times 1$$
$$= 128 + 40 + 3 = 171_{10}$$

More important, for a given octal number, we can factor out powers of two in each digit and so very quickly determine the binary representation of the same number. For example,

$$\begin{split} 253_8 &= 2 \times 8^2 + 5 \times 8^1 + 3 \times 8^0 \\ &= (0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0) \times 8^2 + (1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0) \times 8^1 \\ &\quad + (0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0) \times 8^0 \\ &= (0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0) \times 2^6 + (1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0) \times 2^3 \\ &\quad + (0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0) \times 2^0 \\ &= (0 \times 2^8 + 1 \times 2^7 + 0 \times 2^6) + (1 \times 2^5 + 0 \times 2^4 + 1 \times 2^3) \\ &\quad + (0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0) \\ &= 010101011_2 \end{split}$$

In general, given an octal number, we can replace each digit with the corresponding three binary digits to give the unsigned binary represen-

tation of the number. The three-bit patterns corresponding to the octal digits are

Note that we need to take care when using an octal number for an unsigned binary code if the code is not a multiple of three in length. We need to understand or specify explicitly how long the binary code is and drop unused bits from the left when converting from octal. For example, had we specified that the number 2538 stood for an 8-bit binary number, we would have dropped the left-most bit to get 10101011<sub>2</sub>. If any of the bits we drop from the left are 1 rather than 0, the octal number is greater than the largest number that can be encoded in the given number of bits. Usually, this is considered an error.

We can also work in the reverse direction from an unsigned binary number. We divide the bits in to groups of three, starting from the right, and replace each group with the corresponding octal digit. For example, given the unsigned binary number 11001011, we can convert it to octal as follows:

$$11001011_2 \Rightarrow 11\ 001\ 011 \Rightarrow 313_8$$

Note that in this example, the number of bits is not a multiple of three, so we had to assume a 0 bit on the left. Again, we need to take care that the actual number of bits in the unsigned binary representation is understood or explicitly stated.

Hexadecimal is another form of positional number system, like octal, but based on powers of 16. The only minor problem we encounter is that we need digits with values from 0 through 15. We use the normal digits 0 through 9, but augment them with the letters A through F for the remaining digits. The correspondence is

$$A_{16} = 10_{10}$$
  $B_{16} = 11_{10}$   $C_{16} = 12_{10}$   
 $D_{16} = 13_{10}$   $E_{16} = 14_{10}$   $F_{16} = 15_{10}$ 

Thus, for example,

$$\begin{aligned} 3\text{CE}_{16} &= 3 \times 16^2 + 12 \times 16^1 + 14 \times 16^0 \\ &= 3 \times 256 + 12 \times 16 + 14 \times 1 \\ &= 768 + 192 + 14 = 974_{10} \end{aligned}$$

By similar arguments to those for octal numbers, we can arrive at a quick method for converting between hexadecimal and unsigned binary representations of a number. Whereas for octal, we formed groups of three bits (since  $8 = 2^3$ ), for hexadecimal we form groups of 4 bits (since  $16 = 2^4$ ). The 4-bit patterns corresponding to the hexadecimal digits are

0: 0000 1: 0001 2: 0010 3: 0011 4: 0100 5: 0101 6: 0110 7: 0111 8: 1000 9: 1001 A: 1010 B: 1011 C: 1100 D: 1101 E: 1110 F: 1111 Thus, for example,  $3CE_{16} = 0011 \ 1100 \ 1110_2$ . In the reverse direction:  $11001011_2 \Rightarrow 1100 \ 1011 \Rightarrow CB_{16}$ 

As we mentioned earlier, nearly all computer systems use number representations that are 8, 16 or 32 bits long. Hence, the term *byte* for 8 bits of data has entered the common language. Since these are all multiples of 4 in length and not multiples of 3, hexadecimal is a more natural representation to convert to than octal. (Engineers sometimes use the term *nibble* to refer to 4 bits of data, punning on the fact that a nibble is a small bite.) With hexadecimal in these applications, we don't need to worry about assuming or dropping leading 0 bits. That's why programmers usually deal with hexadecimal and not octal. However, since we, as hardware designers, can select the number of bits that is best for our needs, we may find octal more useful in some cases, particularly if the number of bits is a multiple of 3.

#### 3.1.2 OPERATIONS ON UNSIGNED INTEGERS

Since unsigned integers are binary coded, we can perform on them all of the operations on encoded data described in Section 2.3. A common application is to decode an n-bit unsigned binary number representing the location of information in a memory. The decoder has  $2^n$  control outputs, which we can use to activate a particular memory location. We shall see this in more detail in Chapter 5. We can also use multiplexers in parallel, one per bit of an unsigned binary representation, to choose between multiple sources of numeric data. This was illustrated in Example 3.3. We should also expect to be able to perform arithmetic operations on numbers represented in unsigned binary. However, before we look at that, we will discuss some simpler operations.

### **Resizing Unsigned Integers**

When we write numbers in decimal on paper, we usually don't write any leading insignificant zeros. We just use the least number of digits needed to represent the number. For example, we just write  $123_{10}$ , and not  $0123_{10}$  or  $000123_{10}$ , although all represent the same number. We could do the same in binary, and just write  $10110_2$ , and not  $010110_2$  or  $00010110_2$ . However, in a digital circuit, each bit is implemented by a physical wire, and we choose the number of bits based on the largest value we expect to occur during operation of the circuit. Since wires do not come and go as values change, we normally do write leading insignificant zeros for unsigned binary numbers occurring in a digital circuit.

FIGURE 3.1 Implementation of zero extension in a circuit.

CHAPTER THREE

Recall that the largest value that can be represented with n bits is  $2^n - 1$ . Suppose we have some numeric data x represented with n bits:

$$x = x_{n-1}2^{n-1} + x_{n-2}2^{n-2} + \dots + x_02^0$$

However, in order to perform some arithmetic operations, which may result in larger values than  $2^n - 1$ , we need to represent the same value in m bits, where m > n:

$$y = y_{m-1}2^{m-1} + \dots + y_n2^n + y_{n-1}2^{n-1} + y_{n-2}2^{n-2} + \dots + y_02^0$$

Since we want y = x, we can just set  $y_i = x_i$ , for  $i = 0, 1, \ldots, n-1$ , and  $y_i = 0$ , for  $i = n, n+1, \ldots, m-1$ . In other words, we just add leading insignificant 0 bits to the left of the n-bit representation to form the m-bit representation. In terms of circuit implementation, we simply add extra bit signals with their value hard-wired to 0, usually by connecting them to the circuit ground, as shown in Figure 3.1. This technique is called *zero extension*.

We can express zero extension in a Verilog model by concatenating a string of 0 bits to the left of a vector representing an unsigned integer. For example, given nets declared as

```
wire [3:0] x;
wire [7:0] y;
```

We can write the following assignment statement in a module to zero extend the value of x and assign it to y:

```
assign y = \{4'b0000, x\};
```

The notation that we have used here simply joins two vector values together to form a larger vector. For example, if x has the value 1010, the value assigned to y would be 00001010. As a convenience, Verilog

automatically zero extends a literal vector value to the specified size. So we could rewrite the above assignment as

```
assign y = \{4'b0, x\};
```

In this case, Verilog extends the bit value 0 with additional 0 bits to make a total of 4 bits.

Verilog also allows us to perform zero extension implicitly. If we assign an unsigned vector of a smaller size to a vector net or variable of a larger size, the value is implicitly zero extended to the size of the assignment target. For example, we could have written the above assignment simply as

```
assign y = x;
```

in which case the 4-bit value of x would be implicitly zero extended to 8 bits, the size of y. While this might appear to be a more succinct and convenient way to write the assignment, we should be aware that zero extension occurs. Using the vector concatenation operation makes the extension explicit, which better documents our design intent.

The converse operation to zero extension is truncation, in which we reduce the number of bits used to represent a numeric value from m to a smaller size, n. Recall again that the largest value representable in n bits is  $2^n - 1$ . Any m-bit value less than or equal to this value has 0 for all of the left-most m-n bits. So to represent the value in n bits, we simply discard the left-most m-n bits. The problem that might arise is that the value represented in m bits might be larger than  $2^n - 1$ , and so not be representable in n bits. Such a value has at least one of the left-most m-n bits being 1. In most applications where we need to truncate, this situation does not arise, and we can discard the bits with impunity. We only reduce the number of bits when we know that the value must be within the range representable by the smaller number of bits. We might arrive at that conclusion by analyzing the arithmetic operations performed to derive the larger-sized value. In terms of circuit implementation, discarding bits does not mean physically removing anything from the circuit. Rather, we just leave the left-most bits unconnected, as illustrated in Figure 3.2.

An alternative view of truncation of y from m bits to n bits is that it implements the operation  $y \mod 2^n$ . We can demonstrate this as follows:

 $y \mod 2^n$ =  $(y_{m-1}2^{m-1} + \dots + y_n2^n + y_{n-1}2^{n-1} + \dots + y_02^0) \mod 2^n$ 

FIGURE 3.2 Implementation of truncation in a circuit.

CHAPTER THREE

$$= ((y_{m-1}2^{m-n-1} + \dots + y_n2^0)2^n + y_{n-1}2^{n-1} + \dots + y_02^0) \bmod 2^n$$
  
=  $y_{n-1}2^{n-1} + \dots + y_02^0$ 

Thus, if we want to compute  $y \mod 2^n$ , we just truncate y to n bits, regardless of the values of any of the discarded bits.

In a Verilog model, we express truncation of a value by picking out a *part select* of the net or variable representing the value. For example, given nets x and y declared as above, we can write the following assignment statement in a module to truncate the value of y and assign it to x:

assign 
$$x = y[3:0];$$

The range of values in brackets specifies the index positions of the right-most elements that we want to use for the smaller representation. For example, if y has the value 00001110, the value assigned to x would be 1110.

### Addition of Unsigned Integers

The addition operation on unsigned binary integers is analogous to the operation on decimal numbers. We start with the two least significant operand bits and add them to form the least significant sum bit and a carry into the next position. We then repeat until we reach the most significant position, forming the most significant sum bit and the carry out. The difference between doing this in binary and decimal is that, in binary, the sum of the two operand bits and the carry into a position is either 0, 1, 2 or at most 3. Since bits can only be 0 or 1, the case of the sum being 2 means the sum bit is 0 and the carry out is 1, and the case of the sum being 3 means the sum bit is 1 and the carry out is 1.

| 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |   |
|---|---|---|---|---|---|---|---|---|---|---|
|   | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
|   | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
|   | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |

FIGURE 3.3 Unsigned addition with carry out of 0.

| 1 | 1 | 0 | 0 | 1 |   |
|---|---|---|---|---|---|
|   | 0 | 1 | 0 | 0 | 1 |
|   | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 0 |

FIGURE 3.4 Unsigned addition with carry out of 1.

| $x_i$ | $y_i$ | $c_i$ | $s_i$ | $c_{i+1}$ |
|-------|-------|-------|-------|-----------|
| 0     | 0     | 0     | 0     | 0         |
| 0     | 0     | 1     | 1     | 0         |
| 0     | 1     | 0     | 1     | 0         |
| 0     | 1     | 1     | 0     | 1         |
| 1     | 0     | 0     | 1     | 0         |
| 1     | 0     | 1     | 0     | 1         |
| 1     | 1     | 0     | 0     | 1         |
| 1     | 1     | 1     | 1     | 1         |

**TABLE 3.1** Truth table for sum and carry bits.

EXAMPLE 3.4 Show the addition of the unsigned binary numbers 10101111100<sub>2</sub> and 0011010010<sub>2</sub>.

SOLUTION The addition is shown in Figure 3.3. Here, we have included the carry-out bit from the most significant position. Since it is 0, the result can be represented in the same number of bits as the two operands.

EXAMPLE 3.5 Show the addition of the unsigned binary numbers  $01001_2$  and  $11101_2$ .

SOLUTION The addition is shown in Figure 3.4. Again, we have included the carry out from the most significant position. However, this time it is 1, indicating that the result value cannot be represented in the same number of bits as the operands. If the design in which we are doing this addition requires the result to be five bits long, the carry out of 1 is an error condition. Alternatively, if the design allows us to use an extra bit for the result, we can use the carry-out bit as the extra most significant bit, as indicated in grey. This is the same as if we had zero extended the operands by one bit.

As these examples show, if we need to represent the result in the same number of bits as the operands (a not uncommon case), we can use the carry-out bit from the most significant position to indicate whether an *overflow* condition has occurred. When the bit is 1, the sum bits are incorrect.

Let's now look at how to design a digital circuit to perform addition upon unsigned binary numbers. Such a circuit is called, unsurprisingly, an *adder*. If we consider the method for addition described above, we see that for the least significant position, the sum  $(s_0)$  and carry-out  $(c_1)$  bits are Boolean functions of the two least significant operand bits  $(x_0, y_0)$ . We can express the functions as Boolean equations:

$$s_0 = x_0 \oplus y_0$$
  $c_1 = x_0 \cdot y_0$  (3.1)

A circuit to implement these equations is called a *half adder*, and can be constructed with an XOR gate to produce the sum bit and an AND gate to produce the carry-out bit. The reason it's only half an adder will become clear in a moment.

For the remaining bits, at each position i, the sum  $(s_i)$  and carry-out  $(c_{i+1})$  bits are Boolean functions of the operand  $(x_i, y_i)$  and carry-in  $(c_i)$  bits. The functions are as shown in the truth table in Table 3.1. They can also be expressed as Boolean equations, as follows:

$$s_i = (x_i \oplus y_i) \oplus c_i \tag{3.2}$$

$$c_{i+1} = x_i \cdot y_i + (x_i \oplus y_i) \cdot c_i \tag{3.3}$$

A circuit that implements these equations is called a *full adder*, since we can construct it from two half adders: one to add the two operand bits

and one to add the result of that with the carry-in bit. A small amount of additional logic is needed to form the carry out. However, this form of full adder is largely of historical interest, since constraints that apply in most designs lead to different implementations.

One thing to note about the equations for a full adder is that, if the carry in,  $c_i$ , is 0, the equations simplify to those for a half adder. A consequence is that we can use a full adder for the least significant position instead of a half adder simply by setting the carry-in bit to 0. This allows us to treat all positions uniformly, and will also afford another advantage that we shall see when we get to signed integer addition and subtraction. Thus, a complete structure for an adder for unsigned integers consists of a full adder cell for each bit position, with carry outs chained to carry ins of adjacent positions, as shown in Figure 3.5. (For arithmetic circuits, we usually arrange components left-to-right in order of decreasing significance, to match the left-to-right order of bits of a number. The arrows on the carry connections in Figure 3.5 indicate that carry values flow from right to left, contrary to our usual convention of left-to-right flow.) The carry out of the most significant position can be used as the most significant sum bit if the sum is allowed to be longer than the operands. Otherwise, it can be used as an overflow condition signal.

This kind of adder structure is called a *ripple-carry adder*. We can see why it has this name by considering the flow of information through the structure. At each bit position, the values of the sum and carry outputs depend not only on the two operand bit inputs, but also on the carry from the adjacent less significant position. We can also see this by examining the Boolean equations for the full adder. They form a recurrence relation, so that, ultimately, each sum bit and the final carry-out bit depend on all of the less significant operand bits. When two operand values arrive at the adder inputs, each full adder determines a transient value for its sum and carry-out outputs. However, the full adders have some propagation delay, since they are just logic circuits. Thus, the carry out from the least significant position acts as an input to the next position after the propagation delay, possibly affecting the output of that position. Its carry out, after another propagation delay, may affect the output of the third position. In this way, carry values "ripple" from least significant to most significant position, possibly affecting sum-bit values along the way.



FIGURE 3.5 Structure of an adder for unsigned integers using full adder cells.

In the worst case, the delay from operand values arriving to the sum value settling is the product of each full adder's propagation delay and the number of bits in the unsigned binary representation. If the performance constraints of the application allow for an addition to be done slowly, a ripple-carry adder is a simple and effective adder structure. However, many applications require that arithmetic operations have high performance in order to meet timing constraints. In those cases, we can find alternate adder structures that have less delay, though at the expense of greater circuit area and power consumption.

We will now outline a couple of ways in which we can improve the adder performance over that of a ripple-carry adder. As the basis of our discussion, let's return to Equations 3.2 and 3.3 and to the truth table in Table 3.1. For a given position *i*, we can see the following properties.

▶ If  $x_i$  and  $y_i$  are both 0, then  $c_{i+1} = 0$ , regardless of the value of  $c_i$ . In this case, any carry in to the position is *killed*. We define a signal for this condition:

$$k_i = \overline{x}_i \cdot \overline{y}_i \tag{3.4}$$

▶ If one of  $x_i$  and  $y_i$  is 1 and the other is 0, then  $c_{i+1} = c_i$ . In this case, the carry in is *propagated* to the next position. A signal for this condition is

$$p_i = x_i \oplus y_i \tag{3.5}$$

▶ If  $x_i$  and  $y_i$  are both 1, then  $c_{i+1} = 1$ , regardless of the value of  $c_i$ . In this case, a carry out is *generated* for the next position. We define a signal for this condition:

$$g_i = x_i \cdot y_i \tag{3.6}$$

Substituting Equations 3.5 and 3.6 into Equations 3.2 and 3.3 gives

$$s_i = p_i \oplus c_i \tag{3.7}$$

$$c_{i+1} = g_i + p_i \cdot c_i \tag{3.8}$$

One way in which these reformulated equations help is by exposing a way of determining the carry values at each position more quickly than the ripple-carry method. Note that the  $k_i$ ,  $p_i$  and  $g_i$  signals only depend on the operand bit values at their respective positions, so they can be determined quickly after the operand values arrive at the adder inputs. If a carry is killed or generated at a given position, we don't need to wait for the carry in from less significant positions; we can drive a 0 or 1 carry-out value immediately. On the other hand, if carry is to be propagated, we



FIGURE 3.6 Fast-carry-chain full-adder cells.

can switch the carry in to the carry out very quickly. These observations form the basis for the structure of a *fast-carry-chain adder*, sometimes also called a *Manchester adder*.

Figure 3.6 shows two alternate implementations of the full-adder cell used in such an adder. In the implementation on the left, the box at the top derives the propagate signal, which drives the select input of a multiplexer. If  $p_i$  is 0, then the carry is either generated ( $x_i$  and  $y_i$  are both 1) or killed ( $x_i$  and  $y_i$  are both 0). So either of the input bits can be selected to derive the carry out, without having to wait for the carry in. If  $p_i$  is 1, then the carry out is the same as the carry in. Like the ripple-carry adder, in the worst case, the carry has to propagate from the least significant to the most significant position. However, if the implementation fabric provides fast multiplexers (which many do), the propagation delay along this carry chain is much less than that of a chain of gate circuits based on Equation 3.3. As an example, several FPGA families manufactured by Xilinx include fast-carry chains using multiplexers, allowing fast-carry-chain adders to be implemented.

The full-adder cell shown at the right of Figure 3.6 is very similar. The box at the top derives all of the generate, propagate and kill signals. These are used to drive the control inputs of electronic switches to derive the carry-out bit. If  $g_i$  is 1, the carry-out bit is switched to 1; if  $k_i$  is 1, the carry-out bit is switched to 0; and if  $p_i$  is 1, the carry-out bit is switched from the carry-in input. Again, in the worst case, a carry may have to propagate from the least significant to the most significant position. However, fabrics such as custom or standard-cell ASICs include switch components that have very small propagation delay, allowing fast-carry-chain adders to be implemented in this way.

Another way in which we can use the reformulated equations is to solve Equation 3.8 as a recurrence relation and determine all of the carry



FIGURE 3.7 A 4-bit carry-lookahead adder.

bits at once. Equation 3.8 gives us the equation for  $c_1$  directly. We can substitute this back into Equation 3.8 to get the equation for  $c_2$ :

$$c_2 = g_1 + p_1 \cdot (g_0 + p_0 \cdot c_0) = g_1 + p_1 \cdot g_0 + p_1 \cdot p_0 \cdot c_0$$

We can repeat substitution and similarly get the equations for  $c_3$  and  $c_4$ :

$$c_3 = g_2 + p_2 \cdot g_1 + p_2 \cdot p_1 \cdot g_0 + p_2 \cdot p_1 \cdot p_0 \cdot c_0$$

$$c_4 = g_3 + p_3 \cdot g_2 + p_3 \cdot p_2 \cdot g_1 + p_3 \cdot p_2 \cdot p_1 \cdot g_0 + p_3 \cdot p_2 \cdot p_1 \cdot p_0 \cdot c_0$$

Note that each of these expressions is a function of only  $c_0$  and the operand input bits (since the generate and propagate signals are functions only of the operand bits). This gives us a way to determine the carry bit at each position without having to wait for carries to propagate up from less significant positions. We can then use the carry bit to derive the sum bits according to Equation 3.2. An adder based on this formulation is called a *carry-lookahead adder*. A 4-bit version of such an adder is illustrated in Figure 3.7. Each of the boxes at the top derives the generate and propagate signals for the corresponding bit position. The *carry-lookahead generator* implements the equations shown above to derive the carry signals. These are combined with the propagate signals to derive the sum bits. The trade-off for getting the sum bits faster is the area and power consumed by the carry-lookahead generator circuitry.

We have shown a carry-lookahead generator for 4 bits, since that is about as large as we can practically make it. In principle, we could continue substituting in Equation 3.8 to get further carry bits. However, a more practical approach for wider adders is to use 4-bit carry-lookahead adders for segments of 4 bits, and to use a second level of carry-lookahead generators to derive the carry-in bits for each segment. There are also

other forms of adders that build upon the reformulated expressions to compute carry bits in different ways. The choice among them is a question of making trade-offs among circuit area, power and performance, constrained by the resources available in implementation fabrics. A full discussion of these adder structures is beyond the scope of this book, but there are many references that go into detail.

In all of our discussion of adders so far, we have not yet described how to model them in Verilog. We could simply translate the Boolean expressions in the various forms we have discussed into Verilog. However, doing so would disguise our design intent of adding unsigned binary numbers. In particular, a CAD tool would just try to implement the model as combinational circuitry, and may not readily be able to recognize the opportunity to use any specialized circuit resources, such as fast-carry chains, available in an implementation fabric. A much better approach is to use the addition operator provided by Verilog to operate on vector values. A synthesis CAD tool can then implement the addition operation using the most appropriate form of adder provided by the target fabric to meet design constraints. Alternatively, we could develop a structural model, selecting the most appropriate form of adder from a library of arithmetic components, and verify that the structural model produces the same results as a behavioral model using the addition operator.

### EXAMPLE 3.6 Given the Verilog declaration of three nets:

```
wire [7:0] a, b, s;
```

write a Verilog statement to assign the sum of a and b to s.

SOLUTION The required statement is

```
assign s = a + b;
```

The + operator works on two unsigned values to produce an unsigned result whose length is the larger of the two operands. It does not produce a carry out, so if there is an overflow, it remains undetected.

EXAMPLE 3.7 Revise the statements to produce a carry-out bit, c.

SOLUTION We can do this by zero extending a and b by one extra bit before doing the additions, in order to get a 9-bit result. The carry out is then

the most significant bit of that result, and the 8-bit sum is the remaining bits. We need to declare a net for the 9-bit intermediate result and for the carry bit:

```
wire [8:0] tmp_result;
wire    c;
```

The required statements are

An alternative way of writing these assignments is

```
assign \{c, s\} = \{1'b0, a\} + \{1'b0, b\};
```

In this assignment, the left-hand side is written as a concatenation of the carry bit and sum nets. The bits of the result of addition are assigned to the corresponding bits of the concatenated nets. We can simplify this further, since Verilog has rules that cover implicit extension of expression operands based on the size of the left-hand side of an assignment. If we write

```
assign \{c, s\} = a + b;
```

the Verilog rules determine that the size of the left-hand side is 9 bits, so the values of a and b must be extended to 9 bits. Since they are unsigned values, they are implicitly zero extended, and the result of the addition is also 9 bits long. As we mentioned earlier, while these rules might appear to make the assignment more succinct, we must take care that implicit extensions have the effect we really want. If in doubt, or if we want to make our intent explicit, we can use explicit extension.

The above example shows how we can use vectors when we need to access the individual bits of the binary code. Often, we can raise the level of abstraction in our Verilog model by considering only the numeric aspects of data and not their binary encoding. Verilog allows us to do so using the type integer for numbers. We can declare a variable (but not a net) to be of type integer as follows:

```
integer n;
```

Integer variables are typically 32 bits long, though a Verilog implementation is allowed to use a larger size. The range of values represented by a 32-bit integer includes the unsigned values up to approximately 2 billion. It also includes negative numbers, which we will discuss further in the next section.

EXAMPLE 3.8 Revise the declaration and statement in Example 3.6 to use integer variables instead of vector nets.

The revised declaration is SOLUTION

```
integer a, b, s;
```

Since we are using variables instead of nets, the assignment must be in a procedural block. We replace the assignment statement with the always block:

```
always @*
 s = a + b:
```

The addition expression looks exactly like that in the original assignment. The only difference is that we are not concerned about the size of the variables and are ignoring the possibility of any carry out. A synthesis tool would infer at least a 32-bit adder with no overflow checking, since we have not indicated the actual range of values that can occur. That is one reason why we would not generally use integer types for synthesizable models where the range of values is known to be smaller than 32.

## Subtraction of Unsigned Integers

We can work out how to perform subtraction of unsigned binary integers by following a process similar to that for addition. First, we devise the steps for binary subtraction, bit by bit, analogously to subtraction of decimal digits. Recall that, in decimal, if we subtract a larger digit from a smaller digit, we borrow from the next column. We do the same in binary, borrowing if we subtract 1 from 0.

EXAMPLE 3.9 Show the subtraction of the unsigned binary numbers 10100110<sub>2</sub> and 01001010<sub>2</sub>.

SOLUTION The subtraction is shown in Figure 3.8. Here, we have included the borrow-out bit from the most significant position. Since it is 0, the result can be represented in the same number of bits as the two operands.

```
0 \ 1 \ 0 \ 1 \ 1 \ 0 \ 0 \ 0
x:
      10100110
ν:
    -01001010
d:
      0 1 0 1 1 1 0 0
```

FIGURE 3.8 Unsigned subtraction.

| $x_i$ | $y_i$ | $b_i$ | $d_i$ | $b_{i+1}$ |
|-------|-------|-------|-------|-----------|
| 0     | 0     | 0     | 0     | 0         |
| 0     | 0     | 1     | 1     | 1         |
| 0     | 1     | 0     | 1     | 1         |
| 0     | 1     | 1     | 0     | 1         |
| 1     | 0     | 0     | 1     | 0         |
| 1     | 0     | 1     | 0     | 0         |
| 1     | 1     | 0     | 0     | 0         |
| 1     | 1     | 1     | 1     | 1         |

**TABLE 3.2** Truth table for difference and borrow bits.

Next, we look at how to design a *subtracter* circuit to perform subtraction upon unsigned binary numbers. For the least significant position, the difference  $(d_0)$  and borrow-out  $(b_1)$  bits are Boolean functions of the two least significant operand bits. The Boolean equations are

$$d_0 = x_0 \oplus y_0$$
  $b_1 = \overline{x_0} \cdot y_0$ 

For the remaining bits, at each position i, the difference  $(d_i)$  and borrow-out  $(b_{i+1})$  bits are Boolean functions of the operand  $(x_i, y_i)$  and borrow-in  $(b_i)$  bits, with the truth table shown in Table 3.2. They can also be expressed as Boolean equations, as follows:

$$d_i = (x_i \oplus y_i) \oplus b_i \tag{3.9}$$

$$b_{i+1} = \overline{x_i} \cdot y_i + \overline{(x_i \oplus y_i)} \cdot b_i \tag{3.10}$$

As we did in the case of the adder, we can set the borrow in for the least significant position to 0 and just use Equations 3.9 and 3.10 uniformly for all positions. We could now go ahead and develop circuits for these equations. However, many systems that need a subtracter also need an adder, and choose whether to add or subtract the operands. A little algebraic manipulation will expose a trick that allows us to use the same circuit to perform either addition or subtraction. Notice that the equation for the difference is the same as that for the sum in an adder, and that the equation for the borrow is similar to that for the carry. The trick lies in using the complemented form of the borrow bits. If we do that, we can rewrite the equations as

$$d_i = (x_i \oplus \overline{y_i}) \oplus \overline{b_i} \tag{3.11}$$

$$\overline{b_{i+1}} = x_i \cdot \overline{y_i} + (x_i \oplus \overline{y_i}) \cdot \overline{b_i}$$
(3.12)

Proof of this is left to Exercise 3.27. If we compare these equations with Equations 3.2 and 3.3, we see that they are identical in form, but with  $\bar{y}_i$  replacing  $y_i$  and  $\bar{b}_i$  replacing  $c_i$ . Consequently, we can use an adder circuit to perform subtraction simply by negating each bit of the second operand and using a negated form of borrow. For the least significant position, we set the negated borrow-in bit to 1. We can use the negated borrow out from the most significant position to indicate underflow: if it is 0, indicating a borrow, the true difference is negative, and so cannot be represented as an unsigned integer.

Now let's see how to modify an adder circuit to perform both addition and subtraction. Suppose we have a control signal that is 0 when we want the circuit to perform addition and 1 when we want it to perform subtraction. Since addition requires a 0 value for the least significant carry in and subtraction requires a 1 for the least significant negated borrow in, we can just use the control signal as the carry in/negated borrow in. We could also use the control signal to control an n-bit 2-to-1 multiplexer selecting between the second operand and its negation as the second input to the circuit. However, another part of the trick is to notice that  $y_i \oplus 0 = y_i$  and  $y_i \oplus 1 = \overline{y_i}$ . So we can connect each bit of the second operand to an XOR

FIGURE 3.9 Adapting an adder to perform addition and subtraction.

CHAPTER THREE

gate with the control signal as the other gate input, and connect the gate outputs to the adder. The final circuit for an adder/subtracter is shown in Figure 3.9. The adder can be any of the circuits we described earlier: ripple-carry or optimized for the application's requirements and constraints.

As with Verilog models that perform addition, we normally write models that apply the subtraction operator to vector values, rather than directly implementing the Boolean equations for a subtracter. That way, we can let the synthesis CAD tool decide on an appropriate subtracter circuit to use depending on constraints that apply. Moreover, if the system we are designing performs both addition and subtraction, the tool can decide whether to use separate circuits for the operations, or to share a single adder/subtracter between the operations. Naturally, it can only share the circuit if operations are to be done at different times. We shall see in later chapters how to control sequencing of operations. For now, we will just consider combinational circuits that assume the existence of a control signal for selecting between addition and subtraction operations.

EXAMPLE 3.10 Develop a Verilog behavioral model of an adder/subtracter for 12-bit unsigned binary numbers. The circuit has data inputs x and y, a data output s, a control input mode that is 0 for addition and 1 for subtraction, and an output ovf\_unf that is 1 when an addition overflow or a subtraction underflow occurs.

SOLUTION The module performs the addition and subtraction using the + and – operators on the vector operand values, as follows:

The assignment in the module uses the mode input to choose between addition and subtraction of the operands. Since we want to use the carry-out or borrow-out bit for the ovf\_unf output, we assign to the concatenation of the two outputs using the notation we saw in Example 3.7. Verilog implicitly extends the addition and subtraction operands to match the 13-bit size of the assignment target. The least significant 12 bits of the result are used as the sum or difference output value and the most significant bit as the ovf\_unf value. In the case of addition, the most significant bit is the carry out: 1 for overflow, or 0 otherwise. In the case of subtraction, the most significant bit is the borrow out, not negated: 1 for underflow, or 0 otherwise. Thus, we can use this bit for the ovf\_unf output.

EXAMPLE 3.11 Develop a verification testbench for the adder/subtracter that compares the result with the result of addition or subtraction performed on values of type integer.

SOLUTION The module, test\_add\_sub, has no ports, since it is a self-contained testbench:

```
`timescale 1ns/1ns
module test_add_sub;
  reg [11:0] x, y;
 wire [11:0] s;
 reg
              mode;
             ovf unf;
 wire
 integer x_num, y_num, s_num;
  task apply_test ( input integer x_test, y_test,
                    input
                                  mode test );
   begin
     x = x_test; y = y_test; mode = mode_test;
     #10;
   end
  endtask
  adder_subtracter duv (.x(x), .y(y), .s(s),
                         .mode(mode), .ovf_unf(ovf_unf) );
  initial begin
   apply_test(
                   0,
                         10,
                              0);
                  0,
   apply_test(
                         10,
                              1);
   apply_test(
                 10,
                          0,
                              0);
   apply_test(
                 10,
                          0, 1);
   apply_test(2**11, 2**11,
```

CHAPTER THREE

```
apply_test(2**11, 2**11, 1);
   // ... further test cases
   #10 $finish:
 end
 always @* begin
   x_num = x; y_num = y; s_num = s;
   if (!mode)
     if (x_num + y_num > 2**12-1) begin
       if (!ovf_unf)
         $display("Addition overflow: ovf_unf should be 1");
     end
     else beain
       if (!(!ovf\_unf \&\& s\_num = = x\_num + y\_num))
         $display("Addition result incorrect");
     end
   else
     if (x_num - y_num < 0) begin
       if (!ovf_unf)
         $display("Subtraction underflow: ovf_unf should be 1");
     end
     else begin
       if (!(!ovf\_unf \&\& s\_num = = x\_num - y\_num))
          $display("Subtraction result incorrect");
     end
 end
endmodule
```

The module declares nets and variables to connect to the inputs and outputs of the adder/subtracter instance, duv. The instance is followed by a task to apply individual test cases. The initial block makes successive calls to the task to assign a sequence of input values to the inputs, exercising both addition and subtraction with cases that produce normal results, overflow and underflow. Note the use of the value 2\*\*11, which is the way we write 2<sup>11</sup> in Verilog. The \*\* operator performs exponentiation.

The always block responds to changes of input values to the adder/subtracter, then waits for the adder/subtracter to produce outputs. The block then assigns the unsigned input values to the variables  $x_num$ ,  $y_num$  and  $y_num$  of type integer. The block then checks the value of the mode input. If it is 0, indicating addition, the block checks the numeric sum of the operands. Since it does this using the numeric variables, the result is not limited to the range representable in 12 bits. Hence, the block can compare the true sum with the largest value representable in 12 bits, namely,  $y_num$  1. If the sum is larger, the block verifies that the ovf\_unf output is 1. Otherwise, the block verifies that the ovf\_unf output is 0 and that the sum result is equal to

the computed numeric sum. If mode is 1, indicating subtraction, the block performs similar checks, but compares the numeric difference between the operands with 0.

Note that the condition checks and choices between consequent actions in the always block are written using Verilog *if statements*. Each if statement has the form

```
if ( condition )
statement
else
statement
```

The first statement is performed if the condition is true, and the second statement is performed if the condition is false. The keyword else and the the second statement are optional, and are omitted if there is no action to perform if the condition is false. Since an if statement is just one form of statement, we can nest an if statement within an alternative of an outer if statement. The always block illustrates this: it has an outer if statement, if (!mode) ..., that has nested if statements for each of the alternatives. If we need to perform more than one statement in either alternative, we bracket the group of statements in the keywords begin ... end, as shown in the example model. We also use begin ... end bracketing if a nested if statement omits the else alternative. The bracketing makes it clear that the else belongs to the outer if statement, not the inner if statement.

#### Incrementing and Decrementing Unsigned Integers

There are two further arithmetic operations that we may perform on unsigned binary integers and that are related to addition and subtraction. The *increment* operation involves adding the constant value 1, and the *decrement* operation involves subtracting the constant value 1. These operations arise quite frequently in digital systems, particularly as part of counters, which generate increasing or decreasing sequences of numbers.

A straightforward way to design an increment circuit would be to use an adder with one operand input hard wired to the unsigned binary representation of 1, namely, 0 ... 001. Alternatively, we could hard wire one input to the representation of 0 and the carry in to 1. However, since one input is a constant value, we can simplify the circuit considerably. To see how, let's return to the Boolean equations for an adder, Equations 3.2 and 3.3. If we substitute  $y_i = 0$ , we can simplify to the equations

$$s_i = x_i \oplus c_i$$
  $c_{i+1} = x_i \cdot c_i$ 

 $s_0$ 

 $S_1$ 

S,

FIGURE 3.10 Structure of an incrementer for unsigned integers using half adder cells.

CHAPTER THREE

which are essentially those for a half adder (Equation 3.1 on page 96). In other words, an incrementer can be formed using a chain of half adders, as shown in Figure 3.10. The carry out of the most significant bit can be used for an overflow condition signal. A decrementer can be formed similarly by simplifying the equations for a subtracter with one input hard wired to the representation of 0 and the negated borrow in hard wired to 0.

Note that the incrementer of Figure 3.10 is a ripple-carry circuit, and so has similar delay characteristics to a ripple-carry adder. In the same way that we improved the performance of adders and subtracters, we could improve the performance of incrementers and decrementers, for example, using fast carry chains or carry-lookahead.

In Verilog models, we can express the increment or decrement operation by adding or subtracting the literal value 1 to an operand. For example, given nets declared as

```
wire [15:0] x, s;
```

we could assign the incremented value of x to s with the statement

```
assign s = x + 1;
```

and we could assign the decremented value with the statement

```
assign s = x - 1;
```

Note that the value 1 is a numeric value, represented by Verilog in binary form. The size of the representation is determined by the context. In this example, it is 16 bits, since that is the size of the addition and subtraction operands and the assignment target. Using unsized numeric values like this is a convenient way to make our Verilog models more concise.

FIGURE 3.11 Circuit for an equality comparator.



### Comparison of Unsigned Integers

In some applications, it may be necessary to compare two unsigned binary integers for equality or inequality. Since there is exactly one code word for each numeric value, we can test for equality of two unsigned binary integers by testing whether the corresponding bits of each are the same. When we introduced the XNOR gate in Section 2.1.1, we mentioned that it is also called an equivalence gate, since its output is 1 only when its two inputs are the same. Thus, we can test for equality of two unsigned binary numbers using the circuit of Figure 3.11, called an *equality comparator*. In practice, an AND gate with many inputs is not workable, so we would modify this circuit to better suit the chosen implementation fabric. Better yet, we would express the comparison in a Verilog model and let the synthesis tool choose the most appropriate circuit from its library of cells.

Comparing two unsigned binary integers for inequality (greater than or less than comparison) is somewhat more complicated. To test whether a number x is greater than another number y, we can start by comparing the most significant bits,  $x_{n-1}$  and  $y_{n-1}$ . If  $x_{n-1} > y_{n-1}$ , we know immediately that x > y. Similarly, if  $x_{n-1} < y_{n-1}$ , we know immediately that x < y. In both cases, the final result is completely determined by comparing just the most significant bits. If  $x_{n-1} = y_{n-1}$ , the result depends on the remaining bits, and is true if and only if  $x_{n-2} \dots 0 > y_{n-2} \dots 0$ . We can now apply the same argument recursively, examining the next pair of bits, and, if they are equal, continuing to less significant bits. Note that  $x_i > y_i$  is only true for  $x_i = 1$  and  $y_i = 0$ , that is, if  $x_i \cdot \overline{y_i}$  is true. These considerations lead to the circuit of Figure 3.12, called a *magnitude comparator*. We can use the same circuit to test for less than inequality simply by exchanging the operands at the inputs.

In Verilog, we can express comparison operations on unsigned values using the ==, > and < operators. (Note the distinction between the equality operator, ==, and the assignment operation, =.) We can also use != for "not-equal," <= for "less-than or equal," and >= for "greater-than or equal." All of these operators yield a single-bit 0 or 1



result, which can also be interpreted as a Boolean false or true result, respectively. This is convenient if the comparison occurs in the condition part of an if statement, since a Boolean result is expected in that context. It is also convenient if we want to assign the result to a net or variable, for example:

```
assign gt = x > y;
```

EXAMPLE 3.12 Develop a Verilog model for a thermostat that has two 8-bit unsigned binary inputs representing the target temperature and the actual temperature in degrees Fahrenheit (°F). Assume that both temperatures are above freezing (32°F). The detector has two outputs: one to turn a heater on when the actual temperature is more than 5°F below target, and one to turn a cooler on when the actual temperature is more than 5°F above target.

SOLUTION The module definition is

```
module thermostat ( output
                                heater_on, cooler_on,
                    input [7:0] target, actual );
 assign heater on = actual < target - 5;
 assign cooler_on = actual > target + 5;
endmodule
```

The assignments use the subtraction and addition operators to calculate the thresholds for turning the heater and cooler on. They use the < and > operators for performing the comparisons against the thresholds.

### Scaling by a Constant Power of 2

Before we turn to multiplying unsigned integers in a general way, let's look at the specific case of scaling an unsigned integer by a given constant value that is a power of 2. The simplest case is multiplying by 2. Recall that the value x represented by the n bits  $x_{n-1}, x_{n-2}, \ldots, x_0$  is

$$x = x_{n-1}2^{n-1} + x_{n-2}2^{n-2} + \dots + x_02^0$$
 (3.13)

If we multiply both sides by 2, we get

$$2x = x_{n-1}2^n + x_{n-2}2^{n-1} + \dots + x_02^1 + (0)2^0$$

which is an n+1 bit number consisting of the bits of x, shifted left by one position, and a 0 bit appended as the least significant bit. If we are working with fixed-length integers, we can truncate the most significant bit to yield an n-bit number, provided the truncated bit is 0. This operation is called a *logical shift left* by one position. We can take this form of scaling further. To scale by a factor of  $2^k$ , we repeat the scaling-by-2 process k times. That is, we shift the bits left by k positions and append k bits of 0 to the least significant end. If we need to truncate to an n-bit result, the k truncated bits must all be zero; otherwise an overflow has occurred.

Dividing by 2 works similarly. If we divide both sides of Equation 3.13 by 2 we get

$$x/2 = x_{n-1}2^{n-2} + x_{n-2}2^{n-3} + \dots + x_12^0 + x_02^{-1}$$

Since  $2^{-1}$  is the fraction ½, and we are dealing with integers only, we can discard the last term in this equation. The result is an n-1 bit number consisting of the bits of x, except for the least significant bit, shifted right by one position. If we are working with fixed-length integers, we can append a 0 to the most significant end to maintain the value. This operation is called a *logical shift right* by one position.

We can take this further also. To divide by  $2^k$ , we shift the bits right by k positions, discarding the k least significant bits and appending k bits of 0 at the most significant end. If any of the discarded bits were nonzero, the true result of the division is truncated toward 0.

Verilog provides two operators for shifting the bits of an unsigned value. The << operator performs a logical shift left, and the >> operator performs a logical shift right. For example, if the unsigned net or variable s has the value 00010011, representing the value  $19_{10}$ , the Verilog expression

CHAPTER THREE

would yield the value 01001100, representing the value  $76_{10}$ . The expression

would yield the value 00000100, representing the value  $4_{10}$ .

#### Multiplication of Unsigned Integers

The final arithmetic operation on unsigned integers that we shall examine is multiplication. A straightforward approach for multiplying x by y is to expand the product out as follows:

$$xy = x(y_{n-1}2^{n-1} + y_{n-2}2^{n-2} + \dots + y_02^0)$$
  
=  $y_{n-1}x2^{n-1} + y_{n-2}x2^{n-2} + \dots + y_0x2^0$ 

The largest value of the product is the product of the largest values of the operands. For *n*-bit operands, that is

$$(2^{n}-1)(2^{n}-1) = 2^{2n}-2^{n}-2^{n}+1 = 2^{2n}-(2^{n+1}-1)$$

which requires 2n bits to represent. If we provide this many bits for the product, there is no possibility of overflow.

Each of the terms in the expanded product equation is called a partial product, and consists of the product of a bit  $y_i$ , the number x and  $2^i$ . Recall that  $x2^i$  is just the bits of x shifted left by i positions. Also,  $y_i$  is either 0 or 1. If it is 0, the partial product is 0. If it is 1, the partial product is just the shifted version of x. Thus the partial product can be formed by AND-ing each bit of x with  $y_i$  and adding it, shifted i places to the left, into the final product. The addition of the partial products can be performed by a series of adders, as shown in Figure 3.13. This is a basic form of combinational multiplier, so called because it is a combinational circuit (albeit a large one). In Chapter 4, we will look at techniques that allow us to construct a sequential multiplier, in which we add partial products one at a time in successive clock cycles. A sequential multiplier trades off reduced area against time taken to yield the product.

In the multiplier circuit of Figure 3.13, we have not specified what kind of adder to use. We could use any of the adders we discussed earlier, with the choice depending on the performance requirements and area constraints that apply. We could also optimize the circuit by



FIGURE 3.13 A combinational multiplier constructed from adders for partial products.

combining parts of adjacent adders to reduce the overall propagation delay through the structure. However, techniques for doing so are beyond the scope of this book. They are discussed in detail in books cited for further reading in Section 3.6. For our purposes, we will rely on a synthesis CAD tool selecting an appropriate multiplier from the resources available to it.

As with other arithmetic operations on unsigned binary integers, we represent multiplication in Verilog models using an operator on unsigned

values. The result of the \* operator is an unsigned vector whose length is the larger of the operand lengths. If we need the multiplication to be performed with size that is the sum of the operand lengths, in order not to overflow, we must extend the operand values before multiplying them. For example, given the following declarations:

```
wire [ 7:0] x;
wire [13:0] y;
wire [21:0] p;
```

we could assign the product of x and y to p with the following statement:

```
assign p = \{14'b0, x\} * \{8'b0, y\};
```

Alternatively, we could rely on Verilog's implicit zero extension and just write:

```
assign p = x * y;
```

### **Summary of Arithmetic Operations**

In this section, we have examined several arithmetic operations that can be performed on unsigned binary integers, including addition, subtraction and multiplication. We have deliberately avoided division, since it is considerably more complex to implement than the other operations, and arises less frequently in real-world applications. Hence, there are relatively few application-specific digital systems that include circuits for performing division. Division circuits are described in the books cited in Section 3.6.

In our discussion, we focused on addition as a foundational operation and examined a number of adder circuits that trade off between performance and circuit area. This is a recurring theme in digital design, and is well illustrated through consideration of adder circuits. We return to it throughout this book.

For each operation, we also discussed how to represent the operation in Verilog models that use unsigned vectors. This approach allows us to abstract away from the details of the digital circuits that implement the arithmetic operations, relying on synthesis CAD tools to choose appropriate circuits from libraries of cells that can be implemented in

the target fabric. As we shall see when we describe our implementation methodology in more detail, we separate the concerns of specifying the circuit behavior in Verilog and constraining the implementation. We provide speed and area constraints for use by the synthesis tool to determine an appropriate implementation. This approach helps us manage the complexity of designing systems to perform numerical computation.

#### 3.1.3 GRAY CODES

The binary code that we have considered so far in this section is not the only code for unsigned integers, though it is the most natural code to use when we need to perform arithmetic operations. However, it has some disadvantages in other applications. Consider a scenario in which we are to design a system that uses a binary code to represent the angular position of a rotating shaft. A common way to measure the position is with a shaft encoder, illustrated in Figure 3.14. The disk attached to the shaft has a number of concentric bands, each of which has opaque parts and transparent parts. For each band, there is a light emitter and a detector. The detector output is 1 when the light shines through the transparent part of the band and 0 when the light is obscured by the opaque part of the band. The collection of four decoder outputs forms a binary code for the angular position of the shaft.

The pattern of transparency and opacity in the bands on the disk is shown in Figure 3.15, and corresponds to a 4-bit Gray code, in which adjacent code words differ by only one bit. A complete rotation is divided into 16 segments, and between any two adjacent segments, exactly one band changes between transparent and opaque. This prevents any minor error in positioning of the detectors from causing incorrect position codes. Suppose, in contrast, that we used the unsigned binary code of Section 3.1.1 for the angular position. This would give a code word of 0011 for segment 3 and 0100 for segment 4. A minor error in position of the detector for the second band might cause it to sense the change from 0 to 1 before the detectors for the right two bands sense the changes from 1 to 0. This would give a code word of 0111, representing segment 7, for the angular position close to the boundary between segments 3 and 4. It is difficult to manufacture mechanical components with sufficient precision to avoid this kind of error. The Gray code, on the other hand, is much more tolerant of positioning error, and so is widely used in electromechanical components that measure position.

The 4-bit Gray code we have used in this example scenario is listed, along with the corresponding decimal and unsigned binary codes, in Table 3.3. Note how adjacent Gray code words differ in only one bit



FIGURE 3.14 An optical shaft encoder.



FIGURE 3.15 Gray code pattern on a shaft-encoder disk.

| DECIMAL | UNSIGNED<br>BINARY | GRAY CODE |
|---------|--------------------|-----------|
| 0       | 0000               | 0000      |
| 1       | 0001               | 0001      |
| 2       | 0010               | 0011      |
| 3       | 0011               | 0010      |
| 4       | 0100               | 0110      |
| 5       | 0101               | 0111      |
| 6       | 0110               | 0101      |
| 7       | 0111               | 0100      |
| 8       | 1000               | 1100      |
| 9       | 1001               | 1101      |
| 10      | 1010               | 1111      |
| 11      | 1011               | 1110      |
| 12      | 1100               | 1010      |
| 13      | 1101               | 1011      |
| 14      | 1110               | 1001      |
| 15      | 1111               | 1000      |

TABLE 3.3 4-bit Gray code, compared to unsigned binary code.

position, unlike the corresponding unsigned binary code words. This is not the only 4-bit Gray code; there are others that also have the property of single-bit difference between adjacent code words. The code we have used here is generated by the following rules, which allow us to generate an *n*-bit Gray code:

- A 1-bit Gray code has the two code words 0 and 1.
- The first  $2^{n-1}$  code words of an *n*-bit Gray code consist of the code words of an (n-1)-bit Gray code, in order, each with a 0 bit appended as the left-most bit.
- The last  $2^{n-1}$  code words of an *n*-bit Gray code consist of the code words of an (n-1)-bit Gray code, in reverse order, each with a 1 bit appended as the left-most bit.

EXAMPLE 3.13 Develop a Verilog model of a code converter to convert the 4-bit Gray code to a 4-bit unsigned binary integer.

SOLUTION For the both the Gray-code input to the converter and the binary-code output, we use vector ports. The module definition is

```
module gray_converter ( output reg [3:0] numeric_value,
                        input
                                   [3:0] gray value );
 always @*
   case (gray_value)
     4'b0000: numeric_value = 4'b0000;
     4'b0001: numeric_value = 4'b0001;
     4'b0011: numeric value = 4'b0010;
     4'b0010: numeric_value = 4'b0011;
     4'b0110: numeric_value = 4'b0100;
     4'b0111: numeric value = 4'b0101;
     4'b0101: numeric_value = 4'b0110;
     4'b0100: numeric_value = 4'b0111;
     4'b1100: numeric_value = 4'b1000;
     4'b1101: numeric_value = 4'b1001;
     4'b1111: numeric_value = 4'b1010;
     4'b1110: numeric value = 4'b1011;
     4'b1010: numeric_value = 4'b1100;
     4'b1011: numeric_value = 4'b1101;
     4'b1001: numeric_value = 4'b1101;
     4'b1000: numeric value = 4'b1111;
   endcase
endmodule
```

The module's behavior takes the form of a truth table. It uses the Gray-code value to select which unsigned numeric value to assign to the output.

## KNOWLEDGE TEST QUIZ

- 1. How is a number x represented in binary as a sum of powers of 2?
- 2. What range of values can be represented as an *n*-bit unsigned binary number?
- 3. Write a Verilog declaration for a net *x* to represent unsigned numbers in the range 0 to 8191.
- 4. Write the binary number 01011101 in octal and in hexadecimal.
- 5. Resize the unsigned binary number 10010011 to 12 bits and to 6 bits. In each case, does the result correctly represent the same value as the original number?
- 6. Add the two 8-bit unsigned binary numbers 01001010 and 01100000 to get an 8-bit result. Does the addition overflow?
- 7. What distinguishes a ripple-carry adder from a carry-lookahead adder?

CHAPTER THREE

- 8. Write Verilog assignments to add two nets \$1 and \$2 of type wire [15:0] to get a result net \$3 of the same type as \$1 and \$2 and a carry-out net c\_out.
- 9. Perform the 8-bit unsigned binary subtraction 01001010 01100000 to get an 8-bit result. Does the subtraction underflow?
- 10. Given a control signal add/sub, how can we adapt an unsigned adder to perform both addition and subtraction?
- 11. Write a Verilog assignment that compares two unsigned nets a and b and assigns 1 to a net smaller if a < b, or 0 otherwise.
- 12. How is an unsigned binary number multiplied by 16? How is it divided by 16?
- 13. How many bits are required for the product of two *n*-bit unsigned binary numbers?
- 14. Why are Gray codes often used in electromechanical position sensors?

#### 3.2 SIGNED INTEGERS

While many applications deal only with nonnegative integers, there are others that deal with integers that range over both positive and negative values. In this section we will explore a binary code for signed integers and see how to implement operations on these encoded values.

#### 3.2.1 CODING SIGNED INTEGERS

The predominant encoding used in digital systems for signed integers is called 2s complement. It is a special case of radix complement representation in which the radix (the base used for positional representation) is 2. We will refer to the Further Reference books for details of general radix complement representations, and focus our attention here just on 2s complement.

A signed number is represented in 2s-complement form as a weighted sum of powers of two, in a similar way to unsigned binary representation. The difference is that, for an *n*-bit signed number, the weight of the leftmost bit is negative. An *n*-bit number *x* represents the value

$$x = -x_{n-1}2^{n-1} + x_{n-2}2^{n-2} + \dots + x_02^0$$
(3.14)

This representation has a number of interesting and useful properties that we will now explore. First, the most negative number that can be represented has  $x_{n-1}=1$  and all other bits 0, giving the value  $-2^{n-1}$ . The most positive number has  $x_{n-1}=0$  and all other bits 1, giving the value  $2^{n-1}-1$ . If  $x_{n-1}$  is 1, the number represented is negative, since the sum of all the positively weighted powers of 2 is less than  $2^{n-1}$ . Thus,  $x_{n-1}$  serves as a sign bit: if it is 1, the number is negative, and if it is 0, the

number is zero or positive. The range of numbers that can be represented is not symmetric about zero, since the negation of  $-2^{n-1}$  is one more than the most positive number that can be represented.

EXAMPLE 3.14 What values are represented by the 8-bit 2s-complement numbers 00110101 and 10110101?

SOLUTION The first number is

$$1 \times 2^5 + 1 \times 2^4 + 1 \times 2^2 + 1 \times 2^0 = 32 + 16 + 4 + 1 = 53$$

The second number is

$$-1 \times 2^7 + 1 \times 2^5 + 1 \times 2^4 + 1 \times 2^2 + 1 \times 2^0 = -128 + 32 + 16 + 4 + 1 = -75$$

While 2s-complement representation for signed integers predominates, there are other forms that are useful in some applications. One form, signed magnitude, is analogous to our conventional decimal representation for signed integers, in which we write a sequence of decimal digits for the magnitude of a number, preceded by a + or - sign to indicate whether the number is positive or negative. In signed magnitude binary representation, we represent a signed number with a sequence of binary digits (bits), preceded by a binary code for the sign of the number. Usually, we would encode a - sign with 1 and a + sign with 0. While some early digital computers used signed magnitude representation, there are a number of disadvantages that make it uncommon in modern digital systems. For this reason, we will not describe in any further detail, and instead refer to the books listed in Section 3.6, Further Reading, for more information.

## Representing Signed Integers in Verilog

We saw in Section 3.1.1 that we can use vectors and built-in arithmetic operators to deal with unsigned integers. For signed integers, we also use vectors, but we include the keyword signed in their declarations, for example:

```
wire signed [ 7:0] a;
reg signed [13:0] b;
```

The arithmetic operators then assume 2s-complement representation, with the sign bit being the left-most bit in a vector and the least significant bit being the right-most bit.

An important point to note is that, even though we might declare nets or variables to be unsigned or signed, the interpretation of the bits of a value depends on the operator being applied and the declaration of the other operand. If both operands to an arithmetic operation are signed, a signed operation is performed. If either or both operations are unsigned, an unsigned operation is performed. If we really want to interpret values that are declared unsigned as representing signed values, we can use the \$signed conversion operation, for example:

```
wire     [11:0] s1;
wire signed [11:0] s2;
...
assign s2 = $signed(s1); // s1 is known to be less than 2**11
```

Similarly, if we want to interpret values declared signed as representing unsigned values, we use the **\$unsigned** conversion operation, for example:

```
assign s1= $unsigned(s2); // s2 is known to be nonnegative
```

We also mentioned the abstract numeric type integer in Section 3.1.1, showing how it can be used for nonnegative numbers. In fact, the integer type represents numbers that can be positive or negative, provided their 2s-complement representation can fit within 32 bits. We can perform arithmetic operations on values of type integer, and we can mix integer with unsigned and signed net and variable values. The type integer is really just a signed variable type whose size is fixed at 32 bits.

### Octal and Hexadecimal Codes for Signed Integers

We saw in Section 3.1.1 that we could use octal or hexadecimal codes for unsigned integers. We can also use octal and hexadecimal for 2s-complement signed integers. However, when we do so, we don't usually think in terms of signed octal or signed hexadecimal numbers. Instead, we just use octal or hexadecimal as a shorthand notation for the vector of bits. We divide the vector into groups of three bits (for octal) or four bits (for hexadecimal) and substitute the corresponding octal or hexadecimal digit for each group.

EXAMPLE 3.15 The 12-bit 2s-complement representation of  $844_{10}$  is 001101001100. Express the bit vector in hexadecimal.

SOLUTION Dividing into groups of four bits, we get 0011 0100 1100. Substituting hexadecimal digits for the 4-bit groups gives  $34C_{16}$ .

EXAMPLE 3.16 The 10-bit 2s-complement representation of -42 is 1111010110. Express the bit vector in octal.

SOLUTION Dividing into groups of three bits, we get 1 111 010 110. Substituting octal digits for the 3-bit groups gives  $1726_8$ . When reading this octal number, we need to understand that it represents 10 bits. The right-most three digits represent 9 bits, and the left-most digit represents just one bit, the sign bit. Since the sign bit is 1, the number is negative, even though the octal number does not include a - sign.

#### 3.2.2 OPERATIONS ON SIGNED INTEGERS

As with unsigned numbers and binary codes in general, we can perform operations on signed integers that don't rely on their numeric interpretation, such as selecting among several encoded numbers using multiplexers. In this section, we will describe operations that relate to the numeric interpretation, such as arithmetic operations. Most of these operations are implemented in a similar way to their counterparts for unsigned integers.

### **Resizing Signed Integers**

The resizing operation on unsigned integers simply involved appending or truncating leading zeros to reach the desired length of representation while maintaining the same numeric value. With 2s-complement numbers, however, the left-most bit is the sign bit, so appending or truncating leading zeros will not work in general. Let's consider the two cases of nonnegative and negative numbers, respectively.

For nonnegative numbers, the sign bit is 0, and the remaining bits constitute the magnitude of the number. In this case, the 2s-complement representation is the same as the unsigned representation, and zero extending it maintains the same value. We can also truncate leading zeros, as we did for unsigned numbers, provided both that none of the truncated bits is 1 and that the left-most bit of the result is 0. Were the left-most bit of the result 1, that would imply a negative result, which would be incorrect. For example, the 8-bit 2s-complement representation of  $41_{10}$  is 00101001. Truncating this to 6 bits would give 101001, which, interpreted as a 2s-complement number, is -23. The problem is that  $41_{10}$  cannot be represented in 6-bit 2s-complement.

For negative numbers, the sign bit is 1. We can extend an n-bit negative number to m bits by appending leading 1 bits. To see that this conserves the negative numeric value, consider the value represented by a negative number x:

$$x = -2^{n-1} + x_{n-2} 2^{n-2} + \dots + x_0 2^0$$
 (3.15)

CHAPTER THREE

Extending this with leading 1 bits gives the 2s-complement number

$$-2^{m-1} + 2^{m-2} + \dots + 2^{n-1} + x_{n-2} + 2^{n-2} + \dots + x_0 + 2^0$$
 (3.16)

We can make use of the following identity:

$$2^{k} = 2^{k-1} + 2^{k-2} + \dots + 2^{0} + 1 \tag{3.17}$$

Expanding the first term in Equation 3.16 using this identity gives

$$\begin{aligned} &-2^{m-2}-\dots-2^{n-1}-2^{n-2}-\dots-2^0-1\\ &+2^{m-2}+\dots+2^{n-1}+x_{n-2}\,2^{n-2}+\dots+x_02^0\\ &=-2^{n-2}-\dots-2^0-1+x_{n-2}\,2^{n-2}+\dots+x_02^0\\ &=-(2^{n-2}+\dots+2^0+1)+x_{n-2}\,2^{n-2}+\dots+x_02^0\\ &=-2^{n-1}+x_{n-2}\,2^{n-2}+\dots+x_02^0=x\end{aligned}$$

We can argue similarly to show that, for a negative number, we can truncate to a smaller length by truncating leading 1 bits, provided the leftmost bit of the result is 1.

In summary, for a 2s-complement signed integer, extending to a greater length involves replicating the sign bit to the left. This is called *sign extension*, and preserves the numeric value, be it positive or negative. A circuit to implement sign extension of an n-bit signal x to an m-bit signal y is shown in Figure 3.16. We can truncate by discarding the left-most bits, provided all of the discarded bits and the resulting sign bit are the same as the original sign bit. The circuit implementation for truncation from m bits to n bits is the same as for truncation of an unsigned value, shown in Figure 3.2, and just involves leaving the left-most m-n bits unconnected. The problem that might arise is that the value represented in m bits might be larger in magnitude than can be represented in n bits. Usually, this situation does not arise, since we only reduce the number of bits when we know that the value must be within the range



FIGURE 3.16 An implementation of sign extension in a circuit.

representable by the smaller number of bits. We might arrive at that conclusion by analyzing the arithmetic operations performed to derive the larger-sized value.

We can express sign extension of a signed value in Verilog using the bit-replication notation to replicate the sign bit. For example given nets declared as

```
wire signed [ 7:0] x;
wire signed [15:0] y;
```

we can write the following assignment to sign extend the value of x and assign it to y:

```
assign y = \{\{8\{x[7]\}\}, x\};
```

The notation  $\{n\{...\}\}$  specifies n replications of the bits inside the inner braces.

Sign extension or truncation of a signed value in a Verilog model also occurs implicitly when we assign the value to a target that is of a different length. For example, we can rewrite the above assignment statement as

```
assign y = x; // x is sign-extended to 16 bits
```

Similarly, we can write the following assignment to truncate the value of y and assign it to x:

```
assign x = y; // y is truncated to 8 bits
```

## **Negating Signed Integers**

Since we can represent both positive and negative numbers using 2s-complement encoding, it makes sense to consider negating a number. The steps needed to perform negation of a number x are first to complement each bit of x (that is, change each 0 to 1 and each 1 to 0), and then to add 1. We can prove that this yields the 2s-complement representation of -x. We need to use the bit identity  $\overline{x_i} = 1 - x_i$  together with the identity in Equation 3.17. The proof is

CHAPTER THREE

$$\overline{x} + 1 = -(1 - x_{n-1})2^{n-1} + (1 - x_{n-2})2^{n-2} + \dots + (1 - x_0)2^0 + 1$$

$$= -2^{n-1} + x_{n-1} 2^{n-1} + 2^{n-2} - x_{n-2} 2^{n-2} + \dots + 2^0 - x_0 2^0 + 1$$

$$= -(-x_{n-1} 2^{n-1} + x_{n-2} 2^{n-2} + \dots + x_0 2^0)$$

$$-2^{n-1} + 2^{n-2} + \dots + 2^0 + 1$$

$$= -x - 2^{n-1} + 2^{n-1} = -x$$

EXAMPLE 3.17 Determine the 8-bit 2s-complement representation of -43.

SOLUTION The 8-bit 2s-complement representation of 43 is 00101011. Complementing this gives 11010100. Adding 1 gives 11010101, which is the required result.

Recall that the range of numbers representable in 2s-complement form is not symmetric about zero. Consider what happens if we try to complement and add 1 to the representation of  $-2^{n-1}$ , which is 100 ... 0. Complementing gives 011 ... 1. Adding 1 to this gives 100 ... 0, which is the negative number we started with. So if we are to negate a 2s-complement number, we need either to sign extend it by one bit to allow for this case, or be sure that the value  $-2^{n-1}$  cannot occur as input.

In Verilog models, we express negation of a signed value with the prefix – operator. For example, to assign the negation of a net x to a net y, we would write:

assign 
$$y = -x$$
;

## Addition of Signed Integers

We can add two 2s-complement numbers x and y using much the same procedure that we used for unsigned binary numbers. The main difference lies in the way we deal with the sign bit, which has a negative weight of  $-2^{n-1}$ . In order to understand how 2s-complement addition works, we can think of each number as the sum of the weighted sign part, which is either 0 or  $-2^{n-1}$ , and a positive offset, which is less than  $2^{n-1}$ . That is,

$$x = -x_{n-1} 2^{n-1} + x_{n-2 \dots 0}$$
  $y = -y_{n-1} 2^{n-1} + y_{n-2 \dots 0}$ 

and

$$x + y = -(x_{n-1} + y_{n-1})2^{n-1} + x_{n-2...0} + y_{n-2...0}$$

FIGURE 3.17 Examples of signed addition. In each case, the addition overflows if the left-most two carry bits differ.

We will do a case analysis of combinations of sign-bit values for the two *n*-bit operands.

First, consider the case of adding two nonnegative numbers. The sign bits are both 0, and can be added to give a result sign bit of 0 with no carry. The bits of the offsets are all positively weighted and can be added using the procedure for unsigned numbers, provided the carry out from position n-2 is 0, as in the first example in Figure 3.17. On the other hand, if the carry out from position n-2 is 1, as in the second example in Figure 3.17, the positive magnitude of the result would be larger than can be represented in n-bit 2s-complement form; that is, it would overflow.

Next, consider the case of adding two negative numbers, with both sign bits being 1. Adding the sign bits gives 0 with a carry out of 1 from the sign position. This corresponds to adding the weighted sign parts to give  $-2^n$ . So we need the sum of the positive offsets to yield a carry out of 1, with weight  $2^{n-1}$ , to add to this to give  $-2^{n-1}$ . We can just add the carry out from the offsets to the sum of the sign bits to give a final sign bit of 1, as in the third example in Figure 3.17. On the other hand, if the sum of the positive offsets yields a carry out of 0, as in the fourth example in Figure 3.17, the result is more negative than can be represented in n-bit 2s-complement form; that is, it would overflow in the negative direction.

Finally, consider the case of adding one positive number (sign bit is 0) and one negative number (sign bit is 1). No overflow can occur in this case. Adding the two sign bits gives 1 with a carry out of 0. This corresponds to adding the weighted sign parts to give  $-2^{n-1}$ . If the sum of the positive offsets is less than  $2^{n-1}$ , the carry out from position n-2 is 0, as in the fifth example in Figure 3.17, and the final result is negative. If the sum of the positive offsets is greater than or equal to  $2^{n-1}$ , the carry out from position n-2 is 1, and the final result is nonnegative, as in the sixth example in Figure 3.17. We can add the carry out from position n-2 into the sign position to give a final sign bit of 0 and a carry out of 1 from the sign position.

So in all cases, we can perform 2s-complement addition using exactly the same process as unsigned addition, including adding the carry out from position n-2 into the sign position. Overflow is indicated when the carry into the sign position is different from the carry out of that position. We have circled these two bits to highlight them in each of the examples in Figure 3.17. It follows that we can use exactly the same circuit to add unsigned numbers or 2s-complement numbers. We use the carry out from the most significant position to indicate overflow for unsigned addition, and the exclusive OR of the carry in and carry out of the most significant position to indicate overflow for signed addition.

In Verilog, we express addition of signed values using the + operator, just as we did for unsigned values. For signed values, if we want to allow for a result that would overflow if represented using the same number of bits as the operands, we can resize the operand values. For example, given the declarations

```
wire signed [11:0] v1, v2;
wire signed [12:0] sum ;
```

we can add the two 12-bit values and get a 13-bit result using the assignment

```
assign sum = {v1[11], v1} + {v2[11], v2};
```

Alternatively, we can rely on Verilog's implicit sign extension, given that the assignment target is 13 bits, and just write:

```
assign sum = v1 + v2;
```

Developing a Verilog model that represents the sum using the same number of bits as the operands and that derives the overflow condition is somewhat more involved. Referring back to our case analysis of the signs of the operands, we see that overflow only occurs if both operands are nonnegative and the carry in to the sign position is 1 (yielding an apparently negative result), or if both operands are negative and the carry in to the sign position is 0 (yielding an apparently nonnegative result). Given this observation and the declarations

```
wire signed [7:0] x, y, z;
wire ovf;
```

we can write the following assignments to derive the required sum and overflow condition bit:

```
assign z = x + y;
assign ovf = ~x[7] & ~y[7] & z[7] | x[7] & y[7] & ~z[7];
```

### **Subtraction of Signed Integers**

Now that we have seen how to perform addition and negation on 2s-complement numbers, subtraction follows from the identity

$$x - y = x + (-y) = x + \overline{y} + 1$$

FIGURE 3.18 An adder/ subtracter for both unsigned and 2s-complement numbers.

This suggests that we can use the same adder/subtracter, shown in Figure 3.9, that we described for unsigned numbers. The revised form that deals with both kinds of numbers, unsigned and 2s-complement, is shown in Figure 3.18. For signed numbers, when the add/sub control input is 0, the *y* operand is passed through the XOR gates unchanged and the carry in to the adder is 0. When the add/sub input is 1, the *y* operand is complemented by the XOR gates, and the carry in is 1. Thus the circuit subtracts by adding to *x* the complement of *y* and 1. Depending on whether the operands are interpreted as unsigned or signed operands, we use one or the other of the overflow condition outputs.

In Verilog, we express subtraction of signed values using the — operator. For signed values, if we want to allow for a result that would overflow if represented as the same number of bits as the operands, we can resize the operand values, as we described for signed addition. Thus, given the declarations

```
wire signed [11:0] v1, v2;
wire signed [12:0] diff;
```

we can calculate the 13-bit difference between the two 12-bit values using the assignment

```
assign diff = \{v1[11], v1\} - \{v2[11], v2\};
```

or in simplified form, relying on Verilog's implicit sign extension,

```
assign diff = v1 - v2;
```

Again, a Verilog model that represents the difference using the same number of bits as the operands and that derives the overflow condition is somewhat more involved. Since x - y is the same as x + (-y), and the sign of -y is the complement of the sign of y (except when y is zero), we can work out the overflow condition by examining sign bits in a way similar to that for addition. We just need to use the logical negation of the sign bit of y in the overflow expression. Thus, for the declarations

```
wire signed [7:0] x, y, z; wire ovf;
```

we can write the following assignments to derive the required difference and overflow condition bit:

```
assign z = x - y;
assign ovf = \sim x[7] \& y[7] \& z[7] | x[7] \& \sim y[7] \& \sim z[7];
```

The case of y being zero is handled correctly by this expression, since in that case, the result z is the same as x, and so the sign of z is the same as the sign of x.

A further case to consider is subtraction of two unsigned numbers to give a signed result, rather than underflowing when the difference is negative. In order to determine the size to use for the result, we can consider the range of possible result values. Suppose we are subtracting n-bit unsigned values. The greatest result arises from subtraction of zero from the greatest unsigned value, giving  $2^n - 1$ . The least (most negative) result arises from subtraction of  $2^n - 1$  from zero, giving  $-2^n + 1$ . This range is encompassed by a result with n+1 bits. So the simplest way to express the subtraction is to zero extend the operands by one bit, treat them as signed, and then apply the signed subtraction operation. In Verilog, given 8-bit operands and a 9-bit result declared as

```
wire [7:0] v1, v2;
wire signed [8:0] diff;
```

we could write the subtraction as

```
assign diff = signed(\{1'b0, v1\}) - signed(\{1'b0, v2\});
```

## Other Arithmetic Operations on Signed Integers

As part of our examination of unsigned integers, we saw that we could use simplified forms of adder and subtracter to implement the increment and decrement operations. The same argument applies to incrementing and decrementing 2s-complement signed integers. However, we won't go into the details here. As with unsigned integers, we can use the + operator in Verilog models to add 1 to a signed value to increment, and use the – operator to subtract 1 to decrement the value.

Comparison of signed integers is also done similarly to comparison of unsigned integers. The main difference arises from the negative weight for the sign bit. Hence, instead of using  $x_{n-1} \cdot \overline{y_{n-1}}$  to compare the most significant bits in the comparator for x > y, we substitute  $\overline{x_{n-1}} \cdot y_{n-1}$  to compare the sign bits. This follows, since a nonnegative number, with a sign bit of 0 is greater than a negative number with a sign bit of 1. We make the corresponding adjustment in a comparator for x < y. The Verilog comparison operators, <, >, <=, and >=, all work on signed values in an analogous way to unsigned integers.

Scaling a signed integer by a constant power of 2 is slightly different for signed integers than for unsigned integers. Multiplying by  $2^k$  involves shifting to the left by k positions and appending k bits of 0 to the least significant end. This is the same logical shift left operation that we say for unsigned numbers. However, if we need to represent the result in the same number of bits as the original unscaled number, we must truncate using the resizing rules for 2s-complement described earlier. Thus, the truncated bits must all be the same as the original sign bit, and the sign of the result must also have that same sign. Dividing by  $2^k$  involves shifting the bits right by k positions, discarding the k least significant bits and appending k copies of the original sign bit at the most significant end. This operation is called an *arithmetic shift right*. It differs from a logical shift right in the replication of the sign bit instead of filling with 0 bits. Proof that these operations correctly implement scaling is left to Exercise 3.54.

In Verilog, we can apply the <<< and >>> operators to signed operands. The <<< operator, like the << operator, performs a logical shift left, but the >>> operator performs an arithmetic shift right. For example, if the signed net or variable s has the value 11110011, representing the value  $-13_{10}$ , the Verilog expression

s <<< 2

would yield the value 11001100, representing the value  $-52_{10}$ . The expression

s >>> 2

would yield the value 111111100, representing the value  $-4_{10}$ .

The final operation that we discussed in the context of unsigned integers was multiplication. Extending the multiplier design that we described there to deal with 2s-complement signed numbers gets quite complicated, since we need to deal with sign extension within partial products. In real designs, signed multipliers are based on transformations of this basic approach to reduce the amount of circuitry required and to improve performance. We will not go into detail here, but refer to the books listed in Section 3.6, Further Reading. In any case, using our design methodology, we can simply express multiplication in Verilog using the \* operator on signed values and let synthesis CAD tools choose an appropriate multiplier circuit to use.

- What is the difference in representation between unsigned binary and 2s-complement signed binary?
- What is the range of values that can be represented using 12-bit 2s-complement signed binary form?
- Write a Verilog declaration for a net that represents a number in the range -512 to 511 in 2s-complement signed form.
- Resize the 2s-complement numbers 01110001 and 11110011 to 12 bits and 6 bits. In each case, does the result correctly represent the same value as the original?
- Negate the 2s-complement signed number 11110010.
- How is a signed adder used to perform signed subtraction?
- 7. How is a 2s-complement signed number multiplied by 16? How is it divided by 16?

#### 3.3 FIXED-POINT NUMBERS

While many applications deal with integer data, there is a growing list of applications that also deal with fractional numeric data. Many such applications involve digital signal processing, in which time-varying analog signals are sampled, converted to a digital representation and subject to numerical operations. For example, most modern audio devices deal with sampled audio signals and perform operations such as filtering, amplification and equalization. The audio samples are approximations to real numbers within a given range. The circuits representing and operating upon the samples need to deal with fractional values, that is, values that lie between integers. In this section, we will introduce the notion of fixedpoint representation of nonintegral values.

#### 3.3.1 CODING FIXED-POINT NUMBERS

Suppose we need to represent numeric values that lie in the range -12.0to +12.0. Since there are an infinite number of real numbers in that range, KNOWLEDGE TEST QUIZ

we cannot represent all of them. Instead, we determine a precision, based on the requirements of our application, and approximate values with a multiple of that precision. For example, if our chosen precision is 0.01, we would round each value to the nearest multiple of 0.01. Thus an original value of 10.23683 would be approximated with a value of 10.24.

When we write decimal numbers in this way, we are extending the positional notation that we described for integers in Section 3.1. We use the decimal point to mark the boundary between digits whose weight is a nonnegative power of 10 and digits whose weight is a negative power of ten. For example, the number  $10.24_{10}$  is

$$10.24_{10} = 1 \times 10^{1} + 0 \times 10^{0} + 2 \times 10^{-1} + 4 \times 10^{-2}$$

We can extend this idea to binary, in which the digits are weighted with powers of 2 and each binary digit (each bit) is 0 or 1. Thus, the binary number 101.01<sub>2</sub> is

$$101.01_2 = 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 + 0 \times 2^{-1} + 1 \times 2^{-2}$$

Since we are dealing with nonintegral numbers, we use negative powers of 2 for the fractional part. We refer to the period dividing the binary number into its integral and fractional parts as the *binary point*.

When we come to implement nonintegral numbers in digital systems, the question arises of how to represent the binary point. The *fixed-point* representation relies on the position of the binary point being implicit. We just represent the bits, as we did for integral values, as a vector with one element per bit position. Thus, the number 101.01<sub>2</sub> could be represented by the bit vector 10101, with the assumption that the binary point lies two places from the right.

EXAMPLE 3.18 What number is represented by the fixed-point binary number 01100010, assuming the binary point is four places from the right?

SOLUTION The number is

$$\begin{aligned} 0110.0010_2 \\ &= 0 \times 2^3 + 1 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 + 0 \times 2^{-1} + 0 \times 2^{-2} + 1 \times 2^{-3} \\ &+ 0 \times 2^{-4} \\ &= 0 + 4 + 2 + 0 + 0 + 0 + \frac{1}{8} + 0 = 6.125_{10} \end{aligned}$$

In general, we write an *n*-bit unsigned fixed-point number with *m* bits before the assumed binary point and *f* bits after the assumed binary point, where n = m + f. The number *x* represented by the bits  $x_{m-1}, \ldots, x_0, x_{-1}, \ldots, x_{-f}$  is

$$x = -x_{m-1} 2^{m-1} + \dots + x_0 2^0 + x_{-1} 2^{-1} + \dots + x_{-f} 2^{-f}$$

CHAPTER THREE

The smallest number representable using such a code is 0, with a code word of all 0 bits. The largest number representable has a code word of all 1 bits, and represents  $2^m - 2^{-f}$ . In between those bounds, numbers are represented as multiples of the precision,  $2^{-f}$ .

Note that a code with no digits before the assumed binary point is permissible, and indeed, practical. This would correspond to a code with m = 0. In such a code, all of the bits represent the fractional part of the number, so the range is between 0 and  $1 - 2^{-f}$ . We can even go so far as to have the assumed binary point several positions to the left of the left-most bit, that is, for m to be negative. For example, a code with m = -3 and f = 13 would be a 10-bit code with values ranging from 0 to  $2^{-3} - 2^{-13}$  in steps of  $2^{-13}$ , or in decimal, from 0 to  $2^{-13} - 2^{-13}$  in steps of  $2^{-13} - 2^{-13}$ ... in steps of  $2^{-13} - 2^{-13} - 2^{-13}$ ...

Similarly, we can have a fixed-point code with no digits to the right of the binary point, that is, with f = 0. Numbers represented in such a code are, in fact, unsigned integers. If we substitute f = 0 in the expressions for the upper bound and precision, we get an upper bound of  $2^m - 1$  and a precision of 1, as we would expect for integers. Thus, integers are just a special case of fixed-point representation.

We can also use fixed-point representation for signed fractional numbers. We use the same approach as we did for integers, changing the weight of the most significant digit to be negative. This gives us a 2s-complement fixed-point signed representation. In this case, the number x represented with m bits before and f bits after the assumed binary point is

$$x = x_{m-1} 2^{m-1} + ... + x_0 2^0 + x_{-1} 2^{-1} + ... + x_{-f} 2^{-f}$$

The range of numbers represented using this form is from  $-2^{m-1}$  to  $2^{m-1}-2^{-f}$ , with a precision of  $2^{-f}$ . Again, we can have a code with m being zero or negative. Since the left-most bit in a signed fixed-point representation is the sign bit, a code that represents values between -1 and just less than 1 has m=1, with the single bit before the binary point being the sign bit.

EXAMPLE 3.19 What number is represented by the signed fixed-point binary number 111101, assuming the binary point is four places from the right?

SOLUTION The number is

$$\begin{aligned} &11.1101_2 \\ &= -1 \times 2^1 + 1 \times 2^0 + 1 \times 2^{-1} + 1 \times 2^{-2} + 0 \times 2^{-3} + 1 \times 2^{-4} \\ &= -2 + 1 + \frac{1}{2} + \frac{1}{4} + 0 + \frac{1}{16} = -0.1875_{10} \end{aligned}$$

Having described how we can represent fixed-point numbers with a given range and precision, the question arises of determining what

range and precision to use in a given application. The answer is not simple, and depends on the application. In digital signal processing applications, where fixed-point numbers are used to represent samples of analog signals, the range of the representation affects the dynamic range (the ratio of maximum to minimum amplitude) of signals that can be processed, and the precision affects the signal-to-noise ratio (a measure of quality or fidelity) of the system. If the system is to perform arithmetic operations on the fixed-point values to implement some processing algorithm, the precision affects the numerical behavior of the algorithm. The finite precision of the representation means that analog signal values are only represented approximately, thus, there is an inherent error in the representation. Some numerical processing steps can magnify the effect of the error. Also, processing steps might yield intermediate values whose range differs from that of the samples, requiring a greater range, and thus more bits, for their representation. Mathematical analysis of the behavior and sensitivity of numerical computations is beyond the scope of this book. Nonetheless, it is a vital early design step in applications that implement numerical processing procedures. More information is provided in the reference books cited in Section 3.6, Further Reading.

## Fixed-Point Representation in Verilog

We can represent fixed-point numbers in Verilog using vectors. When we use vectors for integers, we have consistently declared them with index values corresponding to the binary weights. We can follow the same convention when declaring vectors representing fixed-point numbers. We specify the left and right index bounds, indicating the power of two for the weights of the most-significant and least-significant bits, respectively. We assume that the binary point is between indices 0 and -1, whether those indices actually occur in a given vector or not.

EXAMPLE 3.20 Write Verilog module declarations for a code converter that has an input representing an unsigned number in the range 0 to 48 with a precision of at least 0.01, and an output representing a signed number in the range -100 to 100 with a precision of at least 0.01.

SOLUTION For the input, we need 6 bits before the binary point, since  $\lceil \log_2 48 \rceil = 6$ . We need a precision that is smaller than 0.01. Since  $\log_2 0.01 \approx -6.64$ , we need 7 bits after the binary point. For the output,  $\lceil \log_2 100 \rceil = 7$ , so we need 7 bits, plus one for the sign bit, giving 8 bits before the binary point. We just need to extend the 6 pre-binary-point input bits with two zero bits to get the 8 pre-binary-point output bits. Since we need the same output precision as the input, we use the same number of bits after the binary point, namely, 7. The module definition is

CHAPTER THREE

In our discussion of integers, we mentioned that Verilog provides the type integer for abstract representation of numbers. Unfortunately, Verilog does not provide a corresponding type for abstract representation of fixed-point numbers. Abstract fixed-point types could, in principle, be included in the language, as has been done in the Ada programming language, for example. While we might hope that abstract fixed-point types might be included in a future version of Verilog as applications become more common, for now, we will just make use of the vector types.

For testbenches in Verilog, however, we can make use of a built-in type real. We can declare a variable (but not a net) to be of this type as follows:

```
real x;
```

Real variables are actually represented using floating-point format, described in Section 3.4. However, we can use them for nonintegral values to be applied to the inputs or checked at the outputs of models using fixed-point representation. Some examples are

```
real     r1, r2;
wire [5:-16] x, y;
wire [8:-14] z;

r1 <= $itor(x)/2**16;
r2 <= r1 / ($itor(y)/2**16);
z <= $rtoi(r2 * 2**14);</pre>
```

The conversion function \$itor used here converts from a vector value, interpreted as an integer, to a real-number value. The scaling is required, since our actual interpretation of the vector is a fixed-point value. The conversion function \$rtoi works in the reverse direction, from a real-number value to a vector interpreted as an integer. Again, scaling is required to take account of our actual interpretation of the vector as a fixed-point value.

#### 3.3.2 OPERATIONS ON FIXED-POINT NUMBERS

We now turn to implementation of arithmetic operations on fixed-point numbers. We have already covered most of what we need in our discussion of arithmetic operations on integers, since fixed-point numbers can be viewed as scaled integers. For example, if x and y are fixed-point numbers with the binary point f positions from the right, then  $x \times 2^f$  and  $y \times 2^f$  are integers represented by the same bit vectors as x and y, respectively. Furthermore,

$$x + y = (x \times 2^f + y \times 2^f)/2^f$$

We know how to add the two integers, and dividing by  $2^f$  simply consists of moving the binary point f places to the left, giving us the result in the same fixed-point format as x and y. Thus, we can use the same kinds of adder circuits for fixed-point numbers as for integers. Similar arguments hold for subtraction, incrementing, decrementing, scaling by constant powers of 2, and resizing.

One issue we need to be aware of is that a design might represent different signals as fixed-point numbers of different lengths or with the binary point in different positions. When we perform operations such as addition or subtraction, we need to ensure that we add or subtract the bits with corresponding binary weights, wherever they occur in a vector. We may need to resize one operand to align it with the other. If we need to add or truncate on the left-hand end of a fixed-point number, the same considerations apply for resizing integers. Thus, in the case of unsigned fixed-point numbers, we add 0 bits to the left to extend the number, and we truncate 0 bits to reduce its size. In the case of 2s-complement signed numbers, we replicate the sign bit to extend the number, and we truncate bits to reduce the number, provided the truncated bits and the resulting sign bit are all the same as the original sign bit. If we need to add or truncate on the righthand end of a number, things are simpler, since the right-most bits all have positive weight. For both unsigned and 2s-complement representations, we add 0 bits to extend and truncate bits to reduce the size.

we add 0 bits to extend and truncate bits to reduce the size.

EXAMPLE 3.2 I Show how to use an adder for two signed fixed-point signals: *a*, with 4 pre-binary-point and 7 post-binary-point bits, and *b*, with 6 pre-binary-point and 4 post-binary-point bits. The result *c* should have 6 pre-binary-point and 4 post-binary-point bits.

SOLUTION The operand *a* needs to be sign extended by two bits on the left-hand end and can be truncated by three bits on the right-hand end. A 10-bit adder is needed, connected as shown in Figure 3.19.



FIGURE 3.19 Alignment of operands for fixed-point addition.

Unfortunately, the Verilog + and - operators applied to vector operands representing fixed-point numbers do not take care of alignment. They

CHAPTER THREE

just perform the operations assuming the right-most bits of the operands are the corresponding least significant bits. If both operands are declared with the same index bounds, the operations are performed correctly for the fixed-point interpretation of the values. If, however, the index bounds are not the same, we need to extend or truncate both ends of the operands to make sure that the assumed binary points align.

EXAMPLE 3.22 Write Verilog declarations and an assignment to perform the addition described in Example 3.21.

SOLUTION The declarations for the nets a, b and c are

```
wire signed [3:-7] a;
wire signed [5:-4] b, c;
```

We could try the following assignment as a first attempt:

```
assign c = a + b;
```

Since a is 11 bits and b is 10 bits, the + operator would sign extend b to 11 bits and perform an 11-bit addition. The implicit binary points would be misaligned by three places. To correct this, we need to sign extend the value of a by 2 bits, and to truncate the 3 least signficant bits of a. We can use a part select to perform the truncation, but the result of a part select is treated as unsigned in Verilog. We can use the \$signed conversion operation to re-interpret it as signed. The following assignment incorporates these corrections:

```
assign c = {{2{a[3]}}, $signed(a[3:-4])} + b;
```

Another related issue to be aware of is the position of the binary point in the result of a multiplication. We can appeal to the way in which we do multiplication of decimals for an analogy. Suppose, for example, that we wish to multiply 23.76 by 3.128. We first multiply the digits without regard to the decimal points to get 7432128. We then add the number of post-decimal digits in the operands, namely, 2 and 3, to get the number of post-decimal digits in the result, namely, 5. Thus the product is 74.32128.

By analogy, multiplying two fixed-point binary numbers with  $m_1$  and  $m_2$  pre-binary-point bits and  $f_1$  and  $f_2$  post-binary-point bits, respectively, gives us a product with  $m_1 + m_2$  pre-binary-point bits and  $f_1 + f_2$  post-binary-point bits. For example, multiplying 1.101<sub>2</sub> by 10.1<sub>2</sub> gives 100.0001<sub>2</sub>. If

KNOWLEDGE TEST QUIZ we are to use the Verilog \* operator to produce a product of this length, we must extend each operand on the left to the final product size.

- 1. How is a nonnegative number *x* represented as a sum of powers of 2 in fixed-point form?
- 2. What range of values can be represented as signed fixed-point numbers with *m* pre-binary-point bits and *f* post-binary-point bits?
- 3. Write a Verilog declaration for a net x, *not* to represent numbers in the range 0.0 to 359.9 with a precision of 0.1.
- 4. Write a Verilog assignment to subtract the value of a net s2 from the value of a net s1, where both are of type wire [7:–7], to get a result net s3 of the same type. No overflow detection is required.
- 5. How many bits are required for the product of two *fixed-point* numbers with 5 pre-binary-point bits and 9 post-binary-point bits?

### 3.4 FLOATING-POINT NUMBERS

The final number representation that we will discuss in this chapter is floating-point, which is another representation for approximating real numbers. They allow for representation of a greater range of numbers than a fixed-point representation with the same number of bits. However, implementation of arithmetic operations is considerably more complex. Indeed, most circuits for floating-point arithmetic are not combinational, since they would otherwise be too complex and reduce overall system performance. Since we have deferred detailed discussion of sequential circuit design to a later chapter, we will not go into circuits for floating-point arithmetic here. For completeness of our survey of numeric representations in this chapter, we will just introduce floating-point format. Unfortunately, Verilog only provides rudimentary features for dealing with floating-point numbers. They are not sufficient for modeling floating-point circuits, so we will not discuss them here.

#### 3.4.1 CODING FLOATING-POINT NUMBERS

Floating-point representation in digital systems is based on the same ideas as scientific notation for decimal numbers. We can write numbers that are very small or very large as the product of a fixed-point decimal fraction and a power of 10. This saves us from writing long strings of leading or trailing zeros and makes the number much easier to read and understand. Examples of numbers expressed in scientific notation are  $6.02214199 \times 10^{23}$  (Avogadro's number) and  $1.60217653 \times 10^{-19}$  (the charge, in Coulombs, of an electron). We call the fractional part before the × sign the *mantissa* and the power to which 10 is raised the *exponent*.

Floating-point representations adopt these ideas, but use binary instead of decimal. The mantissa is expressed as a fixed-point binary number, the base of the exponent is 2, and the exponent is a signed binary number. Within these general guidelines, there are many alternative floating-point representations, and, historically, several have been implemented in computer designs. However, modern general-purpose computers have almost universally adopted a floating-point representation standardized as IEEE Standard 754, the so called IEEE floating-point format. In this section, we will describe this format and formats that differ from it only in the number of bits used for the mantissa and exponent.

A floating-point number is represented as a vector of bits arranged as shown in Figure 3.20. The mantissa is represented using a sign bit, s, located in the left-most bit of the vector, and the unsigned magnitude, located in the right-most m bits of the vector. The exponent is represented using e bits between the sign bit and the mantissa magnitude. The IEEE floating-point standard defines two standard floating-point sizes: 32-bit single precision, with m = 23 bits and e = 8 bits; and 64-bit double precision, with m = 52 bits and e = 11 bits. These are implemented by most computers. However, if we are designing custom digital circuits for specific applications, we need not be constrained to these sizes. We can choose smaller or larger sizes in order to meet the requirements and constraints of the application. After we've explored some more of the details of the way in which numbers are represented, we will see how the sizes of the exponent and mantissa affect the range and precision of numbers represented.

A floating-point number is usually *normalized*, meaning that the magnitude of the mantissa is greater than or equal to  $1.0_{10}$  (that is,  $1.0_2$ ) and less than  $2.0_{10}$  (that is, less than or equal to,  $1.111...1_2$ ), with the exponent being adjusted to give the required value for the number. The mantissa magnitude could be represented as a fixed-point fraction with the binary point located just to the right of the most significant bit. However, as a consequence of normalizing, the most significant bit is always 1. So we can gain an extra bit of precision by not explicitly representing the most significant bit, but assuming that it is 1. This implicit bit in the floating-point format is called the hidden bit. Note that the mantissa is not represented using 2s-complement encoding, even though it is a signed value. The sign/magnitude representation turns out to have several advantages, including simplification of circuits for some arithmetic operations. We won't go into details here.

Similarly, though the exponent is a signed number, it also is not represented in 2s-complement form. Rather, it is represented in excess form. That is, for a given actual exponent value E, we represent it with the e-bit unsigned binary code for  $E + 2^{e-1} - 1$ . The value  $2^{e-1} - 1$  is called the bias, and is chosen so that a symmetric range of positive and negative actual exponent values can be represented. For example, if 5 bits are

|   | e bits | m bits   |
|---|--------|----------|
| s | exp    | mantissa |

FIGURE 3.20 Floating-point format.

used for the exponent, the bias would be  $2^4 - 1 = 15$ , that is,  $01111_2$ . An actual exponent value of 3 would be represented using the 5-bit unsigned binary code for 3 + 15 = 18, that is  $10010_2$ . The reason for using excess coding is that all exponent codes are unsigned. Given the position of the exponent within a floating-point code word, and the fact that numbers with smaller exponents are smaller than numbers with larger exponents (due to normalization), floating-point numbers can be compared using the same hardware as for comparing integers. This is a useful trick for saving cost and execution time in floating-point arithmetic hardware.

Let's now consider the range and precision of values that can be represented using floating-point format. As with fixed-point numbers, the range and precision are important factors that influence the numerical behavior of computations. The range of values is determined by the length of the exponent, since the most positive exponent determines the largest value and the most negative exponent determines the smallest value. The IEEE floating-point format reserves two exponent encodings for special purposes: the largest encoding,  $2^e - 1$ , with all 1 bits; and the smallest encoding, with all 0 bits. We will return to these shortly. Setting them aside, the smallest exponent has an encoding of 1, representing an actual exponent value of  $-2^{e-1} + 2$ . Putting this together with the smallest mantissa magnitude of 1.0 gives us the smallest representable value of  $\pm 1.0 \times 2^{-2^{e-1}+2}$ . The largest exponent has an encoding of  $2^e-2$ , representing an actual exponent value of  $2^{e-1}-1$ . Putting this together with the largest mantissa magnitude of just under 2.0 gives us the largest representable value of just under  $\pm 2.0 \times 2^{2^{e-1}-1}$ , that is,  $\pm 2^{2^{e-1}}$ . For IEEE single-precision format, this corresponds to a range of approximately  $\pm 1.2 \times 10^{-38}$  to  $\pm 3.4 \times 10^{38}$ , and for IEEE double-precision format, a range of approximately  $\pm 2.2 \times 10^{-308}$  to  $\pm 1.8 \times 10^{308}$ . A custom floating-point representation with a 5-bit exponent, on the other hand, would give us a range of approximately  $\pm 6.1 \times 10^{-5}$  to  $\pm 6.6 \times 10^{4}$ .

When considering the precision of floating-point numbers, we usually talk about relative precision, since absolute precision varies with the exponent. The relative precision is determined by the number of bits in the mantissa magnitude. All of the bits are significant, since there are no leading zeros in the mantissa (taking into account the hidden bit). So the relative precision remains the same across the full range of values, and is approximately  $2^{-m}$ . Another way of thinking about precision is to specify the number of significant decimal digits, which is approximately  $m \times \log_{10} 2$ , that is  $m \times 0.3$  digits. For example, IEEE single-precision format gives a precision of approximately 7 decimal digits, and IEEE double-precision format gives approximately 16 decimal digits. A custom format with 16 bits of mantissa magnitude would give a precision of approximately 5 decimal digits.

We can return now to the special exponent encodings that we mentioned above. First, the smallest exponent encoding, all zeros, is used for denormal numbers, in which the hidden bit is 0. The actual exponent is still represented using excess form, and so has a value of  $-2^{e-1}+1$ . Thus, denormal numbers are all smaller in magnitude than the smallest normalized number, though they have fewer significant bits. They allow for gradual underflow in a computation, where the results diminish toward 0.0 once the limit of precision has been reached. This feature of the representation improves the numerical behavior of some algorithms. If all the mantissa bits in a denormal number are 0, we get  $\pm 0.0 \times 2^{-2^{e^{-1}} + 1}$ . Thus, there are two alternate representations for 0.0, one with a sign bit of 0 and the other with a sign bit of 1. The IEEE standard specifies that a zero result in most cases be represented by the nonnegative version, but that in any case, the two versions should be deemed equal.

The other special exponent encoding, all 1s, has two uses. If the mantissa magnitude bits are all 0 (not counting the hidden bit), the number represents an infinite value. The value of the sign bit determines whether it is a positive or negative infinity. Operations that overflow generally yield an infinite result, which is maintained in subsequent computations. This avoids having to check for overflow until completion of a multistep computation, thus improving performance. If the exponent encoding is all 1s and the mantissa magnitude is other than all 0s, the value is said to represent not a number (NaN). NaN results arise from computations such as division of 0 by 0, and can also be maintained through a multistep computation.

In addition to the representation for floating-point numbers, the IEEE standard also specifies how arithmetic operations are to be performed, provides options for specifying how operations are to be rounded, and specifies the conditions under which exceptions may occur. (A system may abort a computation or take recovery action when an exception occurs.) The details are beyond the scope of this book, but can be found in the Further Reading references.

For a given number of bits of representation, floating-point representation can give a larger range of values than fixed-point, albeit at the expense of precision. The choice between floating-point and fixed-point in a given application will depend largely on the range of values that must be represented, both for the input and output signals, as well as for intermediate results during computation. There is also a trade-off with the complexity of circuits needed to perform the computations. Fixed-point circuits are generally simpler, but if significantly more bits are needed to get the required range, the circuits may consume more area. In many cases, the choice will only be made after thorough exploration of the numerical behavior of the computations to be performed and comparison of implementation complexities of alternate representations. This exploration will usually be performed by a system architect early in the development process. The result of the exploration will be a design specification that includes details of number representations to be used within the system. In a circuit that is customized for a particular application, a floating-point representation can use exponent and mantissa sizes other than those defined by the IEEE standard, thus reducing cost and potentially improving performance.

## KNOWLEDGE TEST QUIZ

- 1. Express the number  $4.5_{10}$  in floating-point format with 5 bits of exponent and 12 bits of mantissa magnitude.
- 2. What values are represented by the following bit vectors, interpreted in floating-point format with 4 bits of exponent and 11 bits of mantissa magnitude: 0000000000000000, 011110000000000 and 010001000000000?
- 3. Determine the minimum number of exponent and mantissa bits required to represent a floating-point value in the range −100 to 100 with a precision of at least 4 decimal digits.

# CHAPTER 4

# **Design Examples**

In this chapter, we present several Verilog design examples to illustrate the design of small digital systems. We present the concept of dividing a design into a controller and a data path and using the control circuit to control the sequence of operations in a digital system. We use Verilog to describe a digital system at the behavioral level so that we can simulate the system to test the algorithms used. We also show how designs have to be coded structurally if specific hardware structures are to be generated.

In any design, first, one should understand the problem and the design specifications clearly. If the problem has not been stated clearly, try to get the specification of the design clarified. In real-world designs, if another team or a client company is providing your team with the specifications, getting the design specifications clarified properly can save you a lot of grief later. Good design starts with a clear specification document.

Once the problem has been stated clearly, often designers start thinking about the basic blocks necessary to accomplish what is specified. Designers often think of standard building blocks, such as adders, shift-registers, counters, and the like. Traditional design methodology splits a design into a "data path" and a "controller." The term "data path" refers to the hardware that actually performs the data processing. The controller sends control signals or commands to the data path, as shown in Figure 4-1. The controller can obtain feedback in the form of status signals from the data path.

FIGURE 4-1: Separation of a Design into Data Path and Controller



In the context of a microprocessor, the data path is the ALU (Arithmetic and Logic Unit) that performs the core of the processing. The controller is the control logic that sends appropriate control signals to the data path, instructing it to

perform addition, multiplication, shifting, or whatever action is called for by the instruction. Many users have a tendency to mistakenly consider the term "data path" to be synonymous with the data bus, but "data path" in traditional design terminology refers to the actual data processing unit.

Maintaining a distinction between data path and controller helps in debugging (i.e., finding errors in the design). It also helps while modifying the design. Many modifications can be accomplished by changing only the control path because the same data path can still support the new requirements. The controller can generate the new sequence of control signals to accomplish the functionality of the modified design. Design often involves refining the data path and controller in iterations.

In this chapter, we will discuss various design examples. Several arithmetic and non-arithmetic examples are presented. Non-arithmetic examples include a 7-segment decoder, a traffic light, a scoreboard, and a keypad scanner. Arithmetic circuits such as adders, multipliers, and dividers are also presented.

# 4.1 BCD to 7-Segment Display Decoder

Seven segment displays are often used to display digits in digital counters, watches, and clocks. A digital watch displays time by turning on a combination of the segments on a 7-segment display. For this example, the segments are labeled as follows, and the digits have the forms as indicated in Figure 4-2.

FIGURE 4-2: 7-Segment Display



Let us design a BCD to 7-segment display decoder. BCD stands for "binary coded decimal." In this format, each digit of a decimal number is encoded into 4-bit binary representation. This decoder is a purely combinational circuit; hence, no state machine is involved here. A block diagram of the decoder is shown in Figure 4-3. The decoder for one BCD digit is presented.

FIGURE 4-3: Block Diagram of a BCD to 7-Segment Display Decoder



We will create a behavioral Verilog architectural description of this BCD to 7-segment decoder by using a single process with a case statement to model this

combinational circuit, as in Figure 4-4. The sensitivity list of the process consists of the BCD number (4 bits).

FIGURE 4-4: Behavioral Verilog Code for BCD to 7-Segment Decoder

```
module bcd_seven (bcd, seven);
  input [3:0] bcd;
  output[7:1] seven;
        [7:1] seven;
  req
  always @(bcd)
  begin
    case (bcd)
      4'b0000 : seven = 7'b0111111 ;
      4'b0001 : seven = 7'b0000110 ;
      4'b0010 : seven = 7'b1011011
      4'b0011 : seven = 7'b1001111
      4'b0100 : seven = 7'b1100110
      4'b0101 : seven = 7'b1101101 :
      4'b0110 : seven = 7'b11111101
      4'b0111 : seven = 7'b0000111
      4'b1000 : seven = 7'b1111111
      4'b1001 : seven = 7'b1101111 ;
      default : seven = 7'b0000000
    endcase
  end
endmodule
```

## 4.2 A BCD Adder

In this example, we design a 2-digit BCD adder, which will add two BCD numbers and produce the sum in BCD format. In BCD representation, each decimal digit is encoded into binary. For instance, decimal number 97 will be represented as 1001 0111 in the BCD format, where the first 4 bits represent digit 9 and the next 4 bits represent digit 7. One may note that the BCD representation is different from the binary representation of 97, which is 110001. It takes 8 bits to represent 97 in BCD, whereas the binary representation of 97 (110001) only requires 6 bits. The 4-bit binary combinations 1010, 1011, 1100, 1101, 1110, and 1111 corresponding to hexadecimal numbers A to F are not used in the BCD representation. Since 6 out of 16 representations possible with 4 binary bits are skipped, a BCD number will take more bits than the corresponding binary representation.

When BCD numbers are added, each sum digit should be adjusted to skip the six unused codes. For instance, if 6 is added with 8, the sum is 14 in decimal form. A binary adder would yield 1110, but the lowest digit of the BCD sum should read 4. In order to obtain the correct BCD digit, 6 should be added to the sum whenever it is greater than 9. Figure 4-5 illustrates the hardware that will be required to

```
wire[4:0] S0;
wire[4:0] S1;
wire C;

assign S0 = `Xdig0 + `Ydig0 ;
assign `Zdig0 = (S0 > 9) ? S0[3:0] + 6 : S0[3:0] ;
assign C = (S0 > 9) ? 1'b1 : 1'b0 ;

assign S1 = `Xdig1 + `Ydig1 + C ;
assign `Zdig1 = (S1 > 9) ? S1[3:0] + 6 : S1[3:0] ;
assign `Zdig2 = (S1 > 9) ? 4'b0001 : 4'b0000 ;
endmodule
```

equals 1, Zdig1 equals 3 and Zdig0 equals 5. In Verilog code, the defined name should be used with `(e.g., `Xdig1 or `Zdig2).

During the addition of the second digit, the carry digit from the addition of the  $XDig\theta$  and  $Ydig\theta$  is also added. Thus, the addition of the second digit is accomplished by the statement:

assign 
$$S1 = Xdig1 + Ydig1 + C$$
;

## 4.3 32-Bit Adders

Let us assume that we have to design a 32-bit adder. A simple way to construct an adder is to build a **ripple-carry adder**, as shown in Figure 4-7. In this type of adder, 32 copies of a one-bit full adder are connected in succession to create the 32-bit adder. The carry "ripples" from the least significant bit to the most significant bit. If gate delays are  $t_g$ , a one-bit adder delay is  $2*t_g$  (assuming a sum-of-products expression for sum and carry, and ignoring delay for inverters), and a 32-bit ripple-carry adder will take approximately 64 gate delays. For instance, if gate delays are 1 ns, the maximum frequency at which the 32-bit ripple-carry adder can operate is approximately 16 MHz! This is inadequate for many applications. Hence, designers often resort to faster adders.

FIGURE 4-7: A 32-Bit Ripple-Carry Adder



## Carry Look-Ahead Adders

A popular fast-addition technique is carry look-ahead (CLA) addition. In the carry look-ahead adder, the carry signals are calculated in advance, based on the input signals. For any bit position i, one can see that a carry will be generated if the

corresponding input bits (i.e.,  $A_i$ ,  $B_i$ ) are 1 or if there was a carry-in to that bit and at least one of the input bits are 1. In other words, bit i has carry-out if  $A_i$  and  $B_i$  are 1 (irrespective of carry-in to bit i); bit i also has a carry-out if  $C_i = 1$  and either  $A_i$  or  $B_i$  is 1. Thus, for any stage i, the carry-out is

$$C_{i+1} = A_i B_i + (A_i \oplus B_i) \cdot C_i \tag{4-1}$$

The " $\oplus$ " stands for the exclusive OR operation. Equation 4-1 simply expresses that there is a carry out from a bit position if it **generated** a carry by itself (e.g.,  $A_iB_i = 1$ ) or it simply **propagated** the carry from the lower bit forwarded to it (i.e.,  $(A_i \oplus B_i) \cdot C_i$ ).

Since  $A_iB_i = 1$  indicates that a stage generated a carry, a general **generate (Gi)** function may be written as

$$G_i = A_i B_i \tag{4-2}$$

Similarly, since  $(A_i \oplus B_i)$  indicates whether a stage should propagate the carry it receives from the lower stage, a general **propagate** ( $P_i$ ) function may be written as

$$P_i = A_i \oplus B_i \tag{4-3}$$

Notice that the propagate and generate functions depend only on the input bits and can be realized with one or two gate delays. Since there will be a carry whether one of  $A_i$  or  $B_i$  is 1 or both are 1, one can also write the propagate expression as

$$P_i = A_i + B_i \tag{4-4}$$

where the OR operation is substituted for the XOR operation. Logically this propagate function also results in the correct carry-out; however, traditionally it has been customary to define the propagate function as the XOR; that is, the bit position simply propagates a carry (without generating a carry by itself). Also, typically, the sum signal is expressed as

$$S_i = A_i \oplus B_i \oplus C_i = P_i \oplus C_i \tag{4-5}$$

The expression  $P_i \oplus C_i$  can be used for sum only if  $P_i$  is defined as  $A_i \oplus B_i$ .

The carry-out equation can be rewritten by substituting (4-2) and (4-3) in (4-1) for  $G_i$  and  $P_i$  as:

$$C_{i+1} = G_i + P_i C_i (4-6)$$

In a 4-bit adder, the  $C_i$ s can be generated by repeatedly applying Equation 4-6 as shown here

$$C_1 = G_0 + P_0 C_0 (4-7)$$

$$C_2 = G_1 + P_1 C_1 = G_1 + P_1 G_0 + P_1 P_0 C_0$$
(4-8)

$$C_3 = G_2 + P_2 C_2 = G_2 + P_2 G_1 + P_2 P_1 G_0 + P_2 P_1 P_0 C_0$$
 (4-9)

$$C_4 = G_3 + P_3C_3 = G_3 + P_3G_2 + P_3P_2G_1 + P_3P_2P_1G_0 + P_3P_2P_1P_0C_0$$
 (4-10)

These carry bits are the look-ahead carry bits. They are expressed in terms of  $P_s$ ,  $G_s$ , and  $C_0$ . Thus, the sum and carry from any stage can be calculated without

$$C_{4} = G_{G} + P_{G}C_{0} \tag{4-11}$$

This is accomplished by computing a group propagate  $(P_G)$  and group generate  $(G_G)$  signal, which is produced by carry look-ahead logic:

$$P_G = P_3 P_2 P_1 P_0 \tag{4-12}$$

$$G_G = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0 (4-13)$$

FIGURE 4-8: Block Diagram of a 4-Bit CLA



The disadvantage of the carry look-ahead adder is that the look-ahead carry logic as shown in Equations 4-7 through 4-13, is not simple. It gets quite complicated for more than 4 bits. For that reason, carry look-ahead adders are usually implemented as 4-bit modules and are used in a hierarchical structure to realize adders that have multiples of 4 bits. Figure 4-9 shows the block diagram for a 16-bit carry look-ahead adder. Four carry look-ahead adders, similar to the ones introduced previously, are used. Instead of relying on each 4-bit adder to send its carry-out to the next 4-bit adder, the **block carry look-ahead logic** generates input carry bits to be fed to each 4-bit adder using a group propagate ( $P_{\rm G}$ ) and group generate ( $G_{\rm G}$ ) signal, which is produced by each 4-bit adder. The next level of carry look-ahead logic uses these group propagates/generates and generates the required carry bits in parallel. The propagate for a group is true if all the propagates in that group are true. The generate for a group is true if the MSB generated a carry or if a lower bit generated a carry and every higher bit in the group propagated it.

The group propagate  $P_G$  and generate  $G_G$  will be available after three and four gate delays, respectively (one or two additional delays than the  $P_i$  and  $G_i$  signals,

FIGURE 4-9: Block Diagram of a 16-Bit CLA



respectively). The carry equations for the block carry look-ahead logic are as follows:

$$C_4 = G_{G0} + P_{G0}C_0 (4-14)$$

$$C_8 = G_{G1} + P_{G1}G_{G0} + P_{G1}P_{G0}C_0 (4-15)$$

$$C_{12} = G_{G2} + P_{G2}G_{G1} + P_{G2}P_{G1}G_{G0} + P_{G2}P_{G1}P_{G0}C_{0}$$
 (4-16)

 $C_{16}$ , which is a final carry of 16-bit CLA, will be

$$C_{16} = GG + PG C_0 (4-17)$$

One can derive the propagate (PG) and generate (GG) equation for block carry look-ahead logic in a manner similar to equation 4-12 and 4-13.

Figure 4-10 illustrates the Verilog description of a 4-bit carry look-ahead adder.

FIGURE 4-10: Verilog Description of a 4-Bit Carry Look-Ahead Adder

```
module CLA4 (A, B, Ci, S, Co, PG, GG);
   input[3:0] A;
   input[3:0] B;
   input Ci;
   output[3:0] S;
   output Co:
   output PG;
   output GG;
   wire[3:0] G;
   wire[3:0] P;
   wire[3:1] C;
   CLALogic CarryLogic (G, P, Ci, C, Co, PG, GG);
   GPFullAdder FA0 (A[0], B[0], Ci, G[0], P[0], S[0]);
   GPFullAdder FA1 (A[1], B[1], C[1], G[1], P[1], S[1]);
   GPFullAdder FA2 (A[2], B[2], C[2], G[2], P[2], S[2]);
   GPFullAdder FA3 (A[3], B[3], C[3], G[3], P[3], S[3]);
endmodule
```

```
module CLALogic (G, P, Ci, C, Co, PG, GG);
                       input[3:0] G;
                       input[3:0] P;
                       input Ci;
                       output[3:1] C;
                       output Co:
                      output PG;
                       output GG;
                       wire GG int:
                       wire PG_int;
                       assign C[1] = G[0] | (P[0] & Ci) ;
                       assign C[2] = G[1] | (P[1] & G[0])
                                                                                                                                                                                                                                                                     | (P[1] & P[0] & Ci);
                       assign C[3] = G[2] | (P[2] & G[1]) | (P[2] & P[1] & G[0]) | (P[2] & P[1] & F[2] & F
                                                                                                                      P[0] & Ci);
                       assign PG_{int} = P[3] & P[2] & P[1] & P[0] ;
                       assign GG_{int} = G[3] \mid (P[3] \& G[2]) \mid (P[3] \& P[2] \& G[1]) \mid (P[3] \& P[2] \& P[2] \& G[1]) \mid (P[3] \& P[2] \& P[2]
                                                                                                                                    P[1] \& G[0]);
                       assign Co = GG_int | (PG_int & Ci) ;
                       assign PG = PG_int ;
                       assign GG = GG int ;
 endmodule
module GPFullAdder (X, Y, Cin, G, P, Sum);
                       input X;
                       input Y;
                       input Cin;
                       output G:
                       output P;
                       output Sum;
                      wire P_int;
                       assign G = X \& Y:
                       assign P = P_int ;
                       assign P_{int} = X \wedge Y;
                       assign Sum = P_int ^ Cin ;
 endmodule
```

Verilog code for a 16-bit carry look-ahead adder can be developed by instantiating four copies of the 4-bit carry look-ahead adder and one additional copy of the carry look-ahead logic. A 64-bit adder can be built by one more level of block carry look-ahead logic. The delay increases by only two gate delays when the adder size increases from 16 bits to 64 bits. Developing Verilog code for 16- and 64-bit carry look-ahead logic is left as subjects of exercise problems.

Figure 4-11 illustrates behavioral Verilog code for a 32-bit adder using the "+" operator. If this code is synthesized, depending on the tools used and the target technology, an adder with characteristics in between a ripple-carry adder and a fast 2-level adder will be obtained. The various topologies result in different area, power, and delay characteristics.

```
module Adder32 (A, B, Ci, S, Co);

input[31:0] A;
input[31:0] B;
input Ci;
output[31:0] S;
output Co;

wire[32:0] Sum33;

assign Sum33 = A + B + Ci;
assign S = Sum33[31:0];
assign Co = Sum33[32];
endmodule
```

## Example

If gate delays are  $t_g$ , what is the delay of the fastest 32-bit adder? Assume that the amount of hardware consumed is not a constraint. Only speed is important.

**Answer:** One can express each sum bit of a 32-bit adder as a sum-of-products expression of the input bits. There will be 33 such equations, including one for the carry out bit. These equations will be very long, and some of them could include 60+ variables in the product term. Nevertheless, if gates with any number of inputs are available, theoretically a 2-level adder can be made. Although it is not very practical, theoretically, the delay of the fastest adder will be  $2t_g$  if gate delays are  $t_g$ .

## Example

Is ripple-carry adder the smallest 32-bit adder?

**Answer:** A 32-bit ripple-carry adder uses 32 1-bit adders. One could design a 32-bit serial adder using a single 1-bit full adder. The input numbers are shifted into the adder, one bit at a time, and carry output from addition of each pair of bits is saved in a flip-flop and fed back to the next addition. The hardware illustrated in Figure 4-12 accomplishes this. The delay

FIGURE 4-12: A 32-Bit Serial Adder Built from a Single 1-Bit Adder



of adder will be 32 \*  $(2t_p + t_{ff})$ , where  $2t_p$  is the delay of the one-bit full adder and  $t_{ff}$  is the delay of the flip-flop (including set up time). If a flip-flop delay is at least two gate delays, the delay of the 32-bit serial adder will be at least 128t<sub>p</sub>. The adder hardware is simple; however, there is also the control circuitry needed to generate 32 shift signals. The registers storing the operands must have shift capability as well.

Even if you write Verilog code based on data flow equations, as shown in Figure 4-10, that does not guarantee that the synthesizer will produce a carry look-ahead adder with the delay characteristics we have discussed. The software might optimize the synthesis output depending on the specific hardware components available in the target technology. For instance, if you are using an FPGA with fast adder support, the software may map some of the functions into the fast adder circuitry. Depending on the number of FPGA logic blocks and interconnects used, the delays will be different from the manual calculations. The delays of a ripple-carry, carry look-ahead, and serial adder for a gate-based implementation are presented in Table 4-1 for various adder sizes. One can see that the carry look-ahead adder is very attractive for large adders.

TABLE 4-1: Comparison of Ripple-Carry and Carry Look-Ahead Adders

| Adder size | Ripple-carry<br>adder delay | CLA delay           | Serial adder<br>delay |
|------------|-----------------------------|---------------------|-----------------------|
| 4 bit      | 8 t <sub>g</sub>            | 5-6 t <sub>g</sub>  | 16 t <sub>g</sub>     |
| 16 bit     | 32 t <sub>g</sub>           | 7-8 t <sub>g</sub>  | 64 t <sub>g</sub>     |
| 32 bit     | 64 t <sub>g</sub>           | 9-10 t <sub>g</sub> | 128 t <sub>g</sub>    |
| 64 bit     | 128 t <sub>a</sub>          | 9-10 t <sub>a</sub> | 256 t <sub>a</sub>    |

# **Traffic Light Controller**

Let us design a sequential traffic light controller for the intersection of street "A" and street "B." Each street has traffic sensors, which detect the presence of vehicles approaching or stopped at the intersection. Sa = 1 means a vehicle is approaching on street "A," and Sb = 1 means a vehicle is approaching on street "B." Street "A" is a main street and has a green light until a car approaches on "B." Then the lights change, and "B" has a green light. At the end of 50 seconds, the lights change back unless there is a car on street "B" and none on "A," in which case the "B" cycle is extended for 10 additional seconds. If cars continue to arrive on street "B" and no car appears on street "A," "B" continues to have a green light. When "A" is green, it remains green at least 60 seconds, and then the lights change only when a car approaches on "B." Figure 4-13 shows the external connections to the controller. Three of the outputs (Ga, Ya, and Ra) drive the green, yellow, and red lights on FIGURE 4-24: Single Pulser and Synchronizer Circuit



assume the state assignments are  $S_0 = 0$  and  $S_1 = 1$ . In such a case, the Q output of the second flip-flop is synonymous with  $S_1$ , and the Q' output of the second flip-flop is synonymous with  $S_0$ . The equation for the single pulse SP is

$$SP = S_0 \cdot SYNCPRESS$$

It may also be noted that  $S_0 = S_1$ '. Including the 2 flip-flops inside the synchronizing block, three flip-flops can provide debouncing, synchronization, and single-pulsing. If button pushes can be passed through such a circuit, a single pulse that is debounced and synchronized, with respect to the system clock, can be obtained. It is a good practice to feed external push-button signals through such a circuit in order to obtain controlled and predictable operation.

# 4.8 A Shift-and-Add Multiplier

In this section, we will design a multiplier for unsigned binary numbers. When we form the product  $A \boxtimes B$ , the first operand (A) is called the *multiplicand* and the second operand (B) is called the *multiplier*. As illustrated here, binary multiplication requires only shifting and adding. In the following example, we multiply  $13_{10}$  by  $11_{10}$  in binary



Note that each partial product is either the multiplicand (1101) shifted over by the appropriate number of places or zero. Instead of forming all the partial products first and then adding, each new partial product is added in as soon as it is formed, which eliminates the need for adding more than two binary numbers at a time.

Multiplication of two 4-bit numbers requires a 4-bit multiplicand register, a 4-bit multiplier register, a 4-bit full adder, and an 8-bit register for the product. The product register serves as an accumulator to accumulate the sum of the partial products. If the multiplicand were shifted left each time before it was added to the accumulator, as was done in the previous example, an 8-bit adder would be needed. Therefore, it is better to shift the contents of the product register to the right each time, as shown in the block diagram of Figure 4-25.

FIGURE 4-25: Block Diagram for Binary Multiplier



This type of multiplier is sometimes referred to as a serial-parallel multiplier, since the multiplier bits are processed serially, but the addition takes place in parallel. As indicated by the arrows on the diagram, 4 bits from the accumulator (ACC) and 4 bits from the multiplicand register are connected to the adder inputs; the 4 sum bits and the carry output from the adder are connected back to the accumulator. When an add signal (Ad) occurs, the adder outputs are transferred to the accumulator by the next clock pulse, thus causing the multiplicand to be added to the accumulator. An extra bit at the left end of the product register temporarily stores any carry that is generated when the multiplicand is added to the accumulator. When a shift signal (Sh) occurs, all 9 bits of ACC are shifted right by the next clock pulse.

Since the lower 4 bits of the product register are initially unused, we will store the multiplier in this location instead of in a separate register. As each multiplier bit is used, it is shifted out the right end of the register to make room for additional product bits. A shift signal (Sh) causes the contents of the product register (including the multiplier) to be shifted right one place when the next clock pulse occurs. The control circuit puts out the proper sequence of add and shift signals after a start signal (St = 1) has been received. If the current multiplier bit (M) is 1, the multiplicand is added to the accumulator followed by a right shift; if the multiplier bit is 0, the addition is skipped and only the right shift occurs. The multiplication example  $(13 \boxtimes 11)$  is reworked as follows showing the location of the bits in the registers at each clock time:



The control circuit must be designed to output the proper sequence of add and shift signals. Figure 4-26 shows a state graph for the control circuit. In Figure 4-26,  $S_0$  is the reset state, and the circuit stays in  $S_0$  until a start signal (St = 1) is received. This generates a Load signal, which causes the multiplier to be loaded into the lower 4 bits of the accumulator (ACC) and the upper 5 bits of the accumulator to be cleared. In state  $S_1$ , the low-order bit of the multiplier (M) is tested. If M=1, an add signal is generated, and if M=0, a shift signal is generated. Similarly, in states  $S_2$ ,  $S_5$ , and  $S_7$ , the current multiplier bit (M) is tested to determine whether to generate an add or shift signal. A shift signal is always generated at the next clock time following an add signal (states  $S_2$ ,  $S_4$ ,  $S_6$ , and  $S_8$ ). After four shifts have been generated, the control network goes to  $S_0$  and a done signal is generated before returning to  $S_0$ .

FIGURE 4-26: State Graph for Binary Multiplier Control



The behavioral Verilog model (Figure 4-27) corresponds directly to the state graph. Since there are 10 states, we have declared an integer ranging from 0 to 9 for the state signal. The signal ACC represents the 9-bit accumulator output. The statement

#### `define M ACC[0]

allows us to use the name M in place of ACC(0). The notation 1, 3, 5, 7: means when the state is 1 or 3 or 5 or 7, the action that follows occurs. All register operations and state changes take place on the rising edge of the clock. For example, in state 0, if St is 1, the multiplier is loaded into the accumulator at the same time the state changes to 1. The expression {1'b0, ACC[7:4]} + Mcand is used to compute the sum of two 4-bit unsigned vectors to give a 5-bit result. This represents the adder output, which is loaded into ACC at the same time the state counter is incremented. The right shift on ACC is accomplished by loading ACC with 0 concatenated with the upper 8 bits of ACC. The expression {1'b0, ACC[8:1]} could be replaced with ACC >> 1.

```
// This is a behavioral model of a multiplier for unsigned
// binary numbers. It multiplies a 4-bit multiplicand
// by a 4-bit multiplier to give an 8-bit product.
// The maximum number of clock cycles needed for a
// multiply is 10.
`define M ACC[0]
module mult4X4 (Clk, St, Mplier, Mcand, Done, Result);
   input Clk;
   input St;
   input[3:0] Mplier;
   input[3:0] Mcand;
   output Done;
   output[7:0] Result;
   reg[3:0] State;
   reg[8:0] ACC;
   initial
   begin
       State = 0;
      ACC
            = 0;
   end
   always @(posedge Clk)
   begin
          case (State)
             0:
                        begin
                           if (St ==1'b1)
                           begin
                              ACC[8:4] <= 5'b00000;
                              ACC[3:0] <= Mplier ;
                              State <= 1 :
                           end
                        end
             1, 3, 5, 7:
                        begin
                         if (`M == 1'b1)
                         begin
                            ACC[8:4] \leftarrow \{1'b0, ACC[7:4]\} + Mcand;
                              State <= State + 1 ;</pre>
                           end
                           else
                           begin
                             ACC <= \{1'b0, ACC[8:1]\};
```

```
State <= State + 2 ;
                            end
                         end
              2, 4, 6, 8:
                         begin
                            ACC \leftarrow \{1'b0, ACC[8:1]\};
                            State <= State + 1;
                         end
              9:
                         begin
                            State \leftarrow 0;
                         end
          endcase
   end
   assign Done = (State == 9) ? 1'b1 : 1'b0 ;
   assign Result = (State == 9) ? ACC[7:0] : 8'b01010101 ;
endmodule
```

The Done signal should be turned on only in state 9. If we had used the statement State <= 0; Done <= '1'; for the behavior of State = 9, Done would be turned on at the same time State changes to 0. This is too late, since we want Done to turn on when State becomes 9. Therefore, we used a separate concurrent assignment statement. This statement is placed outside the process so that *Done* will be updated whenever *State* changes.

As the state graph for the multiplier indicates, the control performs two functions—generating add or shift signals as needed and counting the number of shifts. If the number of bits is large, it is convenient to divide the control circuit into a counter and an add-shift control, as shown in Figure 4-28(a). First, we will derive a state graph for the add-shift control that tests St and M and outputs the proper sequence of add and shift signals (Figure 4-28(b)). Then we will add a completion signal (K) from the counter that stops the multiplier after the proper number of shifts have been completed. Starting in  $S_0$  in Figure 4-28(b), when a start signal St = 1 is received, a load signal is generated and the circuit goes to state  $S_1$ . Then if M=1, an add signal is generated and the circuit goes to state  $S_2$ ; if M=0, a shift signal is generated and the circuit stays in  $S_1$ . In  $S_2$ , a shift signal is generated since a shift always follows an add. The graph of Figure 4-28(b) will generate the proper sequence of add and shift signals, but it has no provision for stopping the multiplier.

To determine when the multiplication is completed, the counter is incremented each time a shift signal is generated. If the multiplier is n bits, n shifts are required. We will design the counter so that a completion signal (K) is generated after n-1shifts have occurred. When K = 1, the circuit should perform one more addition, if necessary, and then do the final shift. The control operation in Figure 4-28(c) is the same as Figure 4-28(b) as long as K = 0. In state  $S_1$ , if K = 1, we test M as usual. If M = 0, we output the final shift signal and go to the done state  $(S_3)$ ; however, if M = 1, we add before shifting and go to state  $S_2$ . In state  $S_2$ , if K = 1, we output one

FIGURE 4-28: Multiplier Control with Counter



(c) Final state graph for add-shift control

more shift signal and then go to  $S_3$ . The last shift signal will increment the counter to 0 at the same time the add-shift control goes to the done state.

As an example, consider the multiplier of Figure 4-25, but replace the control circuit with Figure 4-28(a). Since n=4, a 2-bit counter is needed to count the four shifts, and K=1 when the counter is in state 3 (11<sub>2</sub>). Table 4-2 shows the operation of the multiplier when 1101 is multiplied by 1011.  $S_0$ ,  $S_1$ ,  $S_2$ , and  $S_3$  represent states of the control circuit (Figure 4-28(c)). The contents of the product register at each step are the same as given on page 233 of this chapter.

TABLE 4-2: Operation of Multiplier Using a Counter

| Time    | State                 | Counter | Product<br>Register | St | М | K | Load | Ad | Sh | Done |
|---------|-----------------------|---------|---------------------|----|---|---|------|----|----|------|
| $t_0$   | S <sub>0</sub>        | 00      | 000000000           | 0  | 0 | 0 | 0    | 0  | 0  | 0    |
| $t_1$   | $S_0$                 | 00      | 000000000           | 1  | 0 | 0 | 1    | 0  | 0  | 0    |
| $t_2$   | S <sub>1</sub>        | 00      | 000001011           | 0  | 1 | 0 | 0    | 1  | 0  | 0    |
| $t_3$   | $S_2$                 | 00      | 011011011           | 0  | 1 | 0 | 0    | 0  | 1  | 0    |
| $t_4$   | S <sub>1</sub>        | 01      | 001101101           | 0  | 1 | 0 | 0    | 1  | 0  | 0    |
| $t_{5}$ | $S_2$                 | 01      | 100111101           | 0  | 1 | 0 | 0    | 0  | 1  | 0    |
| $t_6$   | <b>S</b> <sub>1</sub> | 10      | 010011110           | 0  | 0 | 0 | 0    | 0  | 1  | 0    |
| $t_7$   | S <sub>1</sub>        | 11      | 001001111           | 0  | 1 | 1 | 0    | 1  | 0  | 0    |
| $t_8$   | $S_2$                 | 11      | 100011111           | 0  | 1 | 1 | 0    | 0  | 1  | 0    |
| $t_9$   | $S_3$                 | 00      | 010001111           | 0  | 1 | 0 | 0    | 0  | 0  | 1    |

At time  $t_0$ , the control is reset and waits for a start signal. At time  $t_1$ , the start signal St is 1, and a Load signal is generated. At time  $t_2$ , M=1, so an Ad signal is generated. When the next clock occurs, the output of the adder is loaded into the

accumulator and the control goes to  $S_2$ . At  $t_3$ , an Sh signal is generated, so at the next clock shifting occurs and the counter is incremented. At  $t_4$ , M = 1 so Ad = 1, and the adder output is loaded into the accumulator at the next clock. At  $t_5$  and  $t_6$ , shifting and counting occur. At  $t_7$ , three shifts have occurred and the counter state is 11, so K = 1. Since M = 1, addition occurs and control goes to  $S_2$ . At  $t_8$ , Sh = K = 1, so at the next clock the final shift occurs and the counter is incremented back to state 00. At  $t_0$ , a *Done* signal is generated.

The multiplier design given here can easily be expanded to 8, 16, or more bits simply by increasing the register size and the number of bits in the counter. The add-shift control would remain unchanged.

### **Array Multiplier** 4.9

An array multiplier is a parallel multiplier that generates the partial products in a parallel fashion. The various partial products are added as soon as they are available. Consider the process of multiplication as illustrated in Table 4-3. Two 4-bit unsigned numbers,  $X_3X_2X_1X_0$  and  $Y_3Y_2Y_1Y_0$ , are multiplied to generate a product that is possibly 8 bits. Each of the  $X_iY_i$  product bits can be generated by an AND gate. Each partial product can be added to the previous sum of partial products using a row of adders. The sum output of the first row of adders, which adds the first two partial products, is  $S_{13} S_{12} S_{11} S_{10}$ , and the carry output is  $C_{13} C_{12} C_{11} C_{10}$ . Similar results occur for the other two rows of adders. (We have used the notation  $S_{ii}$  and  $C_{ii}$  to represent the sums and carries from the *i*th row of adders.)

**TABLE 4-3: 4-Bit** Multiplier Partial **Products** 

|                 |                 |                 |                 | $X_3$           | $X_2$           | $X_1$           | $X_0$    | Multiplicand      |
|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|----------|-------------------|
|                 |                 |                 |                 | $Y_3$           | $Y_2$           | $Y_1$           | $Y_0$    | Multiplier        |
|                 |                 |                 |                 | $X_3Y_0$        | $X_2Y_0$        | $X_1Y_0$        | $X_0Y_0$ | partial product 0 |
|                 |                 |                 | $X_3Y_1$        | $X_2Y_1$        | $X_1Y_1$        | $X_0Y_1$        |          | partial product 1 |
|                 |                 |                 | C <sub>12</sub> | C <sub>11</sub> | C <sub>10</sub> |                 | _        | 1st row carries   |
|                 |                 | C <sub>13</sub> | S <sub>13</sub> | S <sub>12</sub> | S <sub>11</sub> | S <sub>10</sub> |          | 1st row sums      |
|                 |                 | $X_3Y_2$        | $X_2Y_2$        | $X_1Y_2$        |                 |                 |          | partial product 2 |
|                 |                 | C <sub>22</sub> | C <sub>21</sub> | C <sub>20</sub> |                 | _               |          | 2nd row carries   |
|                 | $C_{23}$        | S <sub>23</sub> | S <sub>22</sub> | S <sub>21</sub> | S <sub>20</sub> |                 |          | 2nd row sums      |
|                 | $X_3Y_3$        | $X_2Y_3$        | $X_1Y_3$        | $X_0Y_3$        |                 |                 |          | partial product 3 |
|                 | C <sub>32</sub> | C <sub>31</sub> | C <sub>30</sub> |                 | _               |                 |          | 3rd row carries   |
| C <sub>33</sub> | S <sub>33</sub> | S <sub>32</sub> | S <sub>31</sub> | S <sub>30</sub> |                 |                 |          | 3rd row sums      |
| $P_7$           | $P_6$           | $P_5$           | $P_4$           | $P_3$           | $P_2$           | $P_1$           | $P_0$    | final product     |

Figure 4-29 shows the array of AND gates and adders to perform this multiplication. If an adder has three inputs, a full adder (FA) is used, but if an adder has only two inputs, a half-adder (HA) is used. A half-adder is the same as a full adder with one of the inputs set to 0. This multiplier requires 16 AND gates, 8 full adders, FIGURE 4-29: Block
Diagram of 4 

4 Array
Multiplier



and 4 half-adders. After the X and Y inputs have been applied, the carry must propagate along each row of cells, and the sum must propagate from row to row. The time required to complete the multiplication depends primarily on the propagation delay in the adders. The longest path from input to output goes through 8 adders. If  $t_{ad}$  is the worst-case (longest possible) delay through an adder, and  $t_g$  is the longest AND gate delay, then the worst-case time to complete the multiplication is  $8t_{ad} + t_g$ .

In general, an *n*-bit-by-*n*-bit array multiplier would require  $n^2$  AND gates, n(n-2) full adders, and *n* half-adders. So the number of components required increases quadratically. For the serial-parallel multiplier previously designed, the amount of hardware required in addition to the control circuit increases linearly with *n*.

For an  $n \boxtimes n$  array multiplier, the longest path from input to output goes through n adders in the top row, n-I adders in the bottom row and n-3 adders in the middle rows. The corresponding worst-case multiply time is  $(3n-4)t_{ad}+t_g$ . The longest delay in a circuit is called a critical path. The worst-case delay can be improved to  $2nt_{ad}+t_g$  by forwarding carry from each adder to the diagonally lower adder rather than the adder on the left side. When n=4, both expressions are the same; however, for larger values of n, it is beneficial to pass carry diagonally as opposed to rippling it to the left. One may note that this multiplier has no sequential logic or registers.

The shift-and-add multiplier that we previously designed requires 2n clocks to complete the multiply in the worst case, although this can be reduced to n clocks using a technique discussed in the following section. The minimum clock period depends on the propagation delay through the n-bit adder as well as the propagation delay and setup time for the accumulator flip-flops.

#### Verilog Coding

If the topology has to be exactly what the designer wants, one needs to do structural coding. If one made a behavioral model of a multiplier without specifying the

topology, the topology generated by the synthesizer will depend on the synthesis tool. Here, we present a structural model for an array multiplier in Figure 4-30. Full-adder and half-adder modules are created and used as components for the array multiplier. The full adders and half adders are interconnected according to the array multiplier topology. Several instantiation statements are used for this purpose.

FIGURE 4-30: Verilog Code for 4 

4 Array Multiplier

```
module Array_Mult (X, Y, P);
   input[3:0] X;
   input[3:0] Y;
   output[7:0] P;
   wire[3:0] C1;
   wire[3:0] C2:
   wire[3:0] C3:
   wire[3:0] S1;
   wire[3:0] S2;
   wire[3:0] S3;
   wire[3:0] XY0;
   wire[3:0] XY1;
   wire[3:0] XY2;
   wire[3:0] XY3;
   assign XY0[0] = X[0] & Y[0];
   assign XY1[0] = X[0] & Y[1]
   assign XY0[1] = X[1] & Y[0]
   assign XY1[1] = X[1] & Y[1]
   assign XY0[2] = X[2] & Y[0]
   assign XY1[2] = X[2] & Y[1]
   assign XY0[3] = X[3] & Y[0]
   assign XY1[3] = X[3] & Y[1]
   assign XY2[0] = X[0] & Y[2]
   assign XY3[0] = X[0] & Y[3]
   assign XY2[1] = X[1] & Y[2]
   assign XY3[1] = X[1] & Y[3]
   assign XY2[2] = X[2] & Y[2]
   assign XY3[2] = X[2] & Y[3]
   assign XY2[3] = X[3] & Y[2]
   assign XY3[3] = X[3] & Y[3];
   FullAdder FA1 (XY0[2], XY1[1], C1[0], C1[1], S1[1]);
   FullAdder FA2 (XY0[3], XY1[2], C1[1], C1[2], S1[2]);
   FullAdder FA3 (S1[2], XY2[1], C2[0], C2[1], S2[1]);
   FullAdder FA4 (S1[3], XY2[2], C2[1], C2[2], S2[2]);
   FullAdder FA5 (C1[3], XY2[3], C2[2], C2[3], S2[3]);
   FullAdder FA6 (S2[2], XY3[1], C3[0], C3[1], S3[1]);
   FullAdder FA7 (S2[3], XY3[2], C3[1], C3[2], S3[2]);
   FullAdder FA8 (C2[3], XY3[3], C3[2], C3[3], S3[3]);
   HalfAdder HA1 (XY0[1], XY1[0], C1[0], S1[0]);
```

```
HalfAdder HA2 (XY1[3], C1[2], C1[3], S1[3]);
   HalfAdder HA3 (S1[1], XY2[0], C2[0], S2[0]);
   HalfAdder HA4 (S2[1], XY3[0], C3[0], S3[0]);
   assign P[0] = XY0[0];
   assign P[1] = S1[0];
   assign P[2] = S2[0];
   assign P[3] = S3[0]
   assign P[4] = S3[1];
   assign P[5] = S3[2];
   assign P[6] = S3[3];
   assign P[7] = C3[3];
endmodule
// Full Adder and half adder modules
// should be in the project
module FullAdder (X, Y, Cin, Cout, Sum);
   input X;
   input Y;
   input Cin;
   output Cout;
   output Sum;
   assign Sum = X ^ Y ^ Cin ;
   assign Cout = (X \& Y) | (X \& Cin) | (Y \& Cin) ;
endmodule
module HalfAdder (X, Y, Cout, Sum);
   input X:
   input Y;
   output Cout;
   output Sum;
   assign Sum = X ^ Y;
   assign Cout = X \& Y;
endmodule
```

## **4.10** A Signed Integer/Fraction Multiplier

Several algorithms are available for multiplication of signed binary numbers. The following procedure is a straightforward way to carry out the multiplication:

- 1. Complement the multiplier if negative.
- 2. Complement the multiplicand if negative.
- **3.** Multiply the two positive binary numbers.
- **4.** Complement the product if it should be negative.

Although this method is conceptually simple, it requires more hardware and computation time than some of the other available methods.

The next method we describe requires only the ability to complement the multiplicand. Complementation of the multiplier or product is not necessary. Although the method works equally well with integers or fractions, we illustrate the method with fractions, since we will later use this multiplier as part of a multiplier for floating-point numbers. Using 2's complement for negative numbers, we will represent signed binary fractions in the following form:

> 0.101 +5/81.011 -5/8

The digit to the left of the binary point is the sign bit, which is 0 for positive fractions and 1 for negative fractions. In general, the 2's complement of a binary fraction F is  $F^* = 2 - F$ . Thus, -5/8 is represented by 10.000 - 0.101 = 1.011. (This method of defining 2's complement fractions is consistent with the integer case  $(N^* = 2^n -$ N), since moving the binary point n-1 places to the left is equivalent to dividing by  $2^{n-1}$ .) The 2's complement of a fraction can be found by starting at the right end and complementing all the digits to the left of the first 1, the same as for the integer case. The 2's complement fraction 1.000 . . . is a special case. It actually represents the number -1, since the sign bit is negative and the 2's complement of 1.000... is 2-1=1. We cannot represent +1 in this 2's complement fraction system, since 0.111 . . . is the largest positive fraction.

#### **Binary Fixed Point Fractions**

Fixed point numbers are number formats in which the decimal or binary point is at a fixed location. One can have a fixed-point 8-bit number format where the binary point is assumed to be after 4 bits (i.e., 4 bits for the fractional part and 4 bits for the integral part). If the binary point is assumed to be located 2 more bits to the right, there will be 6 bits for the integral part and 2 bits for the fraction. The range and precision of the numbers that can be represented in the different formats depend on the location of the binary point. For instance, if there are 4 bits for the fractional part and 4 bits for the integer, the range, assuming unsigned numbers, is 0.00 to 15.925. If only 2 bits are allowed for the fractional part and 6 bits for the integer, the range increases but the precision reduces. Now, the range would be 0.00 to 63.75, but the fractional part can be specified only as a multiple of 0.25.

Let us say we need to represent -13.45 in a 2's complement fixed-point number representation with 4 fractional bits. To convert any decimal fraction into the binary fraction, one technique is to repeatedly multiply the fractional part (only the fractional part in each intermediate step) by 2. So, starting with 0.45, the repeated multiplication results in

0.90

1.80

**1**.60

| <b>1</b> .20 |  |
|--------------|--|
| 0.40         |  |
| 0.80         |  |
| <b>1</b> .60 |  |
| <b>1</b> .20 |  |

Now, the binary representation can be obtained by considering the digits in bold. An appropriate representation can be obtained depending on the number of bits available (e.g., 0111 if 4 bits are available, 01110011 if 8 bits are available, and so on). The representation for decimal number 13.45 in the fixed point format with 4 binary places will be as follows:

> 13.45: 1101.0111

One may note that the represented number is only an approximation of the actual number. The represented number can be converted back to decimal and seen to be 13.4375 (slightly off from the number we started with). The representation approaches the actual number as more and more binary places are added to the representation.

Negative fractions can be represented in 2's complement form. Let us represent -13.45 in 2's complement form. This cannot be done if we have only four places for the integer. We need to have at least 5 bits for the integer in order to handle the sign. Assuming 5 bits are available for the integer, in a 9 bit format:

| 13.45:         | 01101.0111 |
|----------------|------------|
| 1's complement | 10010.1000 |
| 2's complement | 10010.1001 |

Hence, -13.45 = 10010.1001 in this representation.

When multiplying signed binary numbers, we must consider four cases:

| Multiplicand | Multiplier |
|--------------|------------|
| +            | +          |
| _            | +          |
| +            | _          |
| _            | _          |

When both the multiplicand and the multiplier are positive, standard binary multiplication is used. For example,

| 0.1 1 1                                          | (+7/8)                         | ←          | Multiplicand                                                                                                                                                                                                         |
|--------------------------------------------------|--------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| × 0.1 0 1                                        | (+5/8)                         | ←          | Multiplier                                                                                                                                                                                                           |
| (0. 0 0)0 1 1 1<br>(0.)0 1 1 1<br>0. 1 0 0 0 1 1 | (+7/64)<br>(+7/16)<br>(+35/64) | <b>← ←</b> | <i>Note</i> : The proper representation of the fractional partial products requires extension of the sign bit past the binary point, as indicated in parentheses. (Such extension is not necessary in the hardware.) |

When the multiplicand is negative and the multiplier is positive, the procedure is the same as in the previous case, except that we must extend the sign bit of the multiplicand so that the partial products and final product will have the proper negative sign. For example,

When the multiplier is negative and the multiplicand is positive, we must make a slight change in the multiplication procedure. A negative fraction of the form 1.g has a numeric value -1 + 0.g; for example, 1.011 = -1 + 0.011 = -(1 - 0.011) = -0.101 = -5/8. Thus, when multiplying by a negative fraction of the form 1.g, we treat the fraction part (.g) as a positive fraction, but the sign bit is treated as -1. Hence, multiplication proceeds in the normal way as we multiply by each bit of the fraction and accumulate the partial products. However, when we reach the negative sign bit, we must add in the 2's complement of the multiplicand instead of the multiplicand itself. The following example illustrates this:

When both the multiplicand and the multiplier are negative, the procedure is the same as before. At each step, we must be careful to extend the sign bit of the partial product to preserve the proper negative sign, and at the final step we must add in the 2's complement of the multiplicand, since the sign bit of the multiplier is negative. For example,

| 1.1 0 1          | (-3/8)  |                                 |    |
|------------------|---------|---------------------------------|----|
| × 1.1 0 1        | (-3/8)  |                                 |    |
| (1. 1 1) 1 1 0 1 | (-3/64) | ← <i>Note</i> : Extend sign bit |    |
| (1.)1 1 01       | (-3/16) |                                 |    |
| 1.11 0001        |         |                                 |    |
| 0. 0 1 1         | (+3/8)  | ← Add the 2's complement of the | he |
| 0.001001         | (+9/64) | multiplicand.                   |    |

In summary, the procedure for multiplying signed 2's complement binary fractions is the same as for multiplying positive binary fractions, except that we must be careful to preserve the sign of the partial product at each step, and if the sign of the multiplier is negative, we must complement the multiplicand before adding it in at the last step. The hardware is almost identical to that used for multiplication of positive numbers, except a complementer must be added for the multiplicand.

Figure 4-31 shows the hardware required to multiply two 4-bit fractions (including the sign bit). A 5-bit adder is used so the sign of the sum is not lost due to a carry into the sign bit position. The M input to the control circuit is the currently active bit of the multiplier. Control signal Sh causes the accumulator to shift right one place with sign extension. Ad causes the ADDER output to be loaded into the left 5 bits of the accumulator. The carry out from the last bit of the adder is discarded, since we are doing 2's complement addition. Cm causes the multiplicand (Mcand) to be complemented (1's complement) before it enters the adder inputs. Cm is also connected to the carry input of the adder so that when Cm = 1, the adder adds 1 plus the 1's complement of Mcand to the accumulator, which is equivalent to adding the

FIGURE 4-31: Block Diagram for 2's Complement Multiplier



2's complement of Mcand. Figure 4-32 shows a state graph for the control circuit. Each multiplier bit (M) is tested to determine whether to add and shift or whether to just shift. In state  $S_7$ , M is the sign bit, and if M = 1, the complement of the multiplicand is added to the accumulator.

FIGURE 4-32: State Graph for 2's Complement Multiplier



When the hardware in Figure 4-31 is used, the add and shift operations must be done at two separate clock times. We can speed up the operation of the multiplier by moving the wires from the adder output one position to the right (Figure 4-33) so that the adder output is already shifted over one position when it is loaded into the accumulator. With this arrangement, the add and shift operations can occur at the same clock time, which leads to the control state graph of Figure 4-34. When the multiplication is complete, the product (6 bits plus sign) is in the lower 3 bits

FIGURE 4-33: Block Diagram for Faster Multiplier





of A followed by B. The binary point then is in the middle of the A register. If we wanted it between the left two bits, we would have to shift A and B left one place.

A behavioral Verilog model for this multiplier is shown in Figure 4-35. Shifting the *A* and *B* registers together is accomplished by the sequential statements

```
A <= {A[3], A[3:1]};
B <= {A[0], B[3:1]};
```

FIGURE 4-35: Behavioral Model for 2's Complement Multiplier

```
`define M B[0]
module mult2C (CLK, St, Mplier, Mcand, Product, Done);
  input CLK;
  input St;
  input[3:0] Mplier;
  input[3:0] Mcand;
  output[6:0] Product;
  output Done;
  reg[2:0] State;
  reg[3:0] A;
  reg[3:0] B;
  reg[3:0] addout;
  initial
  begin
      State = 0:
  end
  always @(posedge CLK)
  begin
         case (State)
             0:
                       begin
                          if (St == 1'b1)
                          begin
                              A <= 4'b0000;
                              B <= Mplier ;
```

```
State <= 1;
                             end
                             else
                                 State <= 0;
                         end
              1, 2, 3:
                         begin
                             if (`M == 1'b1)
                             begin
                                 addout = A + Mcand;
                                 A <= {Mcand[3], addout[3:1]};
                                 B <= {addout[0], B[3:1]};
                             end
                             else
                             begin
                                 A \leftarrow \{A[3], A[3:1]\};
                                 B \leftarrow \{A[0], B[3:1]\};
                             State <= State + 1;
                         end
              4:
                         begin
                             if (`M == 1'b1)
                             begin
                                 addout = A + \sim Mcand + 1;
                                 A <= {~Mcand[3], addout[3:1]};
                                 B <= {addout[0], B[3:1]} ;</pre>
                             end
                             else
                             begin
                                 A \leftarrow \{A[3], A[3:1]\};
                                 B \leftarrow \{A[0], B[3:1]\};
                             end
                             State <= 5 ;
                         end
              5:
                         begin
                            State \leftarrow 0;
                         end
       default:
                         begin
                             State \leftarrow 0;
                         end
           endcase
           end
   assign Done = (State == 5) ? 1'b1 : 1'b0 ;
   assign Product = \{A[2:0], B\};
endmodule
```

Although these statements are executed sequentially, A and B are both scheduled to be updated at the same delta time. Therefore, the old value of A[0] is used when computing the new value of B.

A register *addout* has been defined to represent the 5-bit output of the adder. In states 1 through 4, if the current multiplier bit M is 1, then the sign bit of the multiplicand followed by 3 bits of addout are loaded into A. At the same time, the low-order bit of *addout* is loaded into B along with the high-order 3 bits of B. The *Done* signal is turned on when control goes to state 5, and then the new value of the product is outputted.

Before continuing with the design, we will test the behavioral level Verilog code to make sure that the algorithm is correct and consistent with the hardware block diagram. At early stages of testing, we will want a step-by-step printout to verify the internal operations of the multiplier and to aid in debugging, if required. When we think that the multiplier is functioning properly, then we will only want to look at the final product's output so that we can quickly test a large number of cases.

Figure 4-36 shows the command file and test results for multiplying +5/8 by -3/8. A clock is defined with a 20-ns period. The St signal is turned on at 2 ns and turned off one clock period later. By inspection of the state graph, the multiplication requires six clocks, so the run time is set at 120 ns.

FIGURE 4-36: Command File and Simulation Results for (+5/8 by -3/8)

```
// command file to test signed multiplier
add list CLK St State A B Done Product
force St 1 2, 0 22
force CLK 1 0, 0 10 - repeat 20
// (5/8 * -3/8)
force Mcand 0101
force Mplier 1101
run 120
      delta CLK St State
                               Α
                                     В
                                         Done
                                                Product
 ns
  0
               1
                  0
                         0 0000 0000
                                            0
                                                0000000
         +1
  2
         +0
               1
                  1
                         0 0000 0000
                                            0
                                                0000000
              0
 10
         +0
                  1
                         0 0000 0000
                                            0
                                                0000000
 20
         +1
              1
                  1
                         1 0000 1101
                                            0
                                                0000000
 22
         +0
              1
                  0
                         1 0000 1101
                                            0
                                                0000000
 30
         +0
              0
                  0
                         1 0000 1101
                                            0
                                                0000000
 40
         +1
              1
                  0
                         2 0010 1110
                                            0
                                                0000000
 50
         +0
              0
                  0
                         2 0010 1110
                                            0
                                                0000000
                         3 0001 0111
 60
         +1
              1
                  0
                                            0
                                                0000000
 70
              0
                  0
                         3 0001 0111
                                            0
                                                0000000
         +0
 80
               1
                  0
                         4 0011 0011
                                            0
         +1
                                                0000000
 90
               0
                         4 0011 0011
                                            0
         +0
                  0
                                                0000000
100
               1
                         5 1111 0001
                                            1
         +2
                  0
                                                1110001
110
         +0
               0
                  0
                          5 1111 0001
                                            1
                                                1110001
120
               1
                  0
                         0 1111 0001
                                            0
                                                1110001
         +1
```

To thoroughly test the multiplier, we need to test not only the four standard cases (++,+-,-+, and --) but also special cases and limiting cases. Test values for the multiplicand and multiplier should include 0, the largest positive fraction, the most negative fraction, and all 1s. We will write a Verilog test bench to test the multiplier. The **test bench** will provide a sequence of values for the multiplicand and the multiplier. Thus, it provides stimuli to the system under test, the multiplier. The test bench can also check for the correctness of the multiplier output. The multiplier we are testing will be treated as a component and embedded in the test bench program. The signals generated within the test bench are interfaced to the multiplier as shown in Figure 4-37.

FIGURE 4-37: Interface between Multiplier and Its Test Bench



Figure 4-38 shows the Verilog code for the multiplier test bench. The test sequence consists of 11 sets of multiplicands and multipliers, provided in the Mcandarr and Mplierarr arrays. The expected outputs from the multiplier

FIGURE 4-38: Test Bench for Signed Multiplier

```
module testmult ();
   parameter N = 11;
   reg[3:0] Mcandarr[1:N];
   req[3:0] Mplierarr[1:N];
   reg[6:0] Productarr[1:N];
   reg CLK;
   req St:
   wire Done;
   reg[3:0] Mplier;
   reg[3:0] Mcand;
   wire[6:0] Product;
   integer i;
   initial
   beain
      CLK = 1'b1;
      Mcandarr[1] = 4'b0111;
      Mcandarr[2] = 4'b1101;
      Mcandarr[3] = 4'b0101;
      Mcandarr[4] = 4'b1101;
      Mcandarr[5] = 4'b0111;
      Mcandarr[6] = 4'b1000;
```

```
Mcandarr[7] = 4'b0111;
       Mcandarr[8] = 4'b1000:
       Mcandarr[9] = 4'b0000;
       Mcandarr[10] = 4'b1111;
       Mcandarr[11] = 4'b1011;
       Mplierarr[1] = 4'b0101;
       Mplierarr[2] = 4'b0101;
       Mplierarr[3] = 4'b1101;
       Mplierarr[4] = 4'b1101;
       Mplierarr[5] = 4'b0111;
       Mplierarr[6] = 4'b0111;
       Mplierarr[7] = 4'b1000;
       Mplierarr[8] = 4'b1000;
       Mplierarr[9] = 4'b1101;
       Mplierarr[10] = 4'b1111;
       Mplierarr[11] = 4'b0000;
       Productarr[1] = 7'b0100011;
       Productarr[2] = 7'b1110001;
       Productarr[3] = 7'b1110001;
       Productarr[4] = 7'b0001001;
       Productarr[5] = 7'b0110001;
       Productarr[6] = 7'b1001000;
       Productarr[7] = 7'b1001000;
       Productarr[8] = 7'b1000000;
       Productarr[9] = 7'b00000000;
       Productarr[10] = 7'b0000001;
       Productarr[11] = 7'b00000000;
   end
   always
   begin
       #10 CLK <= ~CLK ;
   end
   always @(posedge CLK)
   begin
       for(i = 1; i \le N; i = i + 1)
       begin
          Mcand <= Mcandarr[i] ;</pre>
          Mplier <= Mplierarr[i] ;</pre>
          St <= 1'b1 ;
          @(posedge CLK);
          St \ll 1'b0;
          @(negedge Done);
          if (~(Product == Productarr[i])) //compare with expected answer
              $display("Incorrect Product (error)");
       end
       $display("TEST COMPLETED (ERROR)");
   end
   mult2C mult1 (CLK, St, Mplier, Mcand, Product, Done);
endmodule
```

are provided in another array, the *Productarr*, in order to test the correctness of the multiplier outputs. The test values and results are placed in constant arrays in the Verilog code. The multiplier is instantiated and all signals are mapped with the test sequences.

#### mult2C mult1 (CLK, St, Mplier, Mcand, Product, Done);

The tester also generates the clock and start signal. The for loop reads values from the Mcandarr and Mplierarr arrays and then sets the start signal to 1. After the next clock, the start signal is turned off. Then, the test bench waits for the Done signal. When the trailing edge of Done arrives, the multiplier output is compared against the expected output in the array *Productarr*. An error is reported if the answers do not match. Since the *Done* signal is turned off at the same time the multiplier control goes back to  $S_0$ , the process waits for the falling edge of Done before looping back to supply new values of Mcand and Mplier. One may note that the **multiplier instatiation** is outside the always statement which generates the stimulus. The multiplier constantly receives some set of inputs and generates the corresponding set of outputs.

Figure 4-39 shows the command file and simulator output. We have annotated the simulator output to interpret the test results. The -NOtrigger together with the -Trigger done in the list statement causes the output to be displayed only when the *Done* signal changes. Without the -NOtrigger and -Trigger, the output would be displayed every time any signal on the list changed. All the product outputs are correct, except for the special case of  $-1 \boxtimes -1$  (1.000  $\boxtimes$  1.000), which gives 1.000000 (-1) instead of +1. This occurs because no representation of +1 is possible without adding another bit.

FIGURE 4-39: Command File and Simulation of Signed Multiplier

```
// Command file to test results of signed multiplier
add list -NOtrigger Mplier Mcand Product -Trigger Done
run 1320
       delta
              Mplier Mcand Product Done
  ns
   0
          +0
                XXXX XXXX XXXXXX
   0
          +2
                0101 0111 xxxxxxx
 100
          +2
                0101 0111 0100011
                                      1 \ 5/8 * 7/8 = 35/64
 120
          +2
                     1101 0100011
                0101
                     1101 1110001
                                      1 \ 5/8 \ * \ -3/8 = \ -15/64
 220
          +2
                0101
          +2
                1101 0101 1110001
 240
                                      1 - 3/8 * 5/8 = -15/64
 340
          +2
                1101 0101 1110001
 360
          +2
                1101
                     1101 1110001
 460
          +2
                1101
                     1101 0001001
                                       1 - 3/8 * - 3/8 = 9/64
 480
          +2
                0111 0111 0001001
                                       1 7/8 * 7/8 = 49/64
 580
          +2
                0111 0111 0110001
 600
          +2
                0111
                     1000 0110001
                                       1 7/8 * -1 = -7/8
 700
          +2
                0111
                     1000 1001000
 720
          +2
                1000 0111 1001000
```

```
820
                                         -1 * 7/8 = -7/8
          +2
               1000
                     0111 1001000
 840
          +2
               1000
                     1000 1001000
 940
          +2
               1000
                     1000 1000000
                                          -1 * -1 = -1 (error)
                     0000 1000000
 960
          +2
               1101
         +2
                     0000 0000000
                                      1
                                         -3/8 * 0 = 0
1060
               1101
         +2
                     1111 0000000
1080
               1111
                                         -1/8 * -1/8 = 1/64
1180
         +2
               1111
                     1111 0000001
                                      1
1200
         +2
               0000
                     1011 0000001
                                       1
                                         0 * -3/8 = 0
1300
         +2
               0000
                     1011 0000000
1320
         +2
               0000
                     1011 0000000
```

Next, we refine the Verilog model for the signed multiplier by explicitly defining the control signals and the actions that occur when each control signal is asserted. The Verilog code (Figure 4-40) is organized in a manner similar to the Mealy machine model of Figure 1-17. In the first process, the Nextstate and output control signals are defined for each present State. In the second process, after waiting for the rising edge of the clock, the appropriate registers are updated and the *State* is updated. We can test the Verilog code of Figure 4-40 using the same test file we used previously and verify that we get the same product outputs.

FIGURE 4-40: Model for 2's Complement Multiplier with Control Signals

```
`define M B[0]
// This Verilog model explicitly defines control signals.
module mult2C2 (CLK, St, Mplier, Mcand, Product, Done);
   input CLK;
   input St;
   input[3:0] Mplier;
   input[3:0] Mcand;
   output[6:0] Product;
   output Done;
   reg Done;
   reg[2:0] State;
   reg[2:0] Nextstate;
   reg[3:0] A;
   reg[3:0] B;
   wire[3:0] compout;
   wire[3:0] addout;
   reg AdSh;
   reg Sh;
   reg Load;
   reg Cm;
   always @(State or St or `M)
   begin
      Load = 1'b0;
```

```
AdSh = 1'b0 ;
Sh = 1'b0;
Cm = 1'b0;
Done = 1'b0;
Nextstate = 1'b0;
case (State)
   0:
             begin
                if (St == 1'b1)
                begin
                   Load = 1'b1;
                   Nextstate = 1;
                end
                else
                begin
                   Load = 1'b0;
                   Nextstate = 0;
                end
             end
   1, 2, 3:
             begin
                if (`M == 1'b1)
                begin
                  AdSh = 1'b1 ;
                end
                else
                begin
                   Sh = 1'b1;
                Nextstate = State + 1;
             end
   4:
             begin
                if (`M == 1'b1)
                begin
                   Cm = 1'b1 ;
                   AdSh = 1'b1 ;
                end
                else
                begin
                   Sh = 1'b1 ;
                end
                Nextstate = 5;
             end
   5:
             begin
                Done = 1'b1;
                Nextstate = 0;
             end
default:
```

```
begin
                        Done = 1'b0 :
                        Nextstate = 0:
                     end
       endcase
   end
   assign compout = (Cm == 1'b1) ? ~Mcand : Mcand ;
   assign addout = A + compout + Cm ;
   always @(posedge CLK)
   begin
       if (Load == 1'b1)
       begin
          A \le 4'b0000;
          B <= Mplier ;
       end
       if (AdSh == 1'b1)
       begin
          A <= {compout[3], addout[3:1]};
          B <= {addout[0], B[3:1]};
       end
       if (Sh == 1'b1)
       begin
          A \leftarrow \{A[3], A[3:1]\};
          B \leftarrow \{A[0], B[3:1]\};
       end
       State <= Nextstate ;
   end
   assign Product = \{A[2:0], B\};
endmodule
```

# 4.11 Keypad Scanner

In this example, we design a scanner for a keypad with three columns and four rows as in Figure 4-41. The keypad is wired in matrix form with a switch at the intersection of each row and column. Pressing a key establishes a connection between a row and column. The purpose of the scanner is to determine which key has been pressed and to output a binary number  $N = N_3 N_2 N_1 N_0$ , which corresponds to the key number. For example, pressing key 5 must output 0101, pressing the \* key must output 1010, and pressing the # key must output 1011. When a valid key has been detected, the scanner should output a signal V for one clock time. Assume that only one key