# EECS 151/251A Homework 9

Due Friday, December 2<sup>rd</sup>, 2022 11:59PM

# Problem 1: Excuses, Excuses, Ek-skew-ses ...

Consider the following circuit diagram. R1 and R2 are rising-edge triggered flip-flops. The maximum and minimum combinational delays are listed for each path. You can assume that  $t_{clk-q} = 50ps$ ,  $t_{su} = 75ps$ , and  $t_{hold} = 60ps$ , and the multiplexers have negligible delay. SEL is a single control bit that is stable during any given clock cycle. CLK2 is a clock signal derived from CLK1 with some amount of delay, as indicated in each subpart.



(a) Assuming that the clock source and clock tree are ideal, answer the following questions.



(i) What is the minimum clock period,  $T_{min}$ , that this circuit may operate at correctly?

# Solution: $max(\text{summed }t_{l,max} \text{ along each path}) = max(130 + 100, 150 + 100, 200) = 250ps$ Maximal path is from F1 to F2.

 $T_{min} = t_{clk-q} + max(\text{summed } t_{l,max} \text{ along each path}) + t_{su} = 50 + 250 + 75 = 375ps$ 

(ii) Does this circuit have a hold-time violation?

#### Solution:

Minimal path is clearly from F2 to F1.

$$t_{clk-q} + t_{l,min} = 50 + 60 = 110ps$$

$$t_h = 60ps$$

$$t_h \leq t_{clk\text{-}q} + t_{l,min}$$

Therefore, no hold violation.

(b) Considering the indicated clock skew  $t_{sk}=30ps$  in the clock tree, answer the following questions.



(i) What is the minimum clock period,  $T_{min}$ , that this circuit may operate at correctly?

## Solution:

We must carefully look for the maximal path while considering skew correctly.

F1 to F2:

$$t_{skew} = t_{sk}$$

 $T_{clk} \ge t_{clk-q} + max(\text{summed } t_{l,max} \text{ along each path}) + t_{su} - t_{skew} = 50 + 250 + 75 - 30 = 345ps$ 

F2 to F1:

$$t_{skew} = -t_{sk}$$

$$T_{clk} \ge t_{clk-q} + t_{l,max} + t_{su} - t_{skew} = 50 + 200 + 75 - (-30) = 355ps$$

 $T_{min} = max(345, 355) = 355ps$ 

(ii) What is the hold slack for this circuit?

#### Solution:

We must carefully look for the minimal path while considering skew correctly.

F1 to F2:

 $t_{skew} = t_{sk}$ 

hold slack<sub>1</sub> =  $t_{clk-q} + min(\text{summed } t_{l,min} \text{ along each path}) - (t_h + t_{skew}) = 50 + min(80, 120) - (60 + 30) = 40ps$ 

F2 to F1:

 $t_{skew} = -t_{sk}$ 

hold slack<sub>2</sub> =  $t_{clk-q} + t_{l,min} - (t_h + t_{skew}) = 50 + 60 - (60 + -30) = 80ps$ 

hold slack = min(40, 80) = 40ps

(iii) Does this circuit have a hold-time violation?

#### Solution:

No as we have positive hold slack.

(c) [251A Only] You may now set the value of  $t_{sk}$ . What would you set  $t_{sk}$  to, for the shortest clock period possible? What is that  $T_{min}$ ? Ignore hold-time violations for this part.



#### Solution:

 $T_{min}$  is set by the slowest pipeline stage. Therefore, our goal is to equalize the time taken by each pipeline stage.

 $T_{min,1} = t_{clk-q} + max(\text{summed } t_{l,max} \text{ along each path}) + t_{su} - t_{sk} = 50 + 250 + 75 - t_{sk}$ 

 $T_{min,2} = t_{clk-q} + t_{l,max} + t_{su} + t_{sk} = 50 + 200 + 75 + t_{sk}$ 

 $T_{min,1} = T_{min,2}$ 

 $50 + 250 + 75 - t_{sk} = 50 + 200 + 75 + t_{sk}$ 

 $t_{sk} = \frac{50}{2} = 25ps$ 

 $T_{min} = 50 + 250 + 75 - t_{sk} = 350ps$ 

(d) Considering the indicated clock skew  $t_{sk} = 30ps$  in the clock tree, and cycle-to-cycle jitter at the source  $t_i = \pm 10ps$  answer the following questions.



(i) What is the minimum clock period,  $T_{min}$ , that this circuit may operate at correctly?

#### Solution:

Since we have the same skew as last time, we can consider maximal path and  $T_{min}$  from (b) and add jitter in the worse direction.

Remember  $T_{min}$  considers the case where data is launched at rising clock edge i and received at rising clock edge i + 1, and calculates the longest path between these edges.

In the worst case, clock edge i occurs 10ps later than nominal, and clock edge i+1 occurs 10ps earlier than nominal, giving us  $t_{jitter} = 20ps$ .  $T_{min} = T_{min,b} + t_{jitter} = 355 + 20 = 375ps$ 

(ii) What is the hold slack for this circuit? (Think carefully here, hold time considers the following case: Observing some rising clock edge *i* for a pipeline, the value at an FF's input before *i* needs to be held stable for hold time past *i*. However, at the clock edge, the pipeline starts computing a new value. This value shouldn't reach that FF's input within hold time past the **same** clock edge *i*.)

#### Solution:

Again, since we have the same skew as last time, we can consider minimal path and hold slack from (b) and add jitter in the worse direction.

However, as jitter here is applied to the clock source, both clocks will be in sync. At any given clock edge i, they will see the same jitter delay. Therefore, hold time and hold slack remain unaffected. hold slack = 40ps

(iii) Does this circuit have a hold-time violation?

#### Solution:

No as we have positive hold slack.

# Problem 2: ScRAMbling for Answers

Fresh off the blocks from 151/251A, you've decided to take on the tapeout course. Halfway through the semester, you realize that the ever-important SRAM-based memory blocks on your RV64GC SoC are not behaving correctly. Time to investigate!

- (a) The spec requires a byte-addressable memory with a capacity of 64 words of storage. Additionally, the design of the memory should have a 1:1 aspect ratio. (What is the size of each word for this ISA?)
  - (i) What is the total capacity of your memory in bits?

## Solution:

 $64 \times 64 = 4096 = 4Kb$  or 4Kbits

(ii) How many rows and columns of SRAM cells should this memory have?

# Solution:

For a 1:1 aspect ratio, rows = columns =  $\sqrt{4096} = 64$ 

(iii) What is the total capacity of your memory in bytes?

#### Solution:

 $64 \times 8 = 512B$ 

(iv) How many total address bits do you need for this memory? How many address bits are used by the row-decoder? How many address bits are used by the column-decoder? (Keep in mind that the memory is byte-addressable. Reason about how would access a single byte from such a memory.)

#### Solution:

Since this is byte-addressable memory,  $\log_2 512 = 9$  address bits Each row holds 64 columns = 64b = 8B.  $\log_2 8 = 3$  column decode bits Row decode bits = Total address bits – column decode bits = 9 - 3 = 6

(b) When you design a chip, you must do so for a particular technology - think Intel 16, TSMC 28 and so on. Physical design of SRAMs is tricky because of their large size and heavy density of bit cells. Therefore, SRAMs are generally provided as many fixed-size macros with a process development kit (PDK). To design an SRAM-based memory (that is not the same size as the macros in the PDK), you often have to combine different SRAM macro blocks together.

You are given a 32 x 64 SRAM macro with that supports 1R xor 1W at a time. You need to use multiple instances of this macro to design a 64 x 64-bit memory that supports 2R xor 1W at a time. The diagram below is a possible implementation. Using the following address signals, and possibly indexing into them, fill in the blanks:

- (1) addr1rw address used to read from port 1 or write
- (2) addr2 address used to read from port 2

Assume we follow little-endian convention, and this particular memory block is word-addressable in a larger byte-addressable memory, i.e., the lowest 3 bits of the address are used to find the right byte in the 64 bit (8 byte) word outside of this system.



# Solution:

We have a total of 9 address bits. Since this memory block is word addressable, but the memory system is byte addressable, we ignore addr1w[2:0]. We can use any address bit to divide the address space in half, in this case, we choose the MSB.

- (1) adddr1w[7:3]
- (2) addr2[7:3]
- (3) adddr1w[8]
- (4) adddr2[8]
- (5) adddr1w[8]
- (6) adddr1w[8]

(c) Noticing the high density and power consumption of the SRAMs on your SoC, you consider using 5T bit cells instead of the standard 6T bit cells we saw in discussion. After all, eliminating 1 bit line per column and 1 access transistor per bit cell can be quite beneficial. Let us analyze 5T SRAM cells further.



Figure 1. Circuit diagram of the standard 6T SRAM cell.



Figure 2. Circuit diagram of the traditional 5T SRAM cell.

Here Q := A and  $\overline{Q} := B$ .

To read from a 5T SRAM cell, you pre-charge the bit line to  $V_{DD}$ . Then the word line is held high, to enable the NMOS access transistor. If Q := 0, the bit line will drop voltage. If Q := 1, the bit line will stay high.

(i) In order to guarantee read stability, what constraints exist on the relative strength of the transistors? (Think about what happens when we read a 0.)

Solution: MN1 > MA1

(ii) In order to guarantee writability, what constraints exist on the relative strength of the transistors? (Think about what happens when we write a 1 and when we write a 0.)

# Solution:

Writing a 0: MA1 > MP1Writing a 1: MA1 > MN1

(iii) (True or False) In this design, improving read stability directly worsens writability and vice-versa.

# Solution:

True

Given these contradictory conditions, using 5T SRAM cells reliably requires techniques that revolve around modifying the involved word line voltage, bit line voltage, and supply voltage, when reading v. writing. However, these can reduce speed and drive current of the cell, or require significant peripheral circuitry. All of this makes them difficult to use.

This problem uses the following paper for reference: Yu, Chien-Cheng, and Ming-Chuen Shiau. "Design of High Performance Single-Port 5T SRAM Cell." IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) 11.5 (2016): 14.