

# **BSV** Training

Lec\_CRegs

A CReg (Concurrent Register) is a register-like primitive that enables greater concurrency (more rules per clock)

reture (m. pi);
collusion

PEFCATIONATE talentiffs
collusion actionality
referring actionality
referring actionality
referring actionality
referring actionality
referring actionality

tdstendf2F0M(Yn\_dapth) Vn\_attlettelöjatiketddig: Fds and (Trusk Batai' In\_data o blanasi,firak

FF-20(hairf) est gam o

delerate, gamin, deta ) = 0 ? estemul : estemul;

es



#### Enabling greater concurrency

With a primitive module like mkReg, the effect of a rule's Action (\_write) is not visible until the next clock (via \_read), because of its method ordering constraint (<)



For greater rule concurrency, we need another primitive whose method ordering constraints allow an Action's effect to be visible in the same clock.

In BSV, we use a primitive called the *CReg*<sup>1</sup>.

<sup>&</sup>lt;sup>1</sup> "CReg" = Concurrent Register. These are based on the "Ephemeral History Register" which was researched by Daniel Rosenband at MIT in 2004.



#### A motivating example

• Suppose we want to build a two-port, saturating, up/down counter of 4-bit signed integer values, with the following interface:

```
interface UpDownSatCounter_Ifc;
  method ActionValue #(Int #(4)) countA (Int #(4) delta);
  method ActionValue #(Int #(4)) countB (Int #(4) delta);
endinterface
```

- The "two ports" are the two identical methods countA and countB
- A module implementing this interface has internal state holding the current value of the counter (Int #(4) type, so range is -8 to +7)
- When either method is called,
  - The internal state is incremented by delta (range: -8 to +7), but saturates at +7 on overflow and at -8 on underflow
  - The old value of the counter is returned as the result of the method

Note: because of finite precision and saturation, "count" operations are not commutative like in conventional arithmetic; so, the order of these operations matters here!



### An implementation using ordinary registers (v1)

Since both methods do the same thing, we abstract their common behavior into a function fn\_count()

#### **BSV** notes:

- "extend (e)" sign-extends for Int#(n), and zero-extends for Bit#(n) and UInt#(n)
- "truncate (e)" drops MSBs, taking care of sign bits etc.
- The number of bits extended/truncated depends on the input and output type widths



#### A testbench to drive the up/down counter module

```
module mkTest (Empty);
  UpDownSatCounter Ifc ctr <- mkUpDownSatCounter;</pre>
  Reg #(int) step <- mkReg (0);</pre>
  Reg #(Bool) flag0 <- mkReg (False); Reg #(Bool) flag1 <- mkReg (False);</pre>
  function Action count show (Integer rulenum, Bool a not b, Int #(4) delta);
      action
         let x <- (a not b ? ctr.countA (delta) : ctr.countB (delta));</pre>
         $display ("cycle %0d, r%0d: is %0d, count (%0d)", cur cycle, rulenum, x, delta);
      endaction
   endfunction
  // Rules 0-9 are sequential, just testing one method at a time
  rule r0 (step == 0); count show (0, True, 3); step <= 1; endrule</pre>
  rule r1 (step == 1); count show (1, True, 3); step <= 2; endrule
                   ... and similarly, sequentially feed deltas of 3,3, -6,-6,-6, 7, 3,
   // Concurrent execution
  rule r10 (step == 10 && !flag0); count show (10,True, 6); flag0 <= True; endrule
  rule r11 (step == 10 && !flaq1); count show (11,False, -3); flaq1 <= True; endrule
   // Show final value
  rule r12 (step == 10 && flag0 && flag1); count show (12, True, 0); $finish; endrule
endmodule: mkTest
```

In rules 0-9, we call either countA or countB with deltas: 3,3,3,3, -6,-6,-6, 7, 3. The rule conditions and step assignments force them to fire 1 rule per clock (and so it doesn't matter whether we call countA or countB in these rules).

Rules 10 and 11 could potentially fire concurrently (if scheduling permits).

Rule 12 just displays the final counter value and exits.



## Expected behavior and outputs for v1

We expect the following behavior if r10 fires in the clock before r11:



We expect the following behavior if r11 fires in the clock before r10:

• •

same for r0 through r9



#### Actual output for v1

When we compile the program (v1), *bsc* produces the following message:

```
Warning: "Test.bsv", line 16, column 8: (G0010)
Rule "r10" was treated as more urgent than "r11". Conflicts:
    "r10" cannot fire before "r11": calls to ctr.countA vs. ctr.countB
    "r11" cannot fire before "r10": calls to ctr.countB vs. ctr.countA
```

#### This is saying:

- r10 and r11 conflict; they cannot be scheduled in the same clock (countA and countB conflict because they both read and write the "ctr" register inside mkUpDownSatCounter, thus both possible rule orders will violate a "\_read < \_write" ordering constraint)
- *bsc* has chosen to give priority to r10, i.e., if both r10 and r11 are enabled in the same clock, the scheduling logic will allow r10 to fire and will suppress r11 (r11 could fire, and indeed it does, in the next clock, when r10 is no longer enabled)
- Note: you can force the opposite priority by adding a "descending\_urgency" attribute to the module

When we run the program (v1), we see: (per first schedule in previous slide)

```
cycle 1, r0: is 0, count (3)
cycle 2, r1: is 3, count (3)
cycle 3, r2: is 6, count (3)
cycle 4, r3: is 7, count (3)
cycle 5, r4: is 7, count (-6)
cycle 6, r5: is 1, count (-6)
cycle 7, r6: is -5, count (-6)
cycle 8, r7: is -8, count (-6)
cycle 9, r8: is -8, count (7)
cycle 10, r9: is -1, count (3)
cycle 11, r10: is 2, count (6)
cycle 12, r11: is 7, count (-3)
cycle 13, r12: is 4, count (0)
```

2 cycles



### v1 is not really a "2-port" counter

v1 of our mkUpDownSatCounter may be functionally correct, but it's hardly a "2-port" counter!

When we say "2-port", we are making a performance characterization, i.e., we expect both ports to be operable in the same clock.

For this, we need to replace the Reg in mkUpDownSatCounter with an CReg, a different primitive that allows "multiple reads and writes" within a clock.



#### First: specifying the semantics of the two ports

Before we worry about implementations and CRegs, we must first specify the *desired* semantics of the two ports! Specifically:

When both countA and countB are operated in the same clock,

- what should be the final value of the counter?
- what should be the "old" values returned by each method?

In light of the finite precision arithmetic, and the saturating behavior, there is no obvious unique answer! It is a design choice!

In RTL designs, this is typically where you'll see an ad hoc choice made by the designer

- Which is (hopefully!) implemented correctly
- Which is (hopefully!) documented clearly and fully in English text in the datasheet
- Which may contain usage rules the user of the IP must follow, and which therefore need verification

In BSV, method orderings give us a formal and precise way to specific the semantics. By specifying that we want "countA < countB" or "countA > countB", we give precise answers to the above two semantic questions, because when operated in the same clock, there is a well-defined *logical* ordering that specifies the behavior exactly.

Further, bsc always verifies correct usage because it's in the semantics, not ad hoc English.



### CRegs (Concurrent Registers)

A CReg provides a *vector* of standard Reg interfaces that can be operated concurrently:

```
interface CReg #(numeric type n, type t);
  interface Vector #(n, Reg #(t)) ports;
endinterface
```

#### **BSV** notes:

- "Vector" is a standard importable BSV library
- The parameter n is the number of elements in the vector; t is the data type stored in the CReg

The ports of an CReg can be operated concurrently, with the following ordering constraints:

```
ports [0]._read <= ports [0]._write <
ports[1]._read <= ports[1]._write <
ports[2]._read <= ports[2]._write <
... ...
ports[n-1]._read <= ports[n-1]._write</pre>
```

This is the same as the standard register method-ordering constraint

But note that a value written in port 0 can be read concurrently on port 1 (by a logically later rule in the same clock), unlike an ordinary register where a write can only be read in the next clock



### A possible implementation of an CReg

This figure shows a possible circuit implementation of a CReg:



But note, this is not a *definition* of an CReg, it is merely shown to strengthen intuition. It is important, as usual, to keep separate the logical semantics from any implementation semantics. When using CRegs in BSV, one only needs to consider its method-ordering constraints (shown in the previous slide).



## Implementing our counter using CRegs (v2)

```
module mkUpDownSatCounter (UpDownSatCounter Ifc);
                                                                                 Change Reg to
  CReg#(2, Int #(4)) ctr <- mkCReg(0);</pre>
                                                                                 CReq
   function ActionValue #(Int #(4)) fn count (Integer p, Int #(4) delta);
      actionvalue
           // Extend the precision to avoid over/under flows
           Int #(5) new val = extend (ctr.ports [p]) + extend (delta);
           if (new val > 7) ctr.ports [p] <= 7;</pre>
                                                                                  Add CRea port
           else if (new val < -8) ctr.ports [p] <= -8;
           else ctr.ports [p] <= truncate (new val);</pre>
                                                                                  reads and writes
           return ctr.ports [p]; // note: returns old value
      endactionvalue
   endfunction
                                                                  For "countA < countB".
  method countA (Int #(4) delta) = fn_count (0, delta);
  method countB (Int #(4) delta) = fn count (1, delta);
                                                                  To implement "countB < countA",
endmodule
                                                                  change to:
 This is only a slight change to v1:
                                                                         ... = fn count (1, delta);
                                                                         ... = fn count (0, delta);
```

- The internal "ctr" is now a 2-port CReg instead of a Reg
- fn\_count is now parameterized by the CReg port "p" it should use
- countA and countB call this function with ports 0 and 1, respectively, thereby implementing the ordering semantics "countA < countB"</li>



### Behavior and outputs for v2

We expect the following behavior (r10 < r11 in same clock):



When we run the program (v2), we see:

```
cycle 1, r0: is 0, count (3)
cycle 2, r1: is 3, count (3)
cycle 3, r2: is 6, count (3)
cycle 4, r3: is 7, count (3)
cycle 5, r4: is 7, count (-6)
cycle 6, r5: is 1, count (-6)
cycle 7, r6: is -5, count (-6)
cycle 8, r7: is -8, count (-6)
cycle 9, r8: is -8, count (7)
cycle 10, r9: is -1, count (3)
cycle 11, r10: is 2, count (6)
cycle 11, r11: is 7, count (-3)
cycle 12, r12: is 4, count (0)
```

#### A second example

#### Consider a 2-stage CPU pipeline:



Let us focus on the FIFO connecting the stages:

Usually this is just a 1-element FIFO (we call it a "PipelineFIFO").

a.k.a. "pipeline register with interlock" (the interlock is just the extra valid bit that allows the execute stage to stall if there is nothing in the pipeline register, and allows the fetch/decode stage to stall if there is already something in the register which has not been consumed by the execute stage).



### An implementation using ordinary registers (v1)

```
module mkFIFOF1 (FIFOF #(t));
  Reg #(t) rg <- mkRegU; // data storage
  Reg \#(Bit \#(1)) rg count <- mkReg (0); // \# of items in FIFO (0 \text{ or } 1)
  method Bool notEmpty = (rg count == 1);
  method Bool notFull = (rg count == 0);
  method Action enq (t x) if (rg count == 0); // can enq if not full
     rg \le x;
     rg count <= 1;
   endmethod
  method t first () if (rg count == 1); // can see first if not empty
     return ra;
   endmethod
  method Action deq () if (rg count == 1); // can deq if not empty
     rg count <= 0;
  endmethod
  method Action clear;
     rg count <= 0;
  endmethod
endmodule
```

But: enq and {first, deq} could never be concurrent, with mutually exclusive conditions: rg\_count == 0 and rg\_count == 1

Implication → the fetch/decode stage and and the execute stage in the 2-stage CPU pipeline could never execute in the same clock (it isn't really a pipeline!)



### First: specify desired semantics of concurrent methods

Before we worry about implementations, we must first specify the desired *semantics* of concurrency on FIFO methods. In BSV we commonly use the following two kinds of FIFOs:

#### PipelineFIFOs:

- When empty, only enq is enabled
- When full, enq, first and deq are enabled, with: {first,deq} < enq</li>
   i.e., if both methods are enabled, logically it is like {first,deq} followed by enq,
   i.e., data currently in the FIFO is returned for {first,deq}, and new data is enqueued.

#### BypassFIFOs:

- When full, only {first,deq} is enabled
- When empty, enq, first and deq are enabled, with: enq < {first,deq}</li>
   i.e., if both methods are enabled, logically it is like enq followed by {first,deq},
   i.e., the newly enqueued value is "bypassed" through to {first,deq}.



### An implementation of Pipeline FIFOs using CRegs

```
module mkPipelineFIFOF (FIFOF #(t));
                                  <- mkCReqU; // data storage
   CReg # (3, t) crg
   CReg #(3, Bit #(1)) crg count <- mkCReg (0); // # of items in FIFO
  method Bool notEmpty = (crg count.ports[0] == 1);
  method Bool notFull = (crg count.ports[1] == 0);
  method Action eng (t x) if (crg count.ports[1] == 0);
      crq.ports[1] <= x;</pre>
      crg count.ports[1] <= 1;</pre>
   endmethod
  method t first () if (crg count.ports[0] == 1);
      return crq.ports[0];
   endmethod
  method Action deq () if (crg count.ports[0] == 1);
      crg count.ports[0] <= 0;</pre>
   endmethod
   method Action clear;
      crg count.ports[2] <= 0;</pre>
                                      This is only a slight change to v1:
   endmethod
endmodule

    notEmpty, first and deq use CReg port 0
```

- notFull and eng use CReg port 1
- clear uses CReg port 2



### An implementation of BypassFIFOs using CRegs

```
module mkBypassFIFOF (FIFOF #(t));
                                   <- mkCReqU; // data storage
   CReq #(3, t)
                        crq
   CReg #(3, Bit #(1)) crg count <- mkCReg (0); // # of items in FIFO
  method Bool notEmpty = (crg count.ports[1] == 1);
  method Bool notFull = (crg count.ports[0] == 0);
  method Action eng (t x) if (crg count.ports[0] == 0);
      crq.ports[0] <= x;</pre>
      crg count.ports[0] <= 1;</pre>
   endmethod
  method t first () if (crg count.ports[1] == 1);
      return crg.ports[1];
   endmethod
  method Action deq () if (crg count.ports[1] == 1);
      crg count.ports[1] <= 0;</pre>
   endmethod
   method Action clear;
      crg count.ports[2] <= 0;</pre>
                                      This is only a slight change to v1:
   endmethod
endmodule

    notFull and eng use CReg port 0
```

- notEmpty, first and deq use CReg port 1
- clear uses CReg port 2



#### CReg summary

The CReg is a highly concurrent primitive, i.e., it has multiple methods that can be invoked by multiple rules within a clock in a well-defined logical sequential order.

When using a CReg to communicate between rules that you want to be concurrent (i.e., able to fire in the same clock),

- first, be clear about what semantics you want, by thinking about what logical ordering of rules you want
- then, use CRegs to implement that ordering
  - (ascending CReg port indexes directly correspond to ordering)

Note: a design using CRegs will be functionally correct with any schedule, even one rule per clock. In the extreme schedule of one rule per clock, an CReg is exactly equivalent to an ordinary register (using mkReg).

In practice, we more often directly use concurrent library modules like PipelineFIFO and BypassFIFO. If necessary, we use CRegs to implement a concurrent module that is not available in the library.



## Hands-on

BSV-by-Example book: Examples in Chapter 8





# End

