#### 1 Basic

# [atomicc.basic]

#### 1.1 Introduction

[atomicc.intro]

1

AtomicC is a timed, structural hardware description language for the high level specification of algorithms to be instantiated directly in hardware. AtomicC extends C++ with support for Guarded Atomic Actions [1, 2, 3]: Bluespec-style[4] modules, rules, interfaces, and methods. AtomicC does not attempt to emulate the behavior of all C++ constructs in hardware, instead uses a subset of the C++ language to specify behavioral assignments to state elements.

The language is designed for the construction of **modules** that are correct-by-construction *composable*: validated smaller modules can be aggregated to form a larger validated module with no loss of correctness of the component modules:

- Module interactions are performed with latency insensitive [5, 6] **method** calls, allowing methods to enforce invocation pre-conditions and transitive support for stalling.
- Module behaviorial statements are encapsulated into transactions (**rules**) following ACID semantics [7, 8]:
  - Atomic: all enabled rules in all modules execute on every clock cycle.
  - *Consistent*: The compiler synthesizes control signals, allowing rules to fire only when their referenced method invocations (*implicit conditions*) are ready.
  - *Isolated*: all rules executed during a given clock cycle are *sequentially consistent* (SC) [9], guaranteeing each rule executes independently of any other rules executing at the same time [8, Sec. 7.1].
  - Durable: all transactions read from and write to state elements in the design
- An **interface** is a named collection of method signatures, defining the behavior of an abstract data type(ADT) [10]. Modules can declare multiple **interfaces**, giving each interface an explicit name, giving flexibility in coupling with other modules. Interfaces can be exported (defined in the module) or imported (used in the module, but defined externally), giving flexibility in algorithm representation [11, Sec. 4.1].
- All state elements in the hardware netlist are explicit in the source code of the design. All module data is private to the module, accessable externally only by method invocation.

These features support the reliable reuse of pre-compiled, incrementally validated libraries, improving productivity on large designs.

Like Connectal[12], AtomicC designs may include both hardware and software components, using interfaces to specify hardware/software communication in a type safe manner. The AtomicC compiler generates the code and transactors to pass arguments between hardware and software.

The AtomicC compiler generates a single Verilog module for each defined AtomicC module. Existing Verilog modules can be called from and can call AtomicC generated modules. Standard Verilog backend tools are used to synthesize the resulting ASIC or FPGA.

The basic building block of AtomicC is the module declaration, made of 3 parts:

— Instantiation of state elements used by the module,

§ 1.1

- Interface declarations for interacting with other modules,
- Rules, which group assignment statements and method invocations into atomic transactions.

#### 1.2 Interface Methods

[atomicc.interface]

There are 2 types of methods:

- Value method functions provide read-only access to module state elements.
- **Action method procedures** perform write operations on state elements, can take parameters and do not have return values. A compiler generated **valid** signal indicates that the caller wishes to perform the method invocation.

Both value and action methods use a compiler generated **ready** signal to indicate when the callee is available and stall scheduling of the calling transaction until execution pre-conditions are statisfied.

AtomicC uses valid/ready hand-shaking signalling [13, 14] to invoke action methods, giving both the invoker(master) and invokee(slave) the ability to control invocation execution timing. The master uses the valid signal of an action method to show when parameter data is available and the operation should be performed. The method invocation succeeds only when both valid and ready are HIGH in the same clock cycle; in TRS notation[1, p. 22],  $\pi(M_i) \equiv ready(M_i) \wedge valid(M_i)$ .

#### 1.2.1 Scheduling

#### [atomicc.schedule]

In software systems, to guarantee *isolation* in the presence of parallelism, *dynamic allocation*[8, p. 377] of schedules and locking[15, Sec. 11.2] are used. In hardware design with AtomicC, the set of state elements accessed by a transaction, the operations on these state elements and the boolean condition when the transaction is performed are all known at compile time. This allows *static allocation*[8, Sec. 7.3.1] of **schedules** (sequences of transaction execution) and compile time validation of SC. The scheduling algorithm is:

- For each module, rules and methods that overlap usage of state elements (*read set* and write set[15, Sec. 10.1.2] [16]) are greedily gathered into schedule sets. Each set will be independently scheduled (since there can be no interactions between sets).
- A constraint graph is a partially-ordered digraph modeling the dependencies within a schedule set:
  - nodes in a constraint graph represent atomic rule and method instances,
  - edges represent write-after-read (WAR) ordering dependency for a specific storage element[17, Sec. 3].
    - In addition, each edge has a symbolic boolean *edge condition* for when the the ordering dependency exists: the boolean condition when one rule/method actually reads a given state element and the other actually writes it.
- The transitive closure of these orders on the constraint graph nodes dictate the **schedule** in which each rule must *appear* to execute in order to be considered SC [15, Sec. 11.1]. Of course, since all rules execute in a single cycle, "schedule" does not refer to an actual time sequenced evolution of state, but to a *conceptual* "sub-cycle" ordering.
- For each pair of nodes in the constraint digraph, we define the *node condition* between 2 nodes as the conjunction of the *edge conditions* of all the edges between them (i.e., the

§ 1.2.1

condition that *any* of the edges causes a dependency). For each cycle in the digraph, we define the *path condition* as the disjunction of the *node conditions* for all sequential pairs of nodes in the cycle (i.e., the condition that *all* the edges, hence the cycle exists).

— When the *path condition* is not identically false, a total ordering of the digraph can not be guaranteed and the *schedule set* is not SC. In this case, the compiler or linker reports an error, requiring resolution by the user.

The compiler can break a cycle under the following conditions:

— if cycle has some method M & some rule R, then rewrite the term valid(R) to add a disjunction with the term  $\neg valid(M)$ 

Since AtomicC performs scheduling analysis independently for each declared module, method invocation conflicts in rules cannot be validated. Schedule processing for rule method calls is delayed until the "module group binding" stage of linking, where separately compiled AtomicC output is combined and verified for SC scheduling. Errors and conflicts detected at this stage must be repaired in the module source text and recompiled before proceeding.

#### 1.2.2 Previous scheduling work

#### [atomicc.schedprev]

In Rule Composition[3], scheduling is reformulated in terms of rule composition, leading to a succinct discussion of issues involved, including a concise description of the Esposito and Performance Guarantees schedulers. The resulting schedules are quite close to the user-specified scheduling in AtomicC. In contrast to AtomicC, the Bluespec kernel language they use for analysis has a sequential composition operator, creating rules that execute for multiple clock cycles.

The Esposito Scheduler[18, 3], is the standard scheduler generation algorithm in the Bluespec Compiler. It uses a heuristic designed to produce a concrete total ordering of rules.

The Performance Guarantees scheduler[19] was proposed to address issues with intra-cycle data passing.

#### 1.3 Compilation

#### [atomicc.modcomp]

Modules independently compiled. Combined with "linking", which validates schedule using header files.

Physical partitioning is used to separate design into separately synthesized pieces, connected using "long distance" signalling. Parallel synthesis; bitstreams combined.

AtomicC execution consists of 4 phases:

- compilation: static elaboration followed by Verilog netlist generation,
- linking: binding of multiple modules and verification of inter-module schedule conflicts,
- netlist synthesis,
- hardware execution.

During netlist generation, modules are instantiated by executing their constructors. During this phase, any C++ constructs may be used, but the resulting netlist must only contain synthesizeable components.

During netlist compilation, the netlist is analyzed and translated to an intermediate representation and then to Verilog for simulation or synthesis. Alternate translations are possible:

§ 1.3

to native code via LLVM, to System C, to Gallina for formal verification with the Coq Proof Assistant, etc.

#### 1.4 Future work

#### [atomicc.modfuture]

Need to describe multi-cycle rules and pipelining.

Need to have a way to support sequencing of operations

Need to have a way to support model checking (say 'module B is a behavioral description of module A') Show example with diff eqn solver from Sharp thesis.

C block semantics do not correctly process the 2 statements: a = b; b = a;. (binding of read values should occur at beginning of block, so that it is clear the 2nd assign refers to the 'previous' value). Thinking again: if we retain C semantics, we have: temp = a; a = b; b = temp;, which gives the correct value mapping.

Multiple clock domains

§ 1.4 4

2 Classes [class]

#### 2.1 Module declaration and definition

[atomicc.module]

A module, defined using the keyword "\_\_\_module", results in the generation of a corresponding verilog module in the compilation output file. It includes local state elements, interfaces exported, interfaces imported and rules for clustering operations into atomic transactions.

Modules are independently compiled, even if they exist in the same compilation unit. Rule and interface method scheduling logic is generated as part of the generated module. Scheduling constraints (read set, write set and relation to other scheduled elements) are generated into a metadata file, allowing schedule consistency between modules to be verified by the linker.

[Example:

```
__module Echo {
          EchoRequest
                                                     // exported interface (defined by this module)
                            request;
                                                     // imported interface (defined by the instantiator of this module)
          EchoIndication
                            *indication:
          bool busy;
          __int(32) itemSay;
          // implementation of method request.say(). Note the guard "if (!busy)".
          void request.say(__int(32) v) if(!busy) {
              itemSay = v;
          void request.saw(__int(16) a, __int(16) b) if(!busy) {
          }
      };
— end example]
```

To reference a module from a separate compilation unit, use "\_\_\_emodule". External module definitions need only specify the exported/imported interfaces.

[Example:

#### 2.2 Module interface definition

#### [atomicc.interface]

An AtomicC interface is essentially an abstract class similar to a Java interface. All the methods are virtual and no default implementations are provided. AtomicC style uses composition of interfaces (using \_\_\_connect) rather than inheritance.

The \_\_\_interface keyword defines a list of methods that are exposed from an object that can be composed as a unit. Instead of using object inheritance to define reusable interfaces, they are defined/exported explicitly by objects, allowing fine-grained specification of interface method visibility.

Methods of a module are translated to value ports for passing the method arguments and a pair of handshaking ports used for scheduling method invocations.

References to an object can only be done through interface methods. State element declarations inside an object (member variables) are private.

 $\S \ 2.2$ 

```
[Example:
    __interface EchoRequest {
        void say(__int(32) v);
        void say2(__int(16) a, __int(16) b);
```

— end example]

#### 2.3 Guard clauses on module interface methods [atomicc.guard]

1 Method definitions in \_\_\_module declarations have the form:

Rules are only ready to fire if the rule's guard is true and all the guards on methods invoked within the rule are also true.

```
void request.say(__int(32) v) if(!busy) {
   itemSay = v;
   ...
}
```

# 2.4 Connecting exported interfaces to imported references [atomicc.connect]

The \_\_\_connect statement allows exported interface declarations to be connected with imported interface references between objects within a module declaration.

```
connect-declaration:
    __connect identifier = identifier;
[Example:
```

```
AtomicC example
```

A consumer; B producer;

#### BSV example

```
__interface ExampleRequest {
    void say(__int(32) v);
};

BSV example
ExampleRequest callIn;
};

__module B {
    ExampleRequest *callOut;
};

__module C {
```

\_\_connect producer.callOut = consumer.callIn;

§ 2.4

— end example]

Comparision with BSV:

- The declaration for 'A' is just like BSV. In BSV, the declaration for B requires the interface instance for 'callOut' be passed in as an interface parameter (forcing a textual ordering to the source code declaration sequence).
- In AtomicC, the interfaces are stitched together outside in any convenient sequence in a location where both the concrete instances for A and B are visible.

#### 2.5 Exporting interfaces from contained objects [atomicc.export]

In a design, there are times when the engineer wishes to declare an object locally, but allow external modules to access specific interfaces of the local object. This is done by declaring an interface to the containing object of compatible type and just 'assigning' the local object's interface to it.

[Example:

```
__module CWrapper {
    A consumer;
    ExampleRequest request = A.callIn;
};
```

— end example]

CWrapper just forwards the interface 'request' down into the instance 'consumer'.

#### 2.6 Syntax extension to C++

[atomicc.classsyn]

```
atomicc-class-key:
    __interface
    __emodule
    __module
```

#### 2.7 Exporting interfaces for use by software [atomicc.softif]

In systems that have both hardware and software components, there is a need to marshal-l/demarshall parameterized method invocations across a hardware bus or network-on-chip (NOC). AtomicC provides this with my decorating the interface declarations with the keyword software".

The use of the \_\_\_software keyword causes the following to be performed:

- The generation of serialization/deserialization code for both software and hardware side modules to allow the method invocations to be performed in each direction
- The generation of header files allowing compilation of software modules that interface with the hardware
- Integration into a modified Connectal execution framework for the orchestration of requests.

[Example:

§ 2.7

```
// implementation of method request.say(). Note the guard "if (!busy)".
           void request.say(__int(32) v) if(!busy) {
                itemSay = v;
           }
           void request.saw(__int(16) a, __int(16) b) if(!busy) {
       };
— end example]
[Example:
 \verb|#include "EchoIndication.h"| // \textit{Header file generated by Atomic C}
 #include "EchoRequest.h"
                                 // Header file generated by AtomicC
 class EchoIndication : public EchoIndicationWrapper
 public:
      virtual void heard(uint32_t v) {
          // user code for handling indication
      EchoIndication(unsigned int id, PortalTransportFunctions *item, void *param) :
          EchoIndicationWrapper(id, item, param) {}
 };
 int main(int argc, const char **argv)
      EchoIndication echoIndication(IfcNames_EchoIndicationH2S, &transportMux, &param);
      {\tt EchoRequestProxy} \ \ {\tt echoRequestProxy} \ ({\tt IfcNames\_EchoRequestS2H}, \ \& transport{\tt Mux}, \ \& param);
      // user code for sending requests
      echoRequestProxy->say(42);
- end example]
```

§ 2.7

### 3 Statements

## [stmt.stmt]

#### 3.1 rule

[atomicc.rule]

Rules specify a group of operations that must execute as an atomiclly. A rule operates transactionally: when a rule's guard and the guards of all of its method invocations are satisfied, then it is ready to fire. It will fire on a clock cycle when it does not conflict with any higher priority rule.

```
rule-statement:
    __rule identifier if-guard_opt compound-statement

[Example:
    __rule respond_rule if (responseAvail) {
        fifo->out.deq();
        ind->heard(fifo->out.first());
    }
```

#### 3.2 Restrictions on C++ statements

[atomicc.nostmt]

Unlike the serialized execution model of C++, AtomicC supports a fully parallel, single cycle execution of rules which satisfy which are able to fire.

Since Atomic C does not generate any extra logic to support sequential execution behavior from language constructs, traditional C++ statements with non-static control flow behavior are not supported.

#### Examples include:

— end example]

- Non-constant bound "for" statements. Constant bound "for" statements that can be fully unrolled are supported.
- "do", "while" statements
- Usages of "goto" that result in a cyclic directed graph of execution blocks
- Method and function calls that are not inlinable at compilation time (for example, recursion is prohibited)

# 4 Modularization [atomicc.modularization]

#### 4.1 Independent compilation of modules [atomicc.independent]

The design is separated into modules that can export and import interfaces to other modules. Each source language module compiles into a single verilog module. Modules are independently compiled, depending only on the interface definitions for referenced modules. Referencing modules do not depend on the internal implementation of referenced modules, even if they textually exist in the same compilation unit. Scheduling of rules in a module is performed "inside out", with the resulting schedule dependencies written to a metadata file during compilation.

Exported interfaces can be used in several ways:

- invoked directly by the instantiator of the module,
- forwarded transparently, becoming another exported interface of the instantiating module,
- 'connected' to an 'interface reference' of another module in the instantiating scope.

#### 4.2 Execution control

#### [atomicc.econtrol]

There are 2 common styles for communication of execution control information for a method:

- Asymmetric (ready/enable signalling) A method/rule is invoked by asserting the "enable" signal. This signal can only be asserted if the "ready" signal was valid, allowing the called module to restrict permissible execution sequences.
- Symmetric (ready/valid signalling) Both caller/callee have "able to be executed" signals. Execution is deemed to take place in each cycle where both "ready" (from the callee) and "valid" (from the caller) are asserted.

Bluespec uses the Asymmetric signalling style, collecting all scheduling control into a central location for analysis/generation. AtomicC uses the Symmetric signalling style, giving modules local control over their allowable execution patterns. Conflicts between local schedules for modules when they are connected together are detected by the linker.

#### 4.3 Linking of groups of modules

#### [atomicc.linker]

To verify that an instantiated group of modules has SC compliant execution characteristics, a linker is used to cross check information from the metadata files for each module.

#### 4.4 Interfacing with verilog modules

[atomicc.verilog]

To reference a module in verilog, fields can be declared in \_\_\_interface items.

[Example:

§ 4.4 10

```
__output __int(1) OUT2;
};
__emodule CONNECTNET2 {
        CNCONNECTNET2 _;
};
---end example
```

This will allow references/instantiation of an externally defined verilog module CONNECT-NET2 that has 2 'input' ports, IN1 and IN2, as well as 2 'output' ports, OUT1 and OUT2.

#### 4.4.1 Parameterized modules

[atomicc.param]

Verilog modules that have module instantiation parameters can also be declared/referenced.

[Example:

-end example

This example can be instantiated as:

[Example:

```
__module Test {
    ...
    MMCME2_ADV#(BANDWIDTH="WIDE",CLKFBOUT_MULT_F=1.0) mmcm;
    ...
    Test() {
        __rule initRule {
            mmcm._.CLKFBIN = mmcm._.CLKFBOUT;
        }
    }
}
```

— end example]

#### 4.4.2 Reference syntax

[atomicc.refsyntax]

```
attribute-specifier-seq<sub>opt</sub> pin-type<sub>opt</sub> decl-specifier-seq<sub>opt</sub> member-declarator-list<sub>opt</sub>;

pin-type:
    __input
    __output
    __inout
    __parameter

[Example:
    __input __uint(1) executeMethod;
    __input __uint(1) methodReady;
}

— end example]

For '___parameter' items, supported datatypes include: "const char *", "float", "int".
```

§ 4.4.2

Factoring of interfaces into sub interfaces is also supported.

#### 4.4.3 Clock/reset ports

#### [atomicc.clockReset]

Note that if interface port pins are declared in a module interface declaration, then CLK and nRST are \_\_not\_\_ automatically declared/instantiated. (Since the user needs the flexibility to not require them when interfacing with legacy code).

Note that this also allows arbitrary signals (like the output of clock generators) to be passed to modules as CLK/nRST signals. (For Atomicc generated modules, please note that the default clock/reset signals for a module will always have these names)

#### 4.4.4 Import tooling

[atomicc.itool]

There is a tool to automate the creation of AtomicC header files from verilog source files. [Example:

§ 4.4.4

# Annex A (informative) Scheduling Example [lpmSchedule]

#### A.1 Source program

[lpmExample.sw]

```
Example from [3, Figure 9]
 __interface LpmRequest {
     void
                 enter(IPA x);
 __module Lpm {
     LpmRequest
                          request;
     BufTicket
                   compBuf;
                  inQ;
     Fifo1<TPA>
     FifoB1<ProcessData>
                            fifo;
     PipeIn<IPA> *outQ;
     LpmMemory
                          mem:
     Lpm() {
          __rule recirc if (!p(mem.ifc.resValue())) {
              auto x = mem.ifc.resValue();
              auto y = fifo.out.first();
             mem.ifc.resAccept();
              mem.ifc.req(compute_addr(x, y.state, y.IPA));
              fifo.out.deq();
              fifo.in.enq(ProcessData{y.ticket, y.IPA, y.state + 1});
          __rule exitr if (p(mem.ifc.resValue()) & !__valid(RULE$recirc)) {
              auto x = mem.ifc.resValue();
              auto y = fifo.out.first();
              mem.ifc.resAccept();
              fifo.out.deq();
              outQ->enq(f1(x,y));
          __rule enter if (!__valid(RULE$recirc)) {
              auto x = inQ.out.first();
              auto ticket = compBuf.tickIfc.getTicket();
              compBuf.tickIfc.allocateTicket();
              inQ.out.deq();
              fifo.in.enq(ProcessData{ticket, static_cast<__uint(16)>(__bitsubstr(x, 15, 0)), 0});
              mem.ifc.req(addr(x));
         };
     };
     void request.enter(IPA x) {
          inQ.in.eng(x);
 };
```

#### A.2 Verilog output

[lpmExample.verilog]

```
module Lpm (input wire CLK, input wire nRST, input wire request$enter__ENA, input wire [31:0]request$enter$x, output wire request$enter__RDY, output wire outQ$enq__ENA, output wire [31:0]outQ$enq$v, input wire outQ$enq__RDY); wire [2:0]RULE$recirc__ENA$agg_2e_tmp$state; wire [15:0]RULE$recirc__ENA$y$IPA; wire compBuf$tickIfc$allocateTicket__ENA; wire compBuf$tickIfc$allocateTicket__ENA; wire compBuf$tickIfc$getTicket; wire compBuf$tickIfc$getTicket; wire compBuf$tickIfc$getTicket__RDY; wire [22:0]fifo$in$enq$v; wire fifo$in$enq__ENA;
```

§ A.2

```
wire fifo$in$enq__RDY;
wire fifo$out$deq__ENA;
wire fifo$out$deq__RDY;
wire [22:0]fifo$out$first;
wire fifo$out$first__RDY;
wire inQ$in$enq__RDY;
wire inQ$out$deq__ENA;
wire inQ$out$deq__RDY;
wire [31:0]inQ$out$first;
wire inQ$out$first__RDY;
wire [31:0]mem$ifc$req$v;
wire mem$ifc$req__ENA;
wire mem$ifc$req__RDY;
wire mem$ifc$resAccept__ENA;
wire mem$ifc$resAccept__RDY;
wire [31:0]mem$ifc$resValue:
wire mem$ifc$resValue__RDY;
BufTicket compBuf (.CLK(CLK), .nRST(nRST),
    .tickIfc$allocateTicket__ENA(compBuf$tickIfc$allocateTicket__ENA),
    .tickIfc$allocateTicket__RDY(compBuf$tickIfc$allocateTicket__RDY),
    .tickIfc$getTicket(compBuf$tickIfc$getTicket),
    .tickIfc$getTicket__RDY(compBuf$tickIfc$getTicket__RDY));
Fifo1Base#(32) inQ (.CLK(CLK), .nRST(nRST),
    .in$enq__ENA(request$enter__ENA),
    .in\enq\v(request\enter\x)
    .in$enq__RDY(inQ$in$enq__RDY)
    .out$deq__ENA(inQ$out$deq__ENA),
    .out$deq__RDY(inQ$out$deq__RDY),
    .out$first(inQ$out$first),
    .out$first__RDY(inQ$out$first__RDY));
FifoB1Base#(23) fifo (.CLK(CLK), .nRST(nRST),
    .in\enq__ENA(fifo\sin\enq__ENA),
    .in\enq\v(fifo\sin\enq\v),
    .in\enq__RDY(fifo\sin\enq__RDY),
    .out$deq__ENA(fifo$out$deq__ENA),
    .out$deq__RDY(fifo$out$deq__RDY),
    .out$first(fifo$out$first),
    .out$first__RDY(fifo$out$first__RDY));
LpmMemory mem (.CLK(CLK), .nRST(nRST),
    .ifc$req__ENA(mem$ifc$req__ENA),
    .ifc$req$v(mem$ifc$req$v),
    .ifc$req__RDY(mem$ifc$req__RDY),
    .ifc$resAccept__ENA(mem$ifc$resAccept__ENA),
    .ifc$resAccept__RDY(mem$ifc$resAccept__RDY),
    .ifc$resValue(mem$ifc$resValue),
    .ifc$resValue__RDY(mem$ifc$resValue__RDY));
// There are still ERRORs in some of these conditions
assign compBuf$tickIfc$allocateTicket__ENA = ( !( ( mem$ifc$resValue != 32'd1 )
      & mem$ifc$resValue__RDY & fifo$out$first__RDY & mem$ifc$resAccept__RDY
      & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY ) )
 & inQ$out$first__RDY & compBuf$tickIfc$getTicket__RDY & inQ$out$deq__RDY & fifo$in$enq__RDY & mem$ifc$req__RDY;
assign fifo\sinenqv = ( ( !( mem$ifc$resValue != 32'd1 )
      & mem$ifc$resValue__RDY & fifo$out$first__RDY
      & mem$ifc$resAccept__RDY & mem$ifc$req__RDY
      & fifo$out$deq__RDY & fifo$in$enq__RDY ) )
      & inQ$out$first__RDY & compBuf$tickIfc$getTicket__RDY
      & compBuf$tickIfc$allocateTicket__RDY
      & inQ$out$deq__RDY & fifo$in$enq__RDY & mem$ifc$req__RDY )
      ? { 3'd0 , inQ$out$first[ 15 : 0 ] , compBuf$tickIfc$getTicket } : 23'd0 )
| ( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY
      & fifo$out$first__RDY & mem$ifc$resAccept__RDY
      & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY )
      ? { RULE$recirc_ENA$agg_2e_tmp$state , fifo$out$first[ 19 : 4 ] , fifo$out$first[ 3 : 0 ] } : 23'd0 );
assign fifo$in$enq__ENA = ( ( !( mem$ifc$resValue != 32'd1 )
      & mem$ifc$resValue__RDY & fifo$out$first__RDY
      & mem$ifc$resAccept__RDY & mem$ifc$req__RDY & fifo$out$deq__RDY ) )
      & inQ$out$first__RDY & compBuf$tickIfc$getTicket__RDY
      & compBuf$tickIfc$allocateTicket__RDY & inQ$out$deq__RDY & mem$ifc$req__RDY )
| ( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY
      & fifo$out$first__RDY & mem$ifc$resAccept__RDY & mem$ifc$req__RDY & fifo$out$deq__RDY );
assign fifo$out$deq__ENA = ( ( mem$ifc$resValue == 32'd1 )
      & ( !( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY
      & fifo$out$first RDY & mem$ifc$resAccept RDY
      & mem$ifc$req__RDY & fifo$in$enq__RDY ) )
```

§ A.2

```
& mem$ifc$resValue__RDY & fifo$out$first__RDY & mem$ifc$resAccept__RDY & outQ$enq__RDY )
    | ( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY
         & fifo$out$first__RDY & mem$ifc$resAccept__RDY & mem$ifc$req__RDY & fifo$in$enq__RDY );
    assign inQ$out$deq__ENA = ( !( ( mem$ifc$resValue != 32'd1 )
          & mem$ifc$resValue__RDY & fifo$out$first__RDY &
         mem$ifc$resAccept__RDY & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY ) )
    & inQ$out$first__RDY & compBuf$tickIfc$getTicket__RDY
         & compBuf$tickIfc$allocateTicket__RDY & fifo$in$enq__RDY & mem$ifc$req__RDY;
    assign mem$ifc$req$v = ( ( ( !( mem$ifc$resValue != 32'd1 )
         & mem$ifc$resValue__RDY & fifo$out$first__RDY
         & mem$ifc$resAccept_RDY & mem$ifc$req_RDY & fifo$out$deq_RDY & fifo$in$enq_RDY ) )
         & inQ$out$first__RDY & compBuf$tickIfc$getTicket__RDY
         & compBuf$tickIfc$allocateTicket__RDY & inQ$out$deq__RDY
         & fifo$in$enq__RDY & mem$ifc$req__RDY )
          ? ( 32'd0 + inQ$out$first[ 31 : 16 ] inQ$out$first [ 18446744073709551615 ] ) : 32'd0 )
    | ( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY &
          fifo$out$first__RDY & mem$ifc$resAccept__RDY
          & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY )
          ? ( ( ( mem$ifc$resValue + fifo$out$first[ 22 : 20 ] ) == 1 )
          ? RULE$recirc_ENA$y$IPA[ 15 : 8 ]
          : RULE$recirc_ENA$y$IPA[7:0])
          : 8'd0);
    assign mem$ifc$req__ENA = ( ( ( !( mem$ifc$resValue != 32'd1 )
         & mem$ifc$resValue__RDY & fifo$out$first__RDY
         & mem$ifc$resAccept__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY ) )
          & inQ$out$first__RDY & compBuf$tickIfc$getTicket__RDY
         & compBuf$tickIfc$allocateTicket__RDY & inQ$out$deq__RDY )
          | ( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY
         & fifo$out$first__RDY & mem$ifc$resAccept__RDY & fifo$out$deq__RDY ) )
    & fifo$in$enq__RDY;
    assign mem$ifc$resAccept__ENA = ( ( mem$ifc$resValue == 32'd1 )
         & ( !( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY
          & fifo$out$first__RDY & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY ) )
          & mem$ifc$resValue__RDY & fifo$out$first__RDY & fifo$out$deq__RDY & outQ$enq__RDY )
    | ( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue RDY & fifo$out$first RDY
          & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY );
    assign outQ$enq$v = mem$ifc$resValue;
    assign outQ$enq__ENA = ( mem$ifc$resValue == 32'd1 )
    & ( !( ( mem$ifc$resValue != 32'd1 ) & mem$ifc$resValue__RDY & fifo$out$first__RDY
          & mem$ifc$resAccept__RDY & mem$ifc$req__RDY & fifo$out$deq__RDY & fifo$in$enq__RDY ) )
   & mem$ifc$resValue__RDY & fifo$out$first__RDY & mem$ifc$resAccept__RDY & fifo$out$deq__RDY;
    assign request$enter__RDY = inQ$in$enq__RDY;
    // Extra assigments, not to output wires
    assign RULE$recirc_ENA$agg_2e_tmp$state = fifo$out$first[ 22 : 20 ] + 3'd1;
    assign RULE$recirc_ENA$y$IPA = fifo$out$first[ 19 : 4 ];
endmodule
```

§ A.2

# Annex B (informative) Introduction for Programmers [introProg]

1

#### B.1 Software

[introProg.sw]

In software, the core model is the time-multiplexed execution of software threads by one or more central processing units (CPUs). Address arithmetic (pointers and indexing) prevents the compiler from statically determining read/write storage elements sets for a transaction. The programmer is responsible preventing the interleaved execution of multiple threads accessing a single storage element by decoration of the code with library calls to dynamically enforce mutual exclusion (mutex) regions.

In languages like Java, the programmer is able to decorate the storage element declarations to automate calling of these mutex operations.



•

#### B.2 Hardware

[introProg.hw]

In hardware, the core model is clock-based updates to state elements from a combinational logic net.

Combinational logic = current output is a boolean combination of current inputs

Sequential logic = combinational logic + memory elements (also called finite-state machine)

Synchronous logic = sequential logic + clock



From Hoe[1], the Term Rewriting System representation of this is:

 $s' = if \pi(s) then \delta(s) else s$ 

Since all hardware elements are independent, all valid source lines in the program text are executed on every cycle. Access to state elements supports neither pointers nor indexing, allowing the compiler to statically determine parallel access transaction conflict sets, allowing the flagging of all combinations where correct operation cannot be guaranteed.

## Bibliography

- J. C. Hoe, "Operation-Centric Hardware Description and Synthesis," Ph.D. dissertation, MIT, Cambridge, MA, 2000.
- [2] J. C. Hoe and Arvind, "Operation-Centric Hardware Description and Synthesis," *IEEE TRANSACTIONS on Computer-Aided Design of Integrated Circuits and Systems*, vol. 23, no. 9, September 2004.
- [3] N. Dave, Arvind, and M. Pellauer, "Scheduling as rule composition," in *Proceedings* of the 5th IEEE/ACM International Conference on Formal Methods and Models for Codesign, ser. MEMOCODE '07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 51–60.
- [4] Bluespec Inc., http://www.bluespec.com.
- [5] M. C. Ng, K. E. Fleming, M. Vutukuru, S. Gross, Arvind, and H. Balakrishnan, "Airblue: A system for cross-layer wireless protocol development," in *Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems*, ser. ANCS '10. New York, NY, USA: ACM, 2010, pp. 4:1–4:11.
- [6] M. Abbas and V. Betz, "Latency insensitive design styles for fpgas," in 28th International Conference on Field Programmable Logic and Applications, FPL 2018, Dublin, Ireland, August 27-31, 2018, 2018, pp. 360–367.
- [7] R. S. Nikhil, "Formal specification of bsv's elaboration and dynamic semantics," https://github.com/rsnikhil/Bluespec\_BSV\_Formal\_Semantics, 2015.
- [8] J. Gray and A. Reuter, *Transaction Processing: Concepts and Techniques*. Morgan Kaufmann, 1993.
- [9] L. Lamport, "How to make a multiprocessor computer that correctly executes multiprocess programs," *IEEE Trans. Comput.*, vol. 28, no. 9, pp. 690–691, Sep. 1979.
- [10] B. Liskov and S. Zilles, "Programming with abstract data types," in SIGPLAN Notices, 1974, pp. 50–59.
- [11] N. Dave, "Designing a Reorder Buffer in Bluespec," in *Proceedings of MEMOCODE'04*, San Diego, CA, 2004.
- [12] M. King, J. Hicks, and J. Ankcorn, "Software-driven hardware development," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015, pp. 13–22.
- [13] C. Fletcher, "Eecs150: Interfaces: "fifo" (a.k.a. ready/valid)," https://inst.eecs.berkeley.edu/~cs150/Documents/Interfaces.pdf, 2009.
- [14] L. ARM, "Amba axi and ace protocol specification," https://developer.arm.com/docs/ihi0022/d/amba-axi-and-ace-protocol-specification-axi3-axi4-and-axi4-lite-ace-and-ace-lite, 2011.

- [15] M. T. Özsu and P. Valduriez, *Principles of Distributed Database Systems, Third Edition*. Springer, 2011.
- [16] D. Rosenkrantz, R. Stearns, and P. Lewis II, "Consistency and serializability in concurrent database systems," *SIAM Journal on Computing*, vol. 13, no. 3, pp. 508–530, 1984.
- [17] H. W. Cain, M. H. Lipasti, and R. Nair, "Constraint graph analysis of multithreaded programs," in *Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques*, ser. PACT '03. Washington, DC, USA: IEEE Computer Society, 2003, pp. 4–.
- [18] T. Esposito, M. Lis, R. Nanavati, J. Stoy, and J. Schwartz, "System and method for scheduling TRS rules," United States Patent US 133051-0001, February 2005.
- [19] D. L. Rosenband and Arvind, "Hardware Synthesis from Guarded Atomic Actions with Performance Specifications," in *Proceedings of ICCAD'05*, San Jose, CA, 2005.