Skip to content

Expressions

Eyck Jentzsch edited this page Apr 11, 2023 · 46 revisions

General

CoreDSL 2 inherits the expression syntax and operator precedence from the C language. Below, we introduce new operators and adapt the type rules for the arbitrary-precision integer (APInt) types.

Please also note the discussion of the effects on the associativity of APInt operations.

New operators

Bit access

We reuse the normal subscript operator [ ] to denote a single bit access to a scalar value. For a value of width w, index 0 refers to the least-significant bit; index w-1 to the most-significant bit. The index expression must be a constant expression producing an unsigned value strictly less than the width of the scalar value.

The operator always yields an lvalue/rvalue of type unsigned<1>.

Examples:

unsigned<5> x = 5'b10101;
x[1] = x[4]; // x == 5'b10111

Range operator

We introduce a new operator at the same precedence level as the subscript operator, and with the following syntax:

PostfixExpression ::= '[' Expression ':' Expression ']'

The operator represents a range denoted by colon-separated operands from and to. The size of the range must be compile-time evaluable. To that end, the operands must adhere to one of the following cases:

  • Both operands are constant expressions, i.e. they are comprised only of parameters and literals. Then size = |from - to| + 1
  • from is an identifier, and to is of the form from ('+'|'-') offset, where offset is a constant expression. Then size = |offset| + 1.
  • to is an identifier, and from is of the form to ('+'|'-') offset, where offset is a constant expression. Then size = |offset| + 1.

The operator is applicable to address space references, and expressions evaluating to a scalar value. The semantics and return types for both variants are detailed below. Note that the syntax can also be used to produce lvalues for assignments.

Ranged address space access

Let AS denote an address space of elements with width w. Then AS[from : to] retrieves and concatenates size-many consecutive elements from the address space, returning a value of type unsigned<size * w>. The element at index from contributes the most significant bits to the result, whereas the element at index to contribues the least significant bits. In other words:

  • If from > to: Little-endian
  • If from < to: Big-endian
  • If from == to: Single-element access, equivalent to AS[from]

If the range contains out-of-bounds elements for the given address space, the result is undefined.

Examples

// architectural state
extern unsigned<8> MEM[33'd1 << 32];

// instruction behavior
unsigned<16> load_halfword_LE = MEM[addr+1 : addr];
unsigned<32> load_word_BE     = MEM[addr : addr+3];

Ranged bit access

Let val be a scalar value val with width w. Then val[from : to] extracts size-many bits from val and returns an unsigned<size> value.

  • If from > to: concatenation of bits val[from], ..., val[to]
  • If from < to: result value: concatenation of bits val[to], ..., val[from], i.e. order is reversed
  • If from == to: equivalent to val[from]

If the range contains out-of-bounds bits for the given value, the result is undefined.

Examples:

unsigned<5> x; unsigned<3> y; unsigned<4> z;
x      = 5'b11000;
x[1:0] = x[4:3];   // x == 5'b11011
y      = x[0:2];   // y ==   3'b110
for (x = 31; x > 3; x -= 4)
  z = 32'hDEADBEEF[x:x-3];
  // z == 0xD, 0xE, 0xA, ...

Concatenation

The new concatenation operator :: fits in, precedence-wise, between the bitwise OR and the logical AND operators:

ConcatExpression     ::= InclusiveOrExpression
                      |  ConcatExpression `::` InclusiveOrExpression
LogicalAndExpression ::= ConcatExpression
                      | LogicalAndExpression `&&` ConcatExpression

Given an application of the operator E1 :: E2, with E1 producing a value of w1 bits and E2 producing a value of w2 bits, the result is a value of type unsigned<w1 + w2>, with the w1 most-significant bits corresponding to E1, and the w2 least-significant bits corresponding to E2.

Examples:

unsigned<32> ieee_minus_one = 1 :: 8'd127 :: 23'b0;

Arithmetic type rules

The basic idea here is that a result type is chosen based on the operand type(s) that guarantees no loss of precision or sign information. The operands are converted to that result type first.

We use the following declarations when presenting the type rules:

unsigned<w1> u1; unsigned<w2> u2;
  signed<w1> s1;   signed<w2> s2;

Unary operations

Expression Result type
-u1 signed<wr>, wr = w1 + 1
-s1 signed<wr>, wr = w1 + 1
~u1 unsigned<wr>, wr = w1
~s1 signed<wr>, wr = w1
!u1 unsigned<1>
!s1 unsigned<1>

Addition

Expression Result type
u1 + u2 unsigned<wr>, wr = max(w1, w2) + 1
s1 + s2 signed<wr>, wr = max(w1, w2) + 1
s1 + u2 signed<wr>, wr = max(w1, w2 + 1) + 1
u1 + s2 signed<wr>, wr = max(w1 + 1, w2) + 1

Subtraction

Expression Result type
u1 - u2 signed<wr>, wr = max(w1 + 1, w2 + 1)
s1 - s2 signed<wr>, wr = max(w1 + 1, w2 + 1)
s1 - u2 signed<wr>, max(w1, w2 + 1) + 1
u1 - s2 signed<wr>, max(w1 + 1, w2) + 1

Multiplication

Expression Result type
u1 * u2 unsigned<wr>, wr = w1 + w2
s1 * s2 signed<wr>, wr = w1 + w2
s1 * u2 signed<wr>, wr = w1 + w2
u1 * s2 signed<wr>, wr = w1 + w2

Division

Expression Result type
u1 / u2 unsigned<wr>, wr = w1
s1 / s2 signed<wr>, wr = w1 + 1
s1 / u2 signed<wr>, wr = w1
u1 / s2 signed<wr>, wr = w1 + 1

Modulus/Remainder

Expression Result type
u1 % u2 unsigned<wr>, wr = min(w1, w2)
s1 % s2 signed<wr>, wr = min(w1, w2)
s1 % u2 signed<wr>, wr = min(w1, w2 + 1)
u1 % s2 unsigned<wr>, wr = min(w1, max(1, w2 - 1))

The % operator is defined (as in C) to satisfy the following formula:

      a = ⌊a/b⌋ * b + a%b
<=> a%b = a - ⌊a/b⌋ * b

As a signed<1> divisor can only be -1 or 0, the only non-error outcome in the last case is 0. To retain consistency with the type rules for the literal 0, the result type is special-cased to unsigned<1>.

Bitwise AND, OR, XOR

Let X be either &, | or ^.

Expression Result type
u1 X u2 unsigned<wr>, wr = max(w1, w2)
s1 X s2 signed<wr>, wr = max(w1, w2)
s1 X u2 signed<wr>, wr = max(w1, w2)
u1 X s2 signed<wr>, wr = max(w1, w2)

Note that differently-sized operands are allowed and subject to the usual casting rules, mostly for consistency with the other arithmetic operations.

Shift operations

Let X be either << or >>.

Expression Result type
u1 X u2 unsigned<wr>, wr = w1
s1 X s2 signed<wr>, wr = w1
s1 X u2 signed<wr>, wr = w1
u1 X s2 unsigned<wr>, wr = w1

Right shifts (>>) are arithmetic right shifts if the left operand is signed, and logical right shifts otherwise. The shift amount is not wrapped/truncated — if it exceeds the bit-size of the left operand, all of the left operand's bits are shifted "out of" the result value. If the shift amount is negative, the shift direction is flipped, e.g. x << (-k)x >> k and vice versa.

Logical AND, OR

The logical operators produce values of type bool, i.e. unsigned<1>.

Comparisons

The comparison operators ==, !=, <, >, <=, >= are defined for all integer types and always produce a bool (unsigned<1>) value. They perform a comparison based on the value represented by the operands, and do not take the operand types into account. For example, 3'sd1 == 17'b1 shall return true.

Conditional operator

The conditional operator, cond ? trueExpr : falseExpr, expects cond to be an unsigned<1> value. Its result type is the smallest type to which trueExpr and falseExpr can be implicitly cast.

Assignment

For simple assignments x = expr, the casting rules hold.

Combined assignments x op= expr for variable x of type T are valid iff expr is implicitly convertible to T. They are then evaluated as x = (T) x op (expr). Note that explicit cast, which means that the result of x op (expr) may be silently truncated.

unsigned<10> x = ...;
x += 10'd1; // OK (unsigned<11> intermediate value is truncated)
x += 11'd1; // Error! Cannot assign a 11-bit value to `x` even without considering the operation

Post/Pre Increment/Decrement

These operators yield the same type as the variable they are applied to. Implicit truncation may occur.

Intrinsics

The following built-in declarations provide auxiliary functions that do not warrant a dedicated syntax.

__static_assert

__static_assert(expr) throws an error if expr cannot be evaluated to a non-zero integer value at compile time. This intrinsic can only be used inside an architectural_state section. The canonical use-cases is to allow ISA extensions to impose constraints on the elaborated values of parameters declared further up in the hierarchy, e.g.:

InstructionSet BASE {
  architectural_state {
    unsigned int XLEN;
    ...
  }
  ...
}

InstructionSet MY_EXT extends BASE {
  architectural_state {
    // this extension can only be added to 32-bit cores
    __static_assert(XLEN == 32); 
    ...
  }
  ...
}

Core A provides BASE, MY_EXT {
  architectural_state { XLEN = 32; } // OK!
}

Core B provides BASE, MY_EXT {
  architectural_state { XLEN = 64; } // Can't use MY_EXT here.
}

bitsizeof, sizeof

bitsizeof(T) returns the minimum number of bits to store a value of type T, i.e. without considering padding or alignment. For an expression E that evaluates to a value of type T', bitsizeof(E) is defined as bitsizeof(T'). For a type or expression X, sizeof(X) is defined as (bitsizeof(X) + 7) / 8. Both intrinsics return compile-time constants, using the unsigned type with the minimal width required to represent the value.

bitsizeof(signed<17>)                            // = 17
bitsizeof(struct {unsigned<1> b; signed<2> c; }) // =  3
bitsizeof(3 + 4 + 5)                             // =  5
sizeof(unsigned<42>)                             // =  6

offsetof and bitoffsetof are reserved for future use.

__encoding_size (DRAFT)

In an instruction's behavior, __encoding_size is a compile-time constant denoting the number of bits in the instruction's encoding: specification, using the unsigned type with the minimal width required to represent the value.

In an always-block, __encoding_size represents the width of the last instruction word that was fetched prior to the execution of the always block. The intrinsic's return type is unsigned<16>.

The canonical example for using __encoding_size is modelling the implicit PC increment for a RISC-V core with support for compressed instructions:

always {
  implicit_pc {
    // to be overriden by branch instructions and other always-blocks
    PC += __encoding_size >> 3;
  }
}