In [5]:
from tutorial_utils import magma_to_verilog_string, smt_to_smtlib_string

import hwtypes as hw
import magma as m

import peak
from peak import Peak  # the base class of Peak circuits
from peak import family_closure

In the previous section, we introduced hwtypes and showed how we can use hwtypes to meta-program higher-order functions in python. In this section, we will introduce PEak which extends the expression language of hwtypes.

The high-level goal of PEak is to create a single source of truth which functions as a functional model, a formal specification, and as RTL. <mark>Eloborate:  Discuss benefits. </mark>

It should be noted that the formal model generated by PEak cannot be used to verify the RTL generated by PEak.  Showing equivelence between the formal model and the RTL would simply show that the PEak compiler's back-ends for SMT and RTL are consistent (of course, such a check could be useful for finding bugs in the PEak compiler, but not for verifying the RTL).

PEak circuits are defined in a python class.  Similar to hwtypes, a PEak program can be executed in pure python, symbolically executed to produce SMT formulas, or used to generate a circuit with Magma.  PEak circuits declare subcomponents in their `__init__` method and define their behavior in their `__call__` method.  The aim is to match the semantics of a "normal" python programs.  

In the following example, we demonstrate the type of program we aim to be able to write.  First we define an `ALU` class which performs either an add or a multiply on two data inputs (`in_0`, `in_1`) and is controlled by a single bit `op`.  Next, we define a `PE` class which contains 3 `ALU`s.  The `PE` has 4 data inputs (`in_0`, ..., `in_3`) and a 3-bit control singal (`ops`), which controls the `ALU`s. 

In [2]:
BV = hw.BitVector
DataT = BV[8]
Bit = hw.Bit

class ALU:
    def __call__(self, op: Bit, in_0: DataT, in_1: DataT) -> DataT:
        if op:
            return in_0 + in_1
        else:
            return in_0 * in_1
        
class PE:
    def __init__(self):
        self.alu_0 = ALU()
        self.alu_1 = ALU()
        self.alu_2 = ALU()
    
    def __call__(self, 
                 ops: BV[3],
                 in_0: DataT,
                 in_1: DataT,
                 in_2: DataT,
                 in_3: DataT,
                ) -> DataT:
        res_0 = self.alu_0(ops[0], in_0, in_1)
        res_1 = self.alu_1(ops[1], in_2, in_3)
        return self.alu_2(ops[2], res_0, res_1)

pe = PE()
s =  pe(hw.BitVector[3](0b101), DataT(1), DataT(2), DataT(3), DataT(4))
assert s == (1+2)+(3*4)
print(repr(s))

BitVector[8](15)


The above will work with python values (as shown); however, as is, this code cannot generate SMT or RTL.  This is because the it is still fundamentally a hwtypes program and is hence subject to the restrictions in the previous section, i.e., an `if` statement can only be evaluated on constant values, and RTL requires a Magma wrapper circuit.  PEak removes these restrictions by compiling `if` statement into `ite`s and automatically generating a wrapper circuit. To evoke the PEak compiler some boiler plate must be added.  Below, we show how to extend the above example using PEak (code points of interest are labeled with comments `# k`).

In [3]:
@family_closure(peak.family) # 1
def closure(family): # 2
    BV = family.BitVector #
    DataT = BV[8]         # 3
    Bit = family.Bit      #
    
    @family.compile(locals(), globals()) # 4
    class ALU(Peak): # 5
        def __call__(self, op: Bit, in_0: DataT, in_1: DataT) -> DataT: # 6
            if op:
                return in_0 + in_1
            else:
                return in_0 * in_1
            
    @family.compile(locals(), globals()) # 4
    class PE(Peak): # 5
        def __init__(self):
            self.alu_0 = ALU()
            self.alu_1 = ALU()
            self.alu_2 = ALU()

        def __call__(self, 
                     ops: BV[3],  #
                     in_0: DataT, #
                     in_1: DataT, #
                     in_2: DataT, # 6
                     in_3: DataT, #
                    ) -> DataT:   # 
            res_0 = self.alu_0(ops[0], in_0, in_1)
            res_1 = self.alu_1(ops[1], in_2, in_3)
            return self.alu_2(ops[2], res_0, res_1)
    
    return PE

For the moment, let us ignore the `family_closure` decorator (`# 1`).  We explain it last, as understanding its behavior and utility is difficult without first understanding the rest of the code. 

The first piece of boiler plate is the construction of a closure (`# 2`) over the various interpretations (e.g., python, SMT, Magma).  This closure takes a single argument, which is a *family* object.  A family object provides an implementation of the core `Bit` and `BitVector` types.  It must also provide an implementation of a register type and Abstract Data Types (ADTs).  We explain the latter two later.  Additionally, each family defines a specific compilation flow, as we describe below.

Recall that in Section 1, we defined `bounded_factorial` as follows: 
```Python
PyDataT = hw.BitVector[8]
SmtDataT = hw.SMTBitVector[8]
MagmaDataT = m.Bits[8]

def bounded_factorial(x):
    if not isinstance(x, hw.AbstractBitVector):
        raise TypeError()
    T = type(x)
    ...
```
We needed to dynamically get the type of `x` to allow us to construct constants with the proper type, e.g., `T(1)`.  In a PEak program we, access these type constructors through the family object and hence avoid the necessity of such dynamic inspection.  In PEak we would write:

```Python
@family_closure(peak.family)
def closure(family):
    T = family.Unsigned[8]
    MAX_UINT = 2**T.size - 1
    
    def inner(x, ctr):
        if ctr == 0:
            return T(1)
        else:
            return (x <= 1).ite(
                T(1),
                x * inner(x - 1, ctr - 1),
            )
        
    def bounded_factorial(x):
        return inner(x, MAX_UINT)
    
    return bounded_factorial
```

This is not a large win in terms of code size, but it is significantly more natural code to write.  Further, one may not have direct access to a value of the desired type, e.g., when casting between signed and unsigned. 

`family.compile` (`# 4`) invokes the PEak compiler, passing the current namespace to the compiler with `locals(), globals()`.  Each family defines its own compilation flow. By having specialized compilation flows, we allow the SMT family to rewrite `if` statements to `ite`s, while the Magma family rewrites the `if`s <mark>to multiplexers?</mark> and wraps the resulting hwtypes program in a circuit.  While the full details of the rewrites used by the SMT and Magma implementations are quite complex, they are fairly straightforward for simple examples.  For example, the body of the `ALU`'s `__call__` method would be rewritten to a hwtypes program semantically equivalent to the following:
```Python
_cond_0 = op
_return_0 = in_0 + in_1
_return_1 = in_0 + in_1
return _cond_0.ite(_return_0, return_1)
```

PEak circuits should inherit from the `Peak` class (`# 5`).  This enables the *PEak protocol*.  The PEak protocol allows types to define how they behave when being read or written.  We will demonstrate a use of this later when we introduce registers.

It is important to note that the type annotations on the `__call__` method (`# 6`) are *not* optional.  The peak compiler uses the annotations to generate ports in a Magma context.

<mark>In what sense is family overloaded</mark>
The notion of family is overloaded in PEak, with the base families for the python, Magma, and SMT implementations being defined in the module `peak.family`.  This module also defines a *family group*; a family group is an object (typically a module) with attributes `PyFamily`, `SMTFamily`, and `MagmaFamily`.  Each of these attributes defines the type of a family within the family group.  The purpose of a family group is to allow uniform access to types whose implementation may differ between interpretations.  While the default family group simply provides type constructors for the primitive types, a family group may provide implementations of complex modules such as memories or floating point units.
<mark>Didn't fully understand this</mark>

The `family_closure` decorator (`# 1`) takes a family group as a parameter.  This parameter associates the decorated closure with a specific family group. This association allows the family closure to provide a shortcut using convenient syntax for calling the closure:

```Python
closure.Py == closure(family_group.PyFamily())
closure.SMT == closure(family_group.SMTFamily())
closure.Magma == closure(family_group.MagmaFamily())
```

This may seem inconsequential, but it is very convenient for programmatic manipulation of peak programs, as one can invoke the desired interpretation without knowledge of the specific families used.
  
As a convenience when using the base family group (`peak.family`), one may omit the family group parameter, e.g., the above example could have been written as:

```Python
@family_closure # 1
def closure(family): # 2
    ...
```

Prior to the development of families, developers were forced to manually wrap their code in boilerplate to achieve the same behavior in a much less ergonomic way. Below is a small example of what that looked like.

```Python
COMPILE_TARGET = 'magma'

...

if COMPILE_TARGET == 'python':
    BV = hw.BitVector
elif COMPILE_TARGET == 'magma':
    BV = m.Bits
else:
    assert COMPILE_TARGET == 'smt'
    BV = hw.SMTBitVector

class PE:
    def __call__(self):
        ...  # code that uses BV

if COMPILE_TARGET == 'magma':
    PE = magma_compile(PE) # invoke magma compilation flow
elif COMPILE_TARGET == 'smt':
    PE = smt_compile(PE) # invoke smt compilation flow
```

In the above, the first `if` block performs the functional equivalent of `family.BitVector`.   The second `if` block performs `family.compile`.

We will return to the ALU example to demonstrate how encapsulation can be used to extend an existing module

In [None]:
@family_closure
def data_t_closure(family):
    BV = family.BitVector
    DataT = BV[8]        
    Bit = family.Bit
    return BV, DataT, Bit

@family_closure
def closure(family):
    BV, DataT, Bit = data_t_closure(family)
    
    @family.compile(locals(), globals())
    class ALU(Peak):
        def __call__(self, op: Bit, in_0: DataT, in_1: DataT) -> DataT:
            if op:
                return in_0 + in_1
            else:
                return in_0 * in_1
            
    return ALU

@family_closure
def ext_closure(family):
    BV, DataT, Bit = data_t_closure(family)
    ALU = closure(family)
    
    @family.compile(locals(), globals())
    class ExtALU(Peak):
        def __init__(self):
            self.alu = ALU()
            
        def __call__(self, op: BV[2], in_0: DataT, in_1: DataT) -> DataT:
            if op[0]:
                in_1 = -in_1
            return ALU(op[1], in_0, in_1)         
    return ExtALU



As we mentioned previously PEak supports ADTs. In the following example we show enums can be used to build instruction sets in the place of raw bitvectors. We believe enums provide a more robust mechanism for defining instruction sets

In [None]:
class ISA(hw.Enum):
    Add = hw.new_instruction()
    Sub = hw.new_instruction()
    And = hw.new_instruction()
    Or = hw.new_instruction()


@family_closure
def closure(family):
    BV = family.BitVector
    DataT = BV[8]
    Bit = family.Bit

    @family.assemble(locals(), globals())
    class ALU(Peak):
        def __call__(self, op: ISA, in_0: DataT, in_1: DataT) -> DataT:
            if op == ISA.Add:
                return in_0 + in_1
            elif op == ISA.Sub:
                return in_0 - in_1
            elif op == ISA.And:
                return in_0 & in_1
            else:
                return in_0 | in_1

    return ALU

We can compose enums sets by using a `Sum` type. `Sum` types will be explained in more detail in the next section

In [None]:
class Arith(hw.Enum):
    Add = hw.new_instruction()
    Sub = hw.new_instruction()


class Bitwise(hw.Enum):
    And = hw.new_instruction()
    Or = hw.new_instruction()


ISA = hw.Sum[Arith, Bitwise]


@family_closure
def lu_fc(family):
    BV = family.BitVector
    DataT = BV[8]

    @family.assemble(locals(), globals())
    class LU(Peak):
        def __call__(self, op: Bitwise, in_0: DataT, in_1: DataT) -> DataT:
            if op == Bitwise.And:
                return in_0 & in_1
            else:
                return in_0 | in_1

    return LU


@family_closure
def au_fc(family):
    BV = family.BitVector
    DataT = BV[8]

    @family.assemble(locals(), globals())
    class AU(Peak):
        def __call__(self, op: Arith, in_0: DataT, in_1: DataT) -> DataT:
            if op == Arith.Add:
                return in_0 + in_1
            else:
                return in_0 - in_1

    return AU


@family_closure
def alu_fc(family):
    BV = family.BitVector
    DataT = BV[8]
    LU_t = lu_fc(family)
    AU_t = au_fc(family)

    @family.assemble(locals(), globals())
    class ALU(Peak):
        def __init__(self):
            self.au = AU_t()
            self.lu = LU_t()

        def __call__(self, op: ISA, in_0: DataT, in_1: DataT) -> DataT:
            if op[Arith].match:
                return self.au(op[Arith].value, in_0, in_1)
            else:
                return self.lu(op[Bitwise].value, in_0, in_1)

    return ALU

So far all the examples demonstated so far do not include state.  The base families provides two distinct register primitives.  The first uses the same call sementatics as other peak circuits. 



In [None]:
@family_closure
def closure(family):
    DataT = family.BitVector[8]
    Bit = family.Bit
    DataRegister = family.gen_register(DataT, 0)
    
    @family.assemble(locals(), globals())
    class PipeLinedIncrementor(Peak):
        def __init__(self):
            self.data_reg = DataRegister()
            
        def __call__(self, stall: Bit, i: DataT) -> DataT:
            o = self.data_reg(i+1, ~stall) # enable the register if it is not stalled
            return o
            
    return PipeLinedIncrementor

pipe = closure.SMT()

data_0 = SmtDataT(name='data_0')
stall_0 = hw.SMTBit(name='stall_0')
data_out = pipe(stall_0, data_0)

print('cycle 0 output:')
print(smt_to_smtlib_string(data_out))

data_1 = SmtDataT(name='data_1')
stall_1 = hw.SMTBit(name='stall_1')
data_out = pipe(stall_1, data_1)

print('\ncycle 1 output:')
print(smt_to_smtlib_string(data_out))

data_2 = SmtDataT(name='data_2')
stall_2 = hw.SMTBit(name='stall_2')
data_out = pipe(stall_2, data_2)

print('\ncycle 2 output:')
print(smt_to_smtlib_string(data_out))
del data_0
del stall_0
del data_1
del stall_1
del data_2
del stall_2

TODO: fix up

The observant reader may notice that this syntax does not allow for a registers next state to be dependent on its current state as their is no way to probe the registers outputs without settings its inputs. Peak provides a second syntax for registers to address this.  In this syntax register reads and writes are performed implicitly ..somthing something.. setting or getting the attribute. In this syntax registers do not have an enable and must be set on all paths (Is this still true?)

In [None]:
@family_closure
def closure(family):
    DataT = family.BitVector[8]
    Bit = family.Bit
    DataRegister = family.gen_attr_register(DataT, 0)
    max_count = 4
    
    @family.assemble(locals(), globals())
    class Counter(Peak):
        def __init__(self):
            self.data_reg = DataRegister()
            
        def __call__(self, en: Bit) -> DataT:
            prev = self.data_reg
            if en:
                val = prev + 1
                if val < max_count:
                    self.data_reg = val
                else:
                    self.data_reg = DataT(0)
            else:
                self.data_reg = prev
            
            return prev
            
    return Counter


In [None]:
Ctr = closure.SMT
ctr = Ctr()
true = hw.SMTBit(name='stall')
print(ctr(true))
print(ctr(true))
print(ctr(true))
print(ctr(true))
print(ctr(true))

Again an observant reader may note that the first syntax can be constructed from the second.  The first is provided for legacy reasons and to allow for better synthesis in magma.  The better synthesis results stem from the use of vendor provided registers with enables instead of a mux and a register.   