# Coroutines

Pythons async/await coroutines were the initial motivation for **Co**HDL. Coroutines are functions that can suspend and resume their execution. CoHDL translates coroutines into VHDL state machines. This translation process is completely deterministic and makes it possible to describe sequential processes clock cycle accurate.

## async/await

The `async` keyword turns functions into coroutines. Only `asnyc def` functions can use `await` expressions in the function body.

For CoHDL there are two different types of await expression:

* awaiting primitive expressions

    When the argument of `await` is a Signal/Variable/Temporary, the coroutines execution is suspended, until that argument becomes truthy (non-zero). Each wait takes at least one clock cycle even if the argument is already true. `await` expressions define wait states, the code between two awaits is executed once when transitioning from one wait to the next. These primitive awaits are the building blocks for more complex sequential processes.
* awaiting coroutine functions

    When the argument of `await` is itself a coroutine, CoHDL treats that expression similar to a normal function call. The function body - that may contain nested `await` expressions - is translated and inlined at the call site.

In [9]:
from cohdl import Entity, Port, Bit, Unsigned, expr
from cohdl import std

async def coro_fn(step, output):
    await step
    output <<= "01"
    # The argument of await is evaluated before
    # it is awaited. CoHDL provides the builtin
    # expr to await an entire expression.
    # The following statement will block until
    # step becomes false.
    await expr(~step)
    output <<= "10"
    await step

class Counter(Entity):
    clk = Port.input(Bit)
    step = Port.input(Bit)

    output = Port.output(Unsigned[2])

    def architecture(self):
        clk = std.Clock(self.clk)

        @std.sequential(clk)
        async def coroutine_process():
            await self.step
            self.output <<= "00"

            # use the coro_fn coroutine
            # this call will take multiple clock cycles
            await coro_fn(self.step, self.output)
            self.output <<= "11"

print(std.VhdlCompiler.to_string(Counter))

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity Counter is
  port (
    clk : in std_logic;
    step : in std_logic;
    output : out unsigned(1 downto 0)
    );
end Counter;


architecture arch_Counter of Counter is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_output : unsigned(1 downto 0);
  type state_coroutine_process is (state_0, state_1, state_2, state_3);
  signal s_coroutine_process : state_coroutine_process := state_0;
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  output <= buffer_output;
  

  coroutine_process: process(clk)
    variable temp : std_logic;
  begin
    if rising_edge(clk) then
      case s_coroutine_process is
        when state_0 =>
          if step = '1' then
            s_coroutine_process <= state_1;
            buffer_output <= unsigned'("00");
          end 

## while loops

While loops are used to describe repeating sequences of states. Like `await`-expressions `while`-loops can only be used in `async` functions. The body of while loops is translated into states, and transitions back to the beginning of the loop are added to all states that reach the end of the loop body. 

Each occurrence of `while` starts a new state - even if the condition is false it takes at least one clock cycle to step over the loop.

In [10]:
from cohdl import Entity, Port, Bit, Unsigned, BitVector, Signal
from cohdl import std

class SerialReceiver(Entity):
    clk = Port.input(Bit)
    
    start = Port.input(Bit)
    input = Port.input(Bit)

    new_output = Port.output(Bit, default=False)
    output = Port.output(BitVector[8])

    def architecture(self):
        clk = std.Clock(self.clk)

        @std.sequential(clk)
        async def coroutine_process():
            await self.start

            cnt = Signal[Unsigned[3]](7)
            buffer = Signal[BitVector[8]]()

            # transition happens in this line
            # the while condition will be evaluated
            # in the next clock cycle
            while cnt:
                buffer[7:1] <<= buffer[6:0]
                buffer[0] <<= self.input
                cnt <<= cnt - 1
                # cohdl inserts a transition back
                # to the loop start in this line
            
            self.output <<= buffer
            self.new_output ^= True

print(std.VhdlCompiler.to_string(SerialReceiver))

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity SerialReceiver is
  port (
    clk : in std_logic;
    start : in std_logic;
    input : in std_logic;
    new_output : out std_logic;
    output : out std_logic_vector(7 downto 0)
    );
end SerialReceiver;


architecture arch_SerialReceiver of SerialReceiver is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_new_output : std_logic := '0';
  signal buffer_output : std_logic_vector(7 downto 0);
  type state_coroutine_process is (state_0, state_1);
  signal s_coroutine_process : state_coroutine_process := state_0;
  signal sig : unsigned(2 downto 0);
  signal sig1 : std_logic_vector(7 downto 0);
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  new_output <= buffer_new_output;
  output <= buffer_output;
  

  coroutine_process: process(clk)
    varia

While loops can also be used as an alternative to await expressions, with the additional ability to customize signal states during the wait period. Awaiting a Signal/Temporary is effectively syntactic sugar for a while loop with an empty body.

In [11]:
from cohdl import Entity, Port, Bit, Unsigned
from cohdl import std

class Counter(Entity):
    clk = Port.input(Bit)
    step = Port.input(Bit)

    output = Port.output(Unsigned[2])

    def architecture(self):
        clk = std.Clock(self.clk)

        @std.sequential(clk)
        async def coroutine_process():
            # wait until step becomes truthy
            # do nothing else
            await self.step

            self.output <<= "00"

            # wait until step becomes truthy
            # do nothing else (equivalent to first await expression)
            while ~self.step:
                pass
            
            self.output <<= "01"

            # wait until step becomes truthy
            # and define the state of output while waiting
            while ~self.step:
                self.output <<= "10"
            
            self.output <<= "11"

print(std.VhdlCompiler.to_string(Counter))

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity Counter is
  port (
    clk : in std_logic;
    step : in std_logic;
    output : out unsigned(1 downto 0)
    );
end Counter;


architecture arch_Counter of Counter is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_output : unsigned(1 downto 0);
  type state_coroutine_process is (state_0, state_1, state_2);
  signal s_coroutine_process : state_coroutine_process := state_0;
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  output <= buffer_output;
  

  coroutine_process: process(clk)
    variable temp : std_logic;
    variable temp1 : boolean;
    variable temp2 : std_logic;
    variable temp3 : boolean;
  begin
    if rising_edge(clk) then
      case s_coroutine_process is
        when state_0 =>
          if step = '1' then
            s_corouti

## break/continue

`await primitive` and while-loops are the only statements that introduce new states. `break` and `continue` 

When the compiler encounters a `break` statement, it replaces it with a copy of all code, following the while-loop until and including the next transition. This has the effect of ignoring the loop condition, immediately performing the actions below the loop and jumping to the next state.

`continue` statements work similar. The code, at the start of the loop, including the condition is duplicated at the continue-location. This is only possible, when there is a transition between the loop-start and the `continue` statement. Otherwise the compiler produces an error because the duplicated code would contain the `continue` statement itself, leading to infinite recursion. For runtime variable loop conditions, the code immediately after the loop is duplicated as well incase the condition is false and the loop exits.

In [12]:
from cohdl import Entity, Port, Bit, Unsigned
from cohdl import std

class ExampleBreak(Entity):
    """
    Waits for `step` to become true. Then output the sequence 15,14,13,...1,0.
    When `break_loop` is set, the sequence is stopped early.
    When the sequence is done the output returns to 0.
    """

    clk = Port.input(Bit)
    step = Port.input(Bit)
    break_loop = Port.input(Bit)

    output = Port.output(Unsigned[4])

    def architecture(self):
        clk = std.Clock(self.clk)

        @std.sequential(clk)
        async def coroutine_process(
            cnt = Signal[Unsigned[4]](0)
        ):
            await self.step
            cnt <<= 15

            while cnt:
                self.output <<= cnt
                cnt <<= cnt - 1
            
                if self.break_loop:
                    break
                    # The break statement is replaced with the code below the loop.
                    # Comment and assignment to self.output and the transition appear twice
                    # in the generated VHDL.
            
            std.comment("statement after the while loop")
            self.output <<= 0
            # implicit transition back to the start of coroutine_process

print(std.VhdlCompiler.to_string(ExampleBreak))

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity ExampleBreak is
  port (
    clk : in std_logic;
    step : in std_logic;
    break_loop : in std_logic;
    output : out unsigned(3 downto 0)
    );
end ExampleBreak;


architecture arch_ExampleBreak of ExampleBreak is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_output : unsigned(3 downto 0);
  type state_coroutine_process is (state_0, state_1);
  signal s_coroutine_process : state_coroutine_process := state_0;
  signal cnt : unsigned(3 downto 0) := unsigned'("0000");
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  output <= buffer_output;
  

  coroutine_process: process(clk)
    variable temp : boolean;
    variable temp1 : unsigned(3 downto 0);
    variable temp2 : boolean;
  begin
    if rising_edge(clk) then
      case s_coroutine_proces

In [13]:
from cohdl import Entity, Port, Bit, Unsigned
from cohdl import std

class ExampleContinue(Entity):
    """
    Waits for `step` to become true. Then output the sequence 15,14,13,...1,0.
    The sequence only advances, when `step` is true.
    When `continue_loop` is set, the sequence is generated faster because
    the loop condition is skipped.
    When the sequence is done the output returns to 0.
    """

    clk = Port.input(Bit)
    step = Port.input(Bit)
    continue_loop = Port.input(Bit)

    output = Port.output(Unsigned[4])

    def architecture(self):
        clk = std.Clock(self.clk)

        @std.sequential(clk)
        async def coroutine_process(
            cnt = Signal[Unsigned[4]](0)
        ):
            await self.step
            cnt <<= 15

            while cnt:
                std.comment("statement at start of while loop")
                self.output <<= cnt
                cnt <<= cnt - 1

                await self.step
            
                if self.continue_loop:
                    continue
            
            std.comment("statement after the while loop")
            self.output <<= 0

print(std.VhdlCompiler.to_string(ExampleContinue))

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity ExampleContinue is
  port (
    clk : in std_logic;
    step : in std_logic;
    continue_loop : in std_logic;
    output : out unsigned(3 downto 0)
    );
end ExampleContinue;


architecture arch_ExampleContinue of ExampleContinue is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_output : unsigned(3 downto 0);
  type state_coroutine_process is (state_0, state_1, state_2);
  signal s_coroutine_process : state_coroutine_process := state_0;
  signal cnt : unsigned(3 downto 0) := unsigned'("0000");
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  output <= buffer_output;
  

  coroutine_process: process(clk)
    variable temp : boolean;
    variable temp1 : unsigned(3 downto 0);
    variable temp2 : boolean;
  begin
    if rising_edge(clk) then
     

One application for `continue` statements are sequential contexts, that must return to an initial state before new data is processed. The next code block demonstrates this by implementing a simplified, AXI like interface.

A transaction starts, when the bus master defines `data` and sets `valid` to `1`.
The transaction is acknowledged using the `ready` signal. `ready` can be set before data arrives.

For optimal performance, `ready` must be set while waiting for `valid`. While data is processed, `ready` is set to `0` to stall the interface. Once all data is processed, `ready` should immediately return to the `1` state.

In [14]:
from cohdl import Entity, Port, Bit
from cohdl import std

class ExampleContinue(Entity):
    clk = Port.input(Bit)

    a_ready = Port.output(Bit)
    a_valid = Port.input(Bit)
    a_data = Port.input(BitVector[32])

    b_ready = Port.output(Bit)
    b_valid = Port.input(Bit)
    b_data = Port.input(BitVector[32])

    c_ready = Port.output(Bit)
    c_valid = Port.input(Bit)
    c_data = Port.input(BitVector[32])

    d_ready = Port.output(Bit)
    d_valid = Port.input(Bit)
    d_data = Port.input(BitVector[32])

    a_processed_data = Port.output(BitVector[32])
    b_processed_data = Port.output(BitVector[32])
    c_processed_data = Port.output(BitVector[32])
    d_processed_data = Port.output(BitVector[32])

    def architecture(self):
        clk = std.Clock(self.clk)

        @std.sequential(clk, comment=[
            "Primitive example.",
            "Since the incoming data is processed in a single clock cycle,",
            "the ready signal can be tied to true."
        ])
        def no_delay():
            self.a_ready <<= True

            if self.a_valid:
                self.a_processed_data <<= self.a_data
        
        @std.sequential(clk, comment=[
            "We can also implement the process using an await expression.",
            "The disadvantage of this approach is, that we introduce a state transition.",
            "Consequently, one transaction takes two clock cycles."
        ])
        async def simple_coroutine():
            self.b_ready <<= True
            await self.b_valid
            self.b_ready <<= False

            self.b_processed_data <<= self.b_data
        
        @std.sequential(clk, comment=[
            "By wrapping the interface in a while-true-continue loop,",
            "we skip over the transition at the end for the while-body.",
            "This coroutine can process one transaction per clock cycle.",
            "The advantage compared to the first example is, that this pattern",
            "scales to more complex designed with data processing that stretches",
            "multiple clock cycles."
        ])
        async def using_continue():

            while True:
                self.c_ready <<= True
                await self.c_valid
                self.c_ready <<= False
                self.c_processed_data <<= self.c_data
                continue
        
        async def some_complex_data_processing(inp):
            # Perform some task, that takes multiple clock cycles (example, write to slow interface).
            # For simplicity we just wait for some time.
            # std.wait_for is effectively a while-loop that counts to the given value.
            await std.wait_for(5)
            return inp
        
        @std.sequential(clk, comment=[
            "As we have seen in the previous example, we can use the continue statement",
            "to return to the ready state without the usual transition at the end of the loop.",
            "This allows us to define a skeleton implementation of the interface and",
            "reuse it for arbitrary data processing tasks without any",
            "additional clock cycles."
        ])
        async def longer_example():

            while True:
                std.comment("start of loop is duplicated to immediately set ready once the process is done")
                self.d_ready <<= True

                await self.d_valid
                self.d_ready <<= False

                std.comment("data processing starts here")
                self.d_processed_data <<= await some_complex_data_processing(Signal(self.d_data))
                std.comment("data processing is done here")
                continue


print(std.VhdlCompiler.to_string(ExampleContinue))

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity ExampleContinue is
  port (
    clk : in std_logic;
    a_ready : out std_logic;
    a_valid : in std_logic;
    a_data : in std_logic_vector(31 downto 0);
    b_ready : out std_logic;
    b_valid : in std_logic;
    b_data : in std_logic_vector(31 downto 0);
    c_ready : out std_logic;
    c_valid : in std_logic;
    c_data : in std_logic_vector(31 downto 0);
    d_ready : out std_logic;
    d_valid : in std_logic;
    d_data : in std_logic_vector(31 downto 0);
    a_processed_data : out std_logic_vector(31 downto 0);
    b_processed_data : out std_logic_vector(31 downto 0);
    c_processed_data : out std_logic_vector(31 downto 0);
    d_processed_data : out std_logic_vector(31 downto 0)
    );
end ExampleContinue;


architecture arch_ExampleContinue of ExampleContinue is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0'

## coroutines and classes

Coroutines can be members of classes. This is used by the SerialTransmitter class to define sequential send logic that operates on the transmit signal of a serial interface. To keep the example simple there is no synchronization logic. The same basic structure could also be used for more complex interfaces such as AXI or Wishbone.

In [15]:
from cohdl import Entity, Port, Bit, Unsigned, BitVector, Signal
from cohdl import std

# The SerialTransmitter class wraps a single bit
# transmit signal and defines a coroutine method
# send, that serializes and sends one byte of data
class SerialTransmitter:
    def __init__(self, tx: Signal[Bit]):
        self._tx = tx
    
    # serialize data and send it via the single bit tx signal
    # this coroutine will return after 8 clock cycles
    async def send(self, data: Signal[BitVector[8]]):
        cnt = Signal[Unsigned[4]](8)
        buffer = Signal(data)

        while cnt:
            cnt <<= cnt - 1
            self._tx <<= buffer[0]
            buffer[6:0] <<= buffer[7:1]


class TransmitterExample(Entity):
    clk = Port.input(Bit)
    reset = Port.input(Bit)

    data = Port.input(BitVector[8])
    tx = Port.output(Bit)

    def architecture(self):
        clk = std.Clock(self.clk)
        reset = std.Reset(self.reset)

        transmitter = SerialTransmitter(self.tx)

        @std.sequential(clk, reset)
        async def proc_use_transmitter():
            # perform the transmitter send operation
            # this coroutine call will take 8 clock cycles
            await transmitter.send(self.data)


print(std.VhdlCompiler.to_string(TransmitterExample))


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity TransmitterExample is
  port (
    clk : in std_logic;
    reset : in std_logic;
    data : in std_logic_vector(7 downto 0);
    tx : out std_logic
    );
end TransmitterExample;


architecture arch_TransmitterExample of TransmitterExample is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_tx : std_logic;
  type state_proc_use_transmitter is (state_0, state_1);
  signal s_proc_use_transmitter : state_proc_use_transmitter := state_0;
  signal sig : unsigned(3 downto 0);
  signal sig1 : std_logic_vector(7 downto 0);
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  tx <= buffer_tx;
  

  proc_use_transmitter: process(clk)
    variable temp : boolean;
    variable temp1 : unsigned(3 downto 0);
  begin
    if rising_edge(clk) then
      if reset = '1' t

## alternative implementation

The following example show an alternative implementation of the SerialTransmitter class. It defines a separate sequential context for the serialization process, where the send method is not longer a coroutine. Instead, the send method forwards the given data alongside a start signal to the other process.

In [16]:
from cohdl import Entity, Port, Bit, Unsigned, BitVector, Signal
from cohdl import std

# In this version of SerialTransmitter
# the serialization logic is defined in its own
# sequential context. send only transfers data
# to that context and sets a start signal.
class SerialTransmitter:
    def __init__(self, clk, reset, tx):
        # define local signals
        self._start = Signal[bool](False)
        self._data = Signal[BitVector[8]]()
        self._ready = Signal[bool](False)

        @std.sequential(clk, reset)
        async def proc_serial_transmitter():
            # wait for start signal
            while not self._start:
                self._ready <<= True
            self._ready <<= False

            # create a local copy of the data to send
            buffer = Signal(self._data)
            cnt = Signal[Unsigned[4]](8)

            while cnt:
                cnt <<= cnt - 1
                tx.next = buffer[0]
                buffer[6:0] <<= buffer[7:1]
    
    def ready(self):
        return self._ready
    
    # not a coroutine, this method
    # starts the transmit sequence in the parallel
    # sequential context
    def send(self, data):
        # assert, that send only gets called when
        # the transmitter is ready
        assert self._ready
        self._data <<= data
        self._start ^= True


class TransmitterExample(Entity):
    clk = Port.input(Bit)
    reset = Port.input(Bit)

    data = Port.input(BitVector[8])
    tx = Port.output(Bit)

    def architecture(self):
        clk = std.Clock(self.clk)
        reset = std.Reset(self.reset)

        transmitter = SerialTransmitter(clk, reset, self.tx)

        @std.sequential(clk, reset)
        async def proc_use_transmitter():
            await transmitter.ready()
            transmitter.send(self.data)
            # this process can do some other work for up to 8 clock cycles
            # before the transmitter becomes ready again


print(std.VhdlCompiler.to_string(TransmitterExample))


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;


entity TransmitterExample is
  port (
    clk : in std_logic;
    reset : in std_logic;
    data : in std_logic_vector(7 downto 0);
    tx : out std_logic
    );
end TransmitterExample;


architecture arch_TransmitterExample of TransmitterExample is
  function cohdl_bool_to_std_logic(inp: boolean) return std_logic is
  begin
    if inp then
      return('1');
    else
      return('0');
    end if;
  end function cohdl_bool_to_std_logic;
  signal buffer_tx : std_logic;
  type state_proc_serial_transmitter is (state_0, state_1);
  signal s_proc_serial_transmitter : state_proc_serial_transmitter := state_0;
  signal sig : boolean := false;
  signal sig1 : boolean := false;
  signal sig2 : std_logic_vector(7 downto 0);
  signal val : std_logic_vector(7 downto 0);
  signal sig3 : unsigned(3 downto 0);
begin
  
  -- CONCURRENT BLOCK (buffer assignment)
  tx <= buffer_tx;
  

  proc_serial_transmitter: process(clk)
    va