Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider simulator interface #228

Open
nmigen-issue-migration opened this issue Sep 23, 2019 · 38 comments
Open

Reconsider simulator interface #228

nmigen-issue-migration opened this issue Sep 23, 2019 · 38 comments

Comments

@nmigen-issue-migration
Copy link

nmigen-issue-migration commented Sep 23, 2019

Issue by whitequark
Monday Sep 23, 2019 at 08:52 GMT
Originally opened as m-labs/nmigen#228


The oMigen simulator had a strange interface where user processes would run after a clock edge, but before synchronous signals assume their new value. Not only it is very hard to use and bug-prone (my simulation testing workflow, for example, involves adding yield until tests pass...), but it also makes certain patterns impossible! For example, let's consider the FIFOInterface.write() simulation function:

def write(self, data):
    """Write method for simulation."""
    assert (yield self.w_rdy)
    yield self.w_data.eq(data)
    yield self.w_en.eq(1)
    yield
    yield self.w_en.eq(0)
    yield

It has to take two clock cycles. This is because, after the first yield, it is not yet possible to read the new value of self.w_rdy; running (yield self.w_rdy) would return the value from the previous cycle. To make sure it's updated, it is necessary to deassert self.w_en, and waste another cycle doing nothing.

Because of this, I've removed these methods from nMigen FIFOInterface in a57b76f.

I think we should look at prior art for cosimulation (cocotb?) and adopt a usage pattern that is easier to use. This is complicated by the fact that it is desirable to run oMigen testbenches unmodified.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by sbourdeauducq
Monday Oct 28, 2019 at 02:53 GMT


The oMigen simulator had a strange interface where user processes would run after a clock edge, but before synchronous signals assume their new value.

The reason for this is to ensure determinism when there are multiple processes on the same clock communicating through signals. The result is independent of the order in which those process are run.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by sbourdeauducq
Monday Oct 28, 2019 at 02:57 GMT


Was this changed already? This prints 1 0:

from nmigen import *
from nmigen.back.pysim import *

class Dummy(Elaboratable):
     def __init__(self):
        self.x = Signal()
        self.y = Signal()

     def elaborate(self, platform):
        m = Module()
        m.d.sync += self.y.eq(self.x)
        return m

if __name__ == "__main__":
     dummy = Dummy()
     with Simulator(dummy) as sim:
        def process():
            yield dummy.x.eq(1)
            print((yield dummy.x))
            yield dummy.x.eq(0)
            print((yield dummy.x))
        sim.add_clock(1e-6)
        sim.add_sync_process(process)
        sim.run()

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 02:58 GMT


You aren't using dummy.y at all in this code, no?

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by sbourdeauducq
Monday Oct 28, 2019 at 03:01 GMT


Yes. It is a dummy signal to work around #262.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 03:14 GMT


Was this changed already?

I didn't intend to change this. I think the reason you see the behavior you do is that I did not understand the design of the oMigen simulator, so I didn't implement it entirely correctly.

The reason for this is to ensure determinism when there are multiple processes on the same clock communicating through signals. The result is independent of the order in which those process are run.

OK, so this works like VHDL. That's good. Unfortunately, the problem I outlined in this issue stands: not being able to write a single-cycle FIFOInterface.read() is a serious defect.

I've been thinking about this problem and came to the conclusion that there should probably be three kinds of simulator processes:

  • add_sync_process, working like in oMigen, where the Python code works like synchronous logic;
  • add_comb_process, where the Python code works like combinatorial logic (Bikeshed: design for .get_fragment() #11);
  • add_toplevel_process (or something like that), only one of which can be present in a design, and where you could implement single-cycle FIFOInterface.read.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by sbourdeauducq
Monday Oct 28, 2019 at 10:15 GMT


What about an option to run synchronous processes at twice the clock rate?

For example, a synchronous process could yield a full cycle (default), or a high half-cycle, or a low half-cycle.

At the beginning of a cycle, it could yield either a full cycle (default) or a high half-cycle. At the middle of a cycle it could yield only a low half-cycle. Doing otherwise raises an error, which ensures that processes are always "in phase" with the clock.

This is backward compatible, can be deterministic using the VHDL-style behavior, and supports an arbitrary number of processes.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 10:33 GMT


The basic idea is sound, thanks. But, what happens if someone uses a negedge clock domain (#185) in the design? There are many valid reasons to do so, and even some FPGA primitives (Xilinx UltraRAM) trigger on both clock edges.

So I think we should change it a bit: rather than use a half cycle, run all "sync" processes at the posedge (or the negedge) as it currently happens, and then run all "testbench" (or whatever they are called) processes simultaneously, a very small amount of time later (however much it takes for combinatorial logic to propagate, and for "comb" processes to run as well).

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 10:39 GMT


I would want to differentiate that yield (None). Semantically yield currently means to continue the simulation until before the next clock edge: "update sync signals and settle comb signals again". At that point the values for sync signals for the next cycle are known (and can be altered from the process) and the sync signals are about to update.

What you actually want instead is to be able to differentiate between yielding to update the sync signals and yielding to settle the comb signals.
I don't think mapping this to two half cycles is correct since that will most likely clash with support for multiple clock domains: the settling of comb signals does not take the length of a half cycle. That's arbitrary. In reality you have the sync clock cycle and -- semantically different from it -- a delta cycle that just settles comb.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 10:44 GMT


In terms of the UI, currently add_process requires submitting explicit wait commands (i.e. Tick("domain"), and add_sync_process(domain=) automatically substitutes yield Tick(domain) for any yield with no arguments. We could split that into yield BeforeTick(domain) (the same as current Tick) and yield AfterTick(domain); in both cases, you can only observe "current" values for all signals and update "next" values for signals, preserving determinism, but all built-in synchronous logic, and any "sync" processes, wait for the BeforeTick event if you do yield with no argument, whereas simulation testbenches wait for AfterTick event in that case.

This solves an important issue where something like FIFOInterface.read cannot just yield because the meaning of yield changes depending on the kind of process it's in. But if it can explicitly do yield AfterTick() (with the domain perhaps substituted by some sort of default), then it would be possible to use it unambiguously.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 10:44 GMT


And this extends to multiple clock domains accordingly (sync cycles executed in the pattern defined by their timing beat and comb/delta cycles in between). A process should be able to specify what kind of cycle it wants to yield for: not just a bare yield but maybe yield None for comb settling vs yield dut.sync or yield "sync" for the next corresponding clock edge update.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 10:45 GMT


I don't think BeforeTick/AfterTick is sufficient: there are good reasons to do multiple delta cycles between sync cycles.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 10:45 GMT


Can you provide some examples of those reasons?

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 10:56 GMT


Otherwise yield x.eq(1) could only ever been a sync update. If your process wants to do the equivalent of comb logic, this is limiting. There would not be a way for a process to react to non-trivial comb signal changes from the simulated gateware within the same cycle.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:00 GMT


E.g. if you don't have any synchronous clock domain.
In the simplest case if you want to test whether your module reacts to a signal immediately:

yield stb.eq(1)
yield  # settle comb
assert (yield busy)
yield stb.eq(0)
yield  # settle comb
assert not (yield busy)

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 11:04 GMT


If your process wants to do the equivalent of comb logic, this is limiting.

Comb processes are something I wanted to introduce explicitly (that's the second oldest nMigen issue, #11). In my mind, they would require specifying the exact set of signals they will wait on during creation, and then always wait on exactly those signals; that way, they would have no choice but to be well behaved.

However, that would not work for the snippet you show. And I think you are making a very good point with it. I'm inclined to agree that a separate command for comb settling is desirable.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 11:08 GMT


Perhaps a trio of yield Tick(domain) (where if the domain is omitted, it is taken from the process), waiting until clock edge; yield Settle(), waiting until comb fixpoint; yield Wait(signals...), waiting until any signal in the sensitivity list changes?

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:16 GMT


Yes. yield Tick(domain) will imply a yield Settle() just before the respective domain update. And now we can debate if yield Tick(domain) should be "settle, update sync, and settle again". Then it would be robustly equivalent to the current behavior. I would not want yield Tick(domain) to be equivalent to just "settle, update" since that in practice is always an unstable and inconsistent transient state.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:19 GMT


The yield Wait(signals) (or Watch() or Trigger()) seems a bit orthogonal since it passively waits for some signal-related event while the others wait for something time-related (even though in the case of the delta cycle this is a very short time).

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:23 GMT


Ah. But Wait() is indeed needed to implement coexistence of combinatorial processes and synchronous clock domains and sync processes. You can't implement it as while not (yield trigger): yield Settle(). That would block.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:24 GMT


Do you even need to differentiate add_sync_process and add_comb_process with the above?

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 11:31 GMT


Do you even need to differentiate add_sync_process and add_comb_process with the above?

No, but for that matter, add_sync_process was never strictly a necessity either. Today it is simple syntactic sugar over Tick. (However, without this sugar, you'd have to give a domain name to FIFOInterface.read, which seems suboptimal.)

One option would be to get rid of yield with no arguments completely, and then add a notion of "default domain" to all processes. So doing yield Tick() would use that default domain. It is a bit verbose, but I like its lack of ambiguity.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 11:35 GMT


And now we can debate if yield Tick(domain) should be "settle, update sync, and settle again". Then it would be robustly equivalent to the current behavior. I would not want yield Tick(domain) to be equivalent to just "settle, update" since that in practice is always an unstable and inconsistent transient state.

I'm confused by this paragraph. The current behavior is something like "settle, stop just before updating sync, run all processes", not "settle, update, settle". Or I misunderstand you.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:35 GMT


Yes. yield Tick() is better. If there is a default sync clock domain for elaboration then there should be something analogous for processes.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:37 GMT


The current behavior of yield is "update sync, settle comb". You need to have "update sync" in there.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by jordens
Monday Oct 28, 2019 at 11:39 GMT


And "stop just before updating sync" is in fact a NOP because after settling comb nothing would happen until the next sync update.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 11:50 GMT


You're right. We want the same thing but I was not describing it precisely.

Thank you for this discussion. I now have a very clear picture of how the next iteration of the simulator should look like, and I am happy with every aspect of it.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by sbourdeauducq
Monday Oct 28, 2019 at 11:57 GMT


what happens if someone uses a negedge clock domain (#185) in the design?

A negedge clock domain is just like another clock domain but with the clock shifted 180 degrees. If simulation supports multiple clock domains (it should) then there is nothing really special about it, I think?
Maybe just rename "high/low half-cycles" to "first/second half-cycles" or similar to avoid confusion.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 12:01 GMT


If simulation supports multiple clock domains (it should) then there is nothing really special about it, I think?

There is if there are two clock domains defined to be clocked by different edges of the same signal. Then you have the same problem in a single process that interacts with both.

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by sbourdeauducq
Monday Oct 28, 2019 at 12:04 GMT


Then you have the same problem in a single process that interacts with both.

Not if such processes are not allowed (and I think they should not be allowed: a synchronous process must be associated with a single clock signal and a single sensitivity edge).

@nmigen-issue-migration
Copy link
Author

nmigen-issue-migration commented Oct 28, 2019

Comment by whitequark
Monday Oct 28, 2019 at 12:08 GMT


and I think they should not be allowed: a synchronous process must be associated with a single clock signal and a single sensitivity edge

Among other things this restriction would make it impossible to simulate ResetSynchronizer, which is something that nMigen does today. So it would be a step back.

@andresdemski
Copy link

andresdemski commented Apr 29, 2020

Hello,
I made a simple cxxrtl wrapper where i implemented an interface similar to COCOTB for nMigen.

https://github.com/andresdemski/nmigen-yosim

I know @whitequark is going to add an official cxxrtl support, take this just as an API proposal and don't look at the implementation please (i'm not C++ developer).

Some interesting features of "yosim":

  • rising_edge, falling_edge and timer triggers are implemented
  • configurable sim step
  • vcd support (you can choose which signals you want to dump)
  • fork and join capabilities
  • the scheduller and triggers are in C++

@whitequark
Copy link
Member

whitequark commented Apr 30, 2020

take this just as an API proposal

At the moment, I believe there is no need for a distinct API for nmigen.back.cxxim because when its API matches that of nmigen.back.pysim, it is possible to reuse testbench code across both backends, which not only makes it easier to "upgrade" to cxxsim but also easy to "downgrade" to pysim. So this issue is for improving the API of both backends.

More broadly, the major issue with the existing simulators that I see (the Verilog one, the VHDL one, the cxxsim one too to a slightly lesser degree) is that they offer you a very generic event-driven interface but they're not used for simulating generic event-driven systems--they're used to write testbenches for a single domain of synchronous logic, or sometimes, a few asynchronous domains with a small amount of CDC. That results in races. Verilog processes are inherently nondeterministic by design; VHDL is better if you're simulating logic alone, but it still does break in confusing ways if you're writing a testbench or perhaps a simulation model for a clock gate, where in both cases you might end up having to balance delta cycles.

Adding capabilities like arbitrary edge triggers or fork/join parallelism increases the potential for race conditions. What I'd like to see in nMigen's simulator API instead is (a) a way to inject behavioral models into the simulation that make it easy to write race-free code and (b) a race condition detector, similar to ThreadSanitizer.

By this point I believe I know how to solve (a); cxxrtl builds towards a solution to it by offering black boxes with a well-defined interface that guides the scheduler (the cxxrtl.comb/cxxrtl.sync annotations), and nMigen can go further than cxxrtl did. (b) is, I believe, a (small) open research problem.

@andresdemski
Copy link

andresdemski commented Apr 30, 2020

Is it going to be possible to write simulation model for peripherals like ddr memories, sensors or high speed communication protocols and generate representative waveforms?

I know that you can use IDDR as a blackbox but you won't have a real waveform at the input for an end to end validation.

@whitequark
Copy link
Member

whitequark commented Apr 30, 2020

Is it going to be possible to write simulation model for peripherals like ddr memories, sensors or high speed communication protocols and generate representative waveforms?

Would it be good enough for you to use the vendor simulation models?

@andresdemski
Copy link

andresdemski commented Apr 30, 2020

yep, but the problem is how to generate data in both edges to feed the vendor iddr simulation model.

@whitequark
Copy link
Member

whitequark commented Apr 30, 2020

how to generate data in both edges to feed the vendor iddr simulation model

Ah yes. Both nmigen.back.pysim and CXXRTL actually already support testbench code that can drive signals with true DDR waveforms. The main issue is that it is quite hard to write such testbench code in a race-free way. I think at the moment the only completely reliable solution is to shift the testbench by a further quarter phase, although that can cause other issues.

@rroohhh
Copy link
Contributor

rroohhh commented May 26, 2020

While redoing the simulator interface it should also be considered to make add_clock (or its equivalent) take a frequency to bring it in line with ResourceManager.add_clock_constraint.

@whitequark
Copy link
Member

whitequark commented Jun 28, 2020

Denominating from 0.3 for two reasons:

  • cxxsim is critical for people simulating larger designs with nMigen and we shouldn't let exploratory projects like this issue delay that;
  • migrating to cxxsim from pysim will inevitably expose issues, and it would be a disservice to our downstream to require two migrations at once, since that doubles the issues and obscures the nature of any bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants