write_cxxrtl: new backend #1562

whitequark · 2019-12-08T02:43:10Z

The PR is ready for merging; it has been used with medium-size practical designs like Minerva SoCs and performs well.

cxxrtl is a new backend that accepts any valid synthesizable RTLIL and outputs readable, debuggable C++ code with close to 1:1 correspondence to the original RTLIL elements (including processes), that may be compiled and run to simulate the RTLIL design. It achieves this by implementing arbitrary width arithmetics using template metaprogramming, taking care to make sure that the generated machine code is compact and efficient. The chosen implementation strategy places emphasis on flexibility and simplicity of implementation rather than speed of generated code (or compilation speed); it uses delta cycles and two-phase commit so that non-flattened hierarchical designs, separate compilation, multiclock designs, clock dividers, latches, and even logic loops just work. Moreover, arbitrary modules can be replaced with C++ implementations, or C++ blackboxes can be used.

This backend implements only two-valued logic, i.e. X and Z are not supported as a part of wire values; in other words, X-propagation and tristate inout ports are not supported. For the time being, some coarse cells ($fa, $lcu, $alu, $macc, $lut, $sop, $ff, $fsm), all fine cells, and all formal cells are not implemented.

As an example of the current state of the backend, the command yosys uart.il -o uart.cc compiles uart.il down to uart.cc. A compiled design should be driven by a user-defined main() function that could be similar to minerva_driver.cc.

ydnatag · 2019-12-11T19:41:09Z

Hi,
First of all, amazing work!! Thanks you for all your effort and contributions!!!

I was developing a python wrapper for cxxrtl backend and i found an unexpected behavior from commit f03c200.

i'm testing a sync adder and the result of the sum is not as expected for value's width longer than 61 bits.

Debugging:

I've compared python values with chunks and they have the same values (considering chunks are 32 bits and python digits are 30 bits).
If i wait for another rising edge, the adder result is correct.
With commits older than f03c200 it works perfectly without changes.
I'm having the following warning: Warning: Ignoring module top because it contains processes (run 'proc' command first).

Let me know if i'm doing something wrong or if i can help you.
Thanks
A.

whitequark · 2019-12-12T01:58:16Z

I was developing a python wrapper for cxxrtl backend

Note that I already have a specific plan for using cxxsim from nmigen and I will proceed with it. If you were planning to upstream your code to nmigen then you are likely doing duplicate work (of course you are free to do so, I am just making sure this is understood).

i'm testing a sync adder and the result of the sum is not as expected for value's width longer than 61 bits.

Please provide an MCVE with a C++ driver for the simulation, similar to the ones I've demonstrated in the PR description. After that I will fix the bug, improve debugging of this pass, and teach how to use the improved debugging facilities to debug any such issues in the future.

ydnatag · 2019-12-19T22:05:04Z

I am just making sure this is understood

Completely understood. Let me know if i can contribute with something.

Please provide an MCVE with a C++ driver for the simulation, similar to the ones I've demonstrated in the PR description. After that I will fix the bug, improve debugging of this pass, and teach how to use the improved debugging facilities to debug any such issues in the future.

I tried to reproduce the same "bug" in C++ but i couldn't. I will try again next week.

nakengelhardt · 2020-01-28T17:01:30Z

@whitequark I'm triaging PRs. What's the current status for this? If you want to take care of follow-up on this yourself, please assign yourself to it, otherwise we'll occasionally check in to ask if there's anything that needs to be done.

whitequark · 2020-01-29T04:13:54Z

@nakengelhardt The PR can in principle be merged in the current condition, but for it to be useful for its original purpose (simulating nmigen-soc designs) it's still necessary to do some changes, so I'd rather keep it open for a bit longer and merge once it's really done. Assigned to myself.

whitequark · 2020-04-03T16:09:12Z

Hierarchical designs can now be converted to C++. (But black boxes are not supported yet.)

This commit adds a basic implementation that isn't very performant but implements most of the planned features.

This results in massive gains in performance, equally massive reduction in compile time, and improved readability.

This results in further massive gains in performance, modest decrease in compile time, and, for designs without feedback arcs, makes it possible to run eval() once per clock edge in certain conditions.

After this commit, if NDEBUG is not defined, out-of-bounds accesses cause assertion failures for reads and writes. If NDEBUG is defined, out-of-bounds reads return zeroes, and out-of-bounds writes are ignored. This commit also adds support for memories that start with a non-zero index (`Memory::start_offset` in RTLIL).

Hierarchical design simulations are generally much slower, but this comes with a major increase in flexibility: 1. Since the `flatten` pass currently does not support flattening of designs with processes, this is the only way to simulate such designs with cxxrtl. 2. Support for hierarchy paves way for simulation black boxes, which are necessary for e.g. replacing PHYs with C++ code that integrates with the host system.

This commit reduces space and time overhead for writable memories to O(write port count) in both cases; implements handling for write port priorities; and simplifies runtime representation of memories.

Also, fix the semantics of SET/CLR inputs of the $dffsr cell, and fix the scheduling of async FF cells to consider ARST/SET/CLR->Q as a forward combinatorial arc.

Also, fix codegen for $dffe and $adff.

cliffordwolf

LGTM

whitequark · 2020-04-10T01:31:43Z

Thanks!

To comment on the future of this backend, it is very much useful and has a high degree of completeness if you're simulating non-techmapped designs, but I have more plans for it. Two things that definitely need to happen is a convenient API for writing C++ blackboxes, and a reflection mechanism for loading generated code from other languages like Python.

However, done and usable is better than perfect and 100% complete, so I merged the PR to let everyone know they can start experimenting with it now.

udif · 2020-04-10T04:19:05Z

What's your reason going through this this route when Verilator already exists?

Ravenslofty · 2020-04-10T10:35:26Z

nMigen generates RTLIL designs. To use Verilator, nMigen needs to put it through Yosys to get Verilog, put it through Verilator to get C++, and then put it through a compiler to get a binary.

But Yosys can do everything Verilator does, so by directly outputting C++ from RTLIL nMigen has skipped going via Verilog plus all the transformations Verilator makes, and the resulting time to build a simulation model has decreased.

udif · 2020-04-10T13:10:39Z

Thanks, it's clearer now.
Is the nMigen flow using Yosys like this?
nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?
Or does nMigen currently calls this backend directly through a python wrapper?

Ravenslofty · 2020-04-10T13:29:21Z

Is the nMigen flow using Yosys like this?
nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?

Yes.

Ravenslofty · 2020-04-10T13:32:02Z

The useful thing about using RTLIL as a target is that anything that can be converted to RTLIL can be output with write_cxxrtl. That means Verilog, nMigen (via read_ilang) and VHDL (via ghdlsynth) can all be combined into a single simulation.

Neither Verilator nor GHDL can do that.

whitequark · 2020-04-10T14:49:40Z

What's your reason going through this this route when Verilator already exists?

Verilator is strictly less powerful than cxxrtl. Specifically, cxxrtl can simulate logic with arbitrary feedback paths. This includes clock dividers, ripple counters, D-latches, SR-latches, logic loops and so on, but most importantly, it is useful for benign feedback paths (i.e. in fully synchronous designs where combinatorial paths have to be evaluated more than once because of the way they are translated) that arise in designs with wires driven by multiple processes (each bit being driven by a single process) and non-flattened hierarchical designs.

Whether having this power (and the associated tradeoffs) is useful to you is something only you can decide. But the fact that my translator provides it and Verilator doesn't makes it inherently valuable, since at the very least it lets you explore approaches impossible with Verilator.

Is the nMigen flow using Yosys like this?
nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?

There's no direct nMigen support yet, but once that's implemented, this is how it will work.

whitequark · 2020-04-10T15:26:11Z

Oh, one more thing on the topic of:

nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?

Interestingly, running nearly all combinations of Yosys optimization passes that I've tried makes the generated C++ slower, not faster. This is in line with my hypothesis (which was driving the implementation decisions behind the cxxrtl backend) that keeping the generated code as close to RTLIL as possible (and by extension as close to the original source code as possible) will result in it being structured in a way that can be exploited by C++ compilers, since C++ compilers are (implicitly) tailored for human-written code and this backend, as are most HDL toolchains, is also (implicitly) tailroed for human-written code.

It is possible that there are kinds of generated RTLIL that would benefit from some kinds of Yosys optimization passes, so of course the ability to do this is valuable. But at the moment, I am almost exclusively relying on the optimizer in the C++ compiler, not in Yosys.

whitequark force-pushed the write_cxxrtl branch 4 times, most recently from 276ba0a to 1c0b3d0 Compare December 10, 2019 20:10

whitequark changed the title ~~[RFC] write_cxxrtl: new backend. (WIP)~~ [RFC] write_cxxrtl: new backend Dec 10, 2019

whitequark force-pushed the write_cxxrtl branch from 1c0b3d0 to dc13e72 Compare December 11, 2019 13:51

whitequark self-assigned this Jan 29, 2020

whitequark force-pushed the write_cxxrtl branch from dc13e72 to 4ef8c4b Compare February 1, 2020 19:19

whitequark force-pushed the write_cxxrtl branch from 4ef8c4b to e79c3a3 Compare February 9, 2020 22:33

whitequark mentioned this pull request Feb 16, 2020

Integrate the CXXSim simulator amaranth-lang/amaranth#324

Open

15 tasks

whitequark force-pushed the write_cxxrtl branch 2 times, most recently from 0ff0b53 to 268b4e0 Compare April 3, 2020 16:07

whitequark force-pushed the write_cxxrtl branch 4 times, most recently from fbe7000 to 6fd5932 Compare April 5, 2020 02:06

whitequark changed the title ~~[RFC] write_cxxrtl: new backend~~ write_cxxrtl: new backend Apr 5, 2020

whitequark force-pushed the write_cxxrtl branch 4 times, most recently from 90a6da5 to 4a9cae3 Compare April 6, 2020 05:08

whitequark added 3 commits April 9, 2020 04:08

write_cxxrtl: new backend.

d20e971

This commit adds a basic implementation that isn't very performant but implements most of the planned features.

write_cxxrtl: elide wires for results of comb cells used once.

d6d7273

This results in massive gains in performance, equally massive reduction in compile time, and improved readability.

write_cxxrtl: statically schedule comb logic and localize wires.

5157691

This results in further massive gains in performance, modest decrease in compile time, and, for designs without feedback arcs, makes it possible to run eval() once per clock edge in certain conditions.

whitequark added 7 commits April 9, 2020 04:08

write_cxxrtl: improve writable memory handling.

01e6850

This commit reduces space and time overhead for writable memories to O(write port count) in both cases; implements handling for write port priorities; and simplifies runtime representation of memories.

write_cxxrtl: add support for $slice and $concat cells.

9534b51

write_cxxrtl: add support for $sr cell.

711df56

Also, fix the semantics of SET/CLR inputs of the $dffsr cell, and fix the scheduling of async FF cells to consider ARST/SET/CLR->Q as a forward combinatorial arc.

write_cxxrtl: add support for $dlatch and $dlatchsr cells.

753e340

Also, fix codegen for $dffe and $adff.

write_cxxrtl: add basic documentation.

4737f42

whitequark force-pushed the write_cxxrtl branch from 4a9cae3 to 4737f42 Compare April 9, 2020 04:08

cliffordwolf approved these changes Apr 9, 2020

View reviewed changes

whitequark merged commit 7c06cb6 into YosysHQ:master Apr 10, 2020

whitequark deleted the write_cxxrtl branch June 4, 2020 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write_cxxrtl: new backend #1562

write_cxxrtl: new backend #1562

whitequark commented Dec 8, 2019 •

edited

Loading

ydnatag commented Dec 11, 2019

whitequark commented Dec 12, 2019

ydnatag commented Dec 19, 2019

nakengelhardt commented Jan 28, 2020

whitequark commented Jan 29, 2020

whitequark commented Apr 3, 2020

cliffordwolf left a comment

whitequark commented Apr 10, 2020

udif commented Apr 10, 2020

Ravenslofty commented Apr 10, 2020 •

edited

Loading

udif commented Apr 10, 2020

Ravenslofty commented Apr 10, 2020

Ravenslofty commented Apr 10, 2020

whitequark commented Apr 10, 2020

whitequark commented Apr 10, 2020

write_cxxrtl: new backend #1562

write_cxxrtl: new backend #1562

Conversation

whitequark commented Dec 8, 2019 • edited Loading

ydnatag commented Dec 11, 2019

whitequark commented Dec 12, 2019

ydnatag commented Dec 19, 2019

nakengelhardt commented Jan 28, 2020

whitequark commented Jan 29, 2020

whitequark commented Apr 3, 2020

cliffordwolf left a comment

Choose a reason for hiding this comment

whitequark commented Apr 10, 2020

udif commented Apr 10, 2020

Ravenslofty commented Apr 10, 2020 • edited Loading

udif commented Apr 10, 2020

Ravenslofty commented Apr 10, 2020

Ravenslofty commented Apr 10, 2020

whitequark commented Apr 10, 2020

whitequark commented Apr 10, 2020

whitequark commented Dec 8, 2019 •

edited

Loading

Ravenslofty commented Apr 10, 2020 •

edited

Loading