-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_cxxrtl: new backend #1562
Conversation
276ba0a
to
1c0b3d0
Compare
1c0b3d0
to
dc13e72
Compare
Hi, I was developing a python wrapper for cxxrtl backend and i found an unexpected behavior from commit f03c200. i'm testing a sync adder and the result of the sum is not as expected for value's width longer than 61 bits. Debugging:
Let me know if i'm doing something wrong or if i can help you. |
Note that I already have a specific plan for using cxxsim from nmigen and I will proceed with it. If you were planning to upstream your code to nmigen then you are likely doing duplicate work (of course you are free to do so, I am just making sure this is understood).
Please provide an MCVE with a C++ driver for the simulation, similar to the ones I've demonstrated in the PR description. After that I will fix the bug, improve debugging of this pass, and teach how to use the improved debugging facilities to debug any such issues in the future. |
Completely understood. Let me know if i can contribute with something.
I tried to reproduce the same "bug" in C++ but i couldn't. I will try again next week. |
@whitequark I'm triaging PRs. What's the current status for this? If you want to take care of follow-up on this yourself, please assign yourself to it, otherwise we'll occasionally check in to ask if there's anything that needs to be done. |
@nakengelhardt The PR can in principle be merged in the current condition, but for it to be useful for its original purpose (simulating nmigen-soc designs) it's still necessary to do some changes, so I'd rather keep it open for a bit longer and merge once it's really done. Assigned to myself. |
dc13e72
to
4ef8c4b
Compare
4ef8c4b
to
e79c3a3
Compare
0ff0b53
to
268b4e0
Compare
Hierarchical designs can now be converted to C++. (But black boxes are not supported yet.) |
fbe7000
to
6fd5932
Compare
90a6da5
to
4a9cae3
Compare
This commit adds a basic implementation that isn't very performant but implements most of the planned features.
This results in massive gains in performance, equally massive reduction in compile time, and improved readability.
This results in further massive gains in performance, modest decrease in compile time, and, for designs without feedback arcs, makes it possible to run eval() once per clock edge in certain conditions.
After this commit, if NDEBUG is not defined, out-of-bounds accesses cause assertion failures for reads and writes. If NDEBUG is defined, out-of-bounds reads return zeroes, and out-of-bounds writes are ignored. This commit also adds support for memories that start with a non-zero index (`Memory::start_offset` in RTLIL).
Hierarchical design simulations are generally much slower, but this comes with a major increase in flexibility: 1. Since the `flatten` pass currently does not support flattening of designs with processes, this is the only way to simulate such designs with cxxrtl. 2. Support for hierarchy paves way for simulation black boxes, which are necessary for e.g. replacing PHYs with C++ code that integrates with the host system.
This commit reduces space and time overhead for writable memories to O(write port count) in both cases; implements handling for write port priorities; and simplifies runtime representation of memories.
Also, fix the semantics of SET/CLR inputs of the $dffsr cell, and fix the scheduling of async FF cells to consider ARST/SET/CLR->Q as a forward combinatorial arc.
Also, fix codegen for $dffe and $adff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! To comment on the future of this backend, it is very much useful and has a high degree of completeness if you're simulating non-techmapped designs, but I have more plans for it. Two things that definitely need to happen is a convenient API for writing C++ blackboxes, and a reflection mechanism for loading generated code from other languages like Python. However, done and usable is better than perfect and 100% complete, so I merged the PR to let everyone know they can start experimenting with it now. |
What's your reason going through this this route when Verilator already exists? |
nMigen generates RTLIL designs. To use Verilator, nMigen needs to put it through Yosys to get Verilog, put it through Verilator to get C++, and then put it through a compiler to get a binary. But Yosys can do everything Verilator does, so by directly outputting C++ from RTLIL nMigen has skipped going via Verilog plus all the transformations Verilator makes, and the resulting time to build a simulation model has decreased. |
Thanks, it's clearer now. |
Yes. |
The useful thing about using RTLIL as a target is that anything that can be converted to RTLIL can be output with write_cxxrtl. That means Verilog, nMigen (via read_ilang) and VHDL (via ghdlsynth) can all be combined into a single simulation. Neither Verilator nor GHDL can do that. |
Verilator is strictly less powerful than cxxrtl. Specifically, cxxrtl can simulate logic with arbitrary feedback paths. This includes clock dividers, ripple counters, D-latches, SR-latches, logic loops and so on, but most importantly, it is useful for benign feedback paths (i.e. in fully synchronous designs where combinatorial paths have to be evaluated more than once because of the way they are translated) that arise in designs with wires driven by multiple processes (each bit being driven by a single process) and non-flattened hierarchical designs. Whether having this power (and the associated tradeoffs) is useful to you is something only you can decide. But the fact that my translator provides it and Verilator doesn't makes it inherently valuable, since at the very least it lets you explore approaches impossible with Verilator.
There's no direct nMigen support yet, but once that's implemented, this is how it will work. |
Oh, one more thing on the topic of:
Interestingly, running nearly all combinations of Yosys optimization passes that I've tried makes the generated C++ slower, not faster. This is in line with my hypothesis (which was driving the implementation decisions behind the cxxrtl backend) that keeping the generated code as close to RTLIL as possible (and by extension as close to the original source code as possible) will result in it being structured in a way that can be exploited by C++ compilers, since C++ compilers are (implicitly) tailored for human-written code and this backend, as are most HDL toolchains, is also (implicitly) tailroed for human-written code. It is possible that there are kinds of generated RTLIL that would benefit from some kinds of Yosys optimization passes, so of course the ability to do this is valuable. But at the moment, I am almost exclusively relying on the optimizer in the C++ compiler, not in Yosys. |
The PR is ready for merging; it has been used with medium-size practical designs like Minerva SoCs and performs well.
cxxrtl is a new backend that accepts any valid synthesizable RTLIL and outputs readable, debuggable C++ code with close to 1:1 correspondence to the original RTLIL elements (including processes), that may be compiled and run to simulate the RTLIL design. It achieves this by implementing arbitrary width arithmetics using template metaprogramming, taking care to make sure that the generated machine code is compact and efficient. The chosen implementation strategy places emphasis on flexibility and simplicity of implementation rather than speed of generated code (or compilation speed); it uses delta cycles and two-phase commit so that non-flattened hierarchical designs, separate compilation, multiclock designs, clock dividers, latches, and even logic loops just work. Moreover, arbitrary modules can be replaced with C++ implementations, or C++ blackboxes can be used.
This backend implements only two-valued logic, i.e. X and Z are not supported as a part of wire values; in other words, X-propagation and tristate inout ports are not supported. For the time being, some coarse cells (
$fa
,$lcu
,$alu
,$macc
,$lut
,$sop
,$ff
,$fsm
), all fine cells, and all formal cells are not implemented.As an example of the current state of the backend, the command
yosys uart.il -o uart.cc
compiles uart.il down to uart.cc. A compiled design should be driven by a user-definedmain()
function that could be similar to minerva_driver.cc.