Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_cxxrtl: new backend #1562

Merged
merged 10 commits into from
Apr 10, 2020
Merged

write_cxxrtl: new backend #1562

merged 10 commits into from
Apr 10, 2020

Conversation

whitequark
Copy link
Member

@whitequark whitequark commented Dec 8, 2019

The PR is ready for merging; it has been used with medium-size practical designs like Minerva SoCs and performs well.

cxxrtl is a new backend that accepts any valid synthesizable RTLIL and outputs readable, debuggable C++ code with close to 1:1 correspondence to the original RTLIL elements (including processes), that may be compiled and run to simulate the RTLIL design. It achieves this by implementing arbitrary width arithmetics using template metaprogramming, taking care to make sure that the generated machine code is compact and efficient. The chosen implementation strategy places emphasis on flexibility and simplicity of implementation rather than speed of generated code (or compilation speed); it uses delta cycles and two-phase commit so that non-flattened hierarchical designs, separate compilation, multiclock designs, clock dividers, latches, and even logic loops just work. Moreover, arbitrary modules can be replaced with C++ implementations, or C++ blackboxes can be used.

This backend implements only two-valued logic, i.e. X and Z are not supported as a part of wire values; in other words, X-propagation and tristate inout ports are not supported. For the time being, some coarse cells ($fa, $lcu, $alu, $macc, $lut, $sop, $ff, $fsm), all fine cells, and all formal cells are not implemented.

As an example of the current state of the backend, the command yosys uart.il -o uart.cc compiles uart.il down to uart.cc. A compiled design should be driven by a user-defined main() function that could be similar to minerva_driver.cc.

@whitequark whitequark force-pushed the write_cxxrtl branch 4 times, most recently from 276ba0a to 1c0b3d0 Compare December 10, 2019 20:10
@whitequark whitequark changed the title [RFC] write_cxxrtl: new backend. (WIP) [RFC] write_cxxrtl: new backend Dec 10, 2019
@ydnatag
Copy link

ydnatag commented Dec 11, 2019

Hi,
First of all, amazing work!! Thanks you for all your effort and contributions!!!

I was developing a python wrapper for cxxrtl backend and i found an unexpected behavior from commit f03c200.

i'm testing a sync adder and the result of the sum is not as expected for value's width longer than 61 bits.

Debugging:

  • I've compared python values with chunks and they have the same values (considering chunks are 32 bits and python digits are 30 bits).
  • If i wait for another rising edge, the adder result is correct.
  • With commits older than f03c200 it works perfectly without changes.
  • I'm having the following warning: Warning: Ignoring module top because it contains processes (run 'proc' command first).

Let me know if i'm doing something wrong or if i can help you.
Thanks
A.

@whitequark
Copy link
Member Author

I was developing a python wrapper for cxxrtl backend

Note that I already have a specific plan for using cxxsim from nmigen and I will proceed with it. If you were planning to upstream your code to nmigen then you are likely doing duplicate work (of course you are free to do so, I am just making sure this is understood).

i'm testing a sync adder and the result of the sum is not as expected for value's width longer than 61 bits.

Please provide an MCVE with a C++ driver for the simulation, similar to the ones I've demonstrated in the PR description. After that I will fix the bug, improve debugging of this pass, and teach how to use the improved debugging facilities to debug any such issues in the future.

@ydnatag
Copy link

ydnatag commented Dec 19, 2019

I am just making sure this is understood

Completely understood. Let me know if i can contribute with something.

Please provide an MCVE with a C++ driver for the simulation, similar to the ones I've demonstrated in the PR description. After that I will fix the bug, improve debugging of this pass, and teach how to use the improved debugging facilities to debug any such issues in the future.

I tried to reproduce the same "bug" in C++ but i couldn't. I will try again next week.

@nakengelhardt
Copy link
Member

@whitequark I'm triaging PRs. What's the current status for this? If you want to take care of follow-up on this yourself, please assign yourself to it, otherwise we'll occasionally check in to ask if there's anything that needs to be done.

@whitequark whitequark self-assigned this Jan 29, 2020
@whitequark
Copy link
Member Author

@nakengelhardt The PR can in principle be merged in the current condition, but for it to be useful for its original purpose (simulating nmigen-soc designs) it's still necessary to do some changes, so I'd rather keep it open for a bit longer and merge once it's really done. Assigned to myself.

@whitequark
Copy link
Member Author

Hierarchical designs can now be converted to C++. (But black boxes are not supported yet.)

@whitequark whitequark force-pushed the write_cxxrtl branch 4 times, most recently from fbe7000 to 6fd5932 Compare April 5, 2020 02:06
@whitequark whitequark changed the title [RFC] write_cxxrtl: new backend write_cxxrtl: new backend Apr 5, 2020
@whitequark whitequark force-pushed the write_cxxrtl branch 4 times, most recently from 90a6da5 to 4a9cae3 Compare April 6, 2020 05:08
This commit adds a basic implementation that isn't very performant
but implements most of the planned features.
This results in massive gains in performance, equally massive
reduction in compile time, and improved readability.
This results in further massive gains in performance, modest decrease
in compile time, and, for designs without feedback arcs, makes it
possible to run eval() once per clock edge in certain conditions.
After this commit, if NDEBUG is not defined, out-of-bounds accesses
cause assertion failures for reads and writes. If NDEBUG is defined,
out-of-bounds reads return zeroes, and out-of-bounds writes are
ignored.

This commit also adds support for memories that start with a non-zero
index (`Memory::start_offset` in RTLIL).
Hierarchical design simulations are generally much slower, but this
comes with a major increase in flexibility:
 1. Since the `flatten` pass currently does not support flattening
    of designs with processes, this is the only way to simulate such
    designs with cxxrtl.
 2. Support for hierarchy paves way for simulation black boxes,
    which are necessary for e.g. replacing PHYs with C++ code that
    integrates with the host system.
This commit reduces space and time overhead for writable memories
to O(write port count) in both cases; implements handling for write
port priorities; and simplifies runtime representation of memories.
Also, fix the semantics of SET/CLR inputs of the $dffsr cell, and
fix the scheduling of async FF cells to consider ARST/SET/CLR->Q
as a forward combinatorial arc.
Also, fix codegen for $dffe and $adff.
Copy link
Collaborator

@cliffordwolf cliffordwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@whitequark whitequark merged commit 7c06cb6 into YosysHQ:master Apr 10, 2020
@whitequark
Copy link
Member Author

Thanks!

To comment on the future of this backend, it is very much useful and has a high degree of completeness if you're simulating non-techmapped designs, but I have more plans for it. Two things that definitely need to happen is a convenient API for writing C++ blackboxes, and a reflection mechanism for loading generated code from other languages like Python.

However, done and usable is better than perfect and 100% complete, so I merged the PR to let everyone know they can start experimenting with it now.

@udif
Copy link
Contributor

udif commented Apr 10, 2020

What's your reason going through this this route when Verilator already exists?

@Ravenslofty
Copy link
Collaborator

Ravenslofty commented Apr 10, 2020

nMigen generates RTLIL designs. To use Verilator, nMigen needs to put it through Yosys to get Verilog, put it through Verilator to get C++, and then put it through a compiler to get a binary.

But Yosys can do everything Verilator does, so by directly outputting C++ from RTLIL nMigen has skipped going via Verilog plus all the transformations Verilator makes, and the resulting time to build a simulation model has decreased.

@udif
Copy link
Contributor

udif commented Apr 10, 2020

Thanks, it's clearer now.
Is the nMigen flow using Yosys like this?
nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?
Or does nMigen currently calls this backend directly through a python wrapper?

@Ravenslofty
Copy link
Collaborator

Is the nMigen flow using Yosys like this?
nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?

Yes.

@Ravenslofty
Copy link
Collaborator

The useful thing about using RTLIL as a target is that anything that can be converted to RTLIL can be output with write_cxxrtl. That means Verilog, nMigen (via read_ilang) and VHDL (via ghdlsynth) can all be combined into a single simulation.

Neither Verilator nor GHDL can do that.

@whitequark
Copy link
Member Author

What's your reason going through this this route when Verilator already exists?

Verilator is strictly less powerful than cxxrtl. Specifically, cxxrtl can simulate logic with arbitrary feedback paths. This includes clock dividers, ripple counters, D-latches, SR-latches, logic loops and so on, but most importantly, it is useful for benign feedback paths (i.e. in fully synchronous designs where combinatorial paths have to be evaluated more than once because of the way they are translated) that arise in designs with wires driven by multiple processes (each bit being driven by a single process) and non-flattened hierarchical designs.

Whether having this power (and the associated tradeoffs) is useful to you is something only you can decide. But the fact that my translator provides it and Verilator doesn't makes it inherently valuable, since at the very least it lets you explore approaches impossible with Verilator.

Is the nMigen flow using Yosys like this?
nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?

There's no direct nMigen support yet, but once that's implemented, this is how it will work.

@whitequark
Copy link
Member Author

Oh, one more thing on the topic of:

nMigen -> ILANG frontend -> (optional passes by Yosys) -> cxxrtl backend?

Interestingly, running nearly all combinations of Yosys optimization passes that I've tried makes the generated C++ slower, not faster. This is in line with my hypothesis (which was driving the implementation decisions behind the cxxrtl backend) that keeping the generated code as close to RTLIL as possible (and by extension as close to the original source code as possible) will result in it being structured in a way that can be exploited by C++ compilers, since C++ compilers are (implicitly) tailored for human-written code and this backend, as are most HDL toolchains, is also (implicitly) tailroed for human-written code.

It is possible that there are kinds of generated RTLIL that would benefit from some kinds of Yosys optimization passes, so of course the ability to do this is valuable. But at the moment, I am almost exclusively relying on the optimizer in the C++ compiler, not in Yosys.

@whitequark whitequark deleted the write_cxxrtl branch June 4, 2020 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants