Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CXXRTL testbench ~25% faster #1

Merged
merged 1 commit into from Jul 18, 2021
Merged

Make CXXRTL testbench ~25% faster #1

merged 1 commit into from Jul 18, 2021

Conversation

whitequark
Copy link
Contributor

The splitnets command removes a few feedback arcs.

@whitequark
Copy link
Contributor Author

Actually, by removing prep -flatten -top $(TOP); async2sync; instead of adding splitnets -driver the testbench runs ~40% faster, but I'm not sure why you were doing async2sync in first place.

@Wren6991
Copy link
Owner

Oh, thanks! If this is in response to my tweet, the slow speed in that case is entirely my fault and due to how that simulation was running in lockstep with OpenOCD, but extra speed when free-running is really useful.

For me this patch changes the behaviour of the RTL, making the processor rapidly go off into the weeds. These are the CXXRTL output files I get with/without the patch, from Yosys 0.9+4081 (git sha1 4379375d, clang 10.0.0-4ubuntu1 -fPIC -Os):

hazard3-dut-with-without-splitnets.zip

@whitequark
Copy link
Contributor Author

That's interesting--this seems to be a bug! What about removing prep and async2sync?

@Wren6991
Copy link
Owner

...which made me realise that the older free-running tb doesn't have any bounds checking on instruction addresses, oops. Quick fix is on master now.

Actually, by removing prep -flatten -top $(TOP); async2sync;

this also gives me odd behaviour and processor instantly goes off into the weeds. Odd because this is the command line I'm using the the openocd testbench (test/sim/openocd):

SYNTH_CMD += read_verilog -I ../../../hdl $(shell listfiles tb.f);
SYNTH_CMD += hierarchy -top $(TOP); proc; opt_clean; async2sync;
SYNTH_CMD += write_cxxrtl -g4 dut.cpp

And that runs fine. I think the proc; opt_clean avoided getting some wires with funky names that gtkwave choked on.

The async2sync is historic, I think I saw some reset issues back in the early days of CXXRTL which went away when I converted to sync reset. Perhaps that's not needed any more.

@whitequark
Copy link
Contributor Author

Do you have some binaries I can run for a quick test? I think I know what issue you're hitting but I'd like to check it more carefully first.

@Wren6991
Copy link
Owner

Sure, here are -Og -g binaries for those two dut cpp files, and a hello world binary.

hazard3-hellow-binaries-with-without-splitnets.zip

Invoke as:

$ ./tb-debug-without-splitnets hellow.bin
Hello world from Hazard5 + CXXRTL!
CPU requested halt. Exit code 123
Ran for 496 cycles

@whitequark
Copy link
Contributor Author

Okay, you're hitting YosysHQ/yosys#2780. There is an issue with how values are propagated to combinatorial output ports. This can be confirmed by applying the following patch, which papers over the underlying soundness issue:

diff --git a/test/sim/tb_cxxrtl/tb.cpp b/test/sim/tb_cxxrtl/tb.cpp
index efb2fd6..0ccae7f 100644
--- a/test/sim/tb_cxxrtl/tb.cpp
+++ b/test/sim/tb_cxxrtl/tb.cpp
@@ -129,7 +129,7 @@ int main(int argc, char **argv) {
 		if (dump_waves)
 			vcd.sample(cycle * 2);
 		top.p_clk.set<bool>(true);
-		top.step();
+		top.step(); top.step();
 		// Handle current data phase, then move current address phase to data phase
 		uint32_t rdata = 0;
 		if (bus_trans && bus_write) {

I'll have to investigate how to fix this without losing performance.

@Wren6991
Copy link
Owner

There is an issue with how values are propagated to combinatorial output ports

This more or less explains the nonsensical VCDs I was seeing :) thanks!

This can be confirmed by applying the following patch, which papers over the underlying soundness issue

Confirming: this does indeed fix my symptoms with the splitnets added, and also with prep; async2sync removed

@Wren6991 Wren6991 merged commit 12bf9bb into Wren6991:master Jul 18, 2021
@whitequark
Copy link
Contributor Author

I'd suggest removing prep instead since it gives an even larger speedup, at least once the upstream bug is fixed.

@whitequark whitequark deleted the patch-1 branch July 18, 2021 15:06
@Wren6991
Copy link
Owner

Wren6991 commented Jul 18, 2021

Oops, yeah. Even with the step(); step(); workaround, the prep-less flow is as fast as what I had before, so I'll go with that. I do still need a

hierarchy -top $(TOP); proc; opt_clean;

Else I get unescaped illegal characters in VCD signal names

GTKWave Analyzer v3.3.103 (w)1999-2019 BSI

Near byte 260, $VAR parse error encountered with 'core.alu.\msb$func$/home/luke/proj/hazard3/hdl/arith/hazard3_aluv'

IIRC flattening hierarchy also gets rid of these but makes the VCDs confusing when you have a design with lots of interaction across hierarchy levels. This is probably just hiding the symptom, probably the VCD writer just needs to escape those? I haven't looked at the code (edit; if I look deeper I will raise an issue :) )

@whitequark
Copy link
Contributor Author

IIRC flattening hierarchy also gets rid of these but makes the VCDs confusing when you have a design with lots of interaction across hierarchy levels

Hold on. You should have the exact same VCD from a flattened and non-flattened design; CXXRTL uses the hdlname attribute to ensure that.

@Wren6991
Copy link
Owner

Sorry; when I said flattened I should have said prep -flattened, my bad, I believe it was some of the passes in there like opt_merge that resulted in nets apparently disappearing at some hierarchy levels

I don't think prep was ever recommended, so that one's my fault for sure

@Wren6991
Copy link
Owner

My brain is fried from gdb debugging, sorry to waste your time by being imprecise

@whitequark
Copy link
Contributor Author

It's okay ^^

@whitequark
Copy link
Contributor Author

This is probably just hiding the symptom, probably the VCD writer just needs to escape those?

What's happening is that GTKWave is hitting :, attempting to parse it as a [msb:lsb] slice, and failing. Please enjoy the corresponding code in the GTKWave VCD parser:

https://github.com/gtkwave/gtkwave/blob/bc06f0fabd174d44f50647b09d6f7c702b3b6c40/gtkwave3/src/vcd.c#L455-L608

@whitequark
Copy link
Contributor Author

sigh YosysHQ/yosys#2881

@Wren6991
Copy link
Owner

Ok, so with my hack from YosysHQ/yosys#2883 I am now enjoying my colons in the sideways orientation, thank you very much

Wren6991 added a commit that referenced this pull request Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants