Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real Verilog tables from Chisel Vecs #983

Open
schoeberl opened this issue Jan 12, 2019 · 5 comments
Open

Real Verilog tables from Chisel Vecs #983

schoeberl opened this issue Jan 12, 2019 · 5 comments

Comments

@schoeberl
Copy link
Contributor

Tables in Chisel are Vecs, but the Verilog output generates a priority MUX. This leads to the fact that Chisel generated tables result in LUT logic in an FPGA although they can be implemented in on-chip memory. This needs more resources and has a lower maximum frequency.

In my tutorials on Chisel I usually mention it as a big advantage that we can use Scala to generate logic tables, e.g., for microcode for a state machine, code for a processor, or a lookup table for functions. All those large tables should end up in a on-chip RAM in an FPGA.

Therefore, I request that tables (Vecs) generate a case statement that is understood by FPGA synthesize tools to generate on-chip ROM.

I've put up an example of a 1 KB table to illustrate the case:

git clone https://github.com/schoeberl/chisel-playground.git

cd chisel-playground
make table-verilog

and run the synthesize with Intel Quartus (from quartus/issue-table.qpf) for a real Verilog table.

Results:

10 logic elements and 8192 memory bits, fmax 250 MHz (restricted to this)

Generate the Chisel based table with:

make table-chisel

and synthesize:

870 logic elements, no memory bits, fmax 191 MHz

As long as (Quartus) synthesize tools cannot extract a simple lookup table from priority MUX code, I strongly propose that Chisel/FIRRTL shall spill out a case based table for a Vec.

Cheers,
Martin

@schoeberl
Copy link
Contributor Author

An update on the issue: with Chisel 3.4 the generated code is a bit better than 3.2, but still a priority MUX. This led to the following regression from updating the Patmos processor from Chisel 2 to 3, when synthesizing to an (old) FPGA

Chisel 2: fmax 80 MHz, fetch stage 740 LCs
Chisel 3: fmax 67 MHz, fetch stage 3170 LCs

Chisel 2 generated Verilog can be put in on-chip memory, not the Chisel 3 code.

Chisel 2 Verilog is:

  always @(*) case (T20)
    0: T18 = 32'h87c20000;
    1: T18 = 32'h87c40000;
    2: T18 = 32'h3e1000;
    3: T18 = 32'h2402026;
    4: T18 = 32'hf0010000;
...

Chisel 3 Verilog is:

  wire [31:0] _GEN_8 = 9'h1 == addrEven[9:1] ? 32'h20700 : 32'h54; // @[Fetch.scala 100:26 Fetch.scala 100:26]
  wire [31:0] _GEN_9 = 9'h2 == addrEven[9:1] ? 32'h20800 : _GEN_8; // @[Fetch.scala 100:26 Fetch.scala 100:26]
  wire [31:0] _GEN_10 = 9'h3 == addrEven[9:1] ? 32'h2402025 : _GEN_9; // @[Fetch.scala 100:26 Fetch.scala 100:26]
  wire [31:0] _GEN_11 = 9'h4 == addrEven[9:1] ? 32'h87c20000 : _GEN_10; // @[Fetch.scala 100:26 Fetch.scala 100:26]
  wire [31:0] _GEN_12 = 9'h5 == addrEven[9:1] ? 32'h2821085 : _GEN_11; // @[Fetch.scala 100:26 Fetch.scala 100:26]
  wire [31:0] _GEN_13 = 9'h6 == addrEven[9:1] ? 32'h2021062 : _GEN_12; // @[Fetch.scala 100:26 Fetch.scala 100:26]

Cheers,
Martin

@ekiwi
Copy link
Contributor

ekiwi commented Jan 13, 2021

With Chisel v3.4 you can initialize memories (emitted as an init block in Verilog).
So you could try to build a ROM like this: https://scastie.scala-lang.org/QmN0vvjwRda0MBdx5YLxmw (see the MemROM)

The one bug that still needs to be fixed is that the init block is omitted when SYNTHESIS is defined (which is the case for some FPGA tools).

In general I would prefer to switch to the read-only Mem implementation for ROMs since that one is super easy to recognize in the backend and we could easily adjust the code emission if we find a way that is superior to doing a Verilog array with an init block.

@schoeberl
Copy link
Contributor Author

Thanks for your proposal. But it looks a bit like a workaround to a Verilog emission issue. Using a plain Vec for a logic table is more elegant than the preloading of a ROM.

And the same issue pops up when using a switch statement. I have an example of a small ALU using a switch statement that doubled the number of logic cells when coding it in Chisel compared to direct coding in Verilog. The fmax got a hit as well.

@ekiwi
Copy link
Contributor

ekiwi commented Jan 14, 2021

Using a plain Vec for a logic table is more elegant than the preloading of a ROM.

I think using a Memory to model a ReadOnlyMemory makes a lot of sense. In general Chisel could have gone the route of trying to infer memories from Vec instead of adding an explicit Mem construct but didn't.

I do agree that your ALU should not blow up. We could probably write a compiler pass that detects switch statements and tries to emit them as such.

@schoeberl
Copy link
Contributor Author

schoeberl commented Jan 15, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants