Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cost: add keep_hierarchy pass with max_cost argument #4344

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

widlarizer
Copy link
Collaborator

Modules being flattened improves QoR in practice. It also makes the yosys runtime take much longer.

This PR creates cost.cc with linear cost models for almost all internal cell types to estimate the size of a module after techmapping. To get most of these numbers, I used a modified version of the test_cell command, see emil/gather-cell-size.

This PR also adds the keep_hierarchy pass which marks all selected modules with that attribute and has an optional -max_cost integer argument which sets a maximum estimated cost threshold.

6. Executing keep_hierarchy pass.
Marking top.cpu (module too big: 8880 > 100).
Marking top.cpu.alu (module too big: 2068 > 100).
Marking top.cpu.brancher (module too big: 515 > 100).
Marking top.cpu.datamem (module too big: 1235 > 100).
Marking top.cpu.imm_decoder (module too big: 226 > 100).
Marking top.cpu.regfile (module too big: 4631 > 100).
<suppressed ~635 debug messages>

7. Executing FLATTEN pass (flatten design).
Keeping top.cpu.regfile (found keep_hierarchy attribute).
Keeping top.cpu.imm_decoder (found keep_hierarchy attribute).
Keeping top.cpu.datamem (found keep_hierarchy attribute).
Keeping top.cpu.brancher (found keep_hierarchy attribute).
Keeping top.cpu.alu (found keep_hierarchy attribute).
Deleting now unused module top.cpu.controller.
Deleting now unused module top.cpu.writeback.
<suppressed ~2 debug messages>

Effects on runtime and QoR with OpenROAD: TBD

@whitequark
Copy link
Member

This PR creates cost.cc with linear cost models for almost all internal cell types to estimate the size of a module after techmapping.

I'm really interested in your methodology here--can you explain it in detail?

@widlarizer
Copy link
Collaborator Author

widlarizer commented Apr 18, 2024

  • squash before merge
  • add mul, div

@widlarizer
Copy link
Collaborator Author

I'm really interested in your methodology here--can you explain it in detail?

@whitequark Sure: I used test_cell to generate random sizes of cells and to techmap them. Then I parse the dump and stats to get the gate count and input/output widths. I then stared at plots like this. I looked for upper bounds in gate count with regards to output port width, sum of all port widths, and the largest input port width. This worked better than I expected. For cells unsupported by test_cell I looked at their techmap or semantics. There's a TODO comment for ones where I'm not sure if I should expect them at the point right before flattening, namely, FSM and memory.

image

kernel/cost.cc Outdated

static unsigned int y_coef(RTLIL::IdString type)
{
// clang-format off
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew. This is convincing me more and more that we should not use clang-format...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that it doesn't fit our repository. This really should have been a switch/case, but ID($...) isn't a constant value. Then I wouldn't have to do this. Personally I'm very used to hitting Ctrl+Shift+I to format an entire file I'm working on in VS Code. For shared files I intend to get used to formatting modified lines only, which VS Code does allow me to set a shortcut to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use type.in(...) which may work better.

kernel/cost.h Outdated
Comment on lines 61 to 62
{ ID($_DFF_P_), 1 },
{ ID($_DFF_N_), 1 },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add all of the DFF and latch types here, but I don't know what a reasonable cost estimate for them would be (1 seems off).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look at the sky130hd and asap7 areas for those and the other cells

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that's out of scope. Want me to make an issue? I added these only because I noticed stat.cc was patching these on its side so I consolidated it into here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is using these costs? Is there any information about where they come from and what they are supposed to model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stat.cc uses CMOS transistor estimates of these (16). Nothing uses default gate count estimates of these. I was using this logic: by being primitives to be techmapped to, they have a default gate count of 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by being primitives to be techmapped to, they have a default gate count of 1

I am not sure I follow. Wouldn't that make the cost 1 for all the primitives in the list?

kernel/cost.cc Outdated
} else if (// shift
type == ID($shift) ||
type == ID($shiftx)) {
return 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this twice the cost of $shr? I'm very unconvinced that the cost model is sound.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll inspect some techmapped examples of these gates on Monday

Copy link
Member

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite interested in the functionality of this PR but in order to be convinced that it's a good addition it will require the following:

  • A definition of what the costs mean, as well as a clear statement of who they are suitable for and who they are not.
  • A clear methodology for calculating the cost, which is not a part of some random script in someone else's repository, but a part of Yosys itself. This includes both:
    • A description of the methodology in prose.
    • An executable that calculates the costs according to it.

@widlarizer
Copy link
Collaborator Author

An executable that calculates the costs according to it

This is pretty intense for the current sole use case which is as a heuristic flattening only modules that aren't huge. For what it's worth, if these were all set to sum*1 except for mul+div, it probably would achieve similar behavior (I can try this out now that I got openroad benchmarks running locally). I can't anticipate other possible uses so I used realistic coefficients which I arrived at by staring at those plots. But it's not supposed to capture perfectly the simplest possible upper bound. A simple executable may even fail to get a reasonable coefficient randomly due to constants when something changes in techmap down the line

@widlarizer widlarizer requested a review from zachjs as a code owner May 7, 2024 12:39
@widlarizer
Copy link
Collaborator Author

Sorry for the spam @zachjs, I accidentally committed changes in ast that I only used to play around

@widlarizer widlarizer removed the request for review from zachjs May 7, 2024 12:44
@widlarizer
Copy link
Collaborator Author

widlarizer commented May 7, 2024

@whitequark test_cell is now capable of checking whether the cost is a correct post techmap gate count upper bound. This means the coefficients aren't generated programmatically, but at least they are verified, at least for cells covered by test_cell functionality. Use case example: test_cell -noeval -nosat -check_cost -bloat 4 all. I have also split it away from existing default_gate_cost beyond a simple check. Let me know what you think.

  • check why some cells (comparison, $sop) are failing and whether it's a constant offset deal
  • find a cost model for $lut
  • add statistics result (share of each cell type's instances that are larger than was estimated, and by how much was the worst/typical offender, in relative and absolute numbers)

Open questions:

  • can it be used in testing, with its rather intense run time to generate a significant number of cells?

@whitequark
Copy link
Member

test_cell is now capable of checking whether the cost is a correct post techmap gate count upper bound. This means the coefficients aren't generated programmatically, but at least they are verified, at least for cells covered by test_cell functionality.

That is much better! I'll take a closer look a bit later.

@povik
Copy link
Member

povik commented May 7, 2024

find a cost model for $lut

I propose we supply a conservative upper bound based on the width of the $lut cell alone, that is, we shouldn't bother with inspecting the LUT parameter. Usually $lut are not a cell entered by the user but are a product of lut-mapping passes only, so there isn't much of a use case where someone would be applying the cost model to it. For that reason let's go with what's easiest but correct.

@widlarizer
Copy link
Collaborator Author

Current status and intended usage. I think I'll have to leave it as is for now. I think it's good enough as a heuristic and should move on to more pressing topics

test_cell -noeval -nosat -bloat 4 -check_cost \$not \$pos \$neg \$and \$or \$xor \$xnor \$reduce_and \$reduce_or \$reduce_xor \$reduce_xnor \$reduce_bool \$shl \$shr \$sshl \$sshr \$shift \$shiftx \$lt \$le \$eq \$ne \$ge \$gt \$add \$sub \$logic_not \$logic_and \$logic_or
Warning: Cell type $shl failed in 3.0% cases with worst offender being by 12 (120.0%)
Warning: Cell type $sshl failed in 2.0% cases with worst offender being by 11 (110.0%)

test_cell -noeval -nosat -bloat 1 -check_cost \$mux \$bmux \$demux
Warning: Cell type $demux failed in 8.0% cases with worst offender being by 30 (23.4%)

test_cell -noeval -nosat -check_cost -bloat 2 \$mul \$div \$mod \$divfloor \$modfloor 
Warning: Cell type $div failed in 1.0% cases with worst offender being by 66 (6.7%)
Warning: Cell type $modfloor failed in 3.0% cases with worst offender being by 433 (33.8%)

test_cell -noeval -nosat -bloat 2 -check_cost \$alu \$lcu \$fa
Warning: Cell type $alu failed in 20.0% cases with worst offender being by 13 (10.2%)

test_cell -noeval -nosat -noopt -bloat 2 -check_cost \$lut \$sop

@widlarizer widlarizer added the status-needs-review Status: Needs reviewers to move forward label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-needs-review Status: Needs reviewers to move forward
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants