Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add constant network support to FPGA interchange arch #591

Merged
merged 15 commits into from
Feb 25, 2021

Conversation

litghost
Copy link
Contributor

@litghost litghost commented Feb 19, 2021

This adds initial constant network support (along side changes in chipsalliance/python-fpga-interchange#23). I believe logic needs to be added to combine GND/VCC cells/nets, but this doesn't appear to matter with the initial const wire designs. Once I have an example netlist with multiple GND/VCC cells/nets, I'll add the remaining logic.

@litghost litghost changed the title Add constant network Add constant network support to FPGA interchange arch Feb 19, 2021
@gatecat
Copy link
Member

gatecat commented Feb 19, 2021

Approach here looks good. Some more work might be needed in the future to improve performance of constant routing where you have a single wire with a very large number of pips fanning off it; the backwards routing heuristic in router2 works well for this but doesn't currently deal with congestion which will fall back to regular forwards routing which will have to iterate over a lot of pips.

This was more of a problem for xcup rather than xc7 in my nextpnr-xilinx experience, due to GND constants needing to come from a nearby LUT creating more possibilities for congestion (also xcup devices just being larger). One thing I did was to have a three level "global->row->tile" type structure for constant wires so the router could make targeted progress; but another solution would be to special-case this in the router, for example always using backwards routing for constants and extending the backwards router to work around congestion.

@litghost
Copy link
Contributor Author

the backwards routing heuristic in router2 works well for this but doesn't currently deal with congestion which will fall back to regular forwards routing which will have to iterate over a lot of pips.

My plan is to have the site router push all constants to the site pins, which will address this explicitly. Because the site router ensures a congestion free site routing, and dedicated site constant sources are preferred over routed constant sources, and because of site inverter <-> constant net relationships, handling this in the site router should avoid any congestion.

@litghost
Copy link
Contributor Author

litghost commented Feb 19, 2021

This was more of a problem for xcup rather than xc7 in my nextpnr-xilinx experience, due to GND constants needing to come from a nearby LUT creating more possibilities for congestion (also xcup devices just being larger). One thing I did was to have a three level "global->row->tile" type structure for constant wires so the router could make targeted progress; but another solution would be to special-case this in the router, for example always using backwards routing for constants and extending the backwards router to work around congestion.

Ya, this might cause some problems. The current constant network does not have pseudo site pips to the LUT outputs, though it needs them for US. xc7 graphs are easier in this regard. I think this is a case of "crawl", "walk", "run", e.g. fix it once we can demostrate the issue.

Overall I think the constant network structure is transparent to the arch, and can be tweaked as needed as problems arise.

@litghost litghost force-pushed the add_constant_network branch 4 times, most recently from 94a0495 to cd6d05d Compare February 23, 2021 21:58
@litghost
Copy link
Contributor Author

@gatecat This is ready for review when you get a chance. I've gotten a counter design mostly working, but I need to fix LUT rotation / port sharing before that is ready.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Also add debug_test target to debug archcheck.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Fixes:
 - Only use map constant pins during routing, and not during placement.
 - Unmapped cell ports have no BEL pins.
 - Fix SiteRouter congestion not taking into account initial expansion.
 - Fix psuedo-site pip output.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
delay_t base = 30 * std::min(std::abs(dst_x - src_x), 18) + 10 * std::max(std::abs(dst_x - src_x) - 18, 0) +
60 * std::min(std::abs(dst_y - src_y), 6) + 20 * std::max(std::abs(dst_y - src_y) - 6, 0) + 300;

base = (base * 3) / 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this lookahead to be more effective, I would also suggest returning a pip delay on the order of 100ps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

case IN_ROUTING:
NPNR_ASSERT(wire_data.site != -1);
if (wire.tile == src_wire.tile && wire_data.site == src_wire_data.site) {
// Dedicated routing won't have straight loops,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is a universal truth? It does seem like a good heuristic, but I'm wondering if there are places where combinations of routethroughs/loopbacks might violate this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need to see an example of where a straight loop forms a dedicated interconnect (e.g. placement dependent). All the cases I know are site to another site. Do you know of any counter examples?

The good news is if this logic fails on an arch, it will manifest eventually as a routing failure, which will generate a bug report, rather than an optimization failure.

// Do detailed routing check to ensure driver can reach sink.
//
// FIXME: This might be too slow, but it handles a case on
// SLICEL.COUT -> SLICEL.CIN has delta_y = {1, 2}, but the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a problem when we start inserting relative placement constraints (which should give much better performance than essentially relying on trial and error) as these require fixed dx/dy values. nextpnr-xilinx punted on this but I think a way of specifying the "next" bel/site might make sense as a future API improvement.

Copy link
Contributor Author

@litghost litghost Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that relative placement constraints should be emitted where possible. Something that is tricky here is that the use of dedicated interconnect is not 1-to-1 with placement constraints. Carry chains and DSP chains are good candidates for relative placement constraints, but the IFF/OFF cases are a good example where it should be up to the placer to determine if it wants to use the IFF/OFF. It might make more sense to use a SLICEL/M FF over an IO FF depending on the shape and fanout of the signals. Just because an IO FF is possible doesn't mean we should constraint the solution to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an interesting thing to think about. Somewhat related is the dedicated LUT->FF routing which you want to use if you can; but definitely don't want to force all the time.

One idea I had in Nexus was a separate post-place pass that performs low-distance swaps only with the goal of using this dedicated interconnect (currently only LUT-FF, I've been meaning to look at doing similar with the dedicated fast LUT-LUT pattern that Nexus has too): https://github.com/YosysHQ/nextpnr/blob/master/nexus/post_place.cc

A generic version of that pass as a post-place optimisation to opportunistically exploit dedicated routing, by swapping things within a site or between adjacent sites, could definitely improve things from a routeability/timing point of view down the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an interesting thing to think about. Somewhat related is the dedicated LUT->FF routing which you want to use if you can; but definitely don't want to force all the time.

Yep, and same story with LUT -> CARRY and CARRY -> FF placement. In general, there needs to be a way to align site local placement to exploit specialized site routing to achieve maximal packing density. Once I debug the FPGA arch site router for correctness, I believe this is going to be an important improvement during HeAP legalization to achieve really good results.

continue;
}

// This net doesn't have a driver, probably not valid?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undriven nets aren't necessarily a problem (for example if you connect a net to an input but don't drive it); in other arches we tend to skip over them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you mean to an output, but don't drive it. A disconnected cell primitive input is a bug in most cases I can think of. It must be either 1/0/signal, rather than "unknown".

continue;
}

for (size_t i = 0; i < bel_data.num_bel_wires; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed signedness compare warning (I'm minded to tighten up the compiler flags so more of these things become CI failures, not sure what your opinion is on that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tighten up the flags. We also should emit -Wxxx on local builds. I'm using clang and it is capable of emitting the same warnings as the CI, but that isn't the current default (I believe).

auto downhill_iter = pip_downhill.find(wire);
if(downhill_iter == pip_downhill.end()) {
if(root_wire != wire) {
log_warning("Wire %s never entered the real fabric?\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a warning or an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a warning right now because the constant merging logic always emits the VCC/GND root, even if unused. Then when walking that root, it won't enter the real fabric. In the long term we can prune away an unused VCC/GND root and make this an error again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair as a temporary thing; from an end-user point of view I don't much like the idea of a warning that isn't immediately end-user-actionable but I imagine it will be long gone by the time this is end-user-read.

@gatecat
Copy link
Member

gatecat commented Feb 24, 2021

my attempt to build the bba locally is failing with:

capnp.lib.capnp.KjException: home/david/nextpnr-master/fpga_interchange/examples/create_bba/build/fpga-interchange-schema/interchange/DeviceResources.capnp:546: failed: Union must have at least two members.

I think this is related to the LUT init stuff based on the line number; and checking out the fpga-interchange-schema commit before that PR was merged seems to work OK.

@litghost
Copy link
Contributor Author

my attempt to build the bba locally is failing with:

capnp.lib.capnp.KjException: home/david/nextpnr-master/fpga_interchange/examples/create_bba/build/fpga-interchange-schema/interchange/DeviceResources.capnp:546: failed: Union must have at least two members.

I think this is related to the LUT init stuff based on the line number; and checking out the fpga-interchange-schema commit before that PR was merged seems to work OK.

Ya, I found this too and need to fix it. Need to stand up a CI on fpga-interchange-schema to catch this kind of stuff.

@litghost
Copy link
Contributor Author

my attempt to build the bba locally is failing with:

capnp.lib.capnp.KjException: home/david/nextpnr-master/fpga_interchange/examples/create_bba/build/fpga-interchange-schema/interchange/DeviceResources.capnp:546: failed: Union must have at least two members.

I think this is related to the LUT init stuff based on the line number; and checking out the fpga-interchange-schema commit before that PR was merged seems to work OK.

I've fixed the bug and stood up a simple CI to find these issues with chipsalliance/fpga-interchange-schema#13

@gatecat gatecat merged commit ab8dfcf into YosysHQ:master Feb 25, 2021
@litghost litghost deleted the add_constant_network branch February 25, 2021 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants