-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add constant network support to FPGA interchange arch #591
Conversation
Approach here looks good. Some more work might be needed in the future to improve performance of constant routing where you have a single wire with a very large number of pips fanning off it; the backwards routing heuristic in router2 works well for this but doesn't currently deal with congestion which will fall back to regular forwards routing which will have to iterate over a lot of pips. This was more of a problem for xcup rather than xc7 in my nextpnr-xilinx experience, due to GND constants needing to come from a nearby LUT creating more possibilities for congestion (also xcup devices just being larger). One thing I did was to have a three level "global->row->tile" type structure for constant wires so the router could make targeted progress; but another solution would be to special-case this in the router, for example always using backwards routing for constants and extending the backwards router to work around congestion. |
My plan is to have the site router push all constants to the site pins, which will address this explicitly. Because the site router ensures a congestion free site routing, and dedicated site constant sources are preferred over routed constant sources, and because of site inverter <-> constant net relationships, handling this in the site router should avoid any congestion. |
Ya, this might cause some problems. The current constant network does not have pseudo site pips to the LUT outputs, though it needs them for US. xc7 graphs are easier in this regard. I think this is a case of "crawl", "walk", "run", e.g. fix it once we can demostrate the issue. Overall I think the constant network structure is transparent to the arch, and can be tweaked as needed as problems arise. |
94a0495
to
cd6d05d
Compare
@gatecat This is ready for review when you get a chance. I've gotten a counter design mostly working, but I need to fix LUT rotation / port sharing before that is ready. |
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Also add debug_test target to debug archcheck. Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Fixes: - Only use map constant pins during routing, and not during placement. - Unmapped cell ports have no BEL pins. - Fix SiteRouter congestion not taking into account initial expansion. - Fix psuedo-site pip output. Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
cd6d05d
to
a30043c
Compare
delay_t base = 30 * std::min(std::abs(dst_x - src_x), 18) + 10 * std::max(std::abs(dst_x - src_x) - 18, 0) + | ||
60 * std::min(std::abs(dst_y - src_y), 6) + 20 * std::max(std::abs(dst_y - src_y) - 6, 0) + 300; | ||
|
||
base = (base * 3) / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this lookahead to be more effective, I would also suggest returning a pip delay on the order of 100ps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
case IN_ROUTING: | ||
NPNR_ASSERT(wire_data.site != -1); | ||
if (wire.tile == src_wire.tile && wire_data.site == src_wire_data.site) { | ||
// Dedicated routing won't have straight loops, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is a universal truth? It does seem like a good heuristic, but I'm wondering if there are places where combinations of routethroughs/loopbacks might violate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to see an example of where a straight loop forms a dedicated interconnect (e.g. placement dependent). All the cases I know are site to another site. Do you know of any counter examples?
The good news is if this logic fails on an arch, it will manifest eventually as a routing failure, which will generate a bug report, rather than an optimization failure.
// Do detailed routing check to ensure driver can reach sink. | ||
// | ||
// FIXME: This might be too slow, but it handles a case on | ||
// SLICEL.COUT -> SLICEL.CIN has delta_y = {1, 2}, but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a problem when we start inserting relative placement constraints (which should give much better performance than essentially relying on trial and error) as these require fixed dx/dy values. nextpnr-xilinx punted on this but I think a way of specifying the "next" bel/site might make sense as a future API improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that relative placement constraints should be emitted where possible. Something that is tricky here is that the use of dedicated interconnect is not 1-to-1 with placement constraints. Carry chains and DSP chains are good candidates for relative placement constraints, but the IFF/OFF cases are a good example where it should be up to the placer to determine if it wants to use the IFF/OFF. It might make more sense to use a SLICEL/M FF over an IO FF depending on the shape and fanout of the signals. Just because an IO FF is possible doesn't mean we should constraint the solution to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an interesting thing to think about. Somewhat related is the dedicated LUT->FF routing which you want to use if you can; but definitely don't want to force all the time.
One idea I had in Nexus was a separate post-place pass that performs low-distance swaps only with the goal of using this dedicated interconnect (currently only LUT-FF, I've been meaning to look at doing similar with the dedicated fast LUT-LUT pattern that Nexus has too): https://github.com/YosysHQ/nextpnr/blob/master/nexus/post_place.cc
A generic version of that pass as a post-place optimisation to opportunistically exploit dedicated routing, by swapping things within a site or between adjacent sites, could definitely improve things from a routeability/timing point of view down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an interesting thing to think about. Somewhat related is the dedicated LUT->FF routing which you want to use if you can; but definitely don't want to force all the time.
Yep, and same story with LUT -> CARRY and CARRY -> FF placement. In general, there needs to be a way to align site local placement to exploit specialized site routing to achieve maximal packing density. Once I debug the FPGA arch site router for correctness, I believe this is going to be an important improvement during HeAP legalization to achieve really good results.
continue; | ||
} | ||
|
||
// This net doesn't have a driver, probably not valid? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undriven nets aren't necessarily a problem (for example if you connect a net to an input but don't drive it); in other arches we tend to skip over them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you mean to an output, but don't drive it. A disconnected cell primitive input is a bug in most cases I can think of. It must be either 1/0/signal, rather than "unknown".
continue; | ||
} | ||
|
||
for (size_t i = 0; i < bel_data.num_bel_wires; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixed signedness compare warning (I'm minded to tighten up the compiler flags so more of these things become CI failures, not sure what your opinion is on that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tighten up the flags. We also should emit -Wxxx
on local builds. I'm using clang
and it is capable of emitting the same warnings as the CI, but that isn't the current default (I believe).
auto downhill_iter = pip_downhill.find(wire); | ||
if(downhill_iter == pip_downhill.end()) { | ||
if(root_wire != wire) { | ||
log_warning("Wire %s never entered the real fabric?\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a warning or an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a warning right now because the constant merging logic always emits the VCC/GND root, even if unused. Then when walking that root, it won't enter the real fabric. In the long term we can prune away an unused VCC/GND root and make this an error again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair as a temporary thing; from an end-user point of view I don't much like the idea of a warning that isn't immediately end-user-actionable but I imagine it will be long gone by the time this is end-user-read.
my attempt to build the bba locally is failing with:
I think this is related to the LUT init stuff based on the line number; and checking out the fpga-interchange-schema commit before that PR was merged seems to work OK. |
Ya, I found this too and need to fix it. Need to stand up a CI on |
I've fixed the bug and stood up a simple CI to find these issues with chipsalliance/fpga-interchange-schema#13 |
This adds initial constant network support (along side changes in chipsalliance/python-fpga-interchange#23). I believe logic needs to be added to combine GND/VCC cells/nets, but this doesn't appear to matter with the initial const wire designs. Once I have an example netlist with multiple GND/VCC cells/nets, I'll add the remaining logic.