Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CMake system to enable ROI-less xc7 graphs that share arches. #1143

Merged
merged 8 commits into from
Nov 13, 2019

Conversation

litghost
Copy link
Contributor

@litghost litghost commented Nov 8, 2019

Previously some xc7 properties were attached to devices, but belonged to
the board. This is now fixed.

Moving the PINMAP to the board is less obvious, but I think late binding
the PINMAP allows for less confusion, and more flexiblity. This does
result in some data duplication, but not a lot.

@probot-autolabeler probot-autolabeler bot added arch-artix7 arch-ice40 Issue related to the iCE40 architecture description. type-utils Issues is related to the scripts inside the repo. labels Nov 8, 2019
@litghost litghost force-pushed the refactor_xc7_cmake branch 5 times, most recently from 3000823 to f0b4c14 Compare November 9, 2019 02:01
Previously some xc7 properties were attached to devices, but belonged to
the board.  This is now fixed.

Moving the PINMAP to the board is less obvious, but I think late binding
the PINMAP allows for less confusion, and more flexiblity.  This does
result in some data duplication, but not a lot.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
@litghost
Copy link
Contributor Author

litghost commented Nov 11, 2019

Looks like going from A* factor of 1 to 1.2 doesn't consistently work with the ROI-less picosoc. I'll try returning the A* factor to 1.0, and will investigate this week how/why the lookahead is mispredicting.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Copy link
Contributor

@mithro mithro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment.

xc7/make/device_define.cmake Show resolved Hide resolved
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
@litghost litghost requested a review from mithro November 11, 2019 18:24
@litghost
Copy link
Contributor Author

Looks like going from A* factor of 1 to 1.2 doesn't consistently work with the ROI-less picosoc. I'll try returning the A* factor to 1.0, and will investigate this week how/why the lookahead is mispredicting.

It looks only a handful of connections are causing a ton of problems, example breakdown:

Iter 55, took 3269.8, re-routed 173 connections
Longest net, from rr node 2001160, took 1317.119873
Longest connection, from rr node 2001160 to 1971067, took 1317.119873
 Conn #1, from 2001160 to 1971067 took 1317.119873
 Conn #2, from 2841252 to 1970557 took 566.71051
 Conn #3, from 1025215 to 1975204 took 428.923981
 Conn #4, from 1682532 to 1970566 took 402.129486
 Conn #5, from 1827062 to 1972648 took 133.416046
 Conn #6, from 1987438 to 1972119 took 129.629654
 Conn #7, from 1757870 to 1892387 took 111.342422
 Conn #8, from 3031043 to 1980022 took 41.527199
 Conn #9, from 1979875 to 1982646 took 34.066013
 Conn #10, from 1501475 to 1970821 took 27.17885
 Conn #11, from 1925015 to 1980174 took 23.920551
 Conn #12, from 1716794 to 1970550 took 10.346669
 Conn #13, from 1979877 to 1980013 took 7.11358
 Conn #14, from 3063121 to 1970815 took 2.852495
 Conn #15, from 1025215 to 1974424 took 2.358453
 Conn #16, from 1980640 to 1980612 took 2.183119
 Conn #17, from 2841252 to 3008757 took 2.105423
 Conn #18, from 1979877 to 1979863 took 1.910496
 Conn #19, from 1979877 to 1979844 took 1.895307
 Conn #20, from 1732417 to 1983448 took 1.556619
 Conn #21, from 2997436 to 2997425 took 1.542244
 Conn #22, from 1025215 to 1974443 took 1.504647
 Conn #23, from 1979875 to 1981411 took 1.322187
 Conn #24, from 1484555 to 1972794 took 1.068602
Iter 56, took 2004.8, re-routed 147 connections
Longest net, from rr node 2001160, took 1140.169678
Longest connection, from rr node 2001160 to 1971067, took 1140.169678
 Conn #1, from 2001160 to 1971067 took 1140.169678
 Conn #2, from 1025215 to 1975204 took 414.741058
 Conn #3, from 1827062 to 1972648 took 175.234192
 Conn #4, from 1987438 to 1972119 took 144.330338
 Conn #5, from 1979875 to 1982646 took 39.589561
 Conn #6, from 1885280 to 1980162 took 15.359474
 Conn #7, from 878 to 1892702 took 10.964668
 Conn #8, from 878 to 1892420 took 10.849192
 Conn #9, from 878 to 1892718 took 10.651355
 Conn #10, from 1716794 to 1970550 took 10.120596
 Conn #11, from 1716794 to 1970555 took 9.104782
 Conn #12, from 1979877 to 1980013 took 3.608731
 Conn #13, from 1979877 to 1979863 took 2.945437
 Conn #14, from 1979877 to 1979844 took 2.757559
 Conn #15, from 3014343 to 1892368 took 2.387583
 Conn #16, from 3014343 to 1754073 took 1.599001
 Conn #17, from 2997446 to 1983449 took 1.464067
 Conn #18, from 1983473 to 1770564 took 1.368437
 Conn #19, from 1979875 to 1981411 took 1.270326
Iter 57, took 2957.1, re-routed 147 connections
Longest net, from rr node 2001160, took 1677.382812
Longest connection, from rr node 2001160 to 1971067, took 1677.382812
 Conn #1, from 2001160 to 1971067 took 1677.382812
 Conn #2, from 1025215 to 1975204 took 405.368744
 Conn #3, from 3170213 to 1980014 took 368.212677
 Conn #4, from 1827062 to 1972648 took 147.129105
 Conn #5, from 1987438 to 1972119 took 128.772491
 Conn #6, from 3058129 to 1975212 took 124.699562
 Conn #7, from 1979875 to 1982646 took 38.07365
 Conn #8, from 1716794 to 1970550 took 10.980624
 Conn #9, from 3014343 to 1892682 took 10.962498
 Conn #10, from 1979877 to 1980013 took 5.456196
 Conn #11, from 3058129 to 2857001 took 4.710027
 Conn #12, from 1980640 to 1980612 took 3.757384
 Conn #13, from 1748989 to 1748955 took 2.493656
 Conn #14, from 1748989 to 1970810 took 2.084272
 Conn #15, from 1983473 to 1486627 took 2.03596
 Conn #16, from 3058129 to 1973263 took 1.978131
 Conn #17, from 3058129 to 1548521 took 1.939974
 Conn #18, from 3014343 to 1892368 took 1.922059
 Conn #19, from 3014343 to 1754073 took 1.610636
 Conn #20, from 1732417 to 1983448 took 1.541999
 Conn #21, from 3047323 to 1748961 took 1.20365
 Conn #22, from 1484555 to 1972794 took 1.029835
 Conn #23, from 1983473 to 1697925 took 1.01546
Iter 58, took 2808.4, re-routed 65 connections
Longest net, from rr node 2001160, took 1831.375
Longest connection, from rr node 2001160 to 1971067, took 1831.375
 Conn #1, from 2001160 to 1971067 took 1831.375
 Conn #2, from 1025215 to 1975204 took 291.001709
 Conn #3, from 1827062 to 1972648 took 219.058334
 Conn #4, from 1987438 to 1972119 took 122.890549
 Conn #5, from 1528937 to 1974444 took 120.361565
 Conn #6, from 1528937 to 1468244 took 50.575531
 Conn #7, from 2836972 to 1972486 took 48.110752
 Conn #8, from 2978074 to 1980175 took 43.730225
 Conn #9, from 1528937 to 2837721 took 26.026325
 Conn #10, from 1716794 to 1970550 took 12.165534
 Conn #11, from 1716794 to 1970555 took 9.478533
 Conn #12, from 1528937 to 3084970 took 7.44284
 Conn #13, from 1689724 to 1980015 took 7.008536
 Conn #14, from 1528937 to 1768625 took 3.332038
 Conn #15, from 1528937 to 1025362 took 1.98669
 Conn #16, from 1528937 to 1761097 took 1.680012
 Conn #17, from 1484555 to 1972794 took 1.146072
 Conn #18, from 1528937 to 1175125 took 1.120285
Iter 59, took 2980.9, re-routed 56 connections
Longest net, from rr node 2001160, took 1692.658081
Longest connection, from rr node 2001160 to 1971067, took 1692.658081
 Conn #1, from 2001160 to 1971067 took 1692.658081
 Conn #2, from 2841252 to 1970557 took 368.760437
 Conn #3, from 1827062 to 1972648 took 236.767273
 Conn #4, from 1549226 to 1974418 took 221.925507
 Conn #5, from 1987438 to 1972119 took 129.36409
 Conn #6, from 1528937 to 1974444 took 87.313309
 Conn #7, from 3062830 to 1975223 took 55.239761
 Conn #8, from 1528937 to 1468262 took 35.208908
 Conn #9, from 1528937 to 1468244 took 35.009323
 Conn #10, from 1982232 to 1980163 took 33.474491
 Conn #11, from 2978074 to 1980175 took 30.544062
 Conn #12, from 1885280 to 1980162 took 20.154942
 Conn #13, from 1716794 to 1970550 took 10.197528
 Conn #14, from 1980640 to 1980612 took 3.294786
 Conn #15, from 1528937 to 1175125 took 2.384524
 Conn #16, from 1549226 to 1025186 took 2.110419
 Conn #17, from 1528937 to 1025362 took 1.96857
 Conn #18, from 2841252 to 1557202 took 1.159231
 Conn #19, from 1528937 to 1463807 took 1.076947
 Conn #20, from 1484555 to 1972794 took 1.046694
 Conn #21, from 2841252 to 1689844 took 1.021666
Iter 60, took 4266.3, re-routed 157 connections
Longest net, from rr node 1178890, took 1983.454346
Longest connection, from rr node 1178890 to 1982958, took 1983.454346
 Conn #1, from 1178890 to 1982958 took 1983.454346
 Conn #2, from 2001160 to 1971067 took 1709.618042
 Conn #3, from 1520723 to 1972909 took 203.362106
 Conn #4, from 1528937 to 1974444 took 125.199272
 Conn #5, from 1979875 to 1982646 took 41.219334
 Conn #6, from 1481287 to 1982643 took 38.050304
 Conn #7, from 1979871 to 1980168 took 33.476837
 Conn #8, from 1520723 to 1915166 took 25.352516
 Conn #9, from 1885280 to 1980162 took 22.013687
 Conn #10, from 1980635 to 1970829 took 20.022779
 Conn #11, from 1688619 to 1982945 took 17.965288
 Conn #12, from 1528937 to 1870375 took 5.413357
 Conn #13, from 1520723 to 789779 took 3.576865
 Conn #14, from 1520723 to 1026238 took 3.224516
 Conn #15, from 1979877 to 1980013 took 2.892133
 Conn #16, from 1520723 to 1485354 took 2.70949
 Conn #17, from 1528937 to 3084970 took 2.502971
 Conn #18, from 1528937 to 1025362 took 1.85784
 Conn #19, from 3014343 to 1892368 took 1.443727
 Conn #20, from 3014343 to 1754073 took 1.366749
 Conn #21, from 3047323 to 1748961 took 1.216168
 Conn #22, from 1983473 to 1486627 took 1.208369
 Conn #23, from 2997446 to 1983449 took 1.208232
 Conn #24, from 1528937 to 1761097 took 1.13201
 Conn #25, from 1486634 to 3047291 took 1.094791
 Conn #26, from 1983473 to 1983459 took 1.011126
Saw 60 iterations

I'm going to pick one sink rr (1971067) and try to figure out what is going wrong.

This path should only be used with explicitly BUFH buffers or during
clock routing from a BUFG.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
@probot-autolabeler probot-autolabeler bot added the lang-python Issue uses (or requires) Python language. label Nov 12, 2019
Copy link
Contributor

@acomodi acomodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, xc7 build is not yet finished though. I guess still for the routing issues of picosoc

@litghost
Copy link
Contributor Author

litghost commented Nov 12, 2019

LGTM, xc7 build is not yet finished though. I guess still for the routing issues of picosoc

Yep, I'm still working on how to get picosoc to route better.

@acomodi
Copy link
Contributor

acomodi commented Nov 13, 2019

I have noticed that some clock signals are being routed (IMO) incorrectly when I have run the OSERDES test with the full 50T. In fact, it happens that, instead of following the dedicated clock route, some of the clocks are being routed through logic, traversing several INT tiles before getting to the destination.
I am not sure whether this could be an issue related to the high routing run-time of picosoc, but definitely this is a behavior that should be avoided.
I think that the best way to let VPR route clock-related signals only using the clock tree is to delete the connections between the INT GCLK inputs to the GFAN pips, so that the clock signal will not be able to jump from one INT to another using non-dedicated routes.

Another reason to avoid this behavior is that, if we allow clock nets to be routed between INT tiles, it may happen that some clock signals cross two different clock regions without going through any clock buffer.

@litghost
Copy link
Contributor Author

I am not sure whether this could be an issue related to the high routing run-time of picosoc

This is not the issue

VPR route clock-related signals only using the clock tree is to delete the connections between the INT GCLK inputs to the GFAN pips, so that the clock signal will not be able to jump from one INT to another using non-dedicated routes.

I've fixed this behavior via 82ea6c1 . This prevents the graph from having a path from the general interconnect to the BUFH. This prevents general interconnect signals from using the clock network.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
@acomodi
Copy link
Contributor

acomodi commented Nov 13, 2019

I've fixed this behavior via 82ea6c1 . This prevents the graph from having a path from the general interconnect to the BUFH. This prevents general interconnect signals from using the clock network.

I had already locally applied 82ea6c1, but the problem is that clock signals coming from BUFHCEs can still be routed through GFANs in the interconnects.

Example of route that was produced by VPR:

BUFHCE.OUTPUT --> CLK_HROW_CK_BUFHCLK_L9
CLK_HROW_CK_BUFHCLK_L9 --> HCLK_LEAF_CLK_B_TOPL4
HCLK_LEAF_CLK_B_TOPL4 --> GCLK_L_B10_EAST
GCLK_L_B10_EAST --> GFAN0

From GFAN0 then it was spread among several more INT tiles before arriving at destination.
Now, I need to double-check whether this was actually foreseen by the design (e.g. assign ce = clkdiv_r && !CLKDIV), meaning that VPR correctly routes the clock signal.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
@litghost litghost merged commit a2fd433 into f4pga:master Nov 13, 2019
@litghost litghost deleted the refactor_xc7_cmake branch November 13, 2019 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-artix7 arch-ice40 Issue related to the iCE40 architecture description. lang-python Issue uses (or requires) Python language. type-utils Issues is related to the scripts inside the repo.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants