Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"target-spec.ini" missing "reg/ff"? #33

Closed
eeriecl opened this issue Dec 13, 2021 · 12 comments
Closed

"target-spec.ini" missing "reg/ff"? #33

eeriecl opened this issue Dec 13, 2021 · 12 comments

Comments

@eeriecl
Copy link

eeriecl commented Dec 13, 2021

I see the "target-spec.ini" has following configuration. but curiously missing "reg/ff"?

[specification]
frequency=100MHz
dsp=220
bram=280
lut=13300
@zslwyuan
Copy link

zslwyuan commented Dec 13, 2021

I see the "target-spec.ini" has following configuration. but curiously missing "reg/ff"?

[specification]
frequency=100MHz
dsp=220
bram=280
lut=13300

Hi @eeriecl I guess that the reason is that the number of LUT (i.e., LUT with 5 input) is the same as the number of FF on Xilinx Ultrascale/+ devices.

@eeriecl
Copy link
Author

eeriecl commented Dec 14, 2021

Absolutely not~
For Xilinx device after 45nm, the F/F:LUT is 2:1, so this is the "pre-allocated" hardware resource ratio. And further more, this is not the true ratio for a particular application.
And there's no LUT-5, there only LUT-4 and LUT-6

@hanchenye
Copy link
Collaborator

Hi @eeriecl, currently we only support to estimate the utilization of DSP and BRAM in the QoR estimator.

@eeriecl
Copy link
Author

eeriecl commented Dec 14, 2021

so why there's LUT specification/estimation?

@hanchenye
Copy link
Collaborator

I don't think the QoR estimator is consuming the LUT specification or reporting the LUT utilization. The LUT estimation is still an open topic -- several existing papers have investigated different approaches to improve the accuracy of the LUT estimation of Vivado HLS, but due to the optimization in the synthesis stage of downstream tools, such as Vivado, the estimation is still not quite accurate :( We are also slowly improving the QoR estimator, but if you are interested to add the LUT estimating feature, please let me know what I can help!

@Oxygen-Chu
Copy link

Oxygen-Chu commented Dec 15, 2021

what if feeding scalehls-generated rtl forward to vitis-hls and run high level synthesis, then back-annotate scalehls' report?
so scalehls' estimator is not needed, and the report is relatively accurate.

@hanchenye
Copy link
Collaborator

hanchenye commented Dec 15, 2021

Hi @Oxygen-Chu, this is definitely possible! Actually another paper from our group is triggering Vitis HLS to evaluate discovered design points. The pro of this approach is Vitis HLS is more accurate and comprehensive. However, the con is Vitis HLS takes at least 1-2 minutes to compile each design point (some complicated ones need more) while our estimator only needs like seconds.

We stated this in the ScaleHLS paper and made the trade-off here. However, I believe triggering Vitis HLS to guide the design space exploration of ScaleHLS is a valuable topic to investigate (worth a new paper) 😄

@Oxygen-Chu
Copy link

Oxygen-Chu commented Dec 15, 2021

i've tried scalehls, autosa, autobridge, merlin-compiler and some other polyhedral compilers these month, and found huge space to improve.
simply speaking, let a particular hls tool doing things alone, is far less accurate that co-working with vitis-hls. you talk about 1~2 waiting minutes, but i think it's absolutely worthy, since vitis-hls knows fpga architecture much better than any home-made estimators by far.

@hanchenye
Copy link
Collaborator

As we stated in the ScaleHLS paper:

The RTL generation downstream tools, such as Vivado HLS, can take minutes to hours to complete the compilation and to report the synthesis results, which (1) limits the total number of design points that can be evaluated during DSE, thus results in sub-optimal solutions and (2) significantly increases the DSE time to up to tens of hours.

Take a GEMM32 kernel as example, ScaleHLS can open a design space containing about 5 thousands design points. 1-2 minutes waiting time means the DSE can only explore less than 60 of them per hour, which can easily lead to incomprehensive DSE or long DSE time. We are having some on-going HLS projects in MLIR, such as CIRCT-HLS, which will help to reduce the compilation time to RTL in the future, making it more feasible to be used in the DSE.

@Oxygen-Chu
Copy link

the dse engine can use multi-core, multi-threading, since each searching-space is absolutely independent.
and cpu/ddr/ssd is become more and more stronger and cheaper.
my workstation consists of xeon-w3375, 1tb-ddr4, 4tb-980pro, fully satisfy multi-tasking space searching

@hanchenye
Copy link
Collaborator

@Oxygen-Chu Please see my early post:

This is definitely possible! Actually another paper from our group is triggering Vitis HLS to evaluate discovered design points ... I believe triggering Vitis HLS to guide the design space exploration of ScaleHLS is a valuable topic to investigate.

We did enable multi-threading in the recent development of that project (has not been open-sourced). As I mentioned, I believe cooperating with Vitis HLS is a valuable direction. Please try this approach if you are interested and I'd love to chat more on this.

@Oxygen-Chu
Copy link

@Oxygen-Chu Please see my early post:

We did enable multi-threading in the recent development of that project (has not been open-sourced). As I mentioned, I believe cooperating with Vitis HLS is a valuable direction. Please try this approach if you are interested and I'd love to chat more on this.

ok, i'd like to try more after solving issues ticketed by eeriecl (actually me :-> )
if you update, please let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants