Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xilinx::pipeline not providing performance #136

Open
ShanoToni opened this issue Sep 17, 2021 · 6 comments
Open

xilinx::pipeline not providing performance #136

ShanoToni opened this issue Sep 17, 2021 · 6 comments

Comments

@ShanoToni
Copy link

Hello,
I am interested in the performance provided by the xilinx::pipeline and xilinx::dataflow from the pipeline test single_task_vector_add_drt_dataflow_func_local_pipeline. I have modified it to produce timing data of the kernel execution and I am comparing it to the same test but with the pipeline and dataflow disabled. Both versions of the test seem to perform with the same timing.
Is this due to some missing support for pipelining from the platform I am targeting?
I used :

  • The origin/sycl/unified/master branch with the df17dc0 commit.
  • Xilinx Zynq UltraScale+ MPSoC ZCU106.
  • Vitis 2020.2
  • XRT 2.8.0
@lforg37
Copy link
Contributor

lforg37 commented Sep 17, 2021

Hello,

I think on this example (I guess you speak of test single_task_vector_add.cpp ?), timing will probably not change a lot between pipelined or not pipelined kernel (the body of the loop is very simple so without pipelining it should not cost more than a few cycles, and as there is also few iterations the timing is probably dominated by the kernel setup time).

Besides, it is possible that the backend chooses to pipeline the kernel even without the annotation.

You can check on the report vxx_comp_report/kernel_name/hls_report/kernel_name_csynth.rpt if the kernel has been pipelined.

Lastly, the xilinx::pipeline has gone under modification recently. These modifications have not landed on sycl/unified/master yet and can be tested with the branch sycl/unified/next.

If the test is indeed the single_task_vector_add.cpp I don't think there is something to gain by using the dataflow annotation.

@ShanoToni
Copy link
Author

Hello,
Thank you for the quick reply, this is the test I ran.
Is it safe to assume that it is too simple as well to produce a timing difference?

@lforg37
Copy link
Contributor

lforg37 commented Sep 22, 2021

Hello,

Thank you for the precision.
After a second read on the setting you say you are using, I am curious to know how exactly you managed to compile the test for ZCU106 ?

For the timing, you can find in the report I pointed the exact kernel execution time (latency x clock frequency).
You can compare the measured execution time with it to get an idea of the overhead incurred by setting up the system.

@ShanoToni
Copy link
Author

Hello,

Apologies for the delayed reply.

To get the test compiling I used the setup described in the docs. I used the sysroot, includes and libs from the petalinux project sysroot for the board. I also cross compiled the libsycl.so and libpi_opencl.so for aarch64 for running on the board's architecture. I encountered some unsupported functionality for aarch64 in sycl/source/detail/platform_util.cpp, which did not seem important for the tests so the #define body for aarch64 and ARM was temporarily removed for cross compiling the libraries.

I wanted to ask some follow up questions regarding the pipelining if possible.

If the xilinx backend can perform loop pipelining what is the purpose of adding the annotation?
Also I wanted to ask for some more information regarding _ssdm_op_SpecDataflowPipeline? In what way does this function annotate the dataflow? Is it something recognised by the xilinx compiler to enable the dataflow optimization?

Thank you.

@lforg37
Copy link
Contributor

lforg37 commented Sep 27, 2021

Hello,

If the xilinx backend can perform loop pipelining what is the purpose of adding the annotation?

Using the annotation you can force the backend to pipeline or forbid it to pipeline a code section, and avoid having to rely on the backend heuristic to choose if the code has to be pipelined or not.

Also I wanted to ask for some more information regarding _ssdm_op_SpecDataflowPipeline? In what way does this function annotate the dataflow? Is it something recognised by the xilinx compiler to enable the dataflow optimization?

I think it was a built-in recognized by the HLS backend.
A PR (#132) is ongoing to change the annotation format for something similar to what is done for pipelining.

In order to help keeping issues tidy, can you check with the HLS report that the pipeline decoration was taken into account ? That way we can either correct the pipeline annotation or close the issue.
Feel free to open a discussion for non-issue related topics.

@keryell
Copy link
Member

keryell commented Oct 5, 2021

This is quite interesting work!
We have not tried SYCL on MPSoC for years because when we tried, all the software stack was too old for SYCL and it was a pain to get just some minimal stuff working.
But now that the latest Ubuntu is supported https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/1413611532/Canonical+Ubuntu it looks like we should come back on it.
It would be great to update our current SYCL recipe with your recipe, so other people can benefit from your experience! :-)
At least you should try with latest Vitis 2021.1 and the latest XRT from master branch.
#132 should land soon to improve the performance.
The examples you looked at from https://github.com/triSYCL/triSYCL/blob/master/tests/pipeline are super old, targeting our first implementation with a lot of workarounds due to the limited implementation. Now we have switched to the oneAPI DPC++ SYCL device compiler, so you should look at the examples from https://github.com/triSYCL/sycl/tree/sycl/unified/master/sycl/test/on-device/xocc instead.
Feel free to contribute examples too. :-)
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants