Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build issue with RHEL8 #192

Open
jyoung3131 opened this issue Jul 15, 2022 · 8 comments
Open

Build issue with RHEL8 #192

jyoung3131 opened this issue Jul 15, 2022 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@jyoung3131
Copy link

Describe the bug
I am trying to build TriSYCL for Redhat 8, and I am running into an error where libpi_opencl requires libstdc++fs, but it doesn't seem to be linked into the standard CMake build. Is there a workaround I can use?

To Reproduce

export SYCL_HOME=$PWD
python3 $SYCL_HOME/llvm/buildbot/configure.py
python3 $SYCL_HOME/llvm/buildbot/compile.py
  1. Code snippet:
[1798/1998] Linking CXX shared library lib/libpi_opencl.so                                                                                                  
FAILED: lib/libpi_opencl.so                                                                                                                                 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings 
-Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Wall -Wextra -Wno-deprecated-declarations -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete -shared -Wl,-soname,libpi_opencl.so -o lib/libpi_opencl.so tools/sycl/plugins/opencl/CMakeFiles/pi_opencl.dir/pi_opencl.cpp.o  -Wl,-rpath,/net/netscratch/trisycl/rhel8_flubber5/llvm/build/lib:  lib/libOpenCL.so.1.2  -Wl,--version-script=//netscratch/trisycl/rhel8_flubber5/llvm/sycl/plugins/ld-version-script.txt  -lgcc_s  -lgcc  -ldl && :
/opt/rh/gcc-toolset-11/root/usr/bin/ld: tools/sycl/plugins/opencl/CMakeFiles/pi_opencl.dir/pi_opencl.cpp.o: in function `terminate_xsimk()':

pi_opencl.cpp:(.text._Z15terminate_xsimkv[_Z15terminate_xsimkv]+0x143): undefined reference to `std::filesystem::__cxx11::path::_M_split_cmpts()'
/opt/rh/gcc-toolset-11/root/usr/bin/ld: pi_opencl.cpp:
(.text._Z15terminate_xsimkv[_Z15terminate_xsimkv]+0x156): undefined reference to `std::filesystem::__cxx11::directory_iterator::directory_iterator(std::filesystem::__cxx11::path const&, std::filesystem::directory_options, std::error_code*)'
python3 $SYCL_HOME/llvm/buildbot/compile.py
  1. Compile does not finish, and it's not clear where to add the appropriate library to the build. Technically target_link_libraries(libstdc++fs) could be added somewhere to the build.

Environment (please complete the following information):

  • OS: Linux, RHEL 8
  • Target device and vendor: Xilinx Alveo U280
  • DPC++ version: N/A
  • Dependencies version: N/A
  • XRT version: 2.13.0 (built from Git repo)
  • GCC: Same result with GCC 8.5.0 and GCC 11, libstdc++-devel package is installed.

Additional context

@jyoung3131 jyoung3131 added the bug Something isn't working label Jul 15, 2022
@Ralender
Copy link
Contributor

I spend some time looking into what to do with terminate_xsimk.
We know it doesn't catch all the cases we would like it to catch and the solution that stopped our issues was to have a background cleanup process even if it is a horrible approach. here is what I propose:

  • remove terminate_xsimk.hpp and its uses thus removing the current issue.
  • add the source of the background process I have running on our test machines.
  • and explain this mess to the users via documentation.

@keryell: what do you think ?

@keryell
Copy link
Member

keryell commented Jul 24, 2022

Yes it is possible to make optional this hack work-around the Xsim control mess from XRT.
But on the other hands it might be just a simple fix in the CMake configuration.
@jyoung3131 Otherwise, is it possible to use a more modern gcc-toolset-12-gcc instead of gcc-toolset-11-gcc on your machine?

@jyoung3131
Copy link
Author

Hi all - I'll try to test here with GCC 12, although I'll need to install it via Spack since RHEL hasn't yet released a toolset version of it.

I am not quite sure what terminate_xsimk does so maybe some brief explanation in the documentation might help to know if it is needed or not.

@Ralender
Copy link
Contributor

The short explication is that the simulator used for hw_emu is often not cleanup by XRT after it is used. and the simulator also has memory leaks which means that if left as is and given enough time it will eat up all the RAM and the eating of RAM gets faster for each xsim process that are running. what terminate_xsimk does is cleaunp the xsim process when the SYCL application terminates to make sure we dont leave any xsim process behind.

but this doesn't solve everything because if the SYCL application is killed by uncatchable signals (like while debugging) it will not execute terminate_xsimk. so what we have on our machines is a background process that will periodically cleanup xsims that are not in use. and the background process has prevent any of our machines from having xsim related issues for a few month now. but terminate_xsimk is my old attempt at solving this issue.

so I am not sure it is worth having terminate_xsimk at all since it is not sufficient to solve the issue it is trying to solve. and I propose instead to provide the sources for my background process and ask people having issues to use this instead. not that it take a long time for 1 xsim process to eat all the RAM. so if you are either rebooting frequently not using hw_emu too much you will likely not experience the issue.

@jyoung3131
Copy link
Author

Hi all - I was able to compile through using a Spack installed version of GCC12 to install this toolchain. RHEL8 doesn't have gcc12 yet as a devtoolset, so I ran the following to install it.

. share/spack/setup-env.sh
spack install gcc @12.1.0
spack load gcc@12.1.0

I'm still struggling with compiling some of the sample codes, but I think I at least have a working version of triSYCL LLVM on RHEL8 to test against.

@Ralender - I'm ok with closing this issue unless you want to keep it open to keep track of xsimk issues.

@keryell
Copy link
Member

keryell commented Aug 6, 2022

Nice to see you were able to make progress.
Which kind of issue you have with the sample codes?
Perhaps the issues are on our side too...

@jyoung3131
Copy link
Author

jyoung3131 commented Oct 4, 2022

Hi @keryell and @Ralender - apologies for the long delay. I had to reorganize my thoughts and switch to some other projects. I can say definitively that the compiler builds correctly, but I've been unable to get a full sample to compile all the way through on RHEL8.

I've worked through several errors, some of which I might file as separate issues (I had to comment some lines in sycl_vxx.py for example which may or may not actually be a bug). The one error I can say I'm puzzled by happened a while back and seems to be related to the --sp flag for targeting an HBM bank on the U280:

Time (s): cpu = 00:00:09 ; elapsed = 00:00:07 . Memory (MB): peak = 2017.672 ; gain = 0.000 ; free physical = 93011 ; free virtual = 209148                                                                      INFO: [SYSTEM_LINK 82-51] Create system connectivity graph                                                                                                                                                       INFO: [SYSTEM_LINK 82-102] Applying explicit connections to the system connectivity graph: /tmp/parallel_for-e3dd21b6427ykk/vxx_link_tmp/link/sys_link/cfgraph/cfgen_cfgraph.xml                                 INFO: 

[SYSTEM_LINK 82-38] [17:33:16] cfgen started: /net/tools/reconfig/xilinx/Vitis/2021.1/Vitis/2021.1/bin/cfgen  -sp r_to_stIZ4mainE3addEE_L02L5nrB_1.m_axi_gmem:DDR[0] -sp r_to_stIZ4mainE3addEE_L02L5nrB_1._arg_:DDR[0] -dmclkid 0 -r /tmp/parallel_for-e3dd21b6427ykk/vxx_link_tmp/link/sys_link/_sysl/.cdb/xd_ip_db.xml -o /tmp/parallel_for-e3dd21b6427ykk/vxx_link_tmp/link/sys_link/cfgraph/cfgen_cfgraph.xml           INFO: [CFGEN 83-0] Kernel Specs:                                                                                                                                                                                 INFO: [CFGEN 83-0]   kernel: r_to_stIZ4mainE3addEE_L02L5nrB, num: 1  {r_to_stIZ4mainE3addEE_L02L5nrB_1}                                                                                                          INFO: [CFGEN 83-0] Port Specs:                                                                                                                                                                                   INFO: [CFGEN 83-0]   kernel: r_to_stIZ4mainE3addEE_L02L5nrB_1, k_port: m_axi_gmem, sptag: DDR[0]                                                                                                                 INFO: [CFGEN 83-0]   kernel: r_to_stIZ4mainE3addEE_L02L5nrB_1, k_port: _arg_, sptag: DDR[0]                                                                                                                      ERROR: [CFGEN 83-2287] --sp tag applied with an invalid sp tag: DDR[0]                                                                                                                                           ERROR: [CFGEN 83-2287] --sp tag applied with an invalid sp tag: DDR[0]                                                                                                                                           ERROR: [CFGEN 83-2297] Please consult platforminfo <platform.xpfm path> for sptag information
ERROR: [CFGEN 83-2298] Exiting due to previous error
ERROR: [SYSTEM_LINK 82-36] [17:33:20] cfgen failed
Time (s): cpu = 00:00:04 ; elapsed = 00:00:04 . Memory (MB): peak = 2017.672 ; gain = 0.000 ; free physical = 93010 ; free virtual = 209146                                                                      ERROR: [SYSTEM_LINK 82-62] Error generating design file for /tmp/parallel_for-e3dd21b6427ykk/vxx_link_tmp/link/sys_link/cfgraph/cfgen_cfgraph.xml, command: /net/tools/reconfig/xilinx/Vitis/2021.1/Vitis/2021.1/bin/cfgen  -sp r_to_stIZ4mainE3addEE_L02L5nrB_1.m_axi_gmem:DDR[0] -sp r_to_stIZ4mainE3addEE_L02L5nrB_1._arg_:DDR[0] -dmclkid 0 -r /tmp/parallel_for-e3dd21b6427ykk/vxx_link_tmp/link/sys_link/_sysl/.cdb/xd_ip_db.xml -o /tmp/parallel_for-e3dd21b6427ykk/vxx_link_tmp/link/sys_link/cfgraph/cfgen_cfgraph.xml                                                                                                                    ERROR: [SYSTEM_LINK 82-96] Error applying explicit connections to the system connectivity graph
ERROR: [SYSTEM_LINK 82-79] Unable to create system connectivity graph
INFO: [v++ 60-1442] [17:33:20] Run run_link: Step system_link: Failed
Time (s): cpu = 00:00:15 ; elapsed = 00:00:13 . Memory (MB): peak = 1909.562 ; gain = 0.000 ; free physical = 93046 ; free virtual = 209183                                                                      ERROR: [v++ 60-661] v++ link run 'run_link' failed
ERROR: [v++ 60-626] Kernel link failed to complete                                                                                                                                                               ERROR: [v++ 60-703] Failed to finish linking
INFO: [v++ 60-1653] Closing dispatch client.
Vitis linkage stage failed                                                                                                                                                                                       Output /tmp/parallel_for-e3dd21b6427ykk/parallel_for-e3dd21.xclbin was not properly produced by previous commands
clang-14: error: sycl-link-vxx command failed with exit code 255 (use -v to see invocation)                                                                                                                      clang version 14.0.0 (https://github.com/triSYCL/sycl.git 5d6a7a8bb914f38c173fbca29858a1c40f846f91)
Target: x86_64-unknown-linux-gnu

I did want to follow up and ask is there a recommended version of Vitis that we should try to use? We have many of them installed locally, so I've tested with 2021.1 and 2022.1 recently. I think we also recently acquired a non-HBM Alveo board (U250) if that's a better platform to test against.

@Ralender
Copy link
Contributor

Ralender commented Oct 5, 2022

about the --sp error. the log shows v++ was given a request to place buffers into DDR banks on a card that doesn't have any DDR banks. This used to be the default behavior until #177.
so:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants