New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple failures in NONLTO, CLANG and ASAN Unit Tests and RelVals due to PluginNotFound
#44821
Comments
cms-bot internal usage |
A new Issue was created by @aandvalenzuela. @smuzaffar, @makortel, @rappoccio, @antoniovilela, @Dr15Jones, @sextonkennedy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
The duplicate dictionary checker for those failing IBs says the
|
at the end of build phase we do run
|
https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/scram-project-build.file#L228 is where we run |
one can reproduce the crash on cmsdev4X nodes by starting
|
note that |
I wonder if we have a 'one definition violation'. Maybe valgrind could spot a problem? |
Running valgrind only showed "Invalid read of size 8" in the But that destructor comes from |
assign heterogeneous |
(jumping into the rabbit hole with @Dr15Jones) So the
Our
and seems to be the only CMSSW shared object having the One thing to note on the ROCm setup is that (as far as I can tell) we are taking the binaries from AMD's RHEL8 RPMs. I would assume those were built with the system GCC against the system libstdc++, that seem to be 8 (or at least |
The ROCm libraries get loaded by |
The
|
Disassembling things, the instructions of It seems like we have an ODR violation from trying to mix libraries that were built with (very) different versions of libstdc++, and thus if we need to keep the rocprofiler, we'd have to build it ourselves. |
I'm not particularly interested in keeping rocprofiler (and in fact we did not have it until now). Unfortunately it seems to be a dependency of
|
I have opened cms-sw/cmsdist#9153 and #44824 to revert ROCm update |
Adding here cms-sw/cmsdist#9143 (comment)
The trend continued: in CMSSW_14_1_X_2024-04-23-2300 the NONLTO and CLANG IBs failed, but none of the others. |
#44838 fixes |
thanks @makortel , I have tested it for NONLTO and confirm that
|
For LTO builds ( where dd4hep is also build with lto flags)
So may be that is why LTO enabled IBs are not failing. |
Hello,
There are multiple failures in NONLTO, CLANG and ASAN IBs (both in Unit Tests and RelVals) in lastest IBs (CMSSW_14_1_[FLAVOR]_X_2024-04-22-2300) reporting:
There are other variants of the exception, for example:
CondCore/SiPixelPlugins
:CondCore/CondDB
:I am not sure if it is related, but we had ROCm update yesterday in #44777 and ROCm device builds fine (See log).
However, there was a similar issue in the past reported at cmssw#40680 and related to a ROCm update in which the missing plugins were not properly registered in the
.edmplugincache
file.Thanks,
Andrea
The text was updated successfully, but these errors were encountered: