-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow crashes (L1TTwinMuxProducer:simTwinMuxDigis) #21059
Comments
A new Issue was created by @davidlt . @davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@bbockelm looks like some of the valgrind errors are from |
assign l1, core |
New categories assigned: core,l1 @Dr15Jones,@smuzaffar,@mulhearn,@rekovic you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I bumped last night xrootd to tip of stable-4.7.x (4.7.1 is soon to be released) and that seems to solve xrootd issues reported by valgrind. Other L1 issues (reported by valgrind and stack smashing) are still present. |
According to ASan we start making damage stack already on 2nd event. Below are the report. This one was executed with
|
|
Below you can see details on another bug (might be related to the first one reported by valgrind). Looks like we rely on memory allocated and freed by XML parser. Basically we rely on internal data from XML parser.
|
Looks like in the reported line |
Here are the details for next one. Seems to the same issue as in previous.
|
There are also instances of undefined behavior:
I looked at the first
I can confirm that |
@Dr15Jones - looking at the stack traces and the contents of the ticket @davidlt filed, I agree with his diagnosis with respect to the Xrootd issues. It appears that CMSSW is using the xrootd API correctly and we triggered a bug in the client. |
@davidlt I think the problem is with the definition of the index for
It looks like the intent was hit[0] was supposed to correspond to the bx of min_rpc_bx .
If that is the case, then the bug is in line 255 which should be changed to have If one looks at other lines using
we can see that they did attempt to shift the index of hit[] by two which is -min_rpc_bx .
So in addition to fixing line 255 and 256, I would suggest changing
to declare those values as const .
|
I'll try to make the changes I proposed and create a pull request if it works. |
#21151 corrects the problems with RPCHitCleaner and RPCtoDTTranslator |
Could L1 look into bugs related to |
+core |
There are a number of workflows crashing for
slc7_aarch64_gcc700
. A quick look at one of them 136.7321 (step2_L1REPACK_HLT.py
) pointed to the issue inL1Trigger/L1TTwinMux/src/RPCHitCleaner.cc:92
.vcluster_size.size()
was returning abnormally high value (jumped from 8 to 209594323). I suspected a stack smashing thus recompiled with:Smashed stack was confimed on
aarch64
andx86_64
.From
aarch64
:From
x86_64
:Valgrind on
x86_64
showed even more issues:The text was updated successfully, but these errors were encountered: