New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for crash in gpuClusterChargeCut.h after "warning too many clusters ..." #35829
Fix for crash in gpuClusterChargeCut.h after "warning too many clusters ..." #35829
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35829/26191
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
Thanks for the fix. should not suffer of this defect. |
does it mean that this PR is not needed? (assuming we sort out the unexpected diffs in #35598 DQM plots) |
I'm checking further. It seems that there can be other issues with that event with a huge occupancy in one module. |
The clusterizer is limited to 6000 pixels BUT in reality the max is 16*blocksize and blocksize now is 256... in the event in question there are 4340 in the module |
This commit (or similar) is required as well |
Besides bugs (that should not be there) I would like to understand what kind of events are those: So: what is causing such an anomalous occupancy in just one module? |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35829/26213
|
A new Pull Request was created by @czangela for master. It involves the following packages:
@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
I will close this PR since #35835 contains the necessary fixes. |
This PR may be useful as backport. |
PR description:
This issue occurs running on some
TTbar
data generated withRun3
end-of-2024 conditions.The full dataset can be found at https://cmsweb.cern.ch/das/request?instance=prod/phys03&input=file+dataset%3D%2FTTbar%2Ftsusa-TTbar_14TeV_GSDR_2024_121X_mcRun3_2024_realistic_v8_10k-58ae893e2544746328b272cd68e5d2f1%2FUSER, there is a trimmed file in
/afs/cern.ch/work/a/aczirkos/public/gpu_charge_crash_10_25/
containing the crashing event instep2_2.root
, running on this should be enough to reproduce the error.In case there are too many clusters in a module, the local reconstruction fails in
gpuClusterChargeCut.h
with the following error:There is a full log at
/afs/cern.ch/work/a/aczirkos/public/gpu_charge_crash_10_25/log
.This is due to an indexing bug, where we set
clusterId[i] = invalidModuleId
first, and then we use this to index thenewClusId
array which has sizemaxNumClustersPerModules
. (invalidModuleId
>maxNumClustersPerModules
)This PR fixes this error.
PR validation:
This is a short recipe for reproducing the error, and checking that this PR fixes it.
On a GPU equipped machine:
if this PR is a backport please specify the original PR and why you need to backport that PR:
kindly pinging @tsusa @connorpa