Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASAN/segmentation faults in gen::HydjetHadronizer::get_particles() #39350

Open
dan131riley opened this issue Sep 8, 2022 · 12 comments
Open

ASAN/segmentation faults in gen::HydjetHadronizer::get_particles() #39350

dan131riley opened this issue Sep 8, 2022 · 12 comments

Comments

@dan131riley
Copy link

We're getting segmentation faults in some IB workflows in gen::HydjetHadronizer::get_particles(), and also ASAN is seeing a heap buffer overflow. I'll include the ASAN report from WF 159.03 since it's more informative. Log is:

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/el8_amd64_gcc10/CMSSW_12_6_ASAN_X_2022-09-07-1100/pyRelValMatrixLogs/run/159.03_HydjetQ_MinBias_5020GeV_2021_ppReco+HydjetQ_MinBias_5020GeV_2021_ppReco+DIGIHI2021PPRECO+RAWPRIMESIMHI18+RECOHI2022PROD+MINIHI2022PROD/step1_HydjetQ_MinBias_5020GeV_2021_ppReco+HydjetQ_MinBias_5020GeV_2021_ppReco+DIGIHI2021PPRECO+RAWPRIMESIMHI18+RECOHI2022PROD+MINIHI2022PROD.log

Edited ASAN report:

=================================================================
==25585==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000dce858 at pc 0x2b529509ae88 bp 0x2b52b9cd8f60 sp 0x2b52b9cd8f58
WRITE of size 4 at 0x611000dce858 thread T4
    #0 0x2b529509ae87 in gen::HydjetHadronizer::get_particles(HepMC::GenEvent*) (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/libGeneratorInterfaceHydjetInterface.so+0x18e87)
    #1 0x2b529509c274 in gen::HydjetHadronizer::generatePartonsAndHadronize() (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/libGeneratorInterfaceHydjetInterface.so+0x1a274)
    #2 0x2b5294fc4761 in edm::GeneratorFilter<gen::HydjetHadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/pluginGeneratorInterfaceHydjetInterfacePlugins.so+0x2f761)

0x611000dce858 is located 0 bytes to the right of 216-byte region [0x611000dce780,0x611000dce858)
allocated by thread T4 here:
    #0 0x2b527d6d6607 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x2b5295099f9d in gen::HydjetHadronizer::get_particles(HepMC::GenEvent*) (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/libGeneratorInterfaceHydjetInterface.so+0x17f9d)
    #2 0x2b529509c274 in gen::HydjetHadronizer::generatePartonsAndHadronize() (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/libGeneratorInterfaceHydjetInterface.so+0x1a274)
    #3 0x2b5294fc4761 in edm::GeneratorFilter<gen::HydjetHadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/pluginGeneratorInterfaceHydjetInterfacePlugins.so+0x2f761)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/cvmfs/cms-ib.cern.ch/nweek-02749/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_ASAN_X_2022-09-07-1100/lib/el8_amd64_gcc10/libGeneratorInterfaceHydjetInterface.so+0x18e87) in gen::HydjetHadronizer::get_particles(HepMC::GenEvent*)
Shadow bytes around the buggy address:
  0x0c22801b1cb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c22801b1cc0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c22801b1cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c22801b1ce0: 00 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c22801b1cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c22801b1d00: 00 00 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa
  0x0c22801b1d10: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c22801b1d20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c22801b1d30: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa
  0x0c22801b1d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c22801b1d50: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==25585==ABORTING
@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2022

A new Issue was created by @dan131riley Dan Riley.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Sep 8, 2022

assign generators

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2022

New categories assigned: generators

@mkirsano,@menglu21,@alberto-sanchez,@SiewYan,@GurpreetSinghChahal,@Saptaparna you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

makortel commented Sep 8, 2022

Seems that the segfault was first seen in CMSSW_12_6_X_2022-09-06-1100

@dan131riley
Copy link
Author

Also seen in UBSAN

/pool/condor/dir_2153/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/6dda80287ded6cb30cbcf21c830565ab/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-09-16-1100/src/GeneratorInterface/HydjetInterface/src/HydjetHadronizer.cc:392:57: runtime error: member call on null pointer of type 'struct GenParticle'
    #0 0x2ac0bac2cf4d  (/cvmfs/cms-ib.cern.ch/nweek-02750/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-09-16-1100/lib/el8_amd64_gcc11/libGeneratorInterfaceHydjetInterface.so+0x62f4d)
    #1 0x2ac0bac2f9aa  (/cvmfs/cms-ib.cern.ch/nweek-02750/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-09-16-1100/lib/el8_amd64_gcc11/libGeneratorInterfaceHydjetInterface.so+0x659aa)
    #2 0x2ac0baac8008  (/cvmfs/cms-ib.cern.ch/nweek-02750/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-09-16-1100/lib/el8_amd64_gcc11/pluginGeneratorInterfaceHydjetInterfacePlugins.so+0xc7008)

@perrotta
Copy link
Contributor

@cms-sw/generators-l2 the segfault is still there in the IBs: could you please let us know what is the status of the investigation of this issue? When do you plan to provide a fix?

@perrotta
Copy link
Contributor

urgent
(well, we are preparing a release for HI, and this has to be fixed at some point)

@makortel
Copy link
Contributor

@Dr15Jones presumably #39717 should have fixed this issue, right?

@Dr15Jones
Copy link
Contributor

@makortel hopefully, but I was unable to test it. We should see by the end of today’s IB.

@smuzaffar
Copy link
Contributor

smuzaffar commented Oct 19, 2022

last ASAN build shows good results. Only 6 tests fail after #39717 was merge as compare to 10 failures. But I still see this error

#4  0x00002afda39c5016 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#5  <signal handler called>
#6  0x00002afdaef49802 in gen::HydjetHadronizer::get_particles(HepMC::GenEvent*) () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/libGeneratorInterfaceHydjetInterface.so
#7  0x00002afdaef4b195 in gen::HydjetHadronizer::generatePartonsAndHadronize() () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/libGeneratorInterfaceHydjetInterface.so
#8  0x00002afdaee75713 in edm::GeneratorFilter<gen::HydjetHadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/pluginGeneratorInterfaceHydjetInterfacePlugins.so
#9  0x00002afd98b91111 in edm::one::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so
#10 0x00002afd98b3d1b5 in edm::WorkerT<edm::one::EDFilterBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so
#11 0x00002afd987fbf05 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02755/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_ASAN_X_2022-10-18-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so

@dan131riley
Copy link
Author

#39717 fixed the ASAN buffer overflow, but we're still getting segfaults. UBSAN reports

CMSSW_12_6_UBSAN_X_2022-10-24-1100/src/GeneratorInterface/HydjetInterface/src/HydjetHadronizer.cc:395:57: runtime error: member call on null pointer of type 'struct GenParticle'

HepMC::GenParticle* mother = primary_particle.at(mid);
HepMC::GenVertex* prods = build_hyjet_vertex(ihy, isub);
if (!mother) {
mother = particle[mid];
primary_particle[mid] = mother;
}
HepMC::GenVertex* prod_vertex = mother->end_vertex();

@Dr15Jones
Copy link
Contributor

@dan131riley this is supposed to be fixed with #39784 which is still waiting on review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants