Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

danarest plugin crashes on os8 with memory error #635

Closed
nsjarvis opened this issue Mar 17, 2022 · 12 comments
Closed

danarest plugin crashes on os8 with memory error #635

nsjarvis opened this issue Mar 17, 2022 · 12 comments

Comments

@nsjarvis
Copy link
Contributor

The files necessary to reproduce this are in
/work/halld/njarvis/bug_danarest - use the script run.sh in that directory
It works on ifarm (rhel7) but not on jlabl5 (rhel8)

The screen output from jlabl5 is in /work/halld/njarvis/bug_danarest/os8.out.

Key excerpts from the error messages are

double free or corruption (fasttop)   
./run.sh: line 5: 627067 Aborted                 (core dumped) hd_root etapipi-100.hddm -PPLUGINS=danarest -PNTHREADS=8
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  DEventSourceHDDM::Extract_DMCReaction (this=0x7fc4dc001a40, record=0x7fc4dcb02290, factory=0x7fc4b84f2690, tag=..., loop=<optimized out>) at /u/group/halld/Softwar
e/builds/Linux_RHEL8-x86_64-gcc8.5.0/halld_recon/halld_recon-4.35.0/Linux_RHEL8-x86_64-gcc8.5.0/include/HDDM/hddm_s.hpp:9251
#6  0x0000000001083329 in DEventSourceHDDM::GetObjects (this=0x7fc4dc001a40, event=..., factory=0x7fc4b84f2690) at /usr/include/c++/8/bits/basic_string.h:940
#7  0x000000000081f49b in jana::JEvent::GetObjects<DMCReaction> (this=this
entry=0x7fc4b8000b68, t=std::vector of length 0, capacity 0, factory=factory
entry=0x7fc4b84f2690) at /u/group/halld/Software/builds/Linux_RHEL8-x86_64-gcc8.5.0/jana/jana_0.8.2^ccdb168/Linux_RHEL8-x86_64-gcc8.5.0/include/JANA/JEvent.h:84
#8  0x000000000081f899 in jana::JEventLoop::GetFromSource<DMCReaction> (factory=0x7fc4b84f2690, t=std::vector of length 0, capacity 0, this=0x7fc4b8000b60) at /u/group
/halld/Software/builds/Linux_RHEL8-x86_64-gcc8.5.0/jana/jana_0.8.2^ccdb168/Linux_RHEL8-x86_64-gcc8.5.0/include/JANA/JEventLoop.h:463
#9  jana::JEventLoop::GetFromFactory<DMCReaction> (this=this
entry=0x7fc4b8000b60, t=std::vector of length 0, capacity 0, tag=tag
entry=0x1203c36 "", data_source=
0x7fc4bfff7280: 4257240736, allow_deftag=allow_deftag
entry=true) at /u/group/halld/Software/builds/Linux_RHEL8-x86_64-gcc8.5.0/jana/jana_0.8.2^ccdb168/Linux_RHEL8-x86_64-gcc8.5.0/include/JANA/JEventLoop.h:427
#10 0x000000000081fa6e in jana::JEventLoop::Get<DMCReaction> (this=this
entry=0x7fc4b8000b60, t=std::vector of length 0, capacity 0, tag=tag
entry=0x1203c36 "", allow_deftag=allow_deftag
entry=true) at /usr/include/c++/8/ext/new_allocator.h:86
#11 0x000000000116d5df in DEventWriterREST::Write_RESTEvent (this=this
entry=0x7fc4b857f300, locEventLoop=locEventLoop
entry=0x7fc4b8000b60, locOutputFileNameSubString="") at libraries/HDDM/DEventWriterREST.cc:70
#12 0x00007fc4e41887b8 in JEventProcessor_danarest::evnt (this=<optimized out>, locEventLoop=0x7fc4b8000b60, eventnumber=<optimized out>) at /usr/include/c++/8/ext/new
_allocator.h:79
#13 0x00000000011a742f in jana::JEventLoop::OneEvent (this=0x7fc4b8000b60) at src/JANA/JEventLoop.cc:693
#14 0x00000000011a7a44 in jana::JEventLoop::Loop (this=this
entry=0x7fc4b8000b60) at src/JANA/JEventLoop.cc:496
#15 0x0000000001184bf9 in LaunchThread (arg=0x7ffc19aaa110) at src/JANA/JApplication.cc:1382
#16 0x00007fc4fd0c017a in start_thread () from /lib64/libpthread.so.0
#17 0x00007fc4fcdefdc3 in clone () from /lib64/libc.so.6
@aaust
Copy link
Contributor

aaust commented Aug 31, 2022

I can confirm your observation. I do not recommend running with 8 threads on jlabl5.

---- edit: found a mistake on my part

@nsjarvis
Copy link
Contributor Author

Did it run correctly single-threaded? The idea seems familiar but I cannot remember for certain, it was so long ago.

@aaust
Copy link
Contributor

aaust commented Dec 14, 2022

Valgrind does not tell me much more:

==667513== 1 errors in context 4 of 127:
==667513== Invalid read of size 4
==667513==    at 0x1104721: DEventSourceHDDM::Extract_DMCReaction(hddm_s::HDDM*, jana::JFactory<DMCReaction>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, jana::JEventLoop*) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513==    by 0x1105959: DEventSourceHDDM::GetObjects(jana::JEvent&, jana::JFactory_base*) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513==    by 0x8024BA: jerror_t jana::JEvent::GetObjects<DMCReaction>(std::vector<DMCReaction const*, std::allocator<DMCReaction const*> >&, jana::JFactory_base*) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513==    by 0x8028B8: jana::JFactory<DMCReaction>* jana::JEventLoop::GetFromFactory<DMCReaction>(std::vector<DMCReaction const*, std::allocator<DMCReaction const*> >&, char const*, jana::JEventLoop::data_source_t&, bool) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513==    by 0x802A88: jana::JFactory<DMCReaction>* jana::JEventLoop::Get<DMCReaction>(std::vector<DMCReaction const*, std::allocator<DMCReaction const*> >&, char const*, bool) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513==    by 0x1258702: DEventWriterREST::Write_RESTEvent(jana::JEventLoop*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513==    by 0x27E36356: JEventProcessor_danarest::evnt(jana::JEventLoop*, unsigned long) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/plugins/danarest.so)
==667513==    by 0x128C70E: jana::JEventLoop::OneEvent() (JEventLoop.cc:693)
==667513==    by 0x128CD23: jana::JEventLoop::Loop() (JEventLoop.cc:496)
==667513==    by 0x1269A48: LaunchThread(void*) (JApplication.cc:1382)
==667513==    by 0xD1821C9: start_thread (in /usr/lib64/libpthread-2.28.so)
==667513==    by 0xDF06E72: clone (in /usr/lib64/libc-2.28.so)
==667513==  Address 0x1c is not stack'd, malloc'd or (recently) free'd

@aaust
Copy link
Contributor

aaust commented Dec 14, 2022

This is a hint:
/include/HDDM/hddm_s.hpp:9251

This looks for the properties of the target

inline Properties &Target::getProperties() {
   return m_properties_link.front();
}

but this field is not present in the input
<properties charge="int" mass="float"/>

@aaust
Copy link
Contributor

aaust commented Dec 14, 2022

And indeed, skipping over this line prevents the crash:
https://github.com/JeffersonLab/halld_recon/blob/master/src/libraries/HDDM/DEventSourceHDDM.cc#L1159

@aaust
Copy link
Contributor

aaust commented Dec 14, 2022

Apparently, calling std::vector::front on an empty container causes undefined behavior. Since this is used everywhere in hddm-cpp, I want to hand this problem off to @rjones30

@rjones30
Copy link
Contributor

rjones30 commented Dec 14, 2022 via email

@rjones30
Copy link
Contributor

rjones30 commented Dec 14, 2022

Hello Alex and all,

The properties tag under "target" is not optional in hddm_s. You don't have to specify a <target> but if you do then you must give it <properties>. This is clear from the hddm template, as shown below. If the tag is optional then it has a minoccurs="0" attribute to indicate you can leave it out without breaking the model. Please mark this issue as resolved. I will check to see if it needs to be opened in hdgeant or hdgeant4 to make sure the outputs are compliant.
-Richard Jones

 <target minOccurs="0" type="Particle_t">
     <momentum E="float" px="float" py="float" pz="float"/>
     <polarization minOccurs="0" Px="float" Py="float" Pz="float"/>
     <properties charge="int" mass="float"/>
  </target>

@aaust
Copy link
Contributor

aaust commented Dec 14, 2022

Ok, that makes sense. I can also confirm that the generator gen_amp and its derivatives fill the field for the target.

@nsjarvis Where did the input file (etapipi-100.hddm) come from? Can you please make sure it is recent?

@rjones30
Copy link
Contributor

Nevertheless, we can make our lives easier by making the reader more robust against inadvertent omissions that make little difference to the analysis like this. I have posted a PR named fix_hddm_read_crashes_rtj that I think should solve this issue. Please test and if it works, approve. -Richard

@nsjarvis
Copy link
Contributor Author

Thank you for fixing this.

@nsjarvis
Copy link
Contributor Author

PS I checked with a real data file (hd_rawdata_073070_001.evio) and it runs over that no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants