Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecoTracker/MkFitCore: Valgrind Massif/Memcheck "Invalid free() / delete / delete[] / realloc()" at end of job. #42700

Open
gartung opened this issue Sep 1, 2023 · 8 comments

Comments

@gartung
Copy link
Member

gartung commented Sep 1, 2023

Running workflow 11834.21 in IB CMSSW_13_3_X_2023-08-30-1100 on el8 produces this messages from Valgrind at end of job.

01-Sep-2023 11:31:56 CEST  Closed file file:step2.root                                                                                                              
>>> processed 10 events                                                                                                                                             
==16446== Conditional jump or move depends on uninitialised value(s)                                                                                                
==16446==    at 0x872F3AE: bool rml::internal::isLargeObject<(rml::internal::MemoryOrigin)0>(void*) [clone .part.0] [clone .lto_priv.0] (frontend.cpp:2504)         
==16446==    by 0x87332A5: UnknownInlinedFun (frontend.cpp:2492)                                                                                                    
==16446==    by 0x87332A5: UnknownInlinedFun (frontend.cpp:2637)                                                                                                    
==16446==    by 0x87332A5: UnknownInlinedFun (frontend.cpp:2663)                                                                                                    
==16446==    by 0x87332A5: scalable_free (frontend.cpp:2943)                                                                                                        
==16446==    by 0x4E6F013: tbb::detail::d1::concurrent_unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std:
:equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, tbb::detail::d1::tbb_allocator<std::pair<std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::~concurrent_unordere
d_map() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el8_amd64_gcc11/libFWCoreUtilities.so)         
==16446==    by 0x72D026B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)                                                                                       
==16446==    by 0x72D039F: exit (in /usr/lib64/libc-2.28.so)                                                                                         
==16446==    by 0x72B9D8B: (below main) (in /usr/lib64/libc-2.28.so)                                                                                
==16446==                                                                                                                                           
==16446== Conditional jump or move depends on uninitialised value(s)                                                                                
==16446==    at 0x87332A8: UnknownInlinedFun (frontend.cpp:2637)                                                                                    
==16446==    by 0x87332A8: UnknownInlinedFun (frontend.cpp:2663)                                                                                    
==16446==    by 0x87332A8: scalable_free (frontend.cpp:2943)    
==16446==    by 0x4E6F013: tbb::detail::d1::concurrent_unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std:
:equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, tbb::detail::d1::tbb_allocator<std::pair<std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::~concurrent_unordere
d_map() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el8_amd64_gcc11/libFWCoreUtilities.so)         
==16446==    by 0x72D026B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)                                                                                   
==16446==    by 0x72D039F: exit (in /usr/lib64/libc-2.28.so)              
==16446==    by 0x72B9D8B: (below main) (in /usr/lib64/libc-2.28.so)                                                                                 
==16446==                                                                 
==16446== Conditional jump or move depends on uninitialised value(s)
==16446==    at 0x872F3C2: bool rml::internal::isLargeObject<(rml::internal::MemoryOrigin)0>(void*) [clone .part.0] [clone .lto_priv.0] (frontend.cpp:2503)       
==16446==    by 0x87332A5: UnknownInlinedFun (frontend.cpp:2492)    
==16446==    by 0x87332A5: UnknownInlinedFun (frontend.cpp:2637)                                                                                           
==16446==    by 0x87332A5: UnknownInlinedFun (frontend.cpp:2663)             
==16446==    by 0x87332A5: scalable_free (frontend.cpp:2943)    
==16446==    by 0x4E6F013: tbb::detail::d1::concurrent_unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std:
:equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, tbb::detail::d1::tbb_allocator<std::pair<std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::~concurrent_unordere
d_map() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el8_amd64_gcc11/libFWCoreUtilities.so)         
==16446==    by 0x72D026B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)                                                                                   
==16446==    by 0x72D039F: exit (in /usr/lib64/libc-2.28.so)                                                                                                  
==16446==    by 0x72B9D8B: (below main) (in /usr/lib64/libc-2.28.so)                                                                                     
==16446==                                                                                                                                                  

==16446== Invalid free() / delete / delete[] / realloc()                                                                                                                     [32/1767]
==16446==    at 0x403CF6C: free (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/external/valgrind/3.17.0-7bfcd2b5e4f162fb4b127c18285f46f6/libexec/valgrind/vgpreload_m
emcheck-amd64-linux.so)                                                                                                                                                               
==16446==    by 0x40E5A141: mkfit::ExecutionContext::~ExecutionContext() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el
8_amd64_gcc11/libRecoTrackerMkFitCore.so)                                                                                                                                             
==16446==    by 0x72D026B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)                                                                                                           
==16446==    by 0x72D039F: exit (in /usr/lib64/libc-2.28.so)                                                                                                                          
==16446==    by 0x72B9D8B: (below main) (in /usr/lib64/libc-2.28.so)                                                                                                                  
==16446==  Address 0x8a0f000 is in a rw- anonymous segment                                                                                                                            
==16446==                                                                                                                                                                             
==16446== Invalid free() / delete / delete[] / realloc()                                                                                                                              
==16446==    at 0x403CF6C: free (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/external/valgrind/3.17.0-7bfcd2b5e4f162fb4b127c18285f46f6/libexec/valgrind/vgpreload_m
emcheck-amd64-linux.so)                      
==16446==    by 0x40E56F40: mkfit::Pool<mkfit::MkFitter>::~Pool() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el8_amd64
_gcc11/libRecoTrackerMkFitCore.so)           
==16446==    by 0x40E5A2B4: mkfit::ExecutionContext::~ExecutionContext() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el
8_amd64_gcc11/libRecoTrackerMkFitCore.so)    
==16446==    by 0x72D026B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)                                                                                                           
==16446==    by 0x72D039F: exit (in /usr/lib64/libc-2.28.so)                               
==16446==    by 0x72B9D8B: (below main) (in /usr/lib64/libc-2.28.so)                       
==16446==  Address 0x8a09340 is in a rw- anonymous segment                                 
==16446==                                    
==16446== Invalid free() / delete / delete[] / realloc()                                   
==16446==    at 0x403CF6C: free (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/external/valgrind/3.17.0-7bfcd2b5e4f162fb4b127c18285f46f6/libexec/valgrind/vgpreload_m
emcheck-amd64-linux.so)                      
==16446==    by 0x40E5A381: mkfit::ExecutionContext::~ExecutionContext() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02800/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_X_2023-08-27-0000/lib/el
8_amd64_gcc11/libRecoTrackerMkFitCore.so)    
==16446==    by 0x72D026B: __run_exit_handlers (in /usr/lib64/libc-2.28.so)                                                                                                           
==16446==    by 0x72D039F: exit (in /usr/lib64/libc-2.28.so)                               
==16446==    by 0x72B9D8B: (below main) (in /usr/lib64/libc-2.28.so)                       
==16446==  Address 0x8a08000 is in a rw- anonymous segment                                 
==16446==                                    
==16446==                                    
==16446== HEAP SUMMARY:                      
==16446==     in use at exit: 318,731,535 bytes in 774,786 blocks                          
==16446==   total heap usage: 1,007,130,306 allocs, 1,006,355,523 frees, 212,626,963,744 bytes allocated                                                                              
==16446==                                    
==16446== LEAK SUMMARY:                      
==16446==    definitely lost: 108,280 bytes in 1,175 blocks                                
==16446==    indirectly lost: 1,978,624 bytes in 30,176 blocks                             
==16446==      possibly lost: 6,069,903 bytes in 146,735 blocks                            
==16446==    still reachable: 310,574,728 bytes in 596,700 blocks                          
==16446==                       of which reachable via heuristic:                          
==16446==                         newarray           : 283,232 bytes in 603 blocks                                                                                                    
==16446==                         multipleinheritance: 168 bytes in 2 blocks                                                                                                          
==16446==         suppressed: 0 bytes in 0 blocks                                          
==16446== Rerun with --leak-check=full to see details of leaked memory                     
==16446==                                    
==16446== Use --track-origins=yes to see where uninitialised values come from                                                                                                         
==16446== For lists of detected and suppressed errors, rerun with: -s                      
==16446== ERROR SUMMARY: 126130 errors from 341 contexts (suppressed: 2 from 1)                                                                                                       
@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 1, 2023

A new Issue was created by @gartung Patrick Gartung.

@Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@gartung gartung changed the title RecoTracker/MkFitCore: Valgrind Massif/Memcheck gives "Invalid free() / delete / delete[] / realloc()" RecoTracker/MkFitCore: Valgrind Massif/Memcheck gives "Invalid free() / delete / delete[] / realloc()" at end of job. Sep 1, 2023
@gartung gartung changed the title RecoTracker/MkFitCore: Valgrind Massif/Memcheck gives "Invalid free() / delete / delete[] / realloc()" at end of job. RecoTracker/MkFitCore: Valgrind Massif/Memcheck "Invalid free() / delete / delete[] / realloc()" at end of job. Sep 1, 2023
@slava77
Copy link
Contributor

slava77 commented Sep 1, 2023

type tracking

@slava77
Copy link
Contributor

slava77 commented Sep 1, 2023

@osschar

@makortel
Copy link
Contributor

makortel commented Sep 1, 2023

assign reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 1, 2023

New categories assigned: reconstruction

@clacaputo,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

makortel commented Sep 1, 2023

Probably more for the record, the "invalid free" was mentioned also in #40733 (comment)

@dan131riley
Copy link

It's during global destructors at exit, which we know has other issues. It would be best to destroy the mkfit::ExecutionContext before framework exit(). I'll take a look at making that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants