Workflow PromptReco_Run381067_JetMET1 error in CMSSW_14_0_7 #45089

mpresill · 2024-05-29T08:58:16Z

Dear all,
As reported in
cms talk
we have a paused job for the workflow PromptReco_Run381067_JetMET1 in Run 381067, with the following error:

28-May-2024 14:59:57 UTC  Closed file root://eoscms.cern.ch//eos/cms/tier0/store/data/Run2024E/JetMET1/RAW/v1/000/381/067/00000/e186cde3-4166-4946-9343-f0c908376153.root?eos.app=cmst0
----- Begin Fatal Exception 28-May-2024 15:00:07 UTC-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob)
   Additional Info:
      [a] Fatal Root Error: @SUB=TBufferFile::WriteByteCount
bytecount too large (more than 1073741822)

tarball with all the logs are here:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FatalRootError/ByteCountTooLarge/vocms013.cern.ch-3587397-3-log.tar.gz
full stack trace can be found here: /afs/cern.ch/user/m/mpresill/public/ORM_May29/CMSSW_14_0_7/src/job/WMTaskSpace/cmsRun1/cmsRun1-stdout_original.log
I was not able to reproduce the error offline so far, but I am now testing the full sets of events locally.

Matteo (ORM)

The text was updated successfully, but these errors were encountered:

cmsbuild · 2024-05-29T08:58:37Z

cms-bot internal usage

cmsbuild · 2024-05-29T08:58:37Z

A new Issue was created by @mpresill.

@makortel, @Dr15Jones, @sextonkennedy, @antoniovilela, @rappoccio, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

francescobrivio · 2024-05-29T09:01:32Z

assign core

cmsbuild · 2024-05-29T09:01:51Z

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2024-05-29T20:28:02Z

I looked at the log of the failure whose exception was in the issue description. The exception is reported after the input file was closed, and the job ultimately dies in a segfault with the following stack trace

Thread 1 (Thread 0x153d50106640 (LWP 1030) "cmsRun"):
#3  0x0000153d4a5c2720 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000153d50937a63 in TDirectoryFile::WriteKeys() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#6  0x0000153d509348bc in TDirectoryFile::SaveSelf(bool) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#7  0x0000153d50932d27 in TDirectoryFile::Save() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#8  0x0000153d50932bfc in TDirectoryFile::Close(char const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#9  0x0000153d509506d6 in TFile::Close(char const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#10 0x0000153d4a4d013e in TStorageFactoryFile::~TStorageFactoryFile() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolTFileAdaptor.so
#11 0x0000153d4a4d0169 in TStorageFactoryFile::~TStorageFactoryFile() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolTFileAdaptor.so
#12 0x0000153cab9c0a97 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#13 0x0000153cab9c38e5 in edm::RootOutputFile::~RootOutputFile() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#14 0x0000153cab9c4deb in edm::PoolOutputModule::~PoolOutputModule() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#15 0x0000153cab9c51b8 in virtual thunk to edm::PoolOutputModule::~PoolOutputModule() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#16 0x0000153cab9fa6f3 in std::_Sp_counted_deleter<edm::one::OutputModuleBase*, std::default_delete<edm::one::OutputModuleBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolOutputPlugins.so
#17 0x0000153cab9fa811 in std::_Sp_counted_ptr_inplace<edm::maker::ModuleHolderT<edm::one::OutputModuleBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolOutputPlugins.so
#18 0x0000153d5132a2a7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x0000153d513eccb2 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x0000153d513f6d10 in std::_Sp_counted_ptr_inplace<edm::ModuleRegistry, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x0000153d5132a2a7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x0000153d51354749 in edm::Schedule::~Schedule() [clone .lto_priv.0] () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#23 0x0000153d51346df7 in edm::EventProcessor::~EventProcessor() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x0000000000405891 in (anonymous namespace)::EventProcessorWithSentry::~EventProcessorWithSentry() ()
#25 0x0000000000405199 in main ()

Symptoms are the same as in #40132 (comment) .

Was any of the jobs re-tried in Tier0?

makortel · 2024-05-29T20:28:08Z

FYI @pcanal

pcanal · 2024-05-29T20:58:34Z

That is strange. It reports problem with the TFile object itself. I.e. one potential cause is that the TFile was already closed (or is being closed by another thread).

mpresill · 2024-05-29T21:35:07Z

I looked at the log of the failure whose exception was in the issue description. The exception is reported after the input file was closed, and the job ultimately dies in a segfault with the following stack trace

Thread 1 (Thread 0x153d50106640 (LWP 1030) "cmsRun"):
#3  0x0000153d4a5c2720 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x0000153d50937a63 in TDirectoryFile::WriteKeys() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#6  0x0000153d509348bc in TDirectoryFile::SaveSelf(bool) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#7  0x0000153d50932d27 in TDirectoryFile::Save() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#8  0x0000153d50932bfc in TDirectoryFile::Close(char const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#9  0x0000153d509506d6 in TFile::Close(char const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/external/el8_amd64_gcc12/lib/libRIO.so
#10 0x0000153d4a4d013e in TStorageFactoryFile::~TStorageFactoryFile() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolTFileAdaptor.so
#11 0x0000153d4a4d0169 in TStorageFactoryFile::~TStorageFactoryFile() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolTFileAdaptor.so
#12 0x0000153cab9c0a97 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#13 0x0000153cab9c38e5 in edm::RootOutputFile::~RootOutputFile() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#14 0x0000153cab9c4deb in edm::PoolOutputModule::~PoolOutputModule() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#15 0x0000153cab9c51b8 in virtual thunk to edm::PoolOutputModule::~PoolOutputModule() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libIOPoolOutput.so
#16 0x0000153cab9fa6f3 in std::_Sp_counted_deleter<edm::one::OutputModuleBase*, std::default_delete<edm::one::OutputModuleBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolOutputPlugins.so
#17 0x0000153cab9fa811 in std::_Sp_counted_ptr_inplace<edm::maker::ModuleHolderT<edm::one::OutputModuleBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginIOPoolOutputPlugins.so
#18 0x0000153d5132a2a7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x0000153d513eccb2 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x0000153d513f6d10 in std::_Sp_counted_ptr_inplace<edm::ModuleRegistry, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x0000153d5132a2a7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x0000153d51354749 in edm::Schedule::~Schedule() [clone .lto_priv.0] () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#23 0x0000153d51346df7 in edm::EventProcessor::~EventProcessor() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x0000000000405891 in (anonymous namespace)::EventProcessorWithSentry::~EventProcessorWithSentry() ()
#25 0x0000000000405199 in main ()

Symptoms are the same as in #40132 (comment) .

Was any of the jobs re-tried in Tier0?

It was not retried yet. Should this be re-tried?

makortel · 2024-05-30T12:54:42Z

Was any of the jobs re-tried in Tier0?

It was not retried yet. Should this be re-tried?

Probably not worth it. I tested the job locally (with a local input file), and it fails in the same way.

makortel · 2024-05-30T13:22:48Z

I noticed many printouts from PileupJetIdProducer in the log (being responsible of 767 MB of the 844 MB of the cmsRun1-stdout.log). I opened a separate issue #45099 about that.

mpresill · 2024-05-31T07:24:42Z

FYI, there is new failed job for the same run and workflow PromptReco_Run381067_JetMET0 with same error.

tarball with error log for this:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FatalRootError/ByteCountTooLarge/vocms013.cern.ch-3587397-3-log.tar.gz

makortel · 2024-06-01T00:50:21Z

Following #40132 (comment) I tested 14_0_7 with the backport of that (#40132 (comment)), but the behavior was the same

01-Jun-2024 00:21:10 CEST  Closed file file:e186cde3-4166-4946-9343-f0c908376153.root
----- Begin Fatal Exception 01-Jun-2024 00:21:20 CEST-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob)
   Additional Info:
      [a] Fatal Root Error: @SUB=TBufferFile::WriteByteCount
bytecount too large (more than 1073741822)

----- End Fatal Exception -------------------------------------------------

<cut>

Thread 1 (Thread 0x7f98df012680 (LWP 316619) "cmsRun"):
#3  0x00007f98dab6a720 in sig_dostack_then_abort () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f98e1e0a48a in TDirectoryFile::WriteKeys (this=0x7f98caf31580) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TDirectoryFile.cxx:2187
#6  0x00007f98e1e08228 in TDirectoryFile::SaveSelf (this=0x7f98caf31580, force=false) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TDirectoryFile.cxx:1634
#7  0x00007f98e1e07c3c in TDirectoryFile::Save (this=0x7f98caf31580) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TDirectoryFile.cxx:1552
#8  0x00007f98e1e03c59 in TDirectoryFile::Close (this=0x7f98caf31580, option=0x7f98da1a6a26 "") at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TDirectoryFile.cxx:568
#9  0x00007f98e1e1d640 in TFile::Close (this=0x7f98caf31580, option=0x7f98da1a6a26 "") at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TFile.cxx:970
#10 0x00007f98da1a413e in TStorageFactoryFile::~TStorageFactoryFile() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolTFileAdaptor.so
#11 0x00007f98da1a4169 in TStorageFactoryFile::~TStorageFactoryFile() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolTFileAdaptor.so
#12 0x00007f9840175a97 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#13 0x00007f98401788e5 in edm::RootOutputFile::~RootOutputFile() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#14 0x00007f9840179deb in edm::PoolOutputModule::~PoolOutputModule() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#15 0x00007f984017a1b8 in virtual thunk to edm::PoolOutputModule::~PoolOutputModule() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#16 0x00007f98401af6f3 in std::_Sp_counted_deleter<edm::one::OutputModuleBase*, std::default_delete<edm::one::OutputModuleBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/pluginIOPoolOutputPlugins.so
#17 0x00007f98401af811 in std::_Sp_counted_ptr_inplace<edm::maker::ModuleHolderT<edm::one::OutputModuleBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/pluginIOPoolOutputPlugins.so
#18 0x00007f98e2bc72a7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x00007f98e2c89cb2 in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >*) () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x00007f98e2c93d10 in std::_Sp_counted_ptr_inplace<edm::ModuleRegistry, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x00007f98e2bc72a7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x00007f98e2bf1749 in edm::Schedule::~Schedule() [clone .lto_priv.0] () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#23 0x00007f98e2be3df7 in edm::EventProcessor::~EventProcessor() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x0000000000405891 in (anonymous namespace)::EventProcessorWithSentry::~EventProcessorWithSentry() ()
#25 0x0000000000405199 in main ()

ferdymercury · 2024-06-01T07:49:08Z

The first error in each job is the one reported by the (CMSSW) exception message, i.e.
* `Request to expand to a negative size, likely due to an integer overflow: 0x8001588e for a max of 0x7ffffffe.`

Please note that root-project/root#14627 should only solve the bug [1] from #40132 (comment): (request to expand to a negative size).

The exception "TBufferFile::WriteByteCount bytecount too large (more than 1073741822)", ie bug [2] in the other issue, is not a bug, but rather a wanted exception, which tells you that your TFile contains a key that exceeds the maximum allowed size of 1 GB (root-project/root#6734)

Workarounds would be to address root-project/root#6734, or to make your object a bit smaller, or to reroot the exception with a custom error handler so that it does not throw a fatal error.

makortel · 2024-06-03T15:29:38Z

type root

makortel · 2024-06-03T15:34:37Z

to reroot the exception with a custom error handler so that it does not throw a fatal error.

What exactly would the ROOT state be then at the point where it issues the error message bytecount too large but the execution would continue without the conversion of the error message into an exception?

ferdymercury · 2024-06-03T15:41:14Z

I guess it would just skip saving that too-big-object into the TFile, and continue with the rest of objects. But it's just a guess, best thing would be try it out with a simple reproducer.
Maybe trying to save a TH2 with a lot of bins in X and Y does the trick for the reproducer.

makortel · 2024-06-04T14:09:51Z

FWIW, here is a stack trace to the where the exception gets thrown

(gdb) where
#0  0x00007ffff55222f1 in __cxxabiv1::__cxa_throw (obj=0x7ffff1885280, tinfo=0x7ffff79a6610 <typeinfo for edm::Exception>, dest=0x7ffff7970020 <edm::Exception::~Exception()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007fffefb2c979 in (anonymous namespace)::RootErrorHandlerImpl(int, char const*, char const*) [clone .cold] () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00007ffff690bc54 in ErrorHandler(Int_t, const char *, const char *, typedef __va_list_tag __va_list_tag *) (level=3000, location=0x7ffff4001f3c "TBufferFile::WriteByteCount", fmt=0x7ffff7026d48 "bytecount too large (more than %d)", ap=0x7fffffff2218)
    at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/core/foundation/src/TError.cxx:148
#3  0x00007ffff6848126 in TObject::DoError (this=0x7fffe568ce80, level=3000, location=0x7ffff7026d6b "WriteByteCount", fmt=0x7ffff7026d48 "bytecount too large (more than %d)", va=0x7fffffff2218) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/core/base/src/TObject.cxx:943
#4  0x00007ffff684838f in TObject::Error (this=0x7fffe568ce80, location=0x7ffff7026d6b "WriteByteCount", fmt=0x7ffff7026d48 "bytecount too large (more than %d)") at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/core/base/src/TObject.cxx:980
#5  0x00007ffff6d898b7 in TBufferFile::SetByteCount (this=0x7fffe568ce80, cntpos=243, packInVersion=true) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TBufferFile.cxx:349
#6  0x00007ffff68db51b in TObjArray::Streamer (this=0x7fffea5b7758, b=...) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/core/cont/src/TObjArray.cxx:489
#7  0x00007ffff692f31a in TClass::StreamerTObjectInitialized (pThis=0x7fffee723c00, object=0x7fffea5b7758, b=...) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/core/meta/src/TClass.cxx:6817
#8  0x00007ffff784270b in TClass::Streamer (this=0x7fffee723c00, obj=0x7fffea5b7758, b=..., onfile_class=0x0) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/core/meta/inc/TClass.h:610
#9  0x00007ffff6d8e832 in TBufferFile::WriteFastArray (this=0x7fffe568ce80, start=0x7fffea5b7758, cl=0x7fffee723c00, n=1, streamer=0x0) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TBufferFile.cxx:2354
#10 0x00007ffff700808c in TStreamerInfo::WriteBufferAux<char**> (this=0x7fff9de73a00, b=..., arr=@0x7fffffff2f30: 0x7fffffff2f28, compinfo=0x7fff9e4397c8, first=0, last=1, narr=1, eoffset=0, arrayMode=0) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TStreamerInfoWriteBuffer.cxx:628
#11 0x00007ffff6e7e814 in TStreamerInfoActions::GenericWriteAction (buf=..., addr=0x7fffea5b7600, config=0x7fff9e4397b0) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TStreamerInfoActions.cxx:202
#12 0x00007ffff6d94a07 in TStreamerInfoActions::TConfiguredAction::operator() (this=0x7fffbab24100, buffer=..., object=0x7fffea5b7600) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/inc/TStreamerInfoActions.h:123
#13 0x00007ffff6d927ec in TBufferFile::ApplySequence (this=0x7fffe568ce80, sequence=..., obj=0x7fffea5b7600) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TBufferFile.cxx:3679
#14 0x00007ffff6d925d4 in TBufferFile::WriteClassBuffer (this=0x7fffe568ce80, cl=0x7fff9df9ff00, pointer=0x7fffea5b7600) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TBufferFile.cxx:3648
#15 0x00007ffff78c3650 in TTree::Streamer (this=0x7fffea5b7600, b=...) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/tree/tree/src/TTree.cxx:9623
#16 0x00007ffff6e4c82f in TKey::TKey (this=0x7fffde449940, obj=0x7fffea5b7600, name=0x7fffea5b7619 "Events", bufsize=265751, motherDir=0x7fffdff2d980) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TKey.cxx:249
#17 0x00007ffff6e11b27 in TFile::CreateKey (this=0x7fffdff2d980, mother=0x7fffdff2d980, obj=0x7fffea5b7600, name=0x7fffea5b7619 "Events", bufsize=265751) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TFile.cxx:1031
#18 0x00007ffff6dfd754 in TDirectoryFile::WriteTObject (this=0x7fffdff2d980, obj=0x7fffea5b7600, name=0x0, option=0x7ffff78e0f89 "", bufsize=0) at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/io/io/src/TDirectoryFile.cxx:1965
#19 0x00007ffff78aae26 in TTree::AutoSave (this=0x7fffea5b7600, option=0x7fff551533e8 "FlushBaskets") at /build/muz/140xRoot/w/BUILD/el8_amd64_gcc12/lcg/root/6.30.03-055353a34b1a9cd5e46334f7a05af86c/root-6.30.03/tree/tree/src/TTree.cxx:1522
#20 0x00007fff551452b9 in edm::RootOutputFile::finishEndFile() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#21 0x00007fff55130a55 in edm::PoolOutputModule::finishEndFile() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#22 0x00007fff55130ba3 in edm::PoolOutputModule::reallyCloseFile() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libIOPoolOutput.so
#23 0x00007ffff7c848e5 in edm::Schedule::closeOutputFiles() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x00007ffff7bda60d in edm::EventProcessor::closeOutputFiles() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#25 0x00007ffff7bdfc9a in edm::EventProcessor::runToCompletion() () from /build/muz/140xRoot/w/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7_ROOT15687/lib/el8_amd64_gcc12/libFWCoreFramework.so
#26 0x00000000004074ef in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#27 0x00007ffff5d719ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre1-el8_amd64_gcc12/build/CMSSW_14_1_0_pre1-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-c3903c50b52342174dbd3a52854a6e6d/tbb-v2021.9.0/src/tbb/arena.cpp:688
#28 0x0000000000408ed2 in main::{lambda()#1}::operator()() const ()
#29 0x000000000040517c in main ()

How could we get more context on what is causing the "too big object"?

ferdymercury · 2024-06-04T14:25:32Z

There seems to be an object or class that is stored in your TTree, whose Streamer is too big, at some point when AutoSave is called.

For example, if I do this:

TFile a("/tmp/big.root","RECREATE");
TH2D h("h", "h", 100000, 0, 10, 100000, 0, 10);
h.Write();

I will get a similar crash as yours, though not exactly through the same path. If you'd try to store this histogram in the TTree directly and call AutoSave, you might get closer to what you are seeing.

So maybe, you would need to do a tree->Print() or to revise what branches and classes you have stored in this tree. One of them has a Streamer that stores too much data in it.

makortel · 2024-06-04T14:38:01Z

One detail to be noted about the output are sizes of the output files at the time the job died

16M     SKIMStreamEXODelayedJetMET.root
8.8M    SKIMStreamEXODisappTrk.root
57G     SKIMStreamEXOHighMET.root
6.6G    SKIMStreamEXOSoftDisplacedVertices.root
1.3G    SKIMStreamJetHTJetPlusHOFilter.root
1.3M    SKIMStreamLogErrorMonitor.root
545M    SKIMStreamLogError.root
56K     SKIMStreamTeVJet.root
572M    write_ALCARECO.root
6.0G    write_AOD.root
21M     write_DQMIO.root
829M    write_MINIAOD.root
64M     write_NANOAOD.root

57 GB of SKIMStreamEXOHighMET.root is highly suspicious (@cms-sw/pdmv-l2)

makortel · 2024-06-04T14:52:11Z

Inspecting SKIMStreamEXOHighMET.root it seems the job was writing that one when it failed

[mkortela@cmsdev42 /build/mkortela/debug/issue45089/CMSSW_14_0_7/src]$ root -l -n SKIMStreamEXOHighMET.root
root [0]
Attaching file SKIMStreamEXOHighMET.root as _file0...
Warning in <TFile::Init>: file SKIMStreamEXOHighMET.root probably not closed, trying to recover
Info in <TFile::Recover>: SKIMStreamEXOHighMET.root, recovered key TTree:MetaData at address 61136680973
Info in <TFile::Recover>: SKIMStreamEXOHighMET.root, recovered key TTree:ParameterSets at address 61136746357
Info in <TFile::Recover>: SKIMStreamEXOHighMET.root, recovered key TTree:Parentage at address 61136752149
Warning in <TFile::Init>: successfully recovered 3 keys
(TFile *) 0x4a153f0
root [1] .ls
TFile**         SKIMStreamEXOHighMET.root
 TFile*         SKIMStreamEXOHighMET.root
  KEY: TTree    MetaData;1
  KEY: TTree    ParameterSets;1
  KEY: TTree    Parentage;1
root [2]

makortel · 2024-06-05T01:30:17Z

57 GB of SKIMStreamEXOHighMET.root is highly suspicious

The SKIMStreamEXOHighMET selects 16084 events in the job, so the average event size is 3.6 MB. Is this really justified? @cms-sw/pdmv-l2

makortel · 2024-06-06T17:50:56Z

you would need to do a tree->Print()

I tried to call TTree::Print() a little bit before the TTree::AutoSave() is called, but that leads to the error message (that we turn to an exception) as well

----- Begin Fatal Exception 06-Jun-2024 08:03:20 CEST-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Calling RootOutputFile::finishEndFile() while closing SKIMStreamEXOHighMET.root
   Additional Info:
      [a] While calling Print() for Events
      [b] Fatal Root Error: @SUB=TBufferFile::WriteByteCount
bytecount too large (more than 1073741822)

----- End Fatal Exception -------------------------------------------------

This is as far as I got (until July). I have the input file and the job configuration on cmsdev42 in /build/mkortela/debug/issue45089/CMSSW_14_0_7/src in case that would be of help.

pcanal · 2024-06-06T18:50:02Z

How could we get more context on what is causing the "too big object"?

The stack trace seems to imply that it is the TTree itself that is too large. This could happens in two cases
(a) (unlikely) Some large basket(s) are somehow not flush properly and store with the TTree
(b) (more likely) The TTree reached the maximum numbers of baskets in total (each baskets cost the storage of its size and location). (50 millions-ish lead go over the 1GB threshold)
(c) something I am missing :)

makortel · 2024-07-10T21:10:36Z

(b) (more likely) The TTree reached the maximum numbers of baskets in total (each baskets cost the storage of its size and location). (50 millions-ish lead go over the 1GB threshold)

Is there a way to find out the number of baskets? Or would there be some other way to confirm (or disprove) this case?

mmusich · 2024-07-24T06:41:14Z

There is now a second Tier0 job with similar symptoms.
We have a paused job for the workflow PromptReco_Run383367_JetMET0 in Run 383367, with the following error:

"An exception of category 'FatalRootError' occurred while [0] Calling RootOutputFile::finishEndFile() while closing SKIMStreamEXOHighMET.root Additional Info: [a] While calling writeTree() for Events [b] Fatal Root Error: @SUB=TBufferFile::WriteByteCount bytecount too large (more than 1073741822)"

see link for more details.

germanfgv · 2024-07-24T06:48:55Z

There is now a second Tier0 job with similar symptoms. We have a paused job for the workflow PromptReco_Run383367_JetMET0 in Run 383367, with the following error:
"An exception of category 'FatalRootError' occurred while [0] Calling RootOutputFile::finishEndFile() while closing SKIMStreamEXOHighMET.root Additional Info: [a] While calling writeTree() for Events [b] Fatal Root Error: @SUB=TBufferFile::WriteByteCount bytecount too large (more than 1073741822)"
see link for more details.

You can find the tarball of this job here:

/eos/user/c/cmst0/public/PausedJobs/Run2024F/ByteCountTooLarge/job_5131750/vocms0313.cern.ch-5131750-3-log.tar.gz

makortel · 2024-07-24T14:33:26Z

57 GB of SKIMStreamEXOHighMET.root is highly suspicious

The SKIMStreamEXOHighMET selects 16084 events in the job, so the average event size is 3.6 MB. Is this really justified? @cms-sw/pdmv-l2

Maybe time to remind about this question.

mmusich · 2024-07-24T14:41:10Z

Maybe time to remind about this question.

tagging also @cms-sw/ppd-l2 and @youyingli

malbouis · 2024-07-24T17:41:40Z

Thanks @makortel and @mmusich . We (PPD) will follow this up with PdmV/DDT and EXO.

davidlange6 · 2024-07-24T17:50:27Z

lots of skims are raw-reco outputs. This is one of them.. some other skims are quite a bit larger than this one on average in 2024.

makortel · 2024-07-24T18:09:27Z

In case of Run 381067 Lumi 335 this skim selected ~60 % of the events of the JetMET1 PD.

davidlange6 · 2024-07-24T18:44:24Z

So far in 2024, its 120TB compared to about 900 TB of raw data in JetMET1 and JetMET0

davidlange6 · 2024-07-24T18:50:16Z

381067/335 appears to be the end of a set of rather anomalous set of lumi sections

anpicci · 2024-07-30T10:05:17Z

FYI, we failed the jobs reported by @mmusich in according with T0

malbouis · 2024-08-05T11:53:34Z

Hi @makortel , to quickly answer your question in #45089 (comment)
When looking at the sizes of those skims for the past years/Eras, we can see:

skim from 2022F: /JetMET/Run2022G-EXOHighMET-PromptReco-v1/RAW-RECO (3.8TB) ==> # of events: 766776 ==> 4.96 M / event
skim from 2023D: /JetMET0/Run2023D-EXOHighMET-PromptReco-v1/RAW-RECO (4.9TB) ==> # of events: 1047355 ==> 4.67 MB / event
skim from 2024F: /JetMET0/Run2024F-EXOHighMET-PromptReco-v1/RAW-RECO ==> # of events: 7010213 (34.9TB) ==> 4.97 MB per event

So I would naively conclude that the 3.6M per event that you mention above is acceptable for that skim, or at least in accordance to what we've seen in the past (assuming I didn't make any mistakes). :-)

I don't know if it was mentioned before, but just to include it here that this EXOHighMET skim was first introduced in 2022, this is the original PR: #37749

makortel · 2024-08-05T16:18:42Z

Thanks @malbouis. My next question is then if the rate (or acceptance fraction, ~60 %) of the skim is along expectation.

Although @davidlange6 already wrote in #45089 (comment)

381067/335 appears to be the end of a set of rather anomalous set of lumi sections

that I understand as there was something anomalous in the data that resulted in that high acceptance fraction. If this is the case, I would not invest effort in trying to to make writing of this large files to succeed. In this case, perhaps some protections would make sense so that the job wouldn't fail for the entire lumi? Would it be feasible to understand these conditions from the physics side and improve the filtering of this skim? I'm not sure if we could do something reliably at the framework level.

mmusich · 2024-08-05T16:27:42Z

Would it be feasible to understand these conditions from the physics side and improve the filtering of this skim? I'm not sure if we could do something reliably at the framework level.

tagging @afrankenthal as original author of #37749

makortel · 2024-08-05T16:44:40Z

assign pdmv

cmsbuild · 2024-08-05T16:44:56Z

New categories assigned: pdmv

@AdrianoDee,@sunilUIET,@miquork,@kskovpen you have been requested to review this Pull request/Issue and eventually sign? Thanks

youyingli · 2024-08-06T09:19:44Z

57 GB of SKIMStreamEXOHighMET.root is highly suspicious

The SKIMStreamEXOHighMET selects 16084 events in the job, so the average event size is 3.6 MB. Is this really justified? @cms-sw/pdmv-l2

Hi, I also looked at Run 2 EXOHighMET as /MET/Run2018D-HighMET-PromptReco-v2/RAW-RECO with 29.4TB and 7906297 events and the value is approximately 3.72 MB/event. So the 3.6 MB for that file should not be an issue. I'm not sure why many events are accumulated into a single file with a size of 50+ GB without any splitting. For DDT, we will contact EXO PAG and check if this skim is still needed or if they can add additional filters in the trigger part or more stringent selections in this skim.

afrankenthal · 2024-08-06T21:53:37Z

Hello, as @youyingli said I also don't understand the technical details of why the events are not getting properly split. This skim is a common EXO skim serving multiple analyses, so I'd guess it's very much still needed. Maybe there are filters we can implement without losing too much information, though. Needs to be discussed in EXO.

jeyserma · 2024-08-19T09:15:53Z

We had 3 more paused jobs over the weekend. The tarballs can be found here:

/eos/user/c/cmst0/public/PausedJobs/Run2024G/ByteCountTooLarge/job_673494/vocms0314.cern.ch-673494-3-log.tar.gz
/eos/user/c/cmst0/public/PausedJobs/Run2024G/ByteCountTooLarge/job_673495/vocms0314.cern.ch-673495-3-log.tar.gz
/eos/user/c/cmst0/public/PausedJobs/Run2024G/ByteCountTooLarge/job_673917/vocms0314.cern.ch-673917-3-log.tar.gz

We also copied the RAW input files here:

/eos/user/c/cmst0/public/PausedJobs/Run2024G/PromptReco_Run384382_JetMET0/
/eos/user/c/cmst0/public/PausedJobs/Run2024G/PromptReco_Run384382_JetMET1/

cmsbuild added the pending-assignment label May 29, 2024

cmsbuild added core-pending pending-signatures and removed pending-assignment labels May 29, 2024

makortel mentioned this issue May 30, 2024

Noisy PileupJetIdProducer in Prompt reconstruction #45099

Closed

makortel mentioned this issue May 30, 2024

ROOT write errors in prompt reconstruction #40132

Closed

cmsbuild added the root label Jun 3, 2024

This was referenced Jun 5, 2024

Add context for exceptions being thrown via RootOutputFile::finishEndFile() #45141

Merged

[14_0_X] Add context for exceptions being thrown via RootOutputFile::finishEndFile() #45142

Merged

cmsbuild added the pdmv-pending label Aug 5, 2024

Workflow PromptReco_Run381067_JetMET1 error in CMSSW_14_0_7 #45089

Workflow PromptReco_Run381067_JetMET1 error in CMSSW_14_0_7 #45089

Comments

mpresill commented May 29, 2024

cmsbuild commented May 29, 2024 • edited Loading

cmsbuild commented May 29, 2024

francescobrivio commented May 29, 2024

cmsbuild commented May 29, 2024

makortel commented May 29, 2024

makortel commented May 29, 2024

pcanal commented May 29, 2024

mpresill commented May 29, 2024

makortel commented May 30, 2024

makortel commented May 30, 2024

mpresill commented May 31, 2024 • edited Loading

makortel commented Jun 1, 2024

ferdymercury commented Jun 1, 2024 • edited Loading

makortel commented Jun 3, 2024

makortel commented Jun 3, 2024

ferdymercury commented Jun 3, 2024 • edited Loading

makortel commented Jun 4, 2024

ferdymercury commented Jun 4, 2024 • edited Loading

makortel commented Jun 4, 2024

makortel commented Jun 4, 2024

makortel commented Jun 5, 2024

makortel commented Jun 6, 2024

pcanal commented Jun 6, 2024

makortel commented Jul 10, 2024

mmusich commented Jul 24, 2024

germanfgv commented Jul 24, 2024

makortel commented Jul 24, 2024

mmusich commented Jul 24, 2024

malbouis commented Jul 24, 2024

davidlange6 commented Jul 24, 2024

makortel commented Jul 24, 2024

davidlange6 commented Jul 24, 2024

davidlange6 commented Jul 24, 2024

anpicci commented Jul 30, 2024

malbouis commented Aug 5, 2024 • edited Loading

makortel commented Aug 5, 2024

mmusich commented Aug 5, 2024

makortel commented Aug 5, 2024

cmsbuild commented Aug 5, 2024

youyingli commented Aug 6, 2024

afrankenthal commented Aug 6, 2024

jeyserma commented Aug 19, 2024

cmsbuild commented May 29, 2024 •

edited

Loading

mpresill commented May 31, 2024 •

edited

Loading

ferdymercury commented Jun 1, 2024 •

edited

Loading

ferdymercury commented Jun 3, 2024 •

edited

Loading

ferdymercury commented Jun 4, 2024 •

edited

Loading

malbouis commented Aug 5, 2024 •

edited

Loading