Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 2018A data relvals #23196

Merged
merged 2 commits into from May 18, 2018

Conversation

fabozzi
Copy link
Contributor

@fabozzi fabozzi commented May 13, 2018

This PR is a follow up to the discussion in issue #23130.

We choose run 315489 for 2018A data relvals.

The relval on EGamma PD replaces relvals on DoubleEG, SinglePhoton and SingleElectron for previous eras.

A 2018A workflow is also included in the short matrix.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@fabozzi
Copy link
Contributor Author

fabozzi commented May 13, 2018

please test

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

cmsbuild commented May 13, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27924/console Started: 2018/05/13 18:58

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fabozzi for master.

It involves the following packages:

Configuration/PyReleaseValidation

@GurpreetSinghChahal, @cmsbuild, @prebello, @kpedro88, @fabozzi can you please review it and eventually sign? Thanks.
@makortel, @felicepantaleo, @Martin-Grunewald this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

-1

Tested at: 1b0b14e

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23196/27924/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
136.85 step3

runTheMatrix-results/136.85_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM/step3_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM.log

@cmsbuild
Copy link
Contributor

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

@arunhep
Copy link
Contributor

arunhep commented May 14, 2018

after more careful checks on the global tags, it seems like we are consistently loading correct L1T menu everywhere and also the HLT key is correct.
Problem appears to be come from L1TDQM sequence for EGamma.
@fabozzi is doing tests by running step3 of wf without this DQM sequence and see if it runs without crash.

@fabozzi
Copy link
Contributor Author

fabozzi commented May 15, 2018

Dear all, the issue is not there, since I got the same crash even without L1TDQM sequence for EGamma. We spotted a suspicious message in the DIGI step related to "L1REPACK:Full" sequence. Therefore I have contacted L1T experts to have a look at that (no reply yet).

@fabiocos
Copy link
Contributor

@fabozzi the reproducible crash happens in the HCAL DQM:

++++++++ starting: processing event for module: stream = 0 label = 'digiPhase1Task' id = 887

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Thread 3 (Thread 0x7f6113acf700 (LWP 6940)):
#0 0x0000003989a0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f61790712cc in __gthread_cond_wait (__mutex=, __cond=) at /mnt/build/davidlt/gcc630/b/BUILD/slc6_amd64_gcc630/external/gcc/6.3.0/gcc-tags_gcc_6_3_0_release-243837/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:864
#2 std::condition_variable::wait (this=, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x00007f6113fe9f2d in Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/external/slc6_amd64_gcc630/lib/libtensorflow_framework.so
#4 0x00007f6113fea999 in Eigen::NonBlockingThreadPoolTempltensorflow::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/external/slc6_amd64_gcc630/lib/libtensorflow_framework.so
#5 0x00007f6113fe8507 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/external/slc6_amd64_gcc630/lib/libtensorflow_framework.so
#6 0x00007f6179076c2f in std::execute_native_thread_routine (__p=0x7f611eb9e850) at ../../../../../libstdc++-v3/src/c++11/thread.cc:83
#7 0x0000003989a07aa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00000039896e8bcd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f61507a3700 (LWP 6793)):
#0 0x0000003989a0f37d in waitpid () from /lib64/libpthread.so.0
#1 0x00007f61602c25f7 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/pluginFWCoreServicesPlugins.so
#2 0x00007f61602c3255 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/pluginFWCoreServicesPlugins.so
#3 0x00007f6179076c2f in std::execute_native_thread_routine (__p=0x7f616432eb80) at ../../../../../libstdc++-v3/src/c++11/thread.cc:83
#4 0x0000003989a07aa1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000039896e8bcd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f6178d273c0 (LWP 6744)):
#0 0x00000039896df383 in poll () from /lib64/libc.so.6
#1 0x00007f61602c2a34 in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/pluginFWCoreServicesPlugins.so
#2 0x00007f61602c332a in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/pluginFWCoreServicesPlugins.so
#3 0x00007f61602c4426 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/pluginFWCoreServicesPlugins.so
#4
#5 0x00007f615e9177e0 in MonitorElement::Fill(double) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/libDQMServicesCore.so
#6 0x00007f6122a6163b in hcaldqm::Container1D::fill(HcalDetId const&, double) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/libDQMHcalCommon.so
#7 0x00007f6122c302cd in DigiPhase1Task::_process(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/pluginDQMHcalTasksAuto.so
#8 0x00007f6122a6e7f7 in hcaldqm::DQTask::analyze(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/libDQMHcalCommon.so
#9 0x00007f617ad32631 in edm::one::EDProducerBase::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc630/cms/cmssw/CMSSW_10_2_X_2018-05-15-2300/lib/slc6_amd64_gcc630/libFWCoreFramework.so

The L1T message should be understood, it comes from the master sequences

https://github.com/cms-sw/cmssw/blob/master/Configuration/StandardSequences/python/SimL1EmulatorRepack_Full_cff.py#L13

@thomreis could you please clarify whether the message should be updated, or the code should be fixed?

It is anyway useful to check also the other workflows, to see whether there are other issues

@fabozzi
Copy link
Contributor Author

fabozzi commented May 17, 2018

please test with #23221

@cmsbuild
Copy link
Contributor

cmsbuild commented May 17, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/28005/console Started: 2018/05/17 12:07

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-23196/28005/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /build/cmsbld/jenkins/workspace/compare-root-files-short-matrix/results/JR-comparison/PR-23196/136.85_RunEGamma2018A+RunEGamma2018A+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Prompt_L1TEgDQM+HARVEST2018_L1TEgDQM

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 30
  • DQMHistoTests: Total histograms compared: 2740553
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2740369
  • DQMHistoTests: Total skipped: 183
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 29 files compared)
  • Checked 124 log files, 14 edm output root files, 30 DQM output files

@kpedro88
Copy link
Contributor

+1

@fabiocos
Copy link
Contributor

@fabozzi @prebello it looks that this can move forward, as soon as the HCAL DQM fix #23221 is merged

@fabozzi
Copy link
Contributor Author

fabozzi commented May 18, 2018

@fabiocos OK, thanks!

@fabozzi
Copy link
Contributor Author

fabozzi commented May 18, 2018

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

this PR adds by default one 2018 test wf to the short matrix, we need to discuss whether some further review of the short matrix is useful

@cmsbuild cmsbuild merged commit 95ec6b8 into cms-sw:master May 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants