add hltTracksMerged monitoring (both DQM and Validation) #19594

mtosi · 2017-07-06T15:17:44Z

back-port of #19591

add HLTSiPixelMonitoring_cff.py as well add HLTSiPixelMonitoring_cff.py as well

mtosi · 2017-07-06T15:18:07Z

cmsbuild · 2017-07-06T15:18:25Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/21227/console Started: 2017/07/06 17:18

cmsbuild · 2017-07-06T15:18:27Z

A new Pull Request was created by @mtosi (mia tosi) for CMSSW_9_2_X.

It involves the following packages:

DQM/HLTEvF
DQMOffline/Trigger
Validation/RecoTrack

@cmsbuild, @vazzolini, @kmaeshima, @dmitrijus, @Martin-Grunewald, @silviodonato, @fwyzard, @vanbesien, @davidlange6 can you please review it and eventually sign? Thanks.
@battibass, @makortel, @felicepantaleo, @GiacomoSguazzoni, @jhgoh, @VinInn, @calderona, @HuguesBrun, @rovere, @wmtford, @ebrondol, @trocino, @dgulhan, @rociovilar this is something you requested to watch as well.
@davidlange6 you are the release manager for this.

cms-bot commands are listed here

cmsbuild · 2017-07-06T16:53:28Z

+1
Tested at: abd1f59
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21227/summary.html

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
fc51b5b
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21227/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21227/git-merge-result

cmsbuild · 2017-07-06T16:53:31Z

Comparison job queued.

cmsbuild · 2017-07-06T19:06:01Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21227/summary.html

Comparison Summary:

You potentially added 929 lines to the logs
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 22
DQMHistoTests: Total histograms compared: 1813696
DQMHistoTests: Total failures: 35514
DQMHistoTests: Total nulls: 65
DQMHistoTests: Total successes: 1777951
DQMHistoTests: Total skipped: 166
DQMHistoTests: Total Missing objects: 0
Checked 90 log files, 14 edm output root files, 22 DQM output files

Martin-Grunewald · 2017-07-12T15:28:14Z

+1

dmitrijus · 2017-07-13T09:05:52Z

+1

cmsbuild · 2017-07-13T09:06:08Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_9_2_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_9_3_X is complete. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @smuzaffar

cmsbuild · 2017-07-19T11:10:15Z

-1

Tested at: abd1f59

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
2c69321
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/git-merge-result

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/summary.html

I found follow errors while testing this PR

Failed tests: AddOn

AddOn:

I found errors in the following addon tests:

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
2c69321
You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/git-log-recent-commits
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/git-merge-result

cmsbuild · 2017-07-19T11:10:18Z

Comparison job queued.

Martin-Grunewald · 2017-07-19T12:03:56Z

@smuzaffar
Hi,

Could you please have a look at the reporting of the above addOnTests error? The above message is not very helpful in indicating which of the addOnTests has failed - this used to be indicated in the past.
Also, as before, running all addOnTests in my area I do not find any error.

smuzaffar · 2017-07-19T12:47:42Z

@Martin-Grunewald , looking athttps://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/addOnTests.log , I see that it is not complete which means the addOnTests were timed out after 1.5 hours. This test ran on a large machine (24 cores and 48GB ram) so I guess there was some slowness in reading/accessing data.

Martin-Grunewald · 2017-07-19T12:54:05Z

@smuzaffar
Since this seems to happens more often lately, could you please perhaps increase the timeout to 2 h?

davidlange6 · 2017-07-19T12:57:17Z

1.5*24 cpus is already too much for PR tests

…

On Jul 19, 2017, at 2:54 PM, Martin Grunewald ***@***.***> wrote: @smuzaffar Since this seems to happens more often lately, could you please perhaps increase the timeout to 2 h? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 · 2017-07-19T12:58:01Z

its odd as all the timestamps I see are within 15 minutes.. there is this crash - is it then hanging - perhaps the add-on test scripts need some better exit code and time to process reporting? https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/addOnTests/hlt_data_PRef/cmsDriver.py_RelVal_-s_HLT:PRef,RAW2DIGI,L1Reco,RECO_--data_--scenario=pp_-n_10_--conditions_auto:run2_data_PRef_--relval_9000,50_--datatier_RAW-HLT-.log

…

On Jul 19, 2017, at 2:47 PM, Malik Shahzad Muzaffar ***@***.***> wrote: @Martin-Grunewald , looking athttps://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/addOnTests.log , I see that it is not complete which means the addOnTests were timed out after 1.5 hours. This test ran on a large machine (24 cores and 48GB ram) so I guess there was some slowness in reading/accessing data. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Martin-Grunewald · 2017-07-19T12:58:18Z

The full machine for one PR - or shared with other PR tests?

Martin-Grunewald · 2017-07-19T12:59:48Z

I do not buy that stack trace as a run-time error, as I can not reproduce it offline. I rather think this is the stack trace due to the external signal cutting when the time goes above 1.5h.

davidlange6 · 2017-07-19T13:01:22Z

1 machine 1.5 hours 24 cores for add-on tests of one PR. Then of course there are the other tests run for each PR in addition, but those are separate and in addition to the 36 core hours here....

…

On Jul 19, 2017, at 2:58 PM, Martin Grunewald ***@***.***> wrote: The full machine for one PR - or shared with other PR tests? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Martin-Grunewald · 2017-07-19T13:04:55Z

Hmm ok - but are all cpus fully used? Ie, how many addOnTests are done in parallel? It can be at most
21 - the number of addOnTests, and some are much shorter than others. The steps within a single addOn test can not be parallelised as they need to run sequential - output of one feeding input of next.

smuzaffar · 2017-07-19T13:06:31Z

@Martin-Grunewald , when a PR test starts then it occupy full machine. We run N parallel jobs where N is cores available on that machine.

davidlange6 · 2017-07-19T13:06:42Z

it might be. as indeed this job ran for >1 hour on 4 cores.. seems like the problem.

…

On Jul 19, 2017, at 2:59 PM, Martin Grunewald ***@***.***> wrote: I do not buy that stack trace as a run-time error, as I can not reproduce it offline. I rather think this is the stack trace due to the external signal cutting when the time goes above 1.5h. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 · 2017-07-19T13:09:48Z

hi @smuzaffar - the addontests seem to have their own choice of ncores=4. You might want to reduce the N parallel jobs to N/4 if you can. (its not the issue causing the problems however)

…

On Jul 19, 2017, at 3:06 PM, Malik Shahzad Muzaffar ***@***.***> wrote: @Martin-Grunewald , when a PR test starts then it occupy full machine. We run N parallel jobs where N is cores available on that machine. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

smuzaffar · 2017-07-19T13:23:06Z

@davidlange6 , if you are referring to https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/addOnTests.py#L265 then do not worry about that as we run addOnTests.py -j N
and I do not think each addOn job run in threaded mode so we should be fine by running N jobs.

davidlange6 · 2017-07-19T13:25:55Z

On Jul 19, 2017, at 3:23 PM, Malik Shahzad Muzaffar ***@***.***> wrote: @davidlange6 , if you are referring to https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/addOnTests.py#L265 then do not worry about that as we run addOnTests.py -j N and I do not think each addOn job run in threaded mode so we should be fine by running N jobs.

from looking through the logs, most are running 4 threads... (maybe all)

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

smuzaffar · 2017-07-19T13:40:50Z

@davidlange6 , I see that now.
I do not think all release cycles use 4 threads for addOn tests so just blindly running N/4 process will under utilize the resources.

davidlange6 · 2017-07-19T13:42:53Z

true - yes... anyway, its not really a problem here I think.

…

On Jul 19, 2017, at 3:40 PM, Malik Shahzad Muzaffar ***@***.***> wrote: @davidlange6 , I see that now. I do not think all release cycles use 4 threads for addOn tests so just blindly running N/4 process will under utilize the resources. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

cmsbuild · 2017-07-19T16:23:22Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 22
DQMHistoTests: Total histograms compared: 1756608
DQMHistoTests: Total failures: 17459
DQMHistoTests: Total nulls: 33
DQMHistoTests: Total successes: 1738950
DQMHistoTests: Total skipped: 166
DQMHistoTests: Total Missing objects: 0
Checked 90 log files, 14 edm output root files, 22 DQM output files

davidlange6 · 2017-07-20T14:16:18Z

merge

JanFSchulte · 2017-07-20T14:30:00Z

Thanks for merging this! One question: When do we plan to have a new 9_2_X release? Without this PR, we are not able to monitor the HLT tracking in the data we are taking right now.

add hltTracksMerged monitoring (both DQM and Validation)

abd1f59

add HLTSiPixelMonitoring_cff.py as well add HLTSiPixelMonitoring_cff.py as well

cmsbuild added this to the CMSSW_9_2_X milestone Jul 6, 2017

cmsbuild added comparison-pending dqm-pending hlt-pending orp-pending pending-signatures tests-pending labels Jul 6, 2017

cmsbuild added tests-started and removed tests-pending labels Jul 6, 2017

cmsbuild added tests-approved and removed tests-started labels Jul 6, 2017

cmsbuild added comparison-available and removed comparison-pending labels Jul 6, 2017

mtosi mentioned this pull request Jul 7, 2017

add hltTracksMerged monitoring (both DQM and Validation) #19591

Merged

cmsbuild added hlt-approved and removed hlt-pending labels Jul 12, 2017

cmsbuild added dqm-approved fully-signed and removed dqm-pending pending-signatures labels Jul 13, 2017

cmsbuild added tests-started and removed tests-pending labels Jul 19, 2017

cmsbuild added tests-rejected and removed tests-started labels Jul 19, 2017

smuzaffar added backport-ok and removed backport labels Jul 19, 2017

cmsbuild added comparison-available and removed comparison-pending labels Jul 19, 2017

cmsbuild merged commit 24cd051 into cms-sw:CMSSW_9_2_X Jul 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hltTracksMerged monitoring (both DQM and Validation) #19594

add hltTracksMerged monitoring (both DQM and Validation) #19594

mtosi commented Jul 6, 2017

mtosi commented Jul 6, 2017

cmsbuild commented Jul 6, 2017 •

edited

cmsbuild commented Jul 6, 2017

cmsbuild commented Jul 6, 2017

cmsbuild commented Jul 6, 2017

cmsbuild commented Jul 6, 2017

Martin-Grunewald commented Jul 12, 2017

dmitrijus commented Jul 13, 2017

cmsbuild commented Jul 13, 2017

cmsbuild commented Jul 19, 2017

cmsbuild commented Jul 19, 2017

Martin-Grunewald commented Jul 19, 2017 •

edited

smuzaffar commented Jul 19, 2017

Martin-Grunewald commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

davidlange6 commented Jul 19, 2017 via email

Martin-Grunewald commented Jul 19, 2017

Martin-Grunewald commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

Martin-Grunewald commented Jul 19, 2017

smuzaffar commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

davidlange6 commented Jul 19, 2017 via email

smuzaffar commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

smuzaffar commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

cmsbuild commented Jul 19, 2017

davidlange6 commented Jul 20, 2017

JanFSchulte commented Jul 20, 2017

add hltTracksMerged monitoring (both DQM and Validation) #19594

add hltTracksMerged monitoring (both DQM and Validation) #19594

Conversation

mtosi commented Jul 6, 2017

mtosi commented Jul 6, 2017

cmsbuild commented Jul 6, 2017 • edited

cmsbuild commented Jul 6, 2017

cmsbuild commented Jul 6, 2017

cmsbuild commented Jul 6, 2017

cmsbuild commented Jul 6, 2017

Martin-Grunewald commented Jul 12, 2017

dmitrijus commented Jul 13, 2017

cmsbuild commented Jul 13, 2017

cmsbuild commented Jul 19, 2017

cmsbuild commented Jul 19, 2017

Martin-Grunewald commented Jul 19, 2017 • edited

smuzaffar commented Jul 19, 2017

Martin-Grunewald commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

davidlange6 commented Jul 19, 2017 via email

Martin-Grunewald commented Jul 19, 2017

Martin-Grunewald commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

Martin-Grunewald commented Jul 19, 2017

smuzaffar commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

davidlange6 commented Jul 19, 2017 via email

smuzaffar commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

smuzaffar commented Jul 19, 2017

davidlange6 commented Jul 19, 2017 via email

cmsbuild commented Jul 19, 2017

davidlange6 commented Jul 20, 2017

JanFSchulte commented Jul 20, 2017

cmsbuild commented Jul 6, 2017 •

edited

Martin-Grunewald commented Jul 19, 2017 •

edited