New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hltTracksMerged monitoring (both DQM and Validation) #19594
add hltTracksMerged monitoring (both DQM and Validation) #19594
Conversation
add HLTSiPixelMonitoring_cff.py as well add HLTSiPixelMonitoring_cff.py as well
@cmsbuild, please test |
The tests are being triggered in jenkins. |
A new Pull Request was created by @mtosi (mia tosi) for CMSSW_9_2_X. It involves the following packages: DQM/HLTEvF @cmsbuild, @vazzolini, @kmaeshima, @dmitrijus, @Martin-Grunewald, @silviodonato, @fwyzard, @vanbesien, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
+1 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
+1 |
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_9_2_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_9_3_X is complete. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @smuzaffar |
-1 Tested at: abd1f59 The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see the results of the tests here: I found follow errors while testing this PR Failed tests: AddOn
I found errors in the following addon tests: The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: |
Comparison job queued. |
@smuzaffar Could you please have a look at the reporting of the above addOnTests error? The above message is not very helpful in indicating which of the addOnTests has failed - this used to be indicated in the past. |
@Martin-Grunewald , looking athttps://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/addOnTests.log , I see that it is not complete which means the addOnTests were timed out after 1.5 hours. This test ran on a large machine (24 cores and 48GB ram) so I guess there was some slowness in reading/accessing data. |
@smuzaffar |
1.5*24 cpus is already too much for PR tests
… On Jul 19, 2017, at 2:54 PM, Martin Grunewald ***@***.***> wrote:
@smuzaffar
Since this seems to happens more often lately, could you please perhaps increase the timeout to 2 h?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
its odd as all the timestamps I see are within 15 minutes.. there is this crash - is it then hanging - perhaps the add-on test scripts need some better exit code and time to process reporting?
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/addOnTests/hlt_data_PRef/cmsDriver.py_RelVal_-s_HLT:PRef,RAW2DIGI,L1Reco,RECO_--data_--scenario=pp_-n_10_--conditions_auto:run2_data_PRef_--relval_9000,50_--datatier_RAW-HLT-.log
… On Jul 19, 2017, at 2:47 PM, Malik Shahzad Muzaffar ***@***.***> wrote:
@Martin-Grunewald , looking athttps://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19594/21600/addOnTests.log , I see that it is not complete which means the addOnTests were timed out after 1.5 hours. This test ran on a large machine (24 cores and 48GB ram) so I guess there was some slowness in reading/accessing data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The full machine for one PR - or shared with other PR tests? |
I do not buy that stack trace as a run-time error, as I can not reproduce it offline. I rather think this is the stack trace due to the external signal cutting when the time goes above 1.5h. |
1 machine 1.5 hours 24 cores for add-on tests of one PR. Then of course there are the other tests run for each PR in addition, but those are separate and in addition to the 36 core hours here....
… On Jul 19, 2017, at 2:58 PM, Martin Grunewald ***@***.***> wrote:
The full machine for one PR - or shared with other PR tests?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hmm ok - but are all cpus fully used? Ie, how many addOnTests are done in parallel? It can be at most |
@Martin-Grunewald , when a PR test starts then it occupy full machine. We run N parallel jobs where N is cores available on that machine. |
it might be. as indeed this job ran for >1 hour on 4 cores.. seems like the problem.
… On Jul 19, 2017, at 2:59 PM, Martin Grunewald ***@***.***> wrote:
I do not buy that stack trace as a run-time error, as I can not reproduce it offline. I rather think this is the stack trace due to the external signal cutting when the time goes above 1.5h.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
hi @smuzaffar - the addontests seem to have their own choice of ncores=4. You might want to reduce the N parallel jobs to N/4 if you can. (its not the issue causing the problems however)
… On Jul 19, 2017, at 3:06 PM, Malik Shahzad Muzaffar ***@***.***> wrote:
@Martin-Grunewald , when a PR test starts then it occupy full machine. We run N parallel jobs where N is cores available on that machine.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@davidlange6 , if you are referring to https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/addOnTests.py#L265 then do not worry about that as we run |
On Jul 19, 2017, at 3:23 PM, Malik Shahzad Muzaffar ***@***.***> wrote:
@davidlange6 , if you are referring to https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/addOnTests.py#L265 then do not worry about that as we run addOnTests.py -j N
and I do not think each addOn job run in threaded mode so we should be fine by running N jobs.
from looking through the logs, most are running 4 threads... (maybe all)
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@davidlange6 , I see that now. |
true - yes... anyway, its not really a problem here I think.
… On Jul 19, 2017, at 3:40 PM, Malik Shahzad Muzaffar ***@***.***> wrote:
@davidlange6 , I see that now.
I do not think all release cycles use 4 threads for addOn tests so just blindly running N/4 process will under utilize the resources.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Comparison is ready Comparison Summary:
|
merge |
Thanks for merging this! One question: When do we plan to have a new 9_2_X release? Without this PR, we are not able to monitor the HLT tracking in the data we are taking right now. |
back-port of #19591