Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM: new DQMFileSaver #28588

Merged
merged 11 commits into from Jan 13, 2020
Merged

DQM: new DQMFileSaver #28588

merged 11 commits into from Jan 13, 2020

Conversation

schneiml
Copy link
Contributor

@schneiml schneiml commented Dec 9, 2019

PR description:

In offline, the legacy TDirectory files for the DQMGUI are written by the DQMFileSaver module. While reading the code and its configurations, it became apparent that this module is not actually used in online or HLT (any more?), even though most of the code is dedicated to these cases. (Online and HLT use DQMServices/FileIO now.)

So, I stripped down this module to prove the hypothesis.

In addition, this PR contains the new TDirectory writing code by @andrius-k, as a new independent class. It is still linked into the DQMStore and can be called as edm::Service<DQMStore>()->save(...). I also updated the Online output modules to use this helper (DQMFileSaverOnline) or contain the writing code (DQMFileSaverPB). Consequently, DQMStore itself does not contain file writing code any more. There is still a huge amount of file reading code, which is largely useless and to be removed next.

PR validation:

This PR needs testing in online DQM (not done yet) and ideally also in HLT (or a solid confirmation that the module is never loaded in HLT). The changes to the ProtoBuf writer need to be tested in the HLT/fastHadd/online DQM chain. None of this is done so far. I'll try to provide a test setup in another PR.

This PR needs testing in Tier0 Express processing, where the runIsComplete logic is used.

The main point of the PR is to test

  • That only a single mode of this module is used in offline, to be confirmed by the IB tests; (as far as it seems confirmed)
  • That the new TDirectory writing code works; (confirmed)
  • Which unit tests depend on behavior removed in this PR. (none, it seems)

[I suspect some tests to fail (assuming these tests are run in the PR tests, not sure about that), and some flaws in the TDirectory writing code (it does not support QTests yet, for example).] Edit: all fixed.

Using commit 70fc23c from andrius-k:dqm-new-dqmstore-on-CMSSW_11_0_0_pre5.
Includes changes to make the coe compile and run with the old DQMStore.
It looks like this was never actually used in online DQM or HLT in run2.
There are plenty of threading-related code paths, but DQMServices/FileIO
seems to have taken over at some point.
@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

The code-checks are being triggered in jenkins.

@schneiml
Copy link
Contributor Author

schneiml commented Dec 9, 2019

please test

I did remember to run code-format, this time!

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28588/13085

  • This PR adds an extra 24KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3885/console Started: 2019/12/09 20:08

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

A new Pull Request was created by @schneiml (Marcel Schneider) for master.

It involves the following packages:

DQMServices/Components

@andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@barvic this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 9, 2019

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-071e70/3885/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2800084
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2799742
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 119367.177 KiB( 33 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 524.656 KiB Tracking/TrackParameters
  • DQMHistoSizes: changed ( 1000.0,... ): -81.534 KiB RPC/AllHits
  • DQMHistoSizes: changed ( 1000.0,... ): -78.862 KiB RPC/Muon
  • DQMHistoSizes: changed ( 1000.0,... ): 72.000 KiB Muons/globalMuons
  • DQMHistoSizes: changed ( 1000.0,... ): 54.984 KiB OfflinePV/Resolution
  • DQMHistoSizes: changed ( 1000.0,... ): 42.240 KiB Btag/TagCorrelation_pfCombinedCvsBJetTags_vs_pfCombinedCvsLJetTags_GLOBAL
  • DQMHistoSizes: changed ( 1000.0,... ): 42.240 KiB Btag/TagCorrelation_pfCombinedCvsBJetTags_vs_pfCombinedCvsLJetTags_ETA_1v4-2v4
  • DQMHistoSizes: changed ( 1000.0,... ): 42.240 KiB Btag/TagCorrelation_pfCombinedCvsBJetTags_vs_pfCombinedCvsLJetTags_ETA_0-1v4
  • DQMHistoSizes: changed ( 1000.0,... ): 27.686 KiB Egamma/gedPhotonAnalyzer
  • DQMHistoSizes: changed ( 1000.0,... ): 27.686 KiB Egamma/stdPhotonAnalyzer
  • DQMHistoSizes: changed ( 1000.0 ): ...
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@schneiml
Copy link
Contributor Author

schneiml commented Dec 10, 2019

I am surprised that no tests failed, even though #28407 (comment) indicated that the old whiteRabbit DQM tests run in the PR tests.

The comparison looks much more messy than expected though, we'll need to fix a bunch of things it seems.

Edit: I see

  • Superfluous quotes on strings
  • Rounding differences with floats (not sure if we'll be able to resolve these)
  • The missing QTests (I assume these are the 1000's of missing items)
  • Edit²: actually the majority of missing items are efficiency flag markers.

I don't see what the DQMHistoSizes comparison is doing here though, I assume it is just confused by something. I don't see 100's of KB of new histograms in the bin-to-bin comparison.

@cmsbuild cmsbuild modified the milestones: CMSSW_11_1_X, CMSSW_11_0_X Dec 10, 2019
@cmsbuild
Copy link
Contributor

Pull request #28588 was updated. @andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please check and sign again.

@Sam-Harper
Copy link
Contributor

@Martin-Grunewald , thanks, I have alerted FOG and we can see if anybody is around to do a quick hilton test

@schneiml
Copy link
Contributor Author

@Sam-Harper if you get to it, can you provide a sample of the streamDQMHistograms output stream? This is what we need to be able to read on online DQM side, and while I can't reproduce this currently (the files seem to be missing a "streamer" header -- is that because I used the fakeFilterUnitMode or does some DAQ mechanism add that later?), looking at the files themselves would be useful...

@fabiocos
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 16, 2019

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/3998/console Started: 2019/12/16 18:14

@cmsbuild
Copy link
Contributor

+1
Tested at: b3259d5
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-071e70/3998/summary.html
CMSSW: CMSSW_11_1_X_2019-12-16-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-071e70/3998/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2798405
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2798063
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@missirol
Copy link
Contributor

@Sam-Harper if you get to it, can you provide a sample of the streamDQMHistograms output stream? This is what we need to be able to read on online DQM side, and while I can't reproduce this currently (the files seem to be missing a "streamer" header -- is that because I used the fakeFilterUnitMode or does some DAQ mechanism add that later?), looking at the files themselves would be useful...

as suggested by @Sam-Harper , we are trying to test this PR on a Hilton machine (in my understanding, we would then be able to provide a sample of the streamDQMHistograms output stream), but at the moment we are having issues installing a recent CMSSW release there. I'm still trying, but given the limited availability of people this week, this might get delayed to the new year.

@schneiml
Copy link
Contributor Author

@missirol thanks for the update! I'll try to pull some information out of the DQM online playback system in parallel.

@schneiml
Copy link
Contributor Author

For the record: waiting for changes&validation at HLT.

@schneiml
Copy link
Contributor Author

schneiml commented Jan 6, 2020

@missirol ran some tests but there were unrelated problems. Still waiting for HLT validation.

@schneiml
Copy link
Contributor Author

The test by @missirol last week looks ok, so this is good to go. (Sorry, forgot to update this PR in time)

@schneiml
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @silviodonato, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit e3e4a3c into cms-sw:master Jan 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants