Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DMQ FakeBeamMonitor plugin and clients - backport to 11_1_X #30696

Merged

Conversation

francescobrivio
Copy link
Contributor

PR description:

This PR is the backport of #30690 to CMSSW_11_1_X

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @francescobrivio for CMSSW_11_1_X.

It involves the following packages:

DQM/BeamMonitor
DQM/Integration

@andrius-k, @kmaeshima, @schneiml, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@threus, @batinkov, @battibass this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

@andrius-k
Copy link

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 15, 2020

The tests are being triggered in jenkins.

@andrius-k
Copy link

Hi @francescobrivio
We're testing this PR in the DQM playback system atm and it looks fine. Would you like us to run the newly introduced client?

@francescobrivio
Copy link
Contributor Author

Hi @francescobrivio
We're testing this PR in the DQM playback system atm and it looks fine. Would you like us to run the newly introduced client?

Hi @andrius-k
I think the idea was to test them during this MWGR.
Simone @gennai is coordinating this test and I think he wanted to try this tomorrow or Friday.
So maybe he can comment better.

@gennai
Copy link
Contributor

gennai commented Jul 15, 2020

So this is a fake client which produces a random beamspot value to test the population of the DB (this will be the new workflow for Run3). In normal conditions we do not need it but as we want to make a test we cannot use the standard client as with cosmic data will always give 0,0,0 as BS position. So we would like to use this fake module ONLY during the test we would like to perform tomorrow or Friday depending on Run Coordination

@andrius-k
Copy link

Ok, I understand. Just keep in mind that if you want to verify there are no configuration or any other unexpected errors we can run this client in the playback system for a few minutes.

@cmsbuild
Copy link
Contributor

+1
Tested at: 78157fb
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b1468/7960/summary.html
CMSSW: CMSSW_11_1_X_2020-07-14-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@andrius-k
Copy link

Hi,@gennai, I tried running the new client in the DQM playback system and it crashes with the following error in the configuration:

> /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0715_CMSSW_11_1_0_patch2_30678_30696/src/DQM/Integration/python/clients/beamfake_dqm_sourceclient-live_cfg.py(388)<module>()
-> if (not process.runType.getRunType() == process.runType.hi_run):
(Pdb) 
----- Begin Fatal Exception 15-Jul-2020 13:13:21 CEST-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0715_CMSSW_11_1_0_patch2_30678_30696/src/DQM/Integration/python/clients/beamfake_dqm_sourceclient-live_cfg.py
Exception Message:
 unknown python problem occurred.
BdbQuit: 

At:
  /opt/offline/slc7_amd64_gcc820/external/python/2.7.15-bcolbf2/lib/python2.7/bdb.py(68): dispatch_line
  /opt/offline/slc7_amd64_gcc820/external/python/2.7.15-bcolbf2/lib/python2.7/bdb.py(49): trace_dispatch
  /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0715_CMSSW_11_1_0_patch2_30678_30696/src/DQM/Integration/python/clients/beamfake_dqm_sourceclient-live_cfg.py(388): <module>

----- End Fatal Exception -------------------------------------------------

Could you please confirm that this crash won't happen during the upcoming MWGR?

@gennai
Copy link
Contributor

gennai commented Jul 15, 2020

@andrius-k it depends the reason of the crash :-) Did you set noDB=True while running? Other reason could be a missing last_lumi.txt file. @francescobrivio can you try to reproduce the crash?

@francescobrivio
Copy link
Contributor Author

I've not seen this crash before, but I can try to reproduce it. @andrius-k what was the command that you used to launch the client?

@andrius-k
Copy link

@francescobrivio the command that to launch the client was this (from the log):
-- starting process: ['cmsRun', '/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0715_CMSSW_11_1_0_patch2_30678_30696/src/DQM/Integration/python/clients/beamfake_dqm_sourceclient-live_cfg.py', 'runInputDir=/fff/BU0/ramdisk', 'runNumber=500462', 'runkey=cosmic_run'] --

Production system launches DQM Online clients in the same way.

@gennai Keep in mind that our playback and production systems are very similar, so a crash in playback is a strong suggestion of a crash in prod.

@gennai
Copy link
Contributor

gennai commented Jul 15, 2020

@andrius-k point taken, consider that to make the test we would like we will need some real time interaction as It may require to copy a db.key file locally where the jobs is running to access the database and to make sure the url from where the LS number is taken must be working. We will need to coordinate a bit and maybe set up a zoom meeting. The test could just last 10-20 mins if everything works

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b1468/7960/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 36
  • DQMHistoTests: Total histograms compared: 2780792
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2780741
  • DQMHistoTests: Total skipped: 50
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 35 files compared)
  • Checked 152 log files, 16 edm output root files, 36 DQM output files

@andrius-k
Copy link

@gennai cool, reach out to DQM DOC (me) whenever you're ready to make a test and we can set up a meeting!

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 28, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_11_1_X IBs after it passes the integration tests and once validation in the development release cycle CMSSW_11_2_X is complete. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

hold
waiting for the fix in 11_2_X

@cmsbuild
Copy link
Contributor

Pull request has been put on hold by @silviodonato
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@cmsbuild
Copy link
Contributor

+1
Tested at: 47b48bd
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b1468/8354/summary.html
CMSSW: CMSSW_11_1_X_2020-07-28-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8b1468/8354/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 36
  • DQMHistoTests: Total histograms compared: 2780792
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2780741
  • DQMHistoTests: Total skipped: 50
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 35 files compared)
  • Checked 152 log files, 16 edm output root files, 36 DQM output files

@jfernan2
Copy link
Contributor

+1

@francescobrivio
Copy link
Contributor Author

hold
waiting for the fix in 11_2_X

Hi @silviodonato,
the fix in 11_2_X has been merged (#30943) and DQM has approved.
Is there anything else to do to unhold and merge this PR?

@silviodonato
Copy link
Contributor

unhold
@francescobrivio thanks for the reminder

@cmsbuild cmsbuild removed the hold label Aug 3, 2020
@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 3, 2020

This pull request is fully signed and it will be integrated in one of the next CMSSW_11_1_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_11_2_X is complete. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 93a134c into cms-sw:CMSSW_11_1_X Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants