Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeamSpotOnline updates - backport to 11_2_X #32415

Conversation

francescobrivio
Copy link
Contributor

PR description:

This is the backport of #32408 to CMSSW_11_2_X

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2020

A new Pull Request was created by @francescobrivio for CMSSW_11_2_X.

It involves the following packages:

DQM/BeamMonitor
DQM/Integration

@andrius-k, @kmaeshima, @ErnestaP, @cmsbuild, @jfernan2, @fioriNTU can you please review it and eventually sign? Thanks.
@threus, @batinkov, @battibass this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@jfernan2
Copy link
Contributor

jfernan2 commented Dec 8, 2020

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2020

+1
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9fe196/11420/summary.html
CMSSW: CMSSW_11_2_X_2020-12-08-1100
SCRAM_ARCH: slc7_amd64_gcc900

@silviodonato
Copy link
Contributor

backport of #32408

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 8, 2020

Comparison results are now available
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9fe196/11420/summary.html
CMSSW: CMSSW_11_2_X_2020-12-08-1100
SCRAM_ARCH: slc7_amd64_gcc900

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2529593
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2529558
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 148 log files, 37 edm output root files, 35 DQM output files

@jfernan2
Copy link
Contributor

@francescobrivio we (@ErnestaP ) have tried this PR at DQM Online P5 playback and it seems we get several crashes. Details are:
Run: 338628
Release: CMSSW_11_2_0_pre11
PRs on top: 31684,32334,32415
GTs: 111X_dataRun3_Express_v4 and 111X_dataRun3_HLT_v (no 112X GT yet afaik)

  1. beam in the first beginning (1LS), after that (2LS) it worked for a while, and crashed again":
  • ----- Begin Fatal Exception 14-Dec-2020 11:19:43 CET-----------------------
    An exception of category 'ConditionDatabase' occurred while
    [0] Processing global end LuminosityBlock run: 338628 luminosityBlock: 1
    [1] Calling method for module BeamMonitor/'dqmBeamMonitor'
    Exception Message:
    Required Key File "/tmp/.cms_cond/db.key" is missing or unreadable. from DecodingKey::init
    ----- End Fatal Exception -------------------------------------------------
  1. Beamhltfake - the same: 1st LS is crashing, from 2nd worked for a while and crashed again:

----- Begin Fatal Exception 14-Dec-2020 11:29:16 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
[0] Processing global end LuminosityBlock run: 338628 luminosityBlock: 1
[1] Calling method for module FakeBeamMonitor/'dqmBeamMonitor'
Exception Message:
Failure while saving log on database:Could not insert a new row in the table ( CORAL : "ITableDataEditor::insertRow" from "CORAL/RelationalPlugins/oracle" ) from Logger::saveOnDb
----- End Fatal Exception -------------------------------------------------

  1. BeamFake is crashing:

----- Begin Fatal Exception 14-Dec-2020 11:33:20 CET-----------------------
An exception of category 'ConditionDatabase' occurred while
[0] Processing global end LuminosityBlock run: 338628 luminosityBlock: 12
[1] Calling method for module FakeBeamMonitor/'dqmFakeBeamMonitor'
Exception Message:
Failure while saving log on database:Could not insert a new row in the table ( CORAL : "ITableDataEditor::insertRow" from "CORAL/RelationalPlugins/oracle" ) from Logger::saveOnDb
----- End Fatal Exception -------------------------------------------------

@jfernan2
Copy link
Contributor

@francescobrivio can you please test on your side? At least the first error above is known form the pasts, since it tries to write in a non existing folder, it was fixed long time ago but it seems now it is back again

@francescobrivio
Copy link
Contributor Author

@francescobrivio can you please test on your side? At least the first error above is known form the pasts, since it tries to write in a non existing folder, it was fixed long time ago but it seems now it is back again

Errors 2 and 3 should be related to the fact that the new name of the tags and jobName (after the fix for running on the playback system) are not registered in the DB. I will get in touch with DB experts later in the day and let you know.

Error 1 is related to the fact that the db.key file is not accessible. There are some conflicts between the autentication paths of the DBParameters between the GT module and the PoolDBSource module, I will try to debug this as well.

@jfernan2
Copy link
Contributor

backport of #32408

@cmsbuild
Copy link
Contributor

Pull request #32415 was updated. @andrius-k, @kmaeshima, @ErnestaP, @cmsbuild, @jfernan2, @fioriNTU can you please check and sign again.

@francescobrivio
Copy link
Contributor Author

@jfernan2 @ErnestaP commit 8e3ca49 should fix error 1: sorry but I missed this one authenticationPath which was creating the wrong path (/tmp/.cms_cond/db.key).

For errors 2 and 3 I got in touch with the DB experts and I will let you know as soon as the new tags are added to condDB and the test can be repeated.

@francescobrivio
Copy link
Contributor Author

@jfernan2 @ErnestaP the new records have been uploaded to condDB by the DB experts, so errors 2 and 3 should be fixed.
Could you repeat the tests at DQM Online P5 playback?

@jfernan2
Copy link
Contributor

Thanks, @francescobrivio
@ErnestaP tested it at P5 with success
I will run the tests here for completeness but I believe it is fine

@jfernan2
Copy link
Contributor

please test

@jfernan2
Copy link
Contributor

@francescobrivio I don't understand why the unitTest for beam_dqm is failing now, since the forwardport PR run fine and it is identical....
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9fe196/11684/unitTests/src/DQM/Integration/test/TestDQMOnlineClient-beam_dqm_sourceclient/testing.log

I am not sure if you have any clue

@francescobrivio
Copy link
Contributor Author

@francescobrivio I don't understand why the unitTest for beam_dqm is failing now, since the forwardport PR run fine and it is identical....
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9fe196/11684/unitTests/src/DQM/Integration/test/TestDQMOnlineClient-beam_dqm_sourceclient/testing.log

I am not sure if you have any clue

It's still the same issue reported in #31896 sometimes it fails and sometimes not. It is related to Coral message logger as far as I understood. @ggovi just implemented a fix, we tested it offline and it seems to be working. He will create a PR asap.

@jfernan2
Copy link
Contributor

OK, so I can try to relaunch the test and see if it passes this time

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9fe196/11684/summary.html
CMSSW: CMSSW_11_2_X_2020-12-15-1100/slc7_amd64_gcc900

Unit Tests

I found errors in the following unit tests:

---> test TestDQMOnlineClient-beam_dqm_sourceclient had ERRORS

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2529593
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2529570
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 148 log files, 37 edm output root files, 35 DQM output files

@jfernan2
Copy link
Contributor

please test
Let's try again

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9fe196/11705/summary.html
CMSSW: CMSSW_11_2_X_2020-12-15-1100/slc7_amd64_gcc900

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2529593
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2529564
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 148 log files, 37 edm output root files, 35 DQM output files

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_11_2_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_11_3_X is complete. This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Dec 16, 2020

+1

@cmsbuild cmsbuild merged commit 51383fd into cms-sw:CMSSW_11_2_X Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants