Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestDQMOnlineClient-beamhlt_dqm_sourceclient silently fails in IBs #43108

Closed
mmusich opened this issue Oct 25, 2023 · 16 comments · Fixed by #45231
Closed

TestDQMOnlineClient-beamhlt_dqm_sourceclient silently fails in IBs #43108

mmusich opened this issue Oct 25, 2023 · 16 comments · Fixed by #45231

Comments

@mmusich
Copy link
Contributor

mmusich commented Oct 25, 2023

Casually looking at the logs of the DQM/Integration unit tests, I noticed that the test TestDQMOnlineClient-beamhlt_dqm_sourceclient fails silently:

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_13_3_X_2023-10-24-2300/unitTestLogs/DQM/Integration#/2700-2700

the exception message is:

25-Oct-2023 03:52:39 CEST  Can't deserialize event or registry data: An exception of category 'FatalRootError' occurred.
   Additional Info:
      [a] Fatal Root Error: @SUB=TBufferFile::ReadClassBuffer
Could not find the StreamerInfo for version 3 of the class edm::Wrapper<FEDRawDataCollection>, object skipped at offset 386

but then:

---> test TestDQMOnlineClient-beamhlt_dqm_sourceclient succeeded

The error message looks reminiscent of #41348, but perhaps core experts can chime in here.
Certainly it's worrisome that the test fails without being registered.
Additionally I think the streamer files used in input are outdated (see also cms-data/DQM-Integration#3).
Cc: @francescobrivio

@cmsbuild
Copy link
Contributor

A new Issue was created by @mmusich Marco Musich.

@makortel, @Dr15Jones, @sextonkennedy, @smuzaffar, @rappoccio, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@francescobrivio
Copy link
Contributor

Thanks Marco for opening the issue!

Additionally I think the streamer files used in input are outdated (see also cms-data/DQM-Integration#3).
Cc: @francescobrivio

Regarding this: I'm about to opens PRs to update the streamer files (cms-data) and the unitTest (cmssw) which, btw works smoothly with the new streamer files.

@mmusich
Copy link
Contributor Author

mmusich commented Oct 25, 2023

I'm about to opens PRs to update the streamer files (cms-data) and the unitTest (cmssw) which, btw works smoothly with the new streamer files.

I think we should first strive to understand why the failure is not caught before sweeping it under the carpet.

@francescobrivio
Copy link
Contributor

I'm about to opens PRs to update the streamer files (cms-data) and the unitTest (cmssw) which, btw works smoothly with the new streamer files.

I think we should first strive to understand why the failure is not caught before sweeping it under the carpet.

Sure I agree!
My idea was to have the unitTest run twice for the moment:

  • with new streamer files
  • with old streamer files --> so we can keep debugging this issue

@francescobrivio
Copy link
Contributor

In #43110 this exactly this:

  • updated the TestDQMOnlineClient-beamhlt_dqm_sourceclient unitTest to use the new streamer files
  • added TestDQMOnlineClient-beamhlt_dqm_sourceclient-legacy which is exactly identical to the old one, i.e. it shows the FatalRootError exception, but the unitTest is deemed successful by the bot

@mmusich
Copy link
Contributor Author

mmusich commented Oct 25, 2023

actually looks like DQMStreamerReader is expected to catch the exception and do nothing about it:

if (std::filesystem::exists(p)) {
try {
openFileImp_(currentLumi);
return true;
} catch (const cms::Exception& e) {
fiterator_.logFileAction(std::string("Can't deserialize registry data (in open file): ") + e.what(), p);
fiterator_.logLumiState(currentLumi, "error: data file corrupted");
closeFileImp_("data file corrupted");
return false;
}
} else {

@makortel
Copy link
Contributor

assign dqm

@cmsbuild
Copy link
Contributor

New categories assigned: dqm

@rvenditti,@syuvivida,@tjavaid,@nothingface0,@antoniovagnerini you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

The error message looks reminiscent of #41348, but perhaps core experts can chime in here.

Streamer files do not support schema evolution.

@mmusich
Copy link
Contributor Author

mmusich commented Oct 25, 2023

Streamer files do not support schema evolution.

good point... which makes the unit test based on streamer files rather brittle...

@francescobrivio
Copy link
Contributor

So the only way out of this is to keep updating (frequently) the streamer files used in the unitTests?

@mmusich
Copy link
Contributor Author

mmusich commented Oct 26, 2023

So the only way out of this is to keep updating (frequently) the streamer files used in the unitTests?

I don't know how frequently the backward compatibility is broken, but I guess few times per year? In any case this is aggravated by the fact that we cannot catch it by looking at the unit test results in IB logs, since as pointed out it just silently fails.

@smuzaffar
Copy link
Contributor

Can we add a configuration flag to fail instead of silently ignoring the exception? The flag should be disabled by default and only unit test can enable it

@mmusich
Copy link
Contributor Author

mmusich commented Oct 26, 2023

Can we add a configuration flag to fail instead of silently ignoring the exception? The flag should be disabled by default and only unit test can enable it

that's fine by me, if @cms-sw/dqm-l2 agree.

@mmusich
Copy link
Contributor Author

mmusich commented Jun 15, 2024

this is fixed at #45231 + cms-data/DQM-Integration#8

@cmsbuild
Copy link
Contributor

cms-bot internal usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants