New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in the 94X backport of HcalDetId protections for calibration events #23744
Comments
A new Issue was created by @fabiocos Fabio Cossutti. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@bsunanda FYI The failing workflows are not present in 8_0_X, are we sure that the problem does not affect also that release? |
I shall take a look and let you know. I thought it was tested for the failed lump blocks
…________________________________
From: Fabio Cossutti [notifications@github.com]
Sent: 04 July 2018 18:32
To: cms-sw/cmssw
Cc: Sunanda Banerjee; Mention
Subject: Re: [cms-sw/cmssw] Problem in the 94X backport of HcalDetId protections for calibration events (#23744)
@bsunanda<https://github.com/bsunanda> FYI The failing workflows are not present in 8_0_X, are we sure that the problem does not affect also that release?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#23744 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEzMurkTrL3ms32RKHmthsJaKPU1P6jiks5uDO4PgaJpZM4VC1CX>.
|
@fabiocos, I guess that DetId fix may have "provoked" something (undesirable consequence) which until now was kind of hidden behind improper treatment of Calib.channels Id and it may need a fix in HCAL DQM as well. |
Hi @fabiocos , could I ask specifically which workflows are failing? I tried a few (2016 data), but they apparently ran successfully. |
@DryRun cmsrel CMSSW_9_4_X_2018-07-08-0000 |
From runTheMatrix.py -s I see both 2016-related wf's |
@fabiocos Thank you, Fabio. Unfortunately in my case (lxplus) all the tests you've listed, like failed at the step2 with similar file access errors: And I don't see RAW for the aforementioned sample: Hopefully David (@DryRun) will have a better chance... |
Thanks @abdoulline and @fabiocos. 136.731 and 136.7611 were the ones I tried successfully previously, and unfortunately I see the same file access errors as @abdoulline for most of the others. On EOS, I do see something for |
CMSSW_9_4_X_2018-07-08-0000 + 136.778 = OK... 136.778_RunZeroBias2016H+RunZeroBias2016H+HLTDR2_2016+RECODR2_2016reHLT_Prompt+H |
@abdoulline indeed, I reverted the crashing PR in CMSSW_9_4_X_2018-07-04-2300 as you can see in the IB history, so an earlier build should be used. Alternatively just merge #23688 on top of todays' IB an try... |
Thanks for the suggestions. I was using CMSSW_9_4_X_2018-07-03-2300, but it was crashing yesterday due to the cvmfs issues. It works today, and I was able to locate the crash. I think the crash can be fixed with c3cdccb (requires subdet==HcalEndcap for QIE11 digis; also, f8aeac9 could be included, which does the same thing for HBHE digis. See https://github.com/cms-sw/cmssw/commits/8d539d44ecea86fea7f16929f65101103fb077d4/DQM/HcalTasks/plugins/DigiTask.cc). However, there are 7 commits to DigiTask.cc in between 9_4_X and c3cdccb. Could we cherry-pick just the relevant commits, or do we have to take all the commits in order? |
David, the two fixing snippets you've linked - they contain just several lines of code. |
@DryRun David? |
Hi @abdoulline, thanks for the reminder. I made a backport of the two commits at #23808, which should fix this crash. |
Thank you, David.
Then, after it, Sunanda's PR #23688 (reverted) can be re-included again...
…On Tue, 17 Jul 2018, David Yu wrote:
Hi @abdoulline, thanks for the reminder. I made a backport of the two commits at #23808, which should fix this crash.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.[AEx02iZAZrDUmpEPSen0bxQ52K-8-Ezlks5uHbNGgaJpZM4VC1CX.gif]
|
The problem looks fixed now |
The merge of #23688 has caused a list of reproducible failures in 2016 workflows in CMSSW_9_4_X_2018-07-03-2300 , within DQM with the exception
----- Begin Fatal Exception 04-Jul-2018 10:09:31 CEST-----------------------
An exception of category 'Conditions not found' occurred while
[0] Processing Event run: 283877 lumi: 17 event: 27631378 stream: 1
[1] Running path 'dqmoffline_step'
[2] Calling method for module DigiTask/'digiTask'
Exception Message:
Unavailable Conditions of type HcalQIEData for cell (0x4e280440) (CastorRadFacility 1 / 2 / 0)
----- End Fatal Exception -------------------------------------------------
triggered from
https://cmssdt.cern.ch/lxr/source/CondFormats/CastorObjects/interface/CastorCondObjectContainer.h#0090
as used in the module https://cmssdt.cern.ch/lxr/source/DQM/HcalTasks/python/DigiTask.py for what I can see.
The text was updated successfully, but these errors were encountered: