Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector<l1t::Muon> read too many bytes: 187 instead of 171 #32133

Closed
silviodonato opened this issue Nov 13, 2020 · 15 comments
Closed

vector<l1t::Muon> read too many bytes: 187 instead of 171 #32133

silviodonato opened this issue Nov 13, 2020 · 15 comments

Comments

@silviodonato
Copy link
Contributor

silviodonato commented Nov 13, 2020

From https://hypernews.cern.ch/HyperNews/CMS/get/tier0-Ops/2187/1/1/1.html and
https://docs.google.com/document/d/1LWs-87dWMTlIj2VhmVZLcPhqZEGZK1Zkv1fRLeBbO1E/edit?pli=1#

In CMSSW_11_1_5, we get

----- Begin Fatal Exception 13-Nov-2020 14:52:39 CET-----------------------
An exception of category 'FatalRootError' occurred while
   [0] Calling InputSource::getNextItemType
   Additional Info:
      [a] Fatal Root Error: @SUB=TBufferFile::CheckByteCount
object of class vector<l1t::Muon> read too many bytes: 187 instead of 171

----- End Fatal Exception -------------------------------------------------

You can reproduce the error by

cmsRun /afs/cern.ch/user/s/sdonato/AFSwork/public/L1issue/PSetPklAsPythonFile.py
@cmsbuild
Copy link
Contributor

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@silviodonato
Copy link
Contributor Author

assign l1

@cmsbuild
Copy link
Contributor

New categories assigned: l1

@rekovic,@jmduarte you have been requested to review this Pull request/Issue and eventually sign? Thanks

@silviodonato
Copy link
Contributor Author

cc @boudoul @amassiro

@silviodonato
Copy link
Contributor Author

I can confirm that CMSSW_11_1_4 works without error. The error appears running with CMSSW_11_1_4 + #31679

@makortel
Copy link
Contributor

The ioread in #31679 11_1_X backport has a slight difference wrt. 11_2_X original PR #31608 (red #31608, green #31679)

-  <ioread sourceClass="l1t::RegionalMuonCand" targetClass="l1t::RegionalMuonCand" version="[-10]"  source="int m_hwPt2" target="m_hwPtUnconstrained">
+  <ioread sourceClass="l1t::RegionalMuonCand" targetClass="l1t::RegionalMuonCand" version="[10]"  source="int m_hwPt2" target="m_hwPtUnconstrained">

@silviodonato
Copy link
Contributor Author

The ioread in #31679 11_1_X backport has a slight difference wrt. 11_2_X original PR #31608 (red #31608, green #31679)

-  <ioread sourceClass="l1t::RegionalMuonCand" targetClass="l1t::RegionalMuonCand" version="[-10]"  source="int m_hwPt2" target="m_hwPtUnconstrained">
+  <ioread sourceClass="l1t::RegionalMuonCand" targetClass="l1t::RegionalMuonCand" version="[10]"  source="int m_hwPt2" target="m_hwPtUnconstrained">

ah! we need to backport #31848. Let's see if it actually solves the problem.

@boudoul
Copy link
Contributor

boudoul commented Nov 13, 2020

Thanks! Adding @dinyar as watcher

@dinyar
Copy link
Contributor

dinyar commented Nov 13, 2020

Hi all,

Thanks for looking into this! Regarding the above, I think it's a red herring unless I'm overlooking something. #31608 didn't contain @silviodonato's fix, but #31679 does. We in fact want [10] and not [-10]. Also the DF affected by this is RegionalMuonCand and not Muon. I'll try to reproduce locally.

Cheers,
Dinyar

@silviodonato
Copy link
Contributor Author

ah, no, it was already backport with cms-l1t-offline@c3734a8

so #31679 is the backport of #31608 + #31848

indeed we have

<ioread sourceClass="l1t::RegionalMuonCand" targetClass="l1t::RegionalMuonCand" version="[10]"  source="int m_hwPt2" target="m_hwPtUnconstrained">

both in
https://github.com/cms-sw/cmssw/blob/CMSSW_11_1_5/DataFormats/L1TMuon/src/classes_def.xml#L8
and in
https://github.com/cms-sw/cmssw/blob/master/DataFormats/L1TMuon/src/classes_def.xml#L8

@makortel
Copy link
Contributor

Ah, right, sorry for the noise.

@silviodonato
Copy link
Contributor Author

I've investigated a bit the problem. From the HLT menu used (hltGetConfiguration run:337973 > hlt.py) I found that ALCAPHYSIM and NanoDST are the only streams storing hltGtStage2Digis. ALCAPHYSIM triggered no data, so we get no error in that case (see AlCa_EcalPhiSym_v4 in https://cmsoms.cern.ch/cms/triggers/hlt_report?cms_run=337973).
This is the reason why we get error only in NanoDST.

The failing process converts the .dat files into the RAW .root files. In case of NanoDST the RAW dataformat contains:

FEDRawDataCollection        "hltFEDSelector"     ""          "HLT"     
edm::TriggerResults         "TriggerResults"     ""          "HLT" 
BXVector<l1t::Muon>         "hltGtStage2Digis"   "Muon"      "HLT"     
BXVector<l1t::Tau>          "hltGtStage2Digis"   "Tau"       "HLT"     
[...]

Since we have changed the l1t::Muon dataformat, the expected length of the l1t::Muon size in the .dat file has changed causing the error. I expect that if we convert from .root to .dat and then again .root in CMSSW_11_1_5, everything should work. In other words, I do not expect we will get any crash during the MWGR. But I let @cms-sw/daq-l2 comment more on this.

@smorovic
Copy link
Contributor

smorovic commented Nov 13, 2020

@silviodonato
I just tested your assumption (/afs/cern.ch/user/s/smorovic/public/test-l1) and it indeed works if dat is converted to root with 11_1_4 and, using 11_1_5 back to .dat.
Thus we shouldn't have crashes with HLT and Tier0 both using 11_1_5 in MWGR.

Other consumers of .dat files are calibrations and DQM, but they don't use hltGtState2Digis.

@silviodonato
Copy link
Contributor Author

Thanks @smorovic . So we can close this issue.
FYI @gennai

@boudoul
Copy link
Contributor

boudoul commented Nov 16, 2020

Thanks @silviodonato and @smorovic for the investigations!

@silviodonato , may you please propagate the conclusion to the Tier0 HN answering the reported issue of the replay- This is important to inform Tier0 about this. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants