Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPC][ARM] Unit test in EventFilter/Utilities failing #32091

Closed
mrodozov opened this issue Nov 10, 2020 · 12 comments
Closed

[PPC][ARM] Unit test in EventFilter/Utilities failing #32091

mrodozov opened this issue Nov 10, 2020 · 12 comments

Comments

@mrodozov
Copy link
Contributor

Unit test:
BUFU_TEST

in EventFilter/Utilities is failing in both Arm and powerpc IBs
with the following:

Running test with index JSONs
Failure cmsRun unittest_FU.py runNumber=101 fffBaseDir=/build/cmsbld/jenkins_c/workspace/ib-run-qa/CMSSW_11_2_X_2020-11-09-2300/tmp/slc7_aarch64_gcc820/results_cmsbld24650: status 65

----- Error -----

%MSG-i ThreadStreamSetup:  (NoModuleName) 10-Nov-2020 00:51:03 CET pre-events
setting # threads 2
setting # streams 2
%MSG
%MSG-e FedRawDataInputSource::FedRawDataInputSource:   FedRawDataInputSource:source@sourceConstruction 10-Nov-2020 00:51:07 CET  pre-events
Intel crc32c checksum computation unavailable
%MSG
%MSG-s ModulesSynchingOnLumis:  AfterModConstruction 10-Nov-2020 00:51:27 CET  pre-events
The following modules require synchronizing on LuminosityBlock boundaries:
  ShmStreamConsumer streamC
%MSG
%MSG-e HLTConfigProvider:  HLTriggerJSONMonitoring:hltJson@beginRun  10-Nov-2020 00:51:27 CET Run: 101
Falling back to ProcessName-only init using ProcessName 'HLT' !
%MSG
%MSG-e HLTConfigProvider:  HLTriggerJSONMonitoring:hltJson@beginRun  10-Nov-2020 00:51:27 CET Run: 101
 Process name 'HLT' not found in registry!
%MSG
%MSG-e HLTriggerJSONMonitoring:  HLTriggerJSONMonitoring:hltJson@beginRun  10-Nov-2020 00:51:27 CET Run: 101
HLTConfigProvider initialization failed!
%MSG
----- Begin Fatal Exception 10-Nov-2020 00:51:30 CET-----------------------
An exception of category 'FedRawDataInputSource::getNextEvent' occurred while
   [0] Calling InputSource::getNextItemType
Exception Message:
Premature end of input file while reading event header
----- End Fatal Exception -------------------------------------------------
status = 16640

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_aarch64_gcc820/CMSSW_11_2_X_2020-11-09-2300/unitTestLogs/EventFilter/Utilities#/

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_ppc64le_gcc820/CMSSW_11_2_X_2020-11-06-2300/unitTestLogs/EventFilter/Utilities#/50-50

@cmsbuild
Copy link
Contributor

A new Issue was created by @mrodozov Mircho Rodozov.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign daq

@cmsbuild
Copy link
Contributor

New categories assigned: daq

@smorovic,@emeschi you have been requested to review this Pull request/Issue and eventually sign? Thanks

@smorovic
Copy link
Contributor

Endianness is the same as x86_64 if I remember correctly. Suspicion would be alignment difference wrt. x86_64, so files aren't correctly written or read (or both) on these two platforms .

We don't use ARM or PPC currently for DAQ/HLT workflow (don't even have hardware for development and testing) and we can't currently support it.
I'd tend to disable the unit test for non-x86_64 architectures. From what I see this could be done from the bash script (checking uname -n before exit 0). Any suggestions are welcome.

@makortel
Copy link
Contributor

I'd tend to disable the unit test for non-x86_64 architectures. From what I see this could be done from the bash script (checking uname -n before exit 0). Any suggestions are welcome.

It could be clearer to disable the test for non-x86 in the BuildFile. Syntax should be the following (@smuzaffar please correct if necessary)

<ifarchitecture name="_amd64_">
  <bin file="RunBUFU_t.cpp" name="BUFU_TEST">
    <flags TEST_RUNNER_ARGS="/bin/bash EventFilter/Utilities/test RunBUFU.sh"/>
  </bin>
</ifarchitecture>

The same unit test appears to fail in the same way in CLANG IBs as well (on x86)
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc820/CMSSW_11_2_CLANG_X_2020-11-10-2300/unitTestLogs/EventFilter/Utilities#/38-38

@smorovic
Copy link
Contributor

@makortel ok, so it's not architecture specific. I'll investigate and try to reproduce it on my side starting from the master branch.

@mrodozov
Copy link
Contributor Author

@smorovic you can use the CLANG IB, fixing it against it and then we can run the test for Arm and PPC from here

scram list CLANG
scram p CMSSW_11_2_CLANG_X_2020-11-10-2300 # or more recent if any

sry if this is redundant :/

@smorovic
Copy link
Contributor

Thanks @mrodozov , that was useful suggestion.
I just tried and reproduced the problem with that IB (CMSSW_11_2_CLANG_X_2020-11-10-2300) on lxplus.
What is interesting is that same tests run fine with the baseline build CMSSW_11_2_X_2020-11-10-2300.

My suspicion is a recent patch introducing changes in the raw data event header format (error is about that header).
#31543
I'll dig more to try to narrow it down.

@smorovic
Copy link
Contributor

I found the cause: In IOPool/Streamer/interface/FRDEventMessage.h:
FRDHeaderVersionSize array is of size 6 (positions 0-5)

But in EventFilter/Utilities/src/FedRawDataInputSource.cc (in a few places):
FRDHeaderVersionSize[detectedFRDversion_]
Version is in this case 6, going out of array.

There are also checks making sure version is <= "6" in a few places. I'll instead move this into the header file and define as a constant, and then also use it to define std::array length for FRDHeaderVersionSize to have compile time check.

I'll prepare PRs for 11_2_X/master as well as 11_1_X.
Production HLT is not affected as it uses version 5 (6 isn't used anywhere yet, so this is not critical).

@smorovic
Copy link
Contributor

I created the PR: #32116
if you can also test it on ARM and PPC.

It will be followed by 11_1_X backport later.

@mrodozov
Copy link
Contributor Author

this is fixed in Arm, PPC & Clang. closing. thanks @smorovic that was fast :)

@smorovic
Copy link
Contributor

you are welcome, thanks for reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants