Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize Alignment/OfflineValidation unit tests #40567

Merged
merged 3 commits into from Jan 19, 2023

Conversation

mmusich
Copy link
Contributor

@mmusich mmusich commented Jan 19, 2023

resolves #40566

PR description:

  • de-clutter Alignment/OfflineValidation/test by moving the testing bash scripts into a dedicated sub-folder (4d3cda7)
  • reduce the verbosity of the goemetry comparison plotter (6cfc315)
  • reduce the number of events in GeneralTrackAnalyser unit test (87d68c4)

PR validation:

Run scram b runtests use-ibeos

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

N/A

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40567/33801

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mmusich (Marco Musich) for master.

It involves the following packages:

  • Alignment/OfflineValidation (alca)

@malbouis, @yuanchao, @cmsbuild, @saumyaphor4252, @francescobrivio, @ChrisMisan, @tvami can you please review it and eventually sign? Thanks.
@mmusich, @adewit, @tocheng, @tlampen this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor Author

mmusich commented Jan 19, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-63e948/30074/summary.html
COMMIT: 87d68c4
CMSSW: CMSSW_13_0_X_2023-01-18-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40567/30074/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555479
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555445
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@tvami
Copy link
Contributor

tvami commented Jan 19, 2023

+alca

  • unit tests pass (from a new location, with new settings)

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

Technical reorganization.

@cmsbuild cmsbuild merged commit 1584331 into cms-sw:master Jan 19, 2023
@mmusich mmusich deleted the reorganizeOfflineValidationUnitTests branch January 19, 2023 16:29
@aandvalenzuela
Copy link
Contributor

Hello @mmusich,
Although this PR decreased the number of events, the test is still timing out for non-amd64 architectures. Do you suspect if it could be any reason why it takes longer to run in aarch64 or ppc64le? I am running the tests locally to identify where it could be hanging or taking longer in processing. Thank you!

@mmusich
Copy link
Contributor Author

mmusich commented Jan 24, 2023

Do you suspect if it could be any reason why it takes longer to run in aarch64 or ppc64le?

no.

I am running the tests locally to identify where it could be hanging or taking longer in processing.

please go ahead and feel free to post in original issue #40566

@mmusich
Copy link
Contributor Author

mmusich commented Jan 24, 2023

@aandvalenzuela

by the way how is one supposed to work with those archs? On lxplus8:

$ setenv SCRAM_ARCH el8_aarch64_gcc11
$ cmsrel CMSSW_13_0_X_2023-01-23-2300
$ cd CMSSW_13_0_X_2023-01-23-2300/src/
$ cmsenv
$ echo $CMSSW_RELEASE_BASE/
/cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/
$ git cms-addpkg Alignment/OfflineValidation
/cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/external/el8_aarch64_gcc11/bin/git: Exec format error. Binary file not executable.

...

@smuzaffar
Copy link
Contributor

smuzaffar commented Jan 24, 2023

@mmusich , please use lxplus8-arm nodes for aarch64 arch. Also there is lxplus9-arm where you can run cmssw-el8 to get el8 env.

@mmusich
Copy link
Contributor Author

mmusich commented Jan 25, 2023

please use lxplus8-arm nodes for aarch64 arch.

thanks. For some reason scram b runtests_GeneralTrackAnalyser doesn't work in that release (CMSSW_13_0_X_2023-01-23-2300), as it yields:

scram b runtests_GeneralTrackAnalyser
>> Local Products Rules ..... started
>> Local Products Rules ..... done
gmake: Nothing to be done for 'runtests_GeneralTrackAnalyser'.

despite the test being defined here:

<bin file="testAlignmentOfflineValidation.cpp" name="GeneralTrackAnalyser">
<flags TEST_RUNNER_ARGS=" /bin/bash Alignment/OfflineValidation/test testingScripts/test_unitGeneralTrackAnalyser.sh"/>
<use name="FWCore/Utilities"/>
</bin>

on the other hand I can execute the bash script locally to success.

#! /bin/bash
function die { echo $1: status $2 ; exit $2; }
echo "TESTING Alignment/OfflineValidation ..."
cmsRun ${LOCAL_TEST_DIR}/test_all_cfg.py || die "Failure running test_all_cfg.py" $?
cmsRun ${LOCAL_TEST_DIR}/test_all_Phase2_cfg.py || die "Failure running test_all_Phase2_cfg.py" $?
cmsRun ${LOCAL_TEST_DIR}/inspectData_cfg.py unitTest=True || die "Failure running inspectData_cfg.py" $?

I noticed some warnings (seemingly unrelated) in the execution of the second command:

25-Jan-2023 10:12:28 CET  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/relval/CMSSW_12_5_3/RelValMinBias_14TeV/GEN-SIM-RECO/125X_mcRun4_realistic_v5_2026D88PU-v1/2590000/22e22ae6-a353-4f2e-815e-cc5efee37af9.root
In file included from DataFormatsL1TrackTrigger_xr dictionary payload:76:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1TrackTrigger/interface/TTTypes.h:19:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1TrackTrigger/interface/TTTrack.h:18:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1TrackTrigger/interface/TTTrack_TrackWord.h:20:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/ap_int.h:20:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/ap_common.h:252:
/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/etc/ap_private.h:1535:34: warning: format specifies type 'unsigned long long *' but the argument has type 'uint64_t *' (aka 'unsigned long *') [-Wformat]
        sscanf(strStart, "%llo", &tmpVAL);
                          ~~~~   ^~~~~~~
                          %lo
/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/ap_int_base.h:354:13: note: in instantiation of member function 'ap_private<64, false, true>::fromString' requested here
    Base::V.fromString(s, length, rd);
            ^
/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/ap_int.h:275:51: note: in instantiation of member function 'ap_int_base<64, false>::ap_int_base' requested here
  INLINE ap_uint(const char* s, signed char rd) : Base(s, rd) {}
                                                  ^
/cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1Trigger/interface/VertexWord.h:143:43: note: in instantiation of member function 'ap_uint<64>::ap_uint' requested here
    vtxword_t vertexWord() const { return vtxword_t(vertexWord_.to_string().c_str(), 2); }
                                          ^
In file included from DataFormatsL1TrackTrigger_xr dictionary payload:76:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1TrackTrigger/interface/TTTypes.h:19:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1TrackTrigger/interface/TTTrack.h:18:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/week1/el8_aarch64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-23-2300/src/DataFormats/L1TrackTrigger/interface/TTTrack_TrackWord.h:20:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/ap_int.h:20:
In file included from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/ap_common.h:252:
/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/etc/ap_private.h:1546:34: warning: format specifies type 'unsigned long long *' but the argument has type 'uint64_t *' (aka 'unsigned long *') [-Wformat]
        sscanf(strStart, "%llu", &tmpVAL);
                          ~~~~   ^~~~~~~
                          %lu
/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02769/el8_aarch64_gcc11/external/hls/2019.08-fd724004387c2a6770dc3517446d30d9/include/etc/ap_private.h:1557:34: warning: format specifies type 'unsigned long long *' but the argument has type 'uint64_t *' (aka 'unsigned long *') [-Wformat]
        sscanf(strStart, "%llx", &tmpVAL);
                          ~~~~   ^~~~~~~
                          %lx

which are also present in the build logs: https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_ppc64le_gcc11/CMSSW_13_0_X_2023-01-24-2300/unitTestLogs/Alignment/OfflineValidation#/1788

@smuzaffar
Copy link
Contributor

@mmusich , please try CMSSW_13_0_X_2023-01-24-2300 or above IB (where cms-sw/cmsdist#8262 has been integrated) and scram b runtests_GeneralTrackAnalyser should work there.

@mmusich
Copy link
Contributor Author

mmusich commented Jan 25, 2023

(where cms-sw/cmsdist#8262 has been integrated)

ah, great.

@aandvalenzuela
Copy link
Contributor

For me, it is the second cmsRun command:

cmsRun ${LOCAL_TEST_DIR}/test_all_Phase2_cfg.py || die "Failure running test_all_Phase2_cfg.py" $?
the one that now takes around 80 min.

@mmusich
Copy link
Contributor Author

mmusich commented Jan 25, 2023

the one that now takes around 80 min.

I cannot confirm. In my setup the test ends in about 20 minutes.
Also the test logs point to a segfault not a timeout.

@aandvalenzuela
Copy link
Contributor

I think the external termination request comes from our side if the tests is taking more than 90 min.

@mmusich
Copy link
Contributor Author

mmusich commented Jan 25, 2023

I think the external termination request comes from our side if the tests is taking more than 90 min.

I see, it would be useful to make that clearer from the test logs.
On the other hand I am still puzzled, because in my local checkout it takes much less time than that (also it looks like in the other builds it works nicely). In any case we can try to reduce the amount of event processed and see if it fixes - though I suspect it's a symptom of something deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decrease number of event for GeneralTrackAnalyser unit test
6 participants