Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hltIntegrationTests tests failing randomly in IBs #37598

Closed
missirol opened this issue Apr 17, 2022 · 23 comments
Closed

hltIntegrationTests tests failing randomly in IBs #37598

missirol opened this issue Apr 17, 2022 · 23 comments

Comments

@missirol
Copy link
Contributor

missirol commented Apr 17, 2022

In recent IBs, there have been seemingly-random failures of the HLT-Validation tests, e.g.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-09-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-11-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-11-2300/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-16-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-16-1100/slc7_amd64_gcc10/HLT_Integration_PRef_MC.log

First occurrences of the issues were briefly discussed in

#37304 (comment)
#37524 (comment)

The cause of the issue is unclear. There is evidence that the issue is not reproducible locally, and in fact it seems to show up in IBs at random times. TSG also routinely runs these executables manually (i.e. not via IBs) during development, but I'm yet to encounter this issue locally.

The error messages point to a failure in downloading correctly the HLT config file from the database, via the hltListPaths call here and/or the hltGetConfiguration call here, as part of the executable hltIntegrationTests.

Examples:

  1. this error [1] suggests that the hlt.py dumped via hltGetConfiguration was not a valid python config;
  2. this error [2] suggests that downloading the menu inside hltListPaths failed, and then the ensuing call to hltGetConfiguration failed as well, causing an error from hltCompareResults (which read as input the invalid python config returned by hltGetConfiguration).

To my knowledge, the issue started to appear after the integration of #37283 (and its backport to 12_3_X) [3]. That PR updated hltListPaths making it maybe a bit slower; on the other hand, it did not update hltGetConfiguration in any way. Curiously, the error showed up so far only for the PIon and PRef HLT menus, which are the two smallest menus being tested (so, their download from the database is generally much quicker compared to other menus).

Given its non-reproducibility, it's unclear (to me) how to tackle this.

Could this be somehow an issue related to how these tests are run in IBs? (and/or how the database is queried in that case? are there any timeouts of any kind?)

[1]

stty: standard input: Inappropriate ioctl for device
Will run 6 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/PIon/V67 --full --offline --mc --input file:../RelVal_Raw_PIon_MC.root --unprescale --process TEST20220416171904 --max-events 100 --globaltag=auto:run3_mc_PIon --type=PIon
Traceback (most recent call last):
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02728/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-04-15-1100/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    Status_OnCPU
    Status_OnGPU
    HLTriggerFirstPath
    HLT_Physics_v7
make: *** [.makefile:23: Status_OnGPU.done] Error 90
    HLT_Random_v3
make: *** [.makefile:23: Status_OnCPU.done] Error 90
    HLT_ZeroBias_v6
make: *** [.makefile:23: HLT_Physics_v7.done] Error 90
make: Target 'Status_OnCPU' not remade because of errors.
make: Target 'Status_OnGPU' not remade because of errors.
make: Target 'HLT_Physics_v7' not remade because of errors.
make: *** [.makefile:23: HLTriggerFirstPath.done] Error 90
make: Target 'HLTriggerFirstPath' not remade because of errors.
make: *** [.makefile:23: HLT_Random_v3.done] Error 90
make: Target 'HLT_Random_v3' not remade because of errors.
make: *** [.makefile:23: HLT_ZeroBias_v6.done] Error 90
make: Target 'HLT_ZeroBias_v6' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

[2]

stty: standard input: Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_3_0/PRef/V67 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/PRef/V67 --full --offline --mc --input file:../RelVal_Raw_PRef_MC.root --unprescale --process TEST20220416171920 --max-events 100 --globaltag=auto:run3_mc_PRef --type=PRef
Traceback (most recent call last):
  File "/pool/condor/dir_150973/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-04-16-1100/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02728/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-04-15-1100/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

[3] Reverting #37283 in full is not a good option, because that PR introduced functionalities needed to test the latest HLT menus.

@cmsbuild
Copy link
Contributor

A new Issue was created by @missirol Marino Missiroli.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign core, hlt

@cmsbuild
Copy link
Contributor

New categories assigned: core,hlt

@missirol,@Dr15Jones,@smuzaffar,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

I remember seeing this kind of errors

NameError: name 'cms' is not defined

recently in other tests too (was unable to find those now though). I wonder if this could be e.g. a CVMFS issue on a worker node?

@missirol
Copy link
Contributor Author

Just noting here another occurrence of the issue in CMSSW_12_3_X_2022-04-25-2300.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-25-2300/slc7_amd64_gcc10/runIB.log

02:28:48 hltIntegrationTests /dev/CMSSW_12_3_0/HIon/V72 -d HLT_Integration_HIon_MC -i file:../RelVal_Raw_HIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_HIon -x --type=HIon >& HLT_Integration_HIon_MC.log
2.097u 1.186s 0:05.99 54.5%    0+0k 2373016+976io 49958pf+0w
02:28:54 exit status: 1

02:28:54 hltIntegrationTests /dev/CMSSW_12_3_0/PIon/V72 -d HLT_Integration_PIon_MC -i file:../RelVal_Raw_PIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PIon -x --type=PIon >& HLT_Integration_PIon_MC.log
2.131u 1.264s 0:10.79 31.4%    0+0k 4129304+1000io 21167pf+0w
02:29:05 exit status: 1

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-25-2300/slc7_amd64_gcc10/HLT_Integration_HIon_MC.log

stty: standard input: Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_3_0/HIon/V72 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/HIon/V72 --full --offline --mc --input file:../RelVal_Raw_HIon_MC.root --unprescale --process TEST20220426022851 --max-events 100 --globaltag=auto:run3_mc_HIon --type=HIon
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02730/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-04-24-0000/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-25-2300/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log

stty: standard input: Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_3_0/PIon/V72 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/PIon/V72 --full --offline --mc --input file:../RelVal_Raw_PIon_MC.root --unprescale --process TEST20220426022856 --max-events 100 --globaltag=auto:run3_mc_PIon --type=PIon
Traceback (most recent call last):
  File "/pool/condor/dir_222511/jenkins/workspace/ib-run-HLT/CMSSW_12_3_X_2022-04-25-2300/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02730/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_3_X_2022-04-24-0000/bin/slc7_amd64_gcc10/edmConfigDump", line 25, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
    full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

@missirol
Copy link
Contributor Author

Another occurrence of the issue in CMSSW_12_3_X_2022-04-27-1100.

Errors are similar to #37598 (comment). Example:

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-27-1100/slc7_amd64_gcc10/runIB.log

18:27:12 hltIntegrationTests /dev/CMSSW_12_3_0/PIon/V72 -d HLT_Integration_PIon_MC -i file:../RelVal_Raw_PIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PIon -x --type=PIon >& HLT_Integration_PIon_MC.log
11.007u 2.286s 0:38.04 34.9%	0+0k 1986120+920io 52369pf+0w
18:27:50 exit status: 1

18:27:50 hltIntegrationTests /dev/CMSSW_12_3_0/PRef/V72 -d HLT_Integration_PRef_MC -i file:../RelVal_Raw_PRef_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PRef -x --type=PRef >& HLT_Integration_PRef_MC.log
1.996u 0.622s 0:03.61 72.2%	0+0k 524288+944io 1778pf+0w
18:27:54 exit status: 1

@missirol
Copy link
Contributor Author

missirol commented May 3, 2022

Another occurrence of the issue in CMSSW_12_3_X_2022-05-02-2300. Errors are similar to #37598 (comment).

In the last 10 days, the issue has continued to appear in 12_3_X IBs, but not in 12_4_X IBs (maybe it is just a coincidence). The HLT menus in those releases are the same. Is there anything different in how IBs run for 12_3_X and 12_4_X? (generic question, but I'm trying to figure out if something could explain the apparent lack of issues in recent 12_4_X IBs)

@missirol
Copy link
Contributor Author

Another occurrence of the issue in CMSSW_12_4_X_2022-05-13-2300. Errors are similar to #37598 (comment), but this time only for the HIon menu.

Is there anything different in how IBs run for 12_3_X and 12_4_X?

This latest failure was in 12_4_X (master), suggesting that there might be no differences between 12_3_X IBs and 12_4_X IBs for what concerns this particular problem.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-05-13-2300/slc7_amd64_gcc10/runIB.log

[..]
03:01:32 hltIntegrationTests /dev/CMSSW_12_3_0/GRun/V79 -d HLT_Integration_GRun_MC -i file:../RelVal_Raw_GRun_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_GRun -x --type=GRun >& HLT_Integration_GRun_MC.log
25416.330u 6401.524s 3:37:51.43 243.4%	0+0k 1954842544+1449568io 8562759pf+0w
06:39:23 exit status: 0

06:39:23 hltIntegrationTests /dev/CMSSW_12_3_0/HIon/V79 -d HLT_Integration_HIon_MC -i file:../RelVal_Raw_HIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_HIon -x --type=HIon >& HLT_Integration_HIon_MC.log
109.351u 51.588s 2:45.39 97.3%	0+0k 30597312+12816io 315924pf+0w
06:42:09 exit status: 1

06:42:09 hltIntegrationTests /dev/CMSSW_12_3_0/PIon/V79 -d HLT_Integration_PIon_MC -i file:../RelVal_Raw_PIon_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PIon -x --type=PIon >& HLT_Integration_PIon_MC.log
105.738u 15.689s 1:35.85 126.6%	0+0k 13181608+73944io 74766pf+0w
06:43:45 exit status: 0

06:43:45 hltIntegrationTests /dev/CMSSW_12_3_0/PRef/V79 -d HLT_Integration_PRef_MC -i file:../RelVal_Raw_PRef_MC.root -n 100 -j 4 --mc -x --globaltag=auto:run3_mc_PRef -x --type=PRef >& HLT_Integration_PRef_MC.log
2582.033u 907.994s 51:46.90 112.3%	0+0k 97426000+429728io 462013pf+0w
07:35:32 exit status: 0
[..]

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-05-13-2300/slc7_amd64_gcc10/HLT_Integration_HIon_MC.log

stty: standard input: Inappropriate ioctl for device
Will run 429 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_3_0/HIon/V79 --full --offline --mc --input file:../RelVal_Raw_HIon_MC.root --unprescale --process TEST20220514064003 --max-events 100 --globaltag=auto:run3_mc_HIon --type=HIon
Traceback (most recent call last):
  File "/pool/condor/dir_18534/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-05-13-2300/bin/slc7_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_4_X_2022-05-13-2300/bin/slc7_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
[..]

@missirol
Copy link
Contributor Author

Another occurrence of the issue in CMSSW_12_4_X_2022-05-17-2300.

This intermittent issue keeps appearing, so it might be useful to start thinking about a way to solve it via software (e.g. retrying the query).

@missirol
Copy link
Contributor Author

Other occurrences of this issue in
CMSSW_12_5_X_2022-05-23-2300
CMSSW_12_5_X_2022-05-27-1100

@missirol
Copy link
Contributor Author

The problem hasn't shown up in the IBs of the last ten days, or so.

I don't know why; I just wonder if anything related to the DB (and/or the queries to it) has changed.

@missirol
Copy link
Contributor Author

As far as I can see, this problem has not re-appeared, so something must have improved. :)

@missirol
Copy link
Contributor Author

The issue re-appeared in CMSSW_12_4_X_2022-08-12-1100.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-08-12-1100/el8_amd64_gcc10/HLT_Integration_PIon_DATA.log

stty: 'standard input': Inappropriate ioctl for device
Will run 6 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/PIon/V94 --full --offline --data --input file:../RelVal_Raw_PIon_DATA.root --unprescale --process TEST20220812172737 --max-events 100 --globaltag=auto:run3_hlt_PIon --type=PIon
Traceback (most recent call last):
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02745/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-08-11-2300/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	HLTriggerFirstPath
	Status_OnGPU
	HLT_Physics_v8
	Status_OnCPU
make: *** [.makefile:23: HLT_Physics_v8.done] Error 90
make: *** [.makefile:23: HLTriggerFirstPath.done] Error 90
	HLT_Random_v3
	HLT_ZeroBias_v7
make: *** [.makefile:23: Status_OnCPU.done] Error 90
make: Target 'HLTriggerFirstPath' not remade because of errors.
make: Target 'Status_OnCPU' not remade because of errors.
make: Target 'HLT_Physics_v8' not remade because of errors.
make: *** [.makefile:23: Status_OnGPU.done] Error 90
make: Target 'Status_OnGPU' not remade because of errors.
make: *** [.makefile:23: HLT_Random_v3.done] Error 90
make: *** [.makefile:23: HLT_ZeroBias_v7.done] Error 90
make: Target 'HLT_Random_v3' not remade because of errors.
make: Target 'HLT_ZeroBias_v7' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-08-12-1100/el8_amd64_gcc10/HLT_Integration_PRef_DATA.log

stty: 'standard input': Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_4_0/PRef/V94 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/PRef/V94 --full --offline --data --input file:../RelVal_Raw_PRef_DATA.root --unprescale --process TEST20220812172745 --max-events 100 --globaltag=auto:run3_hlt_PRef --type=PRef
Traceback (most recent call last):
  File "/pool/condor/dir_39524/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-08-12-1100/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 10, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02745/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-08-11-2300/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 10, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

@missirol
Copy link
Contributor Author

Another instance of the issue was in CMSSW_12_5_X_2022-08-17-1100. The errors are virtually identical to #37598 (comment).

@missirol
Copy link
Contributor Author

Another instance of the issue was in CMSSW_12_5_X_2022-08-24-2300.

https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_5_X_2022-08-24-2300/el8_amd64_gcc10/HLT_Integration_HIon_DATA.log

stty: 'standard input': Inappropriate ioctl for device
Traceback (most recent call last):
  File "/pool/condor/dir_60795/jenkins/workspace/ib-run-HLT/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_60795/jenkins/workspace/ib-run-HLT/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_4_0/HIon/V110 --noedsources --noes --noservices"
Will run 0 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/HIon/V110 --full --offline --data --input file:../RelVal_Raw_HIon_DATA.root --unprescale --process TEST20220825045934 --max-events 100 --globaltag=auto:run3_hlt_HIon --type=HIon
Traceback (most recent call last):
  File "/pool/condor/dir_60795/jenkins/workspace/ib-run-HLT/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 5, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-08-24-2300/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 5, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
make: Target 'all' not remade because of errors.
Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

@missirol
Copy link
Contributor Author

missirol commented Sep 4, 2022

@missirol
Copy link
Contributor Author

missirol commented Oct 9, 2022

@missirol
Copy link
Contributor Author

Another instance of this issue was in CMSSW_12_4_X_2022-10-11-1100, albeit with a somewhat new error message [*].

(I know I sound like a broken record; I just mean to highlight that the issue persists; when there are less urgent matters, I will try to come up with a solution, e.g. #39345 (comment); ETA: EOY).

[*] https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-10-11-1100/el8_amd64_gcc10/HLT_Integration_GRun_MC.log

stty: 'standard input': Inappropriate ioctl for device
Will run 674 HLT paths over 100 events, with 4 jobs in parallel
Extracting full menu dump
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 1349, in getresponse
    response.begin()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-urllib3/1.26.6-504ee060441080cce4ff715292ff47ca/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 1349, in getresponse
    response.begin()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/external/python3/3.9.6-67e5cf5b4952101922f1d4c8474baa39/lib/python3.9/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/bin/el8_amd64_gcc10/hltGetConfiguration", line 251, in <module>
    print(confdb.HLTProcess(config).dump())
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/python/HLTrigger/Configuration/Tools/confdb.py", line 53, in __init__
    self.converter = OfflineConverter(version = self.config.menu.version, database = self.config.menu.database, proxy = self.config.proxy, proxyHost = self.config.proxy_host, proxyPort = self.config.proxy_port)
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/python/HLTrigger/Configuration/Tools/confdbOfflineConverter.py", line 131, in __init__
    version_website = requests.get(self.baseUrl+"/../confdb.version").text
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/external/py3-requests/2.26.0-0d6433445dfa3a94b84d1ce98b51f46e/lib/python3.9/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
HLT menu: hltGetConfiguration /dev/CMSSW_12_4_0/GRun/V145 --full --offline --mc --input file:../RelVal_Raw_GRun_MC.root --unprescale --process TEST20221011111238 --max-events 100 --globaltag=auto:run3_mc_GRun --type=GRun
Traceback (most recent call last):
  File "/pool/condor/dir_35162/jenkins/workspace/ib-run-HLT/CMSSW_12_4_X_2022-10-11-1100/bin/el8_amd64_gcc10/hltCheckPrescaleModules", line 25, in <module>
    exec(open(name).read(), globals(), menu.__dict__)
  File "<string>", line 4, in <module>
NameError: name 'cms' is not defined
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/nweek-02754/el8_amd64_gcc10/cms/cmssw/CMSSW_12_4_X_2022-10-09-0000/bin/el8_amd64_gcc10/edmConfigDump", line 26, in <module>
    loader.exec_module(mod)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "hlt.py", line 4, in <module>
    process.hltTriggerSummaryAOD = cms.EDProducer( "TriggerSummaryProducerAOD",
NameError: name 'cms' is not defined
Preparing single-path configurations
Running...
	full menu dump
make: *** [.makefile:17: hlt.done] Error 90
	HLT_AK8PFJet360_TrimMass30_v20
	Status_OnGPU
	Status_OnCPU
	HLTriggerFirstPath
make: *** [.makefile:23: Status_OnCPU.done] Error 90
	HLT_AK8PFJet380_TrimMass30_v13

[..]

Comparing the results of running each path by itself with those from the full menu
ERROR: Execution of the full HLT menu failed.
Please check the contents of 'hlt.log' for details.
exit status: 1
done

@missirol
Copy link
Contributor Author

missirol commented Nov 2, 2022

@missirol
Copy link
Contributor Author

+hlt

I will try to come up with a solution, e.g. #39345 (comment); ETA: EOY

#40004 and its backports have removed queries to ConfDB in IB tests. This should, by construction, remove occurrences of this issue for 12_4_X and higher, so I'm signing this.

Having said that, the root cause of these failures (see also #39345) still escapes me. The symptom is a failure in downloading configurations from ConfDB (only some of them usually, during the same IB), which leads to invalid cfg files. The nodes running tests in IB don't have /afs access, so the ConfDB .jar files are downloaded locally, but it's unclear (to me) whether or not this is part of the issue. The code seems to account for the fact that multiple downloads can happen simultaneously, but I didn't try to stress-test this.

@makortel
Copy link
Contributor

+core

Although in the end there wasn't much (anything?) for core.

@makortel
Copy link
Contributor

@cmsbuild, please close

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants