New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hltIntegrationTests
tests failing randomly in IBs
#37598
Comments
A new Issue was created by @missirol Marino Missiroli. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core, hlt |
New categories assigned: core,hlt @missirol,@Dr15Jones,@smuzaffar,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I remember seeing this kind of errors
recently in other tests too (was unable to find those now though). I wonder if this could be e.g. a CVMFS issue on a worker node? |
Just noting here another occurrence of the issue in
|
Another occurrence of the issue in Errors are similar to #37598 (comment). Example:
|
Another occurrence of the issue in In the last 10 days, the issue has continued to appear in |
Another occurrence of the issue in
This latest failure was in
|
Another occurrence of the issue in This intermittent issue keeps appearing, so it might be useful to start thinking about a way to solve it via software (e.g. retrying the query). |
Other occurrences of this issue in |
The problem hasn't shown up in the IBs of the last ten days, or so. I don't know why; I just wonder if anything related to the DB (and/or the queries to it) has changed. |
As far as I can see, this problem has not re-appeared, so something must have improved. :) |
The issue re-appeared in
|
Another instance of the issue was in |
Another instance of the issue was in
|
Another instance of the issue was in |
Another instance of this issue was in |
Another instance of this issue was in (I know I sound like a broken record; I just mean to highlight that the issue persists; when there are less urgent matters, I will try to come up with a solution, e.g. #39345 (comment); ETA: EOY).
|
Another instance of this issue was in |
+hlt
#40004 and its backports have removed queries to Having said that, the root cause of these failures (see also #39345) still escapes me. The symptom is a failure in downloading configurations from ConfDB (only some of them usually, during the same IB), which leads to invalid cfg files. The nodes running tests in IB don't have |
+core Although in the end there wasn't much (anything?) for core. |
@cmsbuild, please close |
This issue is fully signed and ready to be closed. |
In recent IBs, there have been seemingly-random failures of the HLT-Validation tests, e.g.
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-09-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_3_X_2022-04-11-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-11-2300/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-16-1100/slc7_amd64_gcc10/HLT_Integration_PIon_MC.log
https://cmssdt.cern.ch/SDT/jenkins-artifacts/HLT-Validation/CMSSW_12_4_X_2022-04-16-1100/slc7_amd64_gcc10/HLT_Integration_PRef_MC.log
First occurrences of the issues were briefly discussed in
#37304 (comment)
#37524 (comment)
The cause of the issue is unclear. There is evidence that the issue is not reproducible locally, and in fact it seems to show up in IBs at random times. TSG also routinely runs these executables manually (i.e. not via IBs) during development, but I'm yet to encounter this issue locally.
The error messages point to a failure in downloading correctly the HLT config file from the database, via the
hltListPaths
call here and/or thehltGetConfiguration
call here, as part of the executablehltIntegrationTests
.Examples:
hlt.py
dumped viahltGetConfiguration
was not a valid python config;hltListPaths
failed, and then the ensuing call tohltGetConfiguration
failed as well, causing an error fromhltCompareResults
(which read as input the invalid python config returned byhltGetConfiguration
).To my knowledge, the issue started to appear after the integration of #37283 (and its backport to
12_3_X
) [3]. That PR updatedhltListPaths
making it maybe a bit slower; on the other hand, it did not updatehltGetConfiguration
in any way. Curiously, the error showed up so far only for the PIon and PRef HLT menus, which are the two smallest menus being tested (so, their download from the database is generally much quicker compared to other menus).Given its non-reproducibility, it's unclear (to me) how to tackle this.
Could this be somehow an issue related to how these tests are run in IBs? (and/or how the database is queried in that case? are there any timeouts of any kind?)
[1]
[2]
[3] Reverting #37283 in full is not a good option, because that PR introduced functionalities needed to test the latest HLT menus.
The text was updated successfully, but these errors were encountered: