Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw a clear exception if OutputModule SelectEvents refers to a non-existent Path or Process #44767

Merged
merged 2 commits into from May 1, 2024

Conversation

makortel
Copy link
Contributor

PR description:

This was already the case for a non-existent Path of the present process. For non-existent Path of an earlier Process, or a non-existent Process, lead to a confusing exception message, and in some cases the job got stuck. An example of such stuck job was having an unrelated module in front of the OutputModule in the same EndPath.

Resolves #44743
Resolves #44744
Resolves cms-sw/framework-team#886

PR validation:

Added unit tests pass

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 17, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44767/39985

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/Framework (core)

@makortel, @smuzaffar, @Dr15Jones, @cmsbuild can you please review it and eventually sign? Thanks.
@missirol, @wddgit this is something you requested to watch as well.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@makortel
Copy link
Contributor Author

@Dr15Jones Please review

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b3c1e7/38906/summary.html
COMMIT: 503657b
CMSSW: CMSSW_14_1_X_2024-04-17-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44767/38906/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 17-Apr-2024 22:44:09 CEST-----------------------
An exception of category 'Configuration' occurred while
   [0] Processing  Event run: 165121 lumi: 62 event: 23609118 stream: 0
   [1] Running path'ALCARECOStreamLumiPixelsMinBiasOutPath'
   [2] Calling OutputModule prePrefetchSelection()
Exception Message:
EventSelector::init, An OutputModule is using SelectEvents
to request a trigger name that does not exist
The unknown trigger name is: pathALCARECOLumiPixelsMinBias
----- End Fatal Exception -------------------------------------------------

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44767/40013

…existent Path or Process

This was already the case for a non-existent Path of the present
process. For non-existent Path of an earlier Process, or a
non-existent Process, lead to a confusing exception message, and in
some cases the job got stuck. An example of such stuck job was having
an unrelated module in front of the OutputModule in the same EndPath.
Path::runNextWorkerAsync() pretty much assumes the runWorkerAsync()
doesn't throw. runNextWorkerAsync() needs to schedule all eligible
workers for the rest of the system to behave correctly.
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44767/40014

@makortel
Copy link
Contributor Author

Is full_cmssw advisable here?

We can certainly do that, thanks for the suggestion. I was in any case waiting for #44787 to be merged before launching next round of tests (not much point testing before)

@makortel
Copy link
Contributor Author

test parameters:

  • full_cmssw = true

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b3c1e7/39180/summary.html
COMMIT: 27a4db4
CMSSW: CMSSW_14_1_X_2024-04-30-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/44767/39180/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
29661.402 step 3
29661.403 step 3
29834.402 step 2
29834.403 step 2
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

@makortel
Copy link
Contributor Author

There are some workflows for which there are errors in the baseline:
29661.402 step 3
29661.403 step 3
29834.402 step 2
29834.403 step 2

@smuzaffar These extra workflows were not run as part of the tests of this PR (or as part of IBs AFAICT). Are the failures reported in the test summary above perhaps because some other PR used the same baseline, and those extra workflows were requested for the tests of that PR?

@makortel
Copy link
Contributor Author

Seems like this PR would be good to go. @Dr15Jones could you take a last look?

@smuzaffar
Copy link
Contributor

Are the failures reported in the test summary above perhaps because some other PR used the same baseline, and those extra workflows were requested for the tests of that PR?

yes looks like these extra workflows were requested by #44874 (comment). Bot should have ignore these workflow failures for this PR tests. I will look in to this

@makortel
Copy link
Contributor Author

makortel commented May 1, 2024

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented May 1, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

@smuzaffar
Copy link
Contributor

yes looks like these extra workflows were requested by #44874 (comment). Bot should have ignore these workflow failures for this PR tests. I will look in to this

cms-sw/cms-bot#2221 should fix this for future PR tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment