New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protection for lhe length #9184
Conversation
…ive of matrix element generator error and can cause worse problems downstream)
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @smuzaffar |
(this would be useful to have in pre5 so that it can go through gen relvals, given that this will need to be backported to 71x in the near future) |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @smuzaffar |
This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @smuzaffar |
I'm not excited by this change - especially if the idea is that it can be exercised in production. Is there no way to pass this information up to the source to say "stop, there is nothing else to do"? |
Hi David, The problem here was that sometimes madgraph produces an lhe file with a different number of events. This is an error condition on the madgraph side, and we want those jobs to fail. This protection could also be added in the external lhe script, but adding it directly in the code also protects against some corruption or error condition in the lhe parsing which results in the wrong number of events being read. In general for all such cases the physics content of the output is suspect, and we want the job to fail rather than writing out a valid output with fewer events. |
@bendavid - is the lhe script protection fix expected soon? |
+1 |
No, the idea was that if the protection is added here, then no protection is needed in the script. The problem with adding the protection in the script is that this requires using grep/awk to count xml tags from the ascii file, which is ugly, and error prone. Having it in the ExternalLHEProducer means that the counting is done properly using the xml parser in LHEReader. |
(the protection actually WAS present in older versions of the script used for previous madgraph versions in run 1, but was at first inadvertently not included in the new scripts for run2/cvmfs/generic madgraph/powheg/etc usage) |
sorry - but then we discover only in gen-sim production?
|
Add protection for lhe length
No. Whether the protection is in the script or in the externallheproducer , both will trigger a failure at the wmlhe step. The present situation does cause failures in the gen-sim step, since the wmlhe step succeeds, but produces an edm file where some events are missing the lheevent product. This was only seen in production because it is a low rate and process-dependent failure in Madgraph. |
ok - thanks
|
So from the framework perspective, this now just looks like a runtime failure of a module which is definitely something we support. |
Explicitly check that lhe contains exactly the requested number of events, and throw an exception otherwise.
Without this check/protection, a too-short lhe file would result in the job silently producing a file where the additional edm events at the end were missing the LHEEvent product, leading to GEN-SIM failures downstream in later jobs which are difficult to debug and recover.
see issue discussed at
https://hypernews.cern.ch/HyperNews/CMS/get/dataopsrequests/6807/1.html