Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protection for lhe length #9184

Merged
merged 1 commit into from May 22, 2015

Conversation

bendavid
Copy link
Contributor

Explicitly check that lhe contains exactly the requested number of events, and throw an exception otherwise.

Without this check/protection, a too-short lhe file would result in the job silently producing a file where the additional edm events at the end were missing the LHEEvent product, leading to GEN-SIM failures downstream in later jobs which are difficult to debug and recover.
see issue discussed at
https://hypernews.cern.ch/HyperNews/CMS/get/dataopsrequests/6807/1.html

…ive of matrix element generator error and can cause worse problems downstream)
@bendavid
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes or unless it breaks tests. This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @smuzaffar

@bendavid
Copy link
Contributor Author

(this would be useful to have in pre5 so that it can go through gen relvals, given that this will need to be backported to 71x in the near future)

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @smuzaffar

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_5_X IBs unless changes (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @davidlange6, @nclopezo, @smuzaffar

@davidlange6
Copy link
Contributor

I'm not excited by this change - especially if the idea is that it can be exercised in production. Is there no way to pass this information up to the source to say "stop, there is nothing else to do"?

@Dr15Jones

@bendavid
Copy link
Contributor Author

Hi David,
Let me explain a bit further. In all production configs, and also from cmsDriver, the generated configurations always have process.maxEvents.input and process.externalLHEProducer.nEvents set to the same value.

The problem here was that sometimes madgraph produces an lhe file with a different number of events. This is an error condition on the madgraph side, and we want those jobs to fail.

This protection could also be added in the external lhe script, but adding it directly in the code also protects against some corruption or error condition in the lhe parsing which results in the wrong number of events being read.

In general for all such cases the physics content of the output is suspect, and we want the job to fail rather than writing out a valid output with fewer events.
(and the current behaviour of writing a file with the LHEEventProduct missing for some events is definitely bad as well)

@davidlange6
Copy link
Contributor

@bendavid - is the lhe script protection fix expected soon?

@davidlange6
Copy link
Contributor

+1

@bendavid
Copy link
Contributor Author

No, the idea was that if the protection is added here, then no protection is needed in the script.

The problem with adding the protection in the script is that this requires using grep/awk to count xml tags from the ascii file, which is ugly, and error prone. Having it in the ExternalLHEProducer means that the counting is done properly using the xml parser in LHEReader.

@bendavid
Copy link
Contributor Author

(the protection actually WAS present in older versions of the script used for previous madgraph versions in run 1, but was at first inadvertently not included in the new scripts for run2/cvmfs/generic madgraph/powheg/etc usage)

@davidlange6
Copy link
Contributor

sorry - but then we discover only in gen-sim production?

On May 22, 2015, at 2:47 PM, Josh Bendavid notifications@github.com wrote:

No, the idea was that if the protection is added here, then no protection is needed in the script.

The problem with adding the protection in the script is that this requires using grep/awk to count xml tags from the ascii file, which is ugly, and error prone. Having it in the ExternalLHEProducer means that the counting is done properly using the xml parser in LHEReader.


Reply to this email directly or view it on GitHub.

cmsbuild added a commit that referenced this pull request May 22, 2015
@cmsbuild cmsbuild merged commit a96732c into cms-sw:CMSSW_7_5_X May 22, 2015
@bendavid
Copy link
Contributor Author

No. Whether the protection is in the script or in the externallheproducer , both will trigger a failure at the wmlhe step.

The present situation does cause failures in the gen-sim step, since the wmlhe step succeeds, but produces an edm file where some events are missing the lheevent product.

This was only seen in production because it is a low rate and process-dependent failure in Madgraph.

@davidlange6
Copy link
Contributor

ok - thanks

On May 22, 2015, at 2:58 PM, Josh Bendavid notifications@github.com wrote:

No. Whether the protection is in the script or in the externallheproducer , both will trigger a failure at the wmlhe step.

The present situation does cause failures in the gen-sim step, since the wmlhe step succeeds, but produces an edm file where some events are missing the lheevent product.

This was only seen in production because it is a low rate and process-dependent failure in Madgraph.


Reply to this email directly or view it on GitHub.

@Dr15Jones
Copy link
Contributor

So from the framework perspective, this now just looks like a runtime failure of a module which is definitely something we support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants