Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug affecting SubProcesses #3846

Merged
merged 3 commits into from May 16, 2014
Merged

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented May 13, 2014

Fix a bug affecting SubProcesses. It affects
them when input files are being merged and
the input files have differing BranchIDLists
and the SubProcess produces something.
An attempt to use a Ref or Ptr in the SubProcess
or any subsequent job using its output can result
in a failed assert which aborts the job. The
BranchListIndexes which are used to reference
Ref's become corrupt.

Fix a bug affecting SubProcesses. It affects
them when input files are being merged and
the input files have differing BranchIDLists
and the SubProcess produces something.
An attempt to use a Ref or Ptr in the SubProcess
or any subsequent job using its output can result
in a failed assert which aborts the job. The
BranchListIndexes which are used to reference
Ref's become corrupt.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wddgit (W. David Dagenhart) for CMSSW_7_1_X.

Fix bug affecting SubProcesses

It involves the following packages:

DataFormats/Provenance
FWCore/Framework
FWCore/Integration
IOPool/Input

@cmsbuild, @Degano, @Dr15Jones, @ktf, @nclopezo can you please review it and eventually sign? Thanks.
@wmtan this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
@nclopezo, @ktf you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@Dr15Jones
Copy link
Contributor

Does this include a unit test which would have caught this problem?

@wddgit
Copy link
Contributor Author

wddgit commented May 13, 2014

Yes. I extended an existing unit test. That one almost caught it except that the SubProcess was not producing anything.

@Dr15Jones
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_1_X IBs unless changes or unless it breaks tests. @nclopezo, @ktf can you please take care of it?

@cmsbuild
Copy link
Contributor

-1
I ran the usual tests and I found errors in the following unit tests:

---> test TestPoolOutput had ERRORS

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3846/1503/summary.html

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_1_X IBs unless changes (but tests are reportedly failing). @nclopezo, @ktf can you please take care of it?

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_7_1_X IBs unless changes (but tests are reportedly failing). @nclopezo, @ktf can you please take care of it?

@Dr15Jones
Copy link
Contributor

@nclopezo @ktf I can't see how this test could have failed because of this pull request. It looks like while this test was running, something deleted a file which was created earlier in the test (the log even says the file was made).
Please rerun the tests.

@cmsbuild
Copy link
Contributor

-1
I ran the usual tests and I found errors in the following unit tests:

---> test TestSecondaryInput had ERRORS
---> test TestPoolOutput had ERRORS

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3846/1531/summary.html

@cmsbuild
Copy link
Contributor

@ktf
Copy link
Contributor

ktf commented May 15, 2014

Mmmm… rerunning gave the same errors, which are not expected. Can you try merging this topic and actually running scram b runtests in your own area?

Unit test fixes unrelated to the primary commit
that were probably causing the jenkins unit
tests to fail. I think some cleanup mechanism
deletes the files created in early steps which
are needed in later steps of the shell script.
@cmsbuild
Copy link
Contributor

Pull request #3846 was updated. @cmsbuild, @Degano, @Dr15Jones, @ktf, @nclopezo can you please check and sign again.

@wddgit
Copy link
Contributor Author

wddgit commented May 15, 2014

Hi Giuio. I already ran the unit tests in my own area, before I submitted the pull request and just a couple minutes ago. They pass. I think what is happening is unrelated to my pull request. The failing unit tests run a shell script that invokes cmsRun multiple times, then the later steps use files created in the earlier steps. It appears some cleanup mechanism is deleted the files. Most of our test shell scripts start with pushd ${LOCAL_TMP_DIR} and end with popd. The two scripts that just failed do not so I added that in and suspect it avoids the problem. I do not know why they started failing now.

@wddgit
Copy link
Contributor Author

wddgit commented May 15, 2014

The other thing to add is that the reason I think these failures are unrelated to the pull request is that almost all the changes I made are in code used only when there is a SubProcess and the failing tests do not have SubProcess's. Anything is possible, but it is very unlikely .... Lets see if the Jenkins tests pass now.

@cmsbuild
Copy link
Contributor

-1
When I ran the RelVals I found an error in the following worklfows:
4.22 step2

runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3846/1542/summary.html

@wddgit
Copy link
Contributor Author

wddgit commented May 15, 2014

Another failure unrelated to the pull request. The relval 4.22 is having a problem with the das query returning no files or something like that. It could not be related to the modifications in the pull request.

@cmsbuild
Copy link
Contributor

-1
I ran the usual tests and I found errors in the following unit tests:

---> test TestFWCoreIntegrationStandalone had ERRORS

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3846/1556/summary.html

@cmsbuild
Copy link
Contributor

Here is a fix for the standalone unit test failure.
Probably this does not help, but there was some
question whether CPPUNIT would run the two subtests
in the right order and this fix guarantees that.

I suspect the actual cause of the recent errors is
that the tests are being run on a machine with more cores
or that is faster and the cleanup is occurring earlier,
too early. Files that are still being used are getting
cleaned up earlier than before.

None of this has anything to do with the changes
in the original pull request
ktf added a commit that referenced this pull request May 16, 2014
Core -- Fix bug affecting SubProcesses
@ktf ktf merged commit 802a6b2 into cms-sw:CMSSW_7_1_X May 16, 2014
@ktf
Copy link
Contributor

ktf commented May 16, 2014

Merging this. We'll see tomorrow if the same issue is triggered in IBs relvals or not.

@wddgit wddgit deleted the subProcessBugFix branch August 27, 2014 19:49
davidlange6 pushed a commit to davidlange6/cmssw that referenced this pull request Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants