New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug affecting SubProcesses #3846
Conversation
Fix a bug affecting SubProcesses. It affects them when input files are being merged and the input files have differing BranchIDLists and the SubProcess produces something. An attempt to use a Ref or Ptr in the SubProcess or any subsequent job using its output can result in a failed assert which aborts the job. The BranchListIndexes which are used to reference Ref's become corrupt.
A new Pull Request was created by @wddgit (W. David Dagenhart) for CMSSW_7_1_X. Fix bug affecting SubProcesses It involves the following packages: DataFormats/Provenance @cmsbuild, @Degano, @Dr15Jones, @ktf, @nclopezo can you please review it and eventually sign? Thanks. |
Does this include a unit test which would have caught this problem? |
Yes. I extended an existing unit test. That one almost caught it except that the SubProcess was not producing anything. |
+1 |
-1 ---> test TestPoolOutput had ERRORS you can see the results of the tests here: |
-1 ---> test TestSecondaryInput had ERRORS you can see the results of the tests here: |
Mmmm… rerunning gave the same errors, which are not expected. Can you try merging this topic and actually running scram b runtests in your own area? |
Unit test fixes unrelated to the primary commit that were probably causing the jenkins unit tests to fail. I think some cleanup mechanism deletes the files created in early steps which are needed in later steps of the shell script.
Hi Giuio. I already ran the unit tests in my own area, before I submitted the pull request and just a couple minutes ago. They pass. I think what is happening is unrelated to my pull request. The failing unit tests run a shell script that invokes cmsRun multiple times, then the later steps use files created in the earlier steps. It appears some cleanup mechanism is deleted the files. Most of our test shell scripts start with pushd ${LOCAL_TMP_DIR} and end with popd. The two scripts that just failed do not so I added that in and suspect it avoids the problem. I do not know why they started failing now. |
The other thing to add is that the reason I think these failures are unrelated to the pull request is that almost all the changes I made are in code used only when there is a SubProcess and the failing tests do not have SubProcess's. Anything is possible, but it is very unlikely .... Lets see if the Jenkins tests pass now. |
-1 runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log you can see the results of the tests here: |
Another failure unrelated to the pull request. The relval 4.22 is having a problem with the das query returning no files or something like that. It could not be related to the modifications in the pull request. |
-1 ---> test TestFWCoreIntegrationStandalone had ERRORS you can see the results of the tests here: |
Here is a fix for the standalone unit test failure. Probably this does not help, but there was some question whether CPPUNIT would run the two subtests in the right order and this fix guarantees that. I suspect the actual cause of the recent errors is that the tests are being run on a machine with more cores or that is faster and the cleanup is occurring earlier, too early. Files that are still being used are getting cleaned up earlier than before. None of this has anything to do with the changes in the original pull request
Core -- Fix bug affecting SubProcesses
Merging this. We'll see tomorrow if the same issue is triggered in IBs relvals or not. |
New dasgoclient version
Fix a bug affecting SubProcesses. It affects
them when input files are being merged and
the input files have differing BranchIDLists
and the SubProcess produces something.
An attempt to use a Ref or Ptr in the SubProcess
or any subsequent job using its output can result
in a failed assert which aborts the job. The
BranchListIndexes which are used to reference
Ref's become corrupt.