Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SerialTaskQueues to protect shared resources #15620

Merged
merged 9 commits into from Aug 27, 2016

Conversation

Dr15Jones
Copy link
Contributor

Replaced the use of std::recursive_mutexes with SerialTaskQueues when serializing modules to protect a shared resource. This allows the framework to only schedule a TBB task to run when that resource is free thereby opening up thread to do other work while the resource is in use.

This satisfies phase 1.75 of the threaded framework development plan.

Unit test checks that no modules are run concurrently in a job where we only have legacy and one modules using shared resources.
-added move based functions
-pushAndWait will rethrow an exception thrown in the lambda
-added numberOfQueues() to be used by SharedResourcesAcquirer
Switched from using std::mutex to SerialTaskQueueChain to handle serialization when acquiring a shared resource.
The SharedResourcesAcquirer now primarily holds a SerialTaskQueueChain and only holds mutexes for sharing between the Source and the DelayedReader. The SharedResourcesAcquirer are built to always contain at least one SerialTaskQueueChain since we no longer have a per module mutex and instead depends on having at least one queue.
This initial implementation uses the SerialTaskQueueChain in a synchronous fashion. Future changes will move to asynchronous usage.
We also need to ignore the case where the result returns a very small number, i.e. one with 'e-' in its value.
The decrementing of the ref count on the waiting task must be done after the exception is set in order to avoid the case where the wait_for_all is released and we return from pushAndWait before the value is assigned to the now gone away stack variable.
No longer do a SerialTaskQueueChain::pushAndWait when we are requested to run a legacy or one module as part of a data prefetch. Now the task can return immediately and the module will be run once the its task reaches the head of the queue. This required moving the use of the SerialTaskQueueChain to the task launched after the prefetch has completed.
The mutexes were only used for an InputSource. Now the InputSource gets an explicit mutex along with its SharedResourcesAcquirer.
Now the signal is emitted right after the post prefetch task has started rather than just right before we are going to run the module.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_ROOT64_X.

It involves the following packages:

FWCore/Concurrency
FWCore/Framework
IOPool/Input

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@Dr15Jones
Copy link
Contributor Author

@davidlange6 @smuzaffar Like last time, this is a potentially breaking change so I'd like it testes in CMSSW_8_1_DEVEL_X.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 26, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14752/console

@smuzaffar
Copy link
Contributor

this time cms-bot/jenkins picked up the right release i.e. CMSSW_8_1_DEVEL_X_2016-08-25-2300 to test this PR.

@Dr15Jones
Copy link
Contributor Author

Nice!

@Dr15Jones
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_ROOT64_X IBs after it passes the integration tests.

@cmsbuild
Copy link
Contributor

-1

Tested at: 56ad3c2

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15620/14752/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test TestFWCoreIntegrationStandalone had ERRORS

@Dr15Jones
Copy link
Contributor Author

I've been unable to isolate how this change has affected this unit test. It would help further the testing of this change to merge it into _DEVEL, assuming the usual pull request RelVals ran OK.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 26, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14765/console

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_ROOT64_X IBs after it passes the integration tests.

@cmsbuild
Copy link
Contributor

-1

Tested at: 0da6f16

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15620/14765/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
1000.0 step1

DAS Error

@cmsbuild
Copy link
Contributor

-1

Tested at: 56ad3c2

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15620/14760/summary.html

I found follow errors while testing this PR

Failed tests: RelVals

  • RelVals:

When I ran the RelVals I found an error in the following worklfows:
4.22 step1

DAS Error

@Dr15Jones
Copy link
Contributor Author

The failed test is not related to this pull request.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 27, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14770/console

@cmsbuild
Copy link
Contributor

-1

Tested at: 0da6f16

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15620/14770/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test TestFWCoreFrameworkGlobalStreamOne had ERRORS
---> test TestFWCoreIntegrationStandalone had ERRORS

@cmsbuild
Copy link
Contributor

@Dr15Jones
Copy link
Contributor Author

Both test failures are deficiencies in the tests themselves and not from this pull request. TestFWCoreFrameworkGlobalStreamOne failed because it has an artificial limit on one of its measurements that just happened to be exceeded. Looks like I still need to tune that cut. TestFWCoreIntegrationStandalone sometimes fails the same way in the IB.

Neither of these should keep this change from merging into _DEVEL.

@smuzaffar smuzaffar merged commit 841c7d6 into cms-sw:CMSSW_8_1_ROOT64_X Aug 27, 2016
@Dr15Jones
Copy link
Contributor Author

@smuzaffar Thanks!

@Dr15Jones Dr15Jones deleted the runModuleAsync branch September 12, 2016 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants