Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetch data products from Event concurrently #15433

Merged
merged 22 commits into from Aug 11, 2016

Conversation

Dr15Jones
Copy link
Contributor

Use the consumes information to allow prefetching to asynchronously retrieve the data products before running the module.

Dr15Jones and others added 22 commits August 10, 2016 10:22
In order to avoid the possibility of deadlocks occurring while a legacy or one module is doing a delayed get while running, we will disallow such activities. Once the full implementation of multiple threads per event is implemented, this restriction will be lifted.
To simplify the initial implementation of allowing multiple threads for one event, we require all 'mayConsumes' to be prefetched.
The Event based module signals are handled elsewhere.
Once we allow asynchronous data fetching from the source for one event we will need to using a WaitingTaskList to handle starting tasks which are waiting for the data.
-changed function names for prefetching to include Async and pass in a callback task as argument.
-separated prefetching and running modules into two different functions in Worker
-made sure all unit tests still pass
exceptionContext was previously a standalone templated function. The template was being created for each OccurenceTraits. This was changed to just be a static non-templated function of Worker.
Previously one of the function arguments to ProductResolverBase::resolveProduct was used as a return value to denote if the request was 'ambiguous'. Such an argument is problematic for an asynchronous call. Instead, the return value of the function was changed to accommodate also carrying that information.
Cache the decision of what ProductResolver to call after the first call each transition to NoProcessProductResolver. This will allow prefetching to trigger the real work and then when the data is later requested in the module we will not redo the work.
Ultimately shared resources will be handled via SerialTaskQueues but that transition can be done gradually. First start by using a SerialTaskQueue for all requests going to the Source.
This is only a starting point for asynchronous prefetches used to checkpoint that the code which has been written passes all framework unit tests.
Items which are still under work:
-Modules on Paths request asynchronous prefetches but then wait for them to be resolved.
-UnscheduledProductResolvers will run their modules synchronously if that is the first request for the data product. If other requests come while running the module, those requests will be done asynchronously.
-Mutexes are still used to synchronize across Streams.
Spawn a task in the case where prefetchAsync is called for the first time. This eventually should be changed to properly handle using the WaitingTaskQueue in the Worker as well as separating prefetching the products for the Worker from running of the module via the Worker.
The BusyWaitIntProducer was developed in order to have a test module actually take up some time and CPU cycles.
Updated the unit test output comparison files to account for the changed order of calling unscheduled modules. The order was changed because asynchronous prefetching runs the modules in FILO order instead of FIFO as was done before. The two orderings are equivalent since unscheduled doesn't care about the order.
Handle the worker state transitions as well as state accounting in a thread safe manner.
Switch to using std::exception_ptr for caching any exceptions.
Moved the logic that decides if an exception should be rethrown into a separate function. This decreases the size of the templated Worker::doWork function and setups to allow that code to be used for the asynchronous processing function.
When running a module in unscheduled mode, have the UnscheduledProductResolver start a task to do any needed prefetching of data. Once all the prefetching has been completed, have the last task launch the task which will run the module.
The Service system now emits a signal before and after a module has done prefetching.
The Tracer service was updated to use this signal.
When a TBB task started by the framework starts up, we need to make sure the ServiceRegistry is properly setup to allow Services to be visible.
Now that we can use more TBB threads than Streams in the system, we need to make sure that all of the threads are known to ROOT. This is done by creating 1 task for each thread and then waiting until all tasks have done their work before proceeding.
Fixed the case where a module asks for data with the same label as
the module but from an earlier process via the skipCurrentProcess
mechanism.
CMS_THREAD_GUARD should be on the line for the variable being guarded and the argument should be the variable doing the guarding. The previous version of the code had those reversed.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_X.

It involves the following packages:

FWCore/Framework
FWCore/Integration
FWCore/ServiceRegistry
FWCore/Services

@cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 11, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14478/console

@Dr15Jones
Copy link
Contributor Author

+1
This was tested in CMSSW_8_1_DEVEL_X

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs after it passes the integration tests. This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar

@davidlange6 davidlange6 merged commit 4740b57 into cms-sw:CMSSW_8_1_X Aug 11, 2016
@cmsbuild
Copy link
Contributor

-1

Tested at: 516d54a

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15433/14478/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test testNestedArrays had ERRORS
---> test testORA_9 had ERRORS
---> test testNamedReference had ERRORS
---> test testContainerLock had ERRORS
---> test testNameService had ERRORS
---> test testORA_0 had ERRORS
---> test testORA_4 had ERRORS
---> test testORABasicException had ERRORS
---> test testORA_7 had ERRORS
---> test testORA_5 had ERRORS
---> test testORA_1 had ERRORS
---> test testORA_6 had ERRORS
---> test testORA_11 had ERRORS
---> test testORA_10 had ERRORS
---> test testORAInit had ERRORS
---> test testORAIO had ERRORS
---> test testORA_3 had ERRORS
---> test testORA_12 had ERRORS
---> test testORAUtility had ERRORS
---> test testORA_2 had ERRORS
---> test testORA_8 had ERRORS
---> test testJetMETCorrectionsType1MET had ERRORS

@cmsbuild
Copy link
Contributor

dan131riley pushed a commit to dan131riley/cmssw that referenced this pull request Aug 13, 2016
@Dr15Jones Dr15Jones deleted the prefetchConcurrently branch August 15, 2016 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants