New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous Prefetch #15367
Asynchronous Prefetch #15367
Conversation
In order to avoid the possibility of deadlocks occurring while a legacy or one module is doing a delayed get while running, we will disallow such activities. Once the full implementation of multiple threads per event is implemented, this restriction will be lifted.
To simplify the initial implementation of allowing multiple threads for one event, we require all 'mayConsumes' to be prefetched.
The Event based module signals are handled elsewhere.
Once we allow asynchronous data fetching from the source for one event we will need to using a WaitingTaskList to handle starting tasks which are waiting for the data.
-changed function names for prefetching to include Async and pass in a callback task as argument. -separated prefetching and running modules into two different functions in Worker -made sure all unit tests still pass
exceptionContext was previously a standalone templated function. The template was being created for each OccurenceTraits. This was changed to just be a static non-templated function of Worker.
Previously one of the function arguments to ProductResolverBase::resolveProduct was used as a return value to denote if the request was 'ambiguous'. Such an argument is problematic for an asynchronous call. Instead, the return value of the function was changed to accommodate also carrying that information.
Cache the decision of what ProductResolver to call after the first call each transition to NoProcessProductResolver. This will allow prefetching to trigger the real work and then when the data is later requested in the module we will not redo the work.
Ultimately shared resources will be handled via SerialTaskQueues but that transition can be done gradually. First start by using a SerialTaskQueue for all requests going to the Source.
This is only a starting point for asynchronous prefetches used to checkpoint that the code which has been written passes all framework unit tests. Items which are still under work: -Modules on Paths request asynchronous prefetches but then wait for them to be resolved. -UnscheduledProductResolvers will run their modules synchronously if that is the first request for the data product. If other requests come while running the module, those requests will be done asynchronously. -Mutexes are still used to synchronize across Streams.
Spawn a task in the case where prefetchAsync is called for the first time. This eventually should be changed to properly handle using the WaitingTaskQueue in the Worker as well as separating prefetching the products for the Worker from running of the module via the Worker.
The BusyWaitIntProducer was developed in order to have a test module actually take up some time and CPU cycles.
Updated the unit test output comparison files to account for the changed order of calling unscheduled modules. The order was changed because asynchronous prefetching runs the modules in FILO order instead of FIFO as was done before. The two orderings are equivalent since unscheduled doesn't care about the order.
Handle the worker state transitions as well as state accounting in a thread safe manner. Switch to using std::exception_ptr for caching any exceptions.
Moved the logic that decides if an exception should be rethrown into a separate function. This decreases the size of the templated Worker::doWork function and setups to allow that code to be used for the asynchronous processing function.
When running a module in unscheduled mode, have the UnscheduledProductResolver start a task to do any needed prefetching of data. Once all the prefetching has been completed, have the last task launch the task which will run the module.
The Service system now emits a signal before and after a module has done prefetching. The Tracer service was updated to use this signal.
A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_ROOT64_X. It involves the following packages: FWCore/Framework @cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks. cms-bot commands are list here #13028 |
@davidlange6 as agreed upon, this will first go into _DEVEL |
please test |
-1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_ROOT64_X IBs (but tests are reportedly failing). |
@Dr15Jones , for now I have started the jenkins jobs by hand and will look in tot he job configuration next week to fix it (jenkins should be able to correctly find the base release e.g. CMSSW_8_1_DEVEL_X_... for this) |
-1 Tested at: 7d532b0 You can see the results of the tests here: I found follow errors while testing this PR Failed tests: UnitTests
I found errors in the following unit tests: ---> test runtestRecoEgammaElectronIdentification had ERRORS |
The unit test failures are not caused by this pull request.
|
@davidlange6 as far as I can tell, this is good to go. |
please test testing cms-bot changes to correctly find the release for devel IBs |
The tests are being triggered in jenkins. |
This is 'phase 1.5' of the threaded framework redesign. This change allows all prefetching of data to be done asynchronously within an event. This allows unscheduled modules to run in parallel. The code still processes scheduled modules and paths in sequentially.