New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefetch data products from Event concurrently #15433
Prefetch data products from Event concurrently #15433
Conversation
In order to avoid the possibility of deadlocks occurring while a legacy or one module is doing a delayed get while running, we will disallow such activities. Once the full implementation of multiple threads per event is implemented, this restriction will be lifted.
To simplify the initial implementation of allowing multiple threads for one event, we require all 'mayConsumes' to be prefetched.
The Event based module signals are handled elsewhere.
Once we allow asynchronous data fetching from the source for one event we will need to using a WaitingTaskList to handle starting tasks which are waiting for the data.
-changed function names for prefetching to include Async and pass in a callback task as argument. -separated prefetching and running modules into two different functions in Worker -made sure all unit tests still pass
exceptionContext was previously a standalone templated function. The template was being created for each OccurenceTraits. This was changed to just be a static non-templated function of Worker.
Previously one of the function arguments to ProductResolverBase::resolveProduct was used as a return value to denote if the request was 'ambiguous'. Such an argument is problematic for an asynchronous call. Instead, the return value of the function was changed to accommodate also carrying that information.
Cache the decision of what ProductResolver to call after the first call each transition to NoProcessProductResolver. This will allow prefetching to trigger the real work and then when the data is later requested in the module we will not redo the work.
Ultimately shared resources will be handled via SerialTaskQueues but that transition can be done gradually. First start by using a SerialTaskQueue for all requests going to the Source.
This is only a starting point for asynchronous prefetches used to checkpoint that the code which has been written passes all framework unit tests. Items which are still under work: -Modules on Paths request asynchronous prefetches but then wait for them to be resolved. -UnscheduledProductResolvers will run their modules synchronously if that is the first request for the data product. If other requests come while running the module, those requests will be done asynchronously. -Mutexes are still used to synchronize across Streams.
Spawn a task in the case where prefetchAsync is called for the first time. This eventually should be changed to properly handle using the WaitingTaskQueue in the Worker as well as separating prefetching the products for the Worker from running of the module via the Worker.
The BusyWaitIntProducer was developed in order to have a test module actually take up some time and CPU cycles.
Updated the unit test output comparison files to account for the changed order of calling unscheduled modules. The order was changed because asynchronous prefetching runs the modules in FILO order instead of FIFO as was done before. The two orderings are equivalent since unscheduled doesn't care about the order.
Handle the worker state transitions as well as state accounting in a thread safe manner. Switch to using std::exception_ptr for caching any exceptions.
Moved the logic that decides if an exception should be rethrown into a separate function. This decreases the size of the templated Worker::doWork function and setups to allow that code to be used for the asynchronous processing function.
When running a module in unscheduled mode, have the UnscheduledProductResolver start a task to do any needed prefetching of data. Once all the prefetching has been completed, have the last task launch the task which will run the module.
The Service system now emits a signal before and after a module has done prefetching. The Tracer service was updated to use this signal.
When a TBB task started by the framework starts up, we need to make sure the ServiceRegistry is properly setup to allow Services to be visible.
Now that we can use more TBB threads than Streams in the system, we need to make sure that all of the threads are known to ROOT. This is done by creating 1 task for each thread and then waiting until all tasks have done their work before proceeding.
Fixed the case where a module asks for data with the same label as the module but from an earlier process via the skipCurrentProcess mechanism.
CMS_THREAD_GUARD should be on the line for the variable being guarded and the argument should be the variable doing the guarding. The previous version of the code had those reversed.
A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_X. It involves the following packages: FWCore/Framework @cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are list here #13028 |
please test |
The tests are being triggered in jenkins. |
+1 |
This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs after it passes the integration tests. This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @smuzaffar |
-1 Tested at: 516d54a You can see the results of the tests here: I found follow errors while testing this PR Failed tests: UnitTests
I found errors in the following unit tests: ---> test testNestedArrays had ERRORS |
Use the consumes information to allow prefetching to asynchronously retrieve the data products before running the module.