Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous Prefetch #15367

Merged

Conversation

Dr15Jones
Copy link
Contributor

This is 'phase 1.5' of the threaded framework redesign. This change allows all prefetching of data to be done asynchronously within an event. This allows unscheduled modules to run in parallel. The code still processes scheduled modules and paths in sequentially.

In order to avoid the possibility of deadlocks occurring while a legacy or one module is doing a delayed get while running, we will disallow such activities. Once the full implementation of multiple threads per event is implemented, this restriction will be lifted.
To simplify the initial implementation of allowing multiple threads for one event, we require all 'mayConsumes' to be prefetched.
The Event based module signals are handled elsewhere.
Once we allow asynchronous data fetching from the source for one event we will need to using a WaitingTaskList to handle starting tasks which are waiting for the data.
-changed function names for prefetching to include Async and pass in a callback task as argument.
-separated prefetching and running modules into two different functions in Worker
-made sure all unit tests still pass
exceptionContext was previously a standalone templated function. The template was being created for each OccurenceTraits. This was changed to just be a static non-templated function of Worker.
Previously one of the function arguments to ProductResolverBase::resolveProduct was used as a return value to denote if the request was 'ambiguous'. Such an argument is problematic for an asynchronous call. Instead, the return value of the function was changed to accommodate also carrying that information.
Cache the decision of what ProductResolver to call after the first call each transition to NoProcessProductResolver. This will allow prefetching to trigger the real work and then when the data is later requested in the module we will not redo the work.
Ultimately shared resources will be handled via SerialTaskQueues but that transition can be done gradually. First start by using a SerialTaskQueue for all requests going to the Source.
This is only a starting point for asynchronous prefetches used to checkpoint that the code which has been written passes all framework unit tests.
Items which are still under work:
-Modules on Paths request asynchronous prefetches but then wait for them to be resolved.
-UnscheduledProductResolvers will run their modules synchronously if that is the first request for the data product. If other requests come while running the module, those requests will be done asynchronously.
-Mutexes are still used to synchronize across Streams.
Spawn a task in the case where prefetchAsync is called for the first time. This eventually should be changed to properly handle using the WaitingTaskQueue in the Worker as well as separating prefetching the products for the Worker from running of the module via the Worker.
The BusyWaitIntProducer was developed in order to have a test module actually take up some time and CPU cycles.
Updated the unit test output comparison files to account for the changed order of calling unscheduled modules. The order was changed because asynchronous prefetching runs the modules in FILO order instead of FIFO as was done before. The two orderings are equivalent since unscheduled doesn't care about the order.
Handle the worker state transitions as well as state accounting in a thread safe manner.
Switch to using std::exception_ptr for caching any exceptions.
Moved the logic that decides if an exception should be rethrown into a separate function. This decreases the size of the templated Worker::doWork function and setups to allow that code to be used for the asynchronous processing function.
When running a module in unscheduled mode, have the UnscheduledProductResolver start a task to do any needed prefetching of data. Once all the prefetching has been completed, have the last task launch the task which will run the module.
The Service system now emits a signal before and after a module has done prefetching.
The Tracer service was updated to use this signal.
@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2016

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_1_ROOT64_X.

It involves the following packages:

FWCore/Framework
FWCore/Integration
FWCore/ServiceRegistry
FWCore/Services

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit, @wmtan this is something you requested to watch as well.
@slava77, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@Dr15Jones
Copy link
Contributor Author

@davidlange6 as agreed upon, this will first go into _DEVEL

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2016

-1
Tested at: UNKNOWN
I was not able to find a release to test this PR. See the Jenkins logs for more details.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 6, 2016

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_ROOT64_X IBs (but tests are reportedly failing).

@smuzaffar
Copy link
Contributor

@Dr15Jones , for now I have started the jenkins jobs by hand and will look in tot he job configuration next week to fix it (jenkins should be able to correctly find the base release e.g. CMSSW_8_1_DEVEL_X_... for this)

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 6, 2016

-1

Tested at: 7d532b0

You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-15367/14398/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests

  • Unit Tests:

I found errors in the following unit tests:

---> test runtestRecoEgammaElectronIdentification had ERRORS
---> test testNameService had ERRORS
---> test testORA_0 had ERRORS
---> test testNamedReference had ERRORS
---> test testORA_7 had ERRORS
---> test testORA_1 had ERRORS
---> test testNestedArrays had ERRORS
---> test testORABasicException had ERRORS
---> test testORA_9 had ERRORS
---> test testContainerLock had ERRORS
---> test testORA_4 had ERRORS
---> test testORAIO had ERRORS
---> test testORA_5 had ERRORS
---> test testORA_12 had ERRORS
---> test testORA_10 had ERRORS
---> test testORA_11 had ERRORS
---> test testORAUtility had ERRORS
---> test testORA_6 had ERRORS
---> test testORA_3 had ERRORS
---> test testORAInit had ERRORS
---> test testORA_2 had ERRORS
---> test testORA_8 had ERRORS

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 6, 2016

@Dr15Jones
Copy link
Contributor Author

The unit test failures are not caused by this pull request.

  • runtestRecoEgammaElectronIdentification failed becuase it could not find its data file via xrootd
  • all the others ones fail becuase the tests could not make a direct connection to the oracle database

@Dr15Jones
Copy link
Contributor Author

@davidlange6 as far as I can tell, this is good to go.

@davidlange6 davidlange6 merged commit 39927b3 into cms-sw:CMSSW_8_1_ROOT64_X Aug 7, 2016
@smuzaffar
Copy link
Contributor

please test

testing cms-bot changes to correctly find the release for devel IBs

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 8, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/14422/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 8, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 8, 2016

@Dr15Jones Dr15Jones deleted the prefetchConcurrently_DEVEL branch August 15, 2016 14:24
@Dr15Jones Dr15Jones restored the prefetchConcurrently_DEVEL branch August 15, 2016 14:24
@Dr15Jones Dr15Jones deleted the prefetchConcurrently_DEVEL branch August 15, 2016 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants