Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument cmssw for nvidia nvprof/nvvm (10.1.x) #22009

Merged

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 29, 2018

Instrument cmssw for nvprof/nvvm

  • implement a Service to provide CMSSW-aware annotations to nvprof/nvvm
  • highlight selected modules and delayed modules
  • update for CMSSW 10.1.x

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 29, 2018

Update #21946 for compatibility with #21872 .

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 29, 2018

@cmsbuild, please test.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 29, 2018

type new-feature

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

HeterogeneousCore/CUDAServices

The following packages do not have a category, yet:

HeterogeneousCore/CUDAServices
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild can you please review it and eventually sign? Thanks.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@davidlange6
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/25875/console Started: 2018/02/02 18:31

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2018

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2018

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 2, 2018

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-22009/25875/summary.html

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /build/cmsbld/jenkins/workspace/compare-root-files-short-matrix/results/JR-comparison/PR-22009/1325.7_TTbar_13_94XNanoAODINPUT+TTbar_13_94XNanoAODINPUT+NANOEDMMC2017

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 26
  • DQMHistoTests: Total histograms compared: 2465763
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2465593
  • DQMHistoTests: Total skipped: 169
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1.09000000007 KiB( 22 files compared)
  • Checked 110 log files, 9 edm output root files, 26 DQM output files

@fabiocos
Copy link
Contributor

fabiocos commented Feb 5, 2018

@fwyzard could you please point to the new packaged needed? At present this PR adds capabilities without affecting any production workflow as far as I can see, right?

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 5, 2018

Indeed, this adds a new service that allows to see the cmsRun transitions (runs, lumis, events, modules, etc.) in the NVIDIA profiler.
While useful on its own, it is even more so once we start offloading some of the computations to the GPUs.

Since it adds a brand new service, no existing workflows should be affected.

The new package was agreed with @makortel , to keep the dependency on CUDA dependency out of the FWCore/Services package.

@makortel
Copy link
Contributor

makortel commented Feb 5, 2018

The package is something we came up to suggest for "core heterogeneous computing" components (for some "thinking out loud" see #21946 (comment)).

@fabiocos
Copy link
Contributor

fabiocos commented Feb 7, 2018

@Dr15Jones @smuzaffar before integrating this code (which is ok for me) we need to decide:

  1. is it ok to open a completely new package? For the time being it is just one class, for which a new package seems an overkill. Probably this is parallel to FWCore, and I understand this will be expanded in time (see the above comment by @makortel).

  2. if it is reasonable, we need to assign it to a category and have a responsible to sign it. It could just be Core Software, as we already do for several Utilities, or we could have a new category and a dedicated responsible for it. What is your preference?

@Dr15Jones
Copy link
Contributor

@fabiocos I'd be OK with Utilities/CUDAServices. So we would have not have to create a new top level Subsystem. This does belong in its own package since this has a unique external dependency.

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 7, 2018

I don't mind the package, but I'd rather people agree before I move it again.

Does this mean the next accelerator service will go under Utilities instead of HeterogenousCore ?

@Dr15Jones
Copy link
Contributor

It does not matter to me if we reuse Utilities or have a new HeterogenousCore. Do we anticipate many more items in HeterogenousCore? Utilities already contains various storage system adaptors which customize cmsRun for different sites. One could see accelerators as something similar.

@makortel
Copy link
Contributor

makortel commented Feb 7, 2018

The current prototype (which I'd like to go through at least one round of testing before integration to master) foresees in addition to this PR

  • CUDAService for general checks of CUDA device availability and possibly resource (e.g. CUDA stream, memory) management, in the same package as the NVProfilerService of this PR
    • For additional device types we foresee a service per device type
  • AcceleratorService for CPU+heterogeneous algorithm scheduling, should go to separate package than the per-device-type services (but uses them)
    • It could be that the result of prototyping is something different than a service, nevertheless the component should live somewhere
  • HeterogeneousProduct, a transient event data format to keep record which devices have a copy of the data and to automatically transfer the data to CPU when needed. Better be in a package separate from any of above.
    • Could also go under DataFormats. If there, should it go to DataFormats/Common or to a separate package? At least for now the it does not bring in any special dependencies.

(I don't have strong opinions where to place things as long as there is some consistency)

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 8, 2018

By the way, we will likely need also a place for CUDA helper headers / libraries.

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 9, 2018

please agree on something, or just merge ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 9, 2018

@fabiocos , what do I need to do to get the same preferential treatment as the framework changes, and have my PRs merged within 12 hours instead of 12 days ?

@fabiocos
Copy link
Contributor

@Dr15Jones @smuzaffar given the explanations, I think we can move forward with the present proposal, and put it for the time being under the Core category. Please make an alternative category suggestion if you disagree.

@fabiocos
Copy link
Contributor

+1

@fabiocos
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit 0b9e505 into cms-sw:master Feb 10, 2018
@fwyzard fwyzard deleted the instrument_CMSSW_for_Nvidia_nvprof_101x branch February 16, 2018 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants