Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared memory support for SonicTriton #33801

Merged
merged 10 commits into from Jun 18, 2021
Merged

Conversation

kpedro88
Copy link
Contributor

PR description:

  • Added allocate() function to streamline creation of input vectors
  • Expanded unit testing: now two tests, one for CPU and one for GPU (each tests both gRPC and shared memory)
  • Shared memory used automatically (by default) for local fallback server (either CPU or GPU)
  • useSharedMemory client config parameter to disable this for a specific algorithm/producer
  • Shared memory regions are reused from one event to the next, and only reallocated if the existing region is too small; this is necessary to achieve performance improvements (otherwise the high cost of reallocating e.g. every event dwarfs any improvement)
  • Documentation updated accordingly

PR validation:

Confirmed that the same outputs are achieved whether using gRPC or shared memory, and unit tests pass as expected.

Tested performance of the two example models (ResNet50 and Graph Attention Network):

  • CPU shared memory: 3-12% faster (client-side), 3-4% faster (server-side)
  • GPU shared memory: 15-20% faster (client-side), 10-45% faster (server-side)

The performance improvements depend on the amount of data being transferred, as well as the size of the model (which controls how long the inference takes).

The impact of these latency decreases on throughput will be tested in the near future, once realistic workflows are prepared.

Technical details: this branch is squashed from several previous development branches.

Requires: cms-sw/cmsdist#6929

@kpedro88
Copy link
Contributor Author

test parameters
pull_request = cms-sw/cmsdist#6929

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33801/22787

  • This PR adds an extra 44KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for master.

It involves the following packages:

HeterogeneousCore/SonicCore
HeterogeneousCore/SonicTriton

@makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks.
@makortel, @riga, @rovere this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor Author

please test

@makortel
Copy link
Contributor

makortel commented May 21, 2021

The term "shared memory" tends to be somewhat overloaded, could you clarify what exactly it means in this context?

@makortel
Copy link
Contributor

The term "shared memory" tends to be somewhat overloaded, could you clarify what exactly it means in this context?

Ok, I suppose this is the memory that can be shared between processes.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: HeaderConsistency
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8660a4/15222/summary.html
COMMIT: 25f67dd
CMSSW: CMSSW_12_0_X_2021-05-20-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/33801/15222/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2650486
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2650451
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 36 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 155 log files, 37 edm output root files, 37 DQM output files
  • TriggerResults: no differences found

@kpedro88
Copy link
Contributor Author

@makortel yes, it's shared memory for inter-process communication. For CPU, it uses /dev/shm, while for GPU, it uses cudaIpcMemHandle_t.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33801/22801

  • This PR adds an extra 44KB to repository

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-33801/23249

  • This PR adds an extra 84KB to repository

  • Found files with invalid states:

    • HeterogeneousCore/SonicTriton/interface/grpc_client_gpu.h:

@cmsbuild
Copy link
Contributor

Pull request #33801 was updated. @makortel, @cmsbuild, @fwyzard can you please check and sign again.

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8660a4/15887/summary.html
COMMIT: d92c6b4
CMSSW: CMSSW_12_0_X_2021-06-10-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/33801/15887/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8660a4/15887/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8660a4/15887/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2862520
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2862491
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 37 files compared)
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: no differences found

@kpedro88
Copy link
Contributor Author

@makortel any further review?

@@ -21,19 +23,22 @@ namespace cms {
const char* cmd,
const char* error,
const char* message,
const char* description = nullptr) {
std::string_view description = std::string_view()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard Do you see any potential problems in using std::string_view here? (all relevant compilers for CUDA should support C++17 by some time already, right?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, the TDR has kept me fully busy the last few days (...).

My main concern was what happens in the vast majority of the cases, when no description is passed. Both @makortel and I have made some checks on godbolt, and it looks like the compiler should optimise the std::string_view away in that case.

As for C++17 vs earlier versions of the standard: yes, CUDA 11 fully supports C++ 17, so no problem there either.

@kpedro88
Copy link
Contributor Author

@makortel @fwyzard I'm happy to address any further review comments, but if the review is finished, I would like to get this merged so that other ongoing developments can rebase on top of it.

@fwyzard
Copy link
Contributor

fwyzard commented Jun 17, 2021

+heterogeneous

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Jun 18, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants