Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SonicTriton tests with Singularity #31616

Merged
merged 7 commits into from Oct 2, 2020
Merged

Conversation

kpedro88
Copy link
Contributor

@kpedro88 kpedro88 commented Sep 29, 2020

PR description:

SonicTriton tests can now run on either Singularity or Docker. This makes the test setup more accessible (since Docker typically requires superuser permission). An augmented Docker image with PyTorch libraries included is now hosted on DockerHub at https://hub.docker.com/repository/docker/fastml/triton-torchgeo, while the Singularity version of that image is automatically generated and hosted on /cvmfs/unpacked.cern.ch thanks to https://gitlab.cern.ch/unpacked/sync.

A unified script triton is introduced to handle the server for both the Docker and Singularity cases. It also handles using GPU instead of CPU, verbosity, waiting for the server to actually start, and other details. The documentation is updated accordingly.

(This PR also fixes some alarmingly fast link rot in the tests... hopefully Nvidia is done renaming the triton inference server repository.)

PR validation:

Tested on several different machines, and had some other users test as well.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31616/18681

  • This PR adds an extra 20KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for master.

It involves the following packages:

HeterogeneousCore/SonicTriton

@makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks.
@makortel, @rovere this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 29, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+1
Tested at: 576f4d7
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb2074/9649/summary.html
CMSSW: CMSSW_11_2_X_2020-09-29-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb2074/9649/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2542225
  • DQMHistoTests: Total failures: 13
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2542190
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 149 log files, 22 edm output root files, 35 DQM output files

@kpedro88
Copy link
Contributor Author

kpedro88 commented Oct 2, 2020

@makortel @fwyzard any comments?

@makortel
Copy link
Contributor

makortel commented Oct 2, 2020

The fetch_model.sh is not run by PR or IB tests, right?

@kpedro88
Copy link
Contributor Author

kpedro88 commented Oct 2, 2020

No, not currently. (At some point, we'll try to introduce a unit test using the local CPU server, but this is not ready at present.)

@makortel
Copy link
Contributor

makortel commented Oct 2, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit bdc0699 into cms-sw:master Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants