New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SonicTriton tests with Singularity #31616
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31616/18681
|
A new Pull Request was created by @kpedro88 (Kevin Pedro) for master. It involves the following packages: HeterogeneousCore/SonicTriton @makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
The tests are being triggered in jenkins.
|
+1 |
Comparison job queued. |
Comparison is ready Comparison Summary:
|
The |
No, not currently. (At some point, we'll try to introduce a unit test using the local CPU server, but this is not ready at present.) |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
SonicTriton tests can now run on either Singularity or Docker. This makes the test setup more accessible (since Docker typically requires superuser permission). An augmented Docker image with PyTorch libraries included is now hosted on DockerHub at https://hub.docker.com/repository/docker/fastml/triton-torchgeo, while the Singularity version of that image is automatically generated and hosted on
/cvmfs/unpacked.cern.ch
thanks to https://gitlab.cern.ch/unpacked/sync.A unified script
triton
is introduced to handle the server for both the Docker and Singularity cases. It also handles using GPU instead of CPU, verbosity, waiting for the server to actually start, and other details. The documentation is updated accordingly.(This PR also fixes some alarmingly fast link rot in the tests... hopefully Nvidia is done renaming the triton inference server repository.)
PR validation:
Tested on several different machines, and had some other users test as well.