Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TensorRT packages #178397

Merged
merged 3 commits into from
Aug 5, 2022
Merged

Add TensorRT packages #178397

merged 3 commits into from
Aug 5, 2022

Conversation

aidalgol
Copy link
Contributor

@aidalgol aidalgol commented Jun 21, 2022

Description of changes

Add derivations for TensorRT, a high-performance deep learning interface SDK from NVIDIA, which is at this point non-redistributable. Uses different upstream tarballs depending on the current CUDA version, similar to the cudnn derivations.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

Copy link
Member

@SuperSandro2000 SuperSandro2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing meta

pkgs/development/python-modules/tensorrt/default.nix Outdated Show resolved Hide resolved
pkgs/development/python-modules/tensorrt/default.nix Outdated Show resolved Hide resolved
pkgs/development/python-modules/tensorrt/default.nix Outdated Show resolved Hide resolved
pkgs/applications/science/math/tensorrt/default.nix Outdated Show resolved Hide resolved
pkgs/applications/science/math/tensorrt/default.nix Outdated Show resolved Hide resolved
pkgs/applications/science/math/tensorrt/default.nix Outdated Show resolved Hide resolved
@aidalgol
Copy link
Contributor Author

aidalgol commented Jun 21, 2022

@SuperSandro2000 I appreciate the quick feedback, but this is nowhere ready for review. I created this draft PR at the suggestion of @SomeoneSerge, after much back and forth in the Nix CUDA Maintainers Matrix room, to ease collaboration efforts on this package. I am aware of the code quality issues, but I am trying to get this to a state where the derivations will build and the python module finds the necessary libraries at runtime. Once this actually works, I will bring the nix code in line with nixpkgs conventions, and parameterise the version similar to the mathematica package.

@aidalgol aidalgol requested review from a team and removed request for a team June 21, 2022 10:24
@aidalgol aidalgol self-assigned this Jun 21, 2022
@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Jun 21, 2022

Good! Now let's expose the expression e.g. as an atttribute, e.g. cudaPackages.tensorrt
You'd need to write an extension for cudaPackages namespace.
It's going to be a function of the form final: prev: { tensorrt = final.callPackage ... { }; } where final an instance of cudaPackages.
You'll append it to this list:

composedExtension = composeManyExtensions [

For now, I think you could define it right there, next to cutensorExtension (which you can use as a template, by the way)

Also, as Sandro suggests, it won't hurt to start bringing things into a shape right away: let's add a meta attribute that would declare the unfree license, limit the platforms to (I suppose) x86_64-linux, and provide a link to some changelog s.t. it's easier to track updates at least manually; let's move the version into a let ... in that we can reuse it when forming the filenames

I'd also suggest that you run nixpkgs-fmt prior to commits if you're not doing that yet

@aidalgol
Copy link
Contributor Author

@SomeoneSerge With my latest changes, you should now be able to reproduce the error I was describing on matrix (from before creating this draft PR) by running NIXPKGS_ALLOW_UNFREE=1 nix-build -A python39Packages.tensorrt. I'm not sure whether the checkPhase I have written is appropriate for normal regression testing once we have this working, but it makes testing changes at this stage easier.

@SuperSandro2000
Copy link
Member

If you want to collaborate on very WIP things I would suggest to work in a PR or branch in your fork.

@SomeoneSerge
Copy link
Contributor

@SuperSandro2000 thank you. It explicitly was my suggestion to @aidalgol that we move the discussion from matrix straight into a draft PR in nixpkgs that we can iterate faster. This is a messy non-redist package, which means there's some trivia about manually gluing pieces together to handle in the beginning

With that done, we'll mark the draft as "ready for review" and tag you

Cheers

@aidalgol
Copy link
Contributor Author

@SomeoneSerge With your suggested changes, I am back to where I was before moving this to a nixpkgs branch.

$ NIX_PATH=nixpkgs=$PWD NIXPKGS_ALLOW_UNFREE=1 nix-shell -p 'python310.withPackages (pypkgs: [ pypkgs.tensorrt ])'
$ python3 -c 'import tensorrt; assert tensorrt.Builder(tensorrt.Logger())'
[06/23/2022-12:09:18] [TRT] [E] 6: [libLoader.h::DynamicLibrary::49] Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.4.0)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: pybind11::init(): factory function returned nullptr

From running the same python3 command under strace, something appears to be looking for libnvinfer_builder_resource.so.8.4.0 in the wrong places:

openat(AT_FDCWD, "/run/opengl-driver/lib/libnvinfer_builder_resource.so.8.4.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wk3xam5pd0a0jwirggg5zpbjzw8zzaf3-gcc-11.3.0-lib/lib/libnvinfer_builder_resource.so.8.4.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fz33c1mfi2krpg1lwzizfw28kj705yg0-glibc-2.34-210/lib/libnvinfer_builder_resource.so.8.4.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

I have no idea what is looking for it here, though.

Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for showing these packages some love @aidalgol! It's def a step forward.

I'm afraid I may not have enough context to be very useful here, since I don't use TensorRT personally. I just have a few notes re the cudaPackages migration that we've been working on.

Add derivation for TensorRT 8, a high-performance deep learning interface SDK
from NVIDIA, which is at this point non-redistributable.  The current version
aldo requires CUDA 11, so this is left out of the cudaPackages_10* scopes.
@aidalgol aidalgol changed the title Draft: Add TensorRT packages Add TensorRT packages Jun 24, 2022
@aidalgol aidalgol requested a review from a team June 24, 2022 01:06
@aidalgol aidalgol marked this pull request as ready for review June 24, 2022 01:06
Refactor derivation to pick the version that supports the current CUDA version.
Based on the implementation of the same concept in the cudnn derivation.
Hash mismatch for cudnn_8_3_2 discovered when building
cudaPackages_10_2.tensorrt, which depends on this version of cudnn.
Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not part of the NVIDIA dev program, so I don't have a means to test building this but the changes LGTM.

Any remaining hold ups on merging this?

@aidalgol
Copy link
Contributor Author

aidalgol commented Aug 5, 2022

Any remaining hold ups on merging this?

Nothing on my end. Just waiting for my last set of changes to be reviewed.

@SuperSandro2000 SuperSandro2000 merged commit 12a7360 into NixOS:master Aug 5, 2022
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-install-a-specific-version-of-cuda-and-cudnn/21725/4

@aidalgol aidalgol deleted the tensorrt branch May 12, 2023 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants