New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3.pkgs.mmcv: init at 1.7.1 #219815
python3.pkgs.mmcv: init at 1.7.1 #219815
Conversation
c4ba1d7
to
567a01e
Compare
|
||
buildPythonPackage rec { | ||
pname = "mmcv"; | ||
version = "1.7.0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.7.1 was released on Jan 3rd, but I wouldn't want to block this MR because of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's 3 months ago. If we introduce a new package and it is a trivial update, we should do that right away unless there is a reason not to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
03ff9a1
to
3e58cc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last commit message doesnt meet guidelines,
since this is an init PR, I would just squash all the commits
LDFLAGS = lib.optionalString cudaSupport "-L${cudatoolkit.lib}/lib"; | ||
|
||
meta = with lib; { | ||
description = "MMCV is a foundational library for computer vision research"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per contribution guidelines, description should not start or contain name of the package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
61a17c0
to
c81b004
Compare
hash = "sha256-EVu6D6rTeebTKFCMNIbgQpvBS52TKk3vy2ReReJ9VQE="; | ||
}; | ||
|
||
torch-deps = if cudaSupport then [ torch-bin torchvision-bin ] else [ torch torchvision ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the use of torch-bin
and torchvision-bin
is desirable here. We have torch{With,Without}Cuda
and also torch
itself can have cuda
enabled or not depending on config.enableCuda
or overriding torch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might use torch torch vision, and let the user overwrite it in a flake file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone might do that, they can always overwrite the torch input to this package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
, pythonOlder | ||
, cudaSupport ? false | ||
, cudatoolkit | ||
, torchCudaArchList ? "Kepler" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have a single package depending on cuda arch settings like this. We already have e.g. config.cudaCapabilities for configuring what architectures cuda-enabled packages are built for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to use torch/default.nix as an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
torch-deps = if cudaSupport then [ torch-bin torchvision-bin ] else [ torch torchvision ]; | ||
nativeBuildInputs = [ ninja which ] | ||
++ lib.lists.optionals cudaSupport [ cudatoolkit.lib cudatoolkit.out cudatoolkit.cc ] ++ torch-deps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't be necessarily for torch/torchvision to be in nativeBuildInputs and propagatedBuildInputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcdarwin torch is needed at runtime. not sure there is another way to do it without propagatedBuildInputs
, pytest | ||
, pythonOlder | ||
, cudaSupport ? false | ||
, cudatoolkit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'd consider this a blocker for this PR, but for future work consider using cudaPackages
over cudatoolkit
. cudaPackages
uses redistributables from NVIDIA and allows for much smaller closures because you choose which libraries you need. It also makes it easier to maintain because we then have greater visibility into what the package is using.
For an example, take a look at
nixpkgs/pkgs/development/python-modules/torchvision/default.nix
Lines 17 to 40 in 7f194c7
inherit (torch) cudaCapabilities cudaPackages cudaSupport; | |
inherit (cudaPackages) cudatoolkit cudaFlags cudaVersion; | |
# NOTE: torchvision doesn't use cudnn; torch does! | |
# For this reason it is not included. | |
cuda-common-redist = with cudaPackages; [ | |
cuda_cccl # <thrust/*> | |
libcublas # cublas_v2.h | |
libcusolver # cusolverDn.h | |
libcusparse # cusparse.h | |
]; | |
cuda-native-redist = symlinkJoin { | |
name = "cuda-native-redist-${cudaVersion}"; | |
paths = with cudaPackages; [ | |
cuda_cudart # cuda_runtime.h | |
cuda_nvcc | |
] ++ cuda-common-redist; | |
}; | |
cuda-redist = symlinkJoin { | |
name = "cuda-redist-${cudaVersion}"; | |
paths = cuda-common-redist; | |
}; |
I break the dependencies into those needed only at build time, run time, or both (common). Unfortunately, because most packages expect all the CUDA libraries to be in a single directly, we need to use symlinkJoin
to group them together.
Also, make sure you're specifying the right C/C++ compilers if you're building with CUDA support, like here: #218265. (Torchvision doesn't yet do that; I've got an open issue here: #221898.)
The compiler provided by stdenv
isn't necessarily supported by NVCC, and it's possible to get odd symbol errors during linking if the other derivations were built with a different compiler or language standard. That's why it's important to use backendStdenv.cc
.
Torchvision and Torch will probably have other things you might want to look at. If you have any questions, don't hesitate to reach out to me or the other CUDA maintainers on Matrix: https://matrix.to/#/#cuda:nixos.org.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @ConnorBaker , thanks a lot for the advices and examples. thanks @bcdarwin for raising this. I am thinking about using torch.cudaCapabilities as well. but I think there is a valid usecase for people who want to overwrite with torch-bin (doesn't require compilation), but doesn't have passthru for cudaCapabilities. would you suggest that I copy the same pattern from torch/default.nix to get cudaCababilities?
as for the odd symbol error, I am already getting that with opencv4, as it is compiled with stdenv.cc gcc 12, and cudatoolkit uses gcc 11. I will give it another go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also. with following nix-shell, the torchvision now fails to build with cuda on latest master @ConnorBaker. i assume gcc 12.2 is from stdenv
with import ../nixpkgs {
config.allowUnfree = true;
};
let
mypthon =
let
packageOverrides = self: super: {
# packagesOverride, will affect all packages that uses the resulted packages as deps
torch = super.torch.override{ magma = magma-cuda; cudaSupport = true; };
# opencv4 = super.opencv4.override {
# enableBlas = false;
# };
};
in
python3.override { inherit packageOverrides; self = mypthon; };
in
let
pythonEnv = mypthon.withPackages (ps: with ps; [ torchvision ]);
in
mkShell {
packages = [ pythonEnv ];
}
raise RuntimeError(
RuntimeError: The current installed version of g++ (12.2.0) is greater than the maximum required version by CUDA 11.7 (11.5.0). Please make sure to use an adequate version of g++ (>=6.0.0, <=11.5.0).
/nix/store/c3f4jdwzn8fm9lp72m91ffw524bakp6v-stdenv-linux/setup: line 1593: pop_var_context: head of shell_variables not a function context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think there is a valid usecase for people who want to overwrite with torch-bin (doesn't require compilation), but doesn't have passthru for cudaCapabilities.
IIRC, newer torch binaries use CUDA redistributables from PyPi. If they haven't moved over to that yet, then their builder is still using bash scripts to manually copy libraries into their binary :(
There's a hardcoded set of supported capabilities for each version of PyTorch. You can find them here: https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L1751-L1752.
Because it's a binary distributions and not "built" with Nixpkgs, the CUDA capabilities supported by the binary aren't tied to what we specify in our config.
For that reason I'd recommend you use the same logic torch/default.nix
has. (Those versions are actually sourced from that same file!)
as for the odd symbol error, I am already getting that with opencv4, as it is compiled with stdenv.cc gcc 12, and cudatoolkit uses gcc 11. I will give it another go.
I have a PR open which should help with that: #221370.
i assume gcc 12.2 is from stdenv
That's right, it is :(
I know that error well. It's from here: https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L415. (Related discussion on whether checks like that should be removed when included in Nixpkgs: #221564 (comment).)
That's odd that you're getting it though, since the torchvision
derivation should correctly set the CC/CXX environment variables to the right compiler:
nixpkgs/pkgs/development/python-modules/torchvision/default.nix
Lines 65 to 72 in e400f93
# NOTE: We essentially override the compilers provided by stdenv because we don't have a hook | |
# for cudaPackages to swap in compilers supported by NVCC. | |
+ lib.optionalString cudaSupport '' | |
export CC=${backendStdenv.cc}/bin/cc | |
export CXX=${backendStdenv.cc}/bin/c++ | |
export TORCH_CUDA_ARCH_LIST="${lib.concatStringsSep ";" cudaCapabilities}" | |
export FORCE_CUDA=1 | |
''; |
With respect to your nix file can you try doing something like the following instead? Substitute whatever capability is appropriate for your device (though using just one does make it a lot faster to rebuild stuff from source!).
with import ../nixpkgs {
config = {
allowUnfree = true;
cudaSupport = true;
cudaCapabilities = [ "8.6" ];
};
};
I think it was choosing the wrong compiler because cudaSupport
wasn't explicitly set to true
.
Let me know how that works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnorBaker with your config, I fixed the issue. thanks a bunch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConnorBaker, your MR on opencv fixed my symbol error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! Now it just needs to get merged (somewhat difficult given the large number of rebuilds)!
# reason for not using pytestCheckHook is similiar to what | ||
# is already mentioned in pkgs/development/python-modules/typed-ast | ||
# test_cnn test_ops really requires gpus to be useful. | ||
# some of the tests take exceedingly long time. | ||
# the rest of the tests are disabled due to sandbox env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not justify not using pytestCheckHook at all
# reason for not using pytestCheckHook is similiar to what | |
# is already mentioned in pkgs/development/python-modules/typed-ast | |
# test_cnn test_ops really requires gpus to be useful. | |
# some of the tests take exceedingly long time. | |
# the rest of the tests are disabled due to sandbox env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do
preCheck = ''
PYTHONPATH=$out/${python.sitePackages}:$PYTHONPATH
''
which seems to work with a lot of packages with c extension, but it didn't work for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another common way is to rm the the conflicting source directory but pytestCheckHook does just invoke pytest as you do, it only builds the arguments to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to find that out the hard way. lol. will add disabled tests like you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pytest --ignore=tests/test_cnn \ | ||
--ignore=tests/test_ops \ | ||
--ignore=tests/test_fileclient.py \ | ||
--ignore=tests/test_load_model_zoo.py \ | ||
--ignore=tests/test_runner/test_checkpoint.py \ | ||
--ignore=tests/test_video/test_processing.py \ | ||
--ignore=tests/test_utils/test_hub.py \ | ||
--ignore=tests/test_video/test_reader.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please convert this into disabledTestFiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
hash = "sha256-EVu6D6rTeebTKFCMNIbgQpvBS52TKk3vy2ReReJ9VQE="; | ||
}; | ||
|
||
torch-deps = if cudaSupport then [ torch-bin torchvision-bin ] else [ torch torchvision ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone might do that, they can always overwrite the torch input to this package
|
||
buildPythonPackage rec { | ||
pname = "mmcv"; | ||
version = "1.7.0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's 3 months ago. If we introduce a new package and it is a trivial update, we should do that right away unless there is a reason not to
|
||
nativeCheckInputs = [ pytestCheckHook ]; | ||
|
||
checkInputs = [ lmdb onnx onnxruntime scipy pyturbojpeg tifffile ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should almost certainly be moved to nativeCheckInputs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1 |
eba6834
to
9275745
Compare
inherit (torch) cudaCapabilities cudaPackages cudaSupport; | ||
inherit (cudaPackages) backendStdenv cudaVersion; | ||
|
||
# for cuda support, we are waiting on opencv: misc CUDA-related updates and fixes; add enableLto #221370 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was merged in Nixos:Staging not master. so its still relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently in staging-next https://nixpk.gs/pr-tracker.html?pr=221370
I would suggest to just leave this open until that is merged which will be very soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#221370 has been merged into master. @SuperSandro2000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you can resolve the comment now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
Results of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Btw, nixpkgs-review has a |
@ofborg build python310Packages.mmcv |
Description of changes
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)