Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3.pkgs.mmcv: init at 1.7.1 #219815

Merged
merged 1 commit into from Apr 30, 2023
Merged

Conversation

benxiao
Copy link
Contributor

@benxiao benxiao commented Mar 6, 2023

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.


buildPythonPackage rec {
pname = "mmcv";
version = "1.7.0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.7.1 was released on Jan 3rd, but I wouldn't want to block this MR because of it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 3 months ago. If we introduce a new package and it is a trivial update, we should do that right away unless there is a reason not to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@benxiao benxiao requested a review from kirillrdy March 10, 2023 05:28
@benxiao benxiao force-pushed the add-mmcv branch 2 times, most recently from 03ff9a1 to 3e58cc7 Compare March 10, 2023 05:36
Copy link
Member

@kirillrdy kirillrdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last commit message doesnt meet guidelines,

since this is an init PR, I would just squash all the commits

pkgs/development/python-modules/mmcv/default.nix Outdated Show resolved Hide resolved
LDFLAGS = lib.optionalString cudaSupport "-L${cudatoolkit.lib}/lib";

meta = with lib; {
description = "MMCV is a foundational library for computer vision research";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per contribution guidelines, description should not start or contain name of the package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

hash = "sha256-EVu6D6rTeebTKFCMNIbgQpvBS52TKk3vy2ReReJ9VQE=";
};

torch-deps = if cudaSupport then [ torch-bin torchvision-bin ] else [ torch torchvision ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the use of torch-bin and torchvision-bin is desirable here. We have torch{With,Without}Cuda and also torch itself can have cuda enabled or not depending on config.enableCuda or overriding torch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might use torch torch vision, and let the user overwrite it in a flake file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone might do that, they can always overwrite the torch input to this package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

, pythonOlder
, cudaSupport ? false
, cudatoolkit
, torchCudaArchList ? "Kepler"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a single package depending on cuda arch settings like this. We already have e.g. config.cudaCapabilities for configuring what architectures cuda-enabled packages are built for.

Copy link
Contributor Author

@benxiao benxiao Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to use torch/default.nix as an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


torch-deps = if cudaSupport then [ torch-bin torchvision-bin ] else [ torch torchvision ];
nativeBuildInputs = [ ninja which ]
++ lib.lists.optionals cudaSupport [ cudatoolkit.lib cudatoolkit.out cudatoolkit.cc ] ++ torch-deps;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't be necessarily for torch/torchvision to be in nativeBuildInputs and propagatedBuildInputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcdarwin torch is needed at runtime. not sure there is another way to do it without propagatedBuildInputs

, pytest
, pythonOlder
, cudaSupport ? false
, cudatoolkit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'd consider this a blocker for this PR, but for future work consider using cudaPackages over cudatoolkit. cudaPackages uses redistributables from NVIDIA and allows for much smaller closures because you choose which libraries you need. It also makes it easier to maintain because we then have greater visibility into what the package is using.

For an example, take a look at

inherit (torch) cudaCapabilities cudaPackages cudaSupport;
inherit (cudaPackages) cudatoolkit cudaFlags cudaVersion;
# NOTE: torchvision doesn't use cudnn; torch does!
# For this reason it is not included.
cuda-common-redist = with cudaPackages; [
cuda_cccl # <thrust/*>
libcublas # cublas_v2.h
libcusolver # cusolverDn.h
libcusparse # cusparse.h
];
cuda-native-redist = symlinkJoin {
name = "cuda-native-redist-${cudaVersion}";
paths = with cudaPackages; [
cuda_cudart # cuda_runtime.h
cuda_nvcc
] ++ cuda-common-redist;
};
cuda-redist = symlinkJoin {
name = "cuda-redist-${cudaVersion}";
paths = cuda-common-redist;
};
.

I break the dependencies into those needed only at build time, run time, or both (common). Unfortunately, because most packages expect all the CUDA libraries to be in a single directly, we need to use symlinkJoin to group them together.

Also, make sure you're specifying the right C/C++ compilers if you're building with CUDA support, like here: #218265. (Torchvision doesn't yet do that; I've got an open issue here: #221898.)

The compiler provided by stdenv isn't necessarily supported by NVCC, and it's possible to get odd symbol errors during linking if the other derivations were built with a different compiler or language standard. That's why it's important to use backendStdenv.cc.

Torchvision and Torch will probably have other things you might want to look at. If you have any questions, don't hesitate to reach out to me or the other CUDA maintainers on Matrix: https://matrix.to/#/#cuda:nixos.org.

Copy link
Contributor Author

@benxiao benxiao Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @ConnorBaker , thanks a lot for the advices and examples. thanks @bcdarwin for raising this. I am thinking about using torch.cudaCapabilities as well. but I think there is a valid usecase for people who want to overwrite with torch-bin (doesn't require compilation), but doesn't have passthru for cudaCapabilities. would you suggest that I copy the same pattern from torch/default.nix to get cudaCababilities?

as for the odd symbol error, I am already getting that with opencv4, as it is compiled with stdenv.cc gcc 12, and cudatoolkit uses gcc 11. I will give it another go.

Copy link
Contributor Author

@benxiao benxiao Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also. with following nix-shell, the torchvision now fails to build with cuda on latest master @ConnorBaker. i assume gcc 12.2 is from stdenv

with import ../nixpkgs {
  config.allowUnfree = true;
 };
let
  mypthon =
    let
      packageOverrides = self: super: {
        # packagesOverride, will affect all packages that uses the resulted packages as deps
        torch = super.torch.override{ magma = magma-cuda; cudaSupport = true; };
        # opencv4 = super.opencv4.override {
        #   enableBlas = false;
        # };
      };
    in
    python3.override { inherit packageOverrides; self = mypthon; };
in
let
  pythonEnv = mypthon.withPackages (ps: with ps; [ torchvision ]);
in
mkShell {
  packages = [ pythonEnv  ];
}
    


 raise RuntimeError(
RuntimeError: The current installed version of g++ (12.2.0) is greater than the maximum required version by CUDA 11.7 (11.5.0). Please make sure to use an adequate version of g++ (>=6.0.0, <=11.5.0).
/nix/store/c3f4jdwzn8fm9lp72m91ffw524bakp6v-stdenv-linux/setup: line 1593: pop_var_context: head of shell_variables not a function context
        

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think there is a valid usecase for people who want to overwrite with torch-bin (doesn't require compilation), but doesn't have passthru for cudaCapabilities.

IIRC, newer torch binaries use CUDA redistributables from PyPi. If they haven't moved over to that yet, then their builder is still using bash scripts to manually copy libraries into their binary :(

There's a hardcoded set of supported capabilities for each version of PyTorch. You can find them here: https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L1751-L1752.

Because it's a binary distributions and not "built" with Nixpkgs, the CUDA capabilities supported by the binary aren't tied to what we specify in our config.

For that reason I'd recommend you use the same logic torch/default.nix has. (Those versions are actually sourced from that same file!)

as for the odd symbol error, I am already getting that with opencv4, as it is compiled with stdenv.cc gcc 12, and cudatoolkit uses gcc 11. I will give it another go.

I have a PR open which should help with that: #221370.

i assume gcc 12.2 is from stdenv

That's right, it is :(

I know that error well. It's from here: https://github.com/pytorch/pytorch/blob/49444c3e546bf240bed24a101e747422d1f8a0ee/torch/utils/cpp_extension.py#L415. (Related discussion on whether checks like that should be removed when included in Nixpkgs: #221564 (comment).)

That's odd that you're getting it though, since the torchvision derivation should correctly set the CC/CXX environment variables to the right compiler:

# NOTE: We essentially override the compilers provided by stdenv because we don't have a hook
# for cudaPackages to swap in compilers supported by NVCC.
+ lib.optionalString cudaSupport ''
export CC=${backendStdenv.cc}/bin/cc
export CXX=${backendStdenv.cc}/bin/c++
export TORCH_CUDA_ARCH_LIST="${lib.concatStringsSep ";" cudaCapabilities}"
export FORCE_CUDA=1
'';
.

With respect to your nix file can you try doing something like the following instead? Substitute whatever capability is appropriate for your device (though using just one does make it a lot faster to rebuild stuff from source!).

with import ../nixpkgs {
  config = {
    allowUnfree = true;
    cudaSupport = true;
    cudaCapabilities = [ "8.6" ];
  };
};

I think it was choosing the wrong compiler because cudaSupport wasn't explicitly set to true.

Let me know how that works!

Copy link
Contributor Author

@benxiao benxiao Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorBaker with your config, I fixed the issue. thanks a bunch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorBaker, your MR on opencv fixed my symbol error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Now it just needs to get merged (somewhat difficult given the large number of rebuilds)!

Comment on lines 50 to 83
# reason for not using pytestCheckHook is similiar to what
# is already mentioned in pkgs/development/python-modules/typed-ast
# test_cnn test_ops really requires gpus to be useful.
# some of the tests take exceedingly long time.
# the rest of the tests are disabled due to sandbox env.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment does not justify not using pytestCheckHook at all

Suggested change
# reason for not using pytestCheckHook is similiar to what
# is already mentioned in pkgs/development/python-modules/typed-ast
# test_cnn test_ops really requires gpus to be useful.
# some of the tests take exceedingly long time.
# the rest of the tests are disabled due to sandbox env.

Copy link
Contributor Author

@benxiao benxiao Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do

  preCheck = ''
    PYTHONPATH=$out/${python.sitePackages}:$PYTHONPATH
  ''

which seems to work with a lot of packages with c extension, but it didn't work for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another common way is to rm the the conflicting source directory but pytestCheckHook does just invoke pytest as you do, it only builds the arguments to it.

Copy link
Contributor Author

@benxiao benxiao Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to find that out the hard way. lol. will add disabled tests like you suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkgs/development/python-modules/mmcv/default.nix Outdated Show resolved Hide resolved
Comment on lines 56 to 90
pytest --ignore=tests/test_cnn \
--ignore=tests/test_ops \
--ignore=tests/test_fileclient.py \
--ignore=tests/test_load_model_zoo.py \
--ignore=tests/test_runner/test_checkpoint.py \
--ignore=tests/test_video/test_processing.py \
--ignore=tests/test_utils/test_hub.py \
--ignore=tests/test_video/test_reader.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert this into disabledTestFiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

hash = "sha256-EVu6D6rTeebTKFCMNIbgQpvBS52TKk3vy2ReReJ9VQE=";
};

torch-deps = if cudaSupport then [ torch-bin torchvision-bin ] else [ torch torchvision ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone might do that, they can always overwrite the torch input to this package


buildPythonPackage rec {
pname = "mmcv";
version = "1.7.0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 3 months ago. If we introduce a new package and it is a trivial update, we should do that right away unless there is a reason not to


nativeCheckInputs = [ pytestCheckHook ];

checkInputs = [ lmdb onnx onnxruntime scipy pyturbojpeg tifffile ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should almost certainly be moved to nativeCheckInputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1

inherit (torch) cudaCapabilities cudaPackages cudaSupport;
inherit (cudaPackages) backendStdenv cudaVersion;

# for cuda support, we are waiting on opencv: misc CUDA-related updates and fixes; add enableLto #221370
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was merged in Nixos:Staging not master. so its still relevant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently in staging-next https://nixpk.gs/pr-tracker.html?pr=221370

I would suggest to just leave this open until that is merged which will be very soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#221370 has been merged into master. @SuperSandro2000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can resolve the comment now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@bcdarwin
Copy link
Member

bcdarwin commented Apr 6, 2023

Results of nixpkgs-review on x86_64-NixOS:

2 packages built:
python310Packages.mmcv python310Packages.mmcv.dist

Copy link
Member

@bcdarwin bcdarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SomeoneSerge
Copy link
Contributor

Results of nixpkgs-review on x86_64-NixOS:

Btw, nixpkgs-review has a --post-result option that uses gh cli to post a comment containing the exact command you used to run it. Makes it easier to reproduce the results

@SuperSandro2000
Copy link
Member

@ofborg build python310Packages.mmcv

@SuperSandro2000 SuperSandro2000 merged commit cfb4d61 into NixOS:master Apr 30, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants