New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3Packages.torch: Fix performance problem on darwin #220017
Conversation
Awesome! So, was it CoreML and MLCompute that made the difference, compared to the previous run? |
Disabling mkldnn makes the big difference. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/prs-ready-for-review/3032/1917 |
@@ -1,6 +1,6 @@ | |||
{ stdenv, lib, fetchFromGitHub, fetchpatch, buildPythonPackage, python, | |||
cudaSupport ? false, cudaPackages, magma, | |||
mklDnnSupport ? true, useSystemNccl ? true, | |||
mklDnnSupport ? stdenv.isLinux, useSystemNccl ? true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this poses the question of whether we should touch x86_64-darwin
, if we were only looking at aarch64-darwin
so far.
- Does
x86_64-darwin
lose performance too, with mkldnn on? - If there's an Apple-specific library that provides the same functionality as mkldnn and that pytorch ends up consuming in your build, we should probably always use it.
I'm looking at 'wheel-py3_10-cpu-build/10_Build PyTorch binary.txt'
from one of the x86 darwin actions upstream, they seem to have these flags set:
-- USE_MKL : ON
-- USE_MKLDNN : ON
-- USE_MKLDNN_ACL : OFF
-- USE_MKLDNN_CBLAS : OFF
Ultimately, if we can manually check that, having the rest of your change merged, mklDnnSupport = false
doesn't harm performance on x86_64-darwin
, I think we're good to keep isLinux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's add a comment referring to this PR and explaining that MKLDNN negatively impacts performance on M1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please add a comment otherwise this gets forgotten again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's add a comment referring to this PR and explaining that MKLDNN negatively impacts performance on M1
Done in the new push.
To anyone possessing an ❯ cat << EOF > pianotrans-tests.nix
let
# github:azuwis/nixpkgs/62b4df51ee3c01968ac2ac30831b1f989ab917f9
nixpkgs = builtins.getFlake github:azuwis/nixpkgs/torch;
pkgs = import nixpkgs { system = "x86_64-darwin"; };
mkPy = mklDnnSupport: pkgs.python3.override
{
packageOverrides = self: super: { torch = super.torch.override { inherit mklDnnSupport; }; };
};
in
{
mklDnnOn = (pkgs.pianotrans.override {
python3 = mkPy true;
});
mklDnnOff = (pkgs.pianotrans.override {
python3 = mkPy false;
});
}
EOF
❯ nix eval -f pianotrans-tests.nix mklDnnOff.outPath
"/nix/store/6zlk47rgzv6vck640vl5qi0fqmi4n18c-pianotrans-1.0.1"
❯ nix eval -f pianotrans-tests.nix mklDnnOn.outPath
"/nix/store/58mxcypwkn2idk89d6z9as66pm28j08h-pianotrans-1.0.1"
❯ wget https://github.com/azuwis/pianotrans/raw/master/test/cut_liszt.opus
❯ nix shell -f pianotrans-tests.nix mklDnnOn --command pianotrans cut_liszt.opus
...
Transcribe time: XX.XXX s
...
❯ nix shell -f pianotrans-tests.nix mklDnnOff --command pianotrans cut_liszt.opus
...
Transcribe time: YY.YYY s
...
</details> |
I've made a github action to test x86_64-darwin, https://github.com/azuwis/pianotrans/actions/runs/4373108055/jobs/7650862384 The result: Torch-bin:
Enable mklDnn, disable Accelerate.framework (current nixpkgs):
Disable mklDnn, enable Accelerate.framework:
Enable mklDnn, enable Accelerate.framework:
We should enable mklDnn and enalbe Accelerate.framework for x86_64-darwin, but still the performance of torch is not as good as torch-bin. |
@azuwis amazing! To finish the comparison against
|
It negatively impacts performance, this is also what official pytorch build does. In my test in a Macbook Pro M1 2020, using [pianotrans][1] to transcribe [cut_liszt.opus][2], transcription time: Enable mkldnn and disable Accelerate.framework: ~88s Disable mkldnn and disable Accelerate.framework: ~21s Disable mkldnn and enable Accelerate.framework: ~9s The final result is close to using torch-bin. See also NixOS#219104 [1]: https://github.com/azuwis/pianotrans [2]: https://github.com/azuwis/pianotrans/raw/master/test/cut_liszt.opus
Please review the new changes. |
I've tried override blas in https://github.com/azuwis/pianotrans/blob/61b1e61b9eaea4f926e9bf4f265bdb826e1f11da/flake.nix#L15-L17, but build failed on both x86_64-linux and x86_64-darwin https://github.com/azuwis/pianotrans/actions/runs/4382622983/jobs/7671881870 |
Turns out if I'm not logged in in the browser I can't see the workflow logs. But I see that you override blas globally. Just in case the build fails before torch (in some of the dependencies): it could be sufficient for the test that we only pass the mkl blas to torch (we did that on x86 Linux in the linked issue once and it worked) |
@wegank Can you please have a look at this PR, I think it's ready. |
Ok, AFAIU |
@ofborg build python3Packages.torch |
@SomeoneSerge I can confirm the performance different between torch and torch-bin on linux is mkl. I did a test on x86_64-linux. Torch:
Torch-bin:
Torch with mkl by override
Torch with mkl by using LD_LIBRARY_PATH:
The LD_LIBRARY_PATH method mentioned in nixpkgs/doc/using/overlays.chapter.md Lines 112 to 116 in abb2ade
Is there a way to use torch with mkl without recompiling? Another thing I noticed is that mkl in nixpkgs is not distributed because it's unfree. Yes, mkl is unfree, but redistribution is allowed according to https://www.intel.com/content/www/us/en/developer/articles/tool/onemkl-license-faq.html:
Debian also redistributes mkl https://packages.debian.org/source/sid/intel-mkl (in non-free section). I wonder if nixpkgs can also redistribute mkl and make using torch a better experience. |
I would assume torch links bits of MKL statically, and there are probably more differences
Yes... One (but not sole) nit pick issue with redistribution is that we use patchelf which, technically, modifies the binaries. For what people say about this cf. e.g. https://discourse.nixos.org/t/petition-to-build-and-cache-unfree-packages-on-cache-nixos-org/17440 |
Description of changes
Enable Accelerate.framework, disable mkldnn.
In my test in a Macbook Pro M1 2020, using pianotrans to transcribe
cut_liszt.opus, transcription time:
Without this patch: ~88s
Disable mkldnn: ~21s
Disable mkldnn and enable Accelerate.framework: ~9s
The final result is close to using torch-bin.
See also #219104
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)