-
-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGILL in pytorch #115425
Comments
The problem is that some of the Hydra machines are Opteron 6100 machines that do not even support SSE4.1. I don't think it is feasible to compile PyTorch pre-SSE4.1, since at least one of its dependencies (oneDNN) does not support pre-SSE4.2 machines (or at least the last time I tried to fix SIGILL in tests in oneDNN). Also, it would be pretty detrimental for machine learning libraries to compile it and its dependencies against such an old baseline. (Of course, many of the libraries dynamically select kernels based on the instruction set.) I think the preferred solution would be to be able to exclude these Hydra builders based on the platform. I have recently submitted a PR to Nix to add the new platform levels defined in the x86_64 ELF ABI as extra platforms. But I didn't have time yet to look how this could be adopted in nixpkgs.) |
Due to NixOS/infra#146 being solved, this should be fixed now. |
This is still relevant today, as wendy is yet again failing with SIGILL on pytorch in release-21.05. |
This compiles in usually about 2h15m with a 2-core build, but about 10m on a big-parallel machine.
Generally it's nice when stuff works even on older HW, but Opteron machines surely won't be coming back to hydra.nixos.org anymore. |
Wendy and Ike are long gone, so closing this issue was overdue. |
Describe the bug
wendy
can not build pytorch-lighting on staging-20.09 because ofThis indicates that it, or a dependency, produces a binary that contains instructions that are not available on
wendy
.It could be caused by a wrong compiler flag or some assembly code that runs unconditionally.
See https://hydra.nixos.org/build/138503219/nixlog/1
pytorch-metric-learning
ends with a similar failurehttps://hydra.nixos.org/build/138492596/nixlog/1
Perhaps a good next step is to check if
pytorch
itself is binary reproducible on different hardware.To Reproduce
Build
python38Packages.pytorch-lightning
onwendy
or similar hardware, nixpkgs da85159Expected behavior
Build succeeds.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Notify maintainers
@tbenst
@bcdarwin
@danieldk
@teh
@thoughtpolice
@tscholak
Maintainer information:
The text was updated successfully, but these errors were encountered: