Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zig 0.10.1: SIGILL on SSE4a instruction insertq #214356

Open
jshholland opened this issue Feb 3, 2023 · 16 comments
Open

zig 0.10.1: SIGILL on SSE4a instruction insertq #214356

jshholland opened this issue Feb 3, 2023 · 16 comments

Comments

@jshholland
Copy link
Contributor

jshholland commented Feb 3, 2023

Describe the bug

Using Zig version 0.10.1 from 006c3bd, certain operations (zig build, zig run, zig fmt on some files, possibly others) crash with SIGILL. A bit of debugging reveals it's the SSE4a-only instruction insertq (unavailable on Intel chips according to wikipedia).

Steps To Reproduce

Steps to reproduce the behavior:
1.

$ cat > hello.zig << EOF
const std = @import("std");

pub fn main() anyerror!void {
    std.debug.print("Hello, world!", .{});
}
EOF
  1. $ nix run github:NixOS/nixpkgs/006c3bd4dd2f5d1d2094047f307cbf9e2b73d9c5#zig -- run hello.zig

Expected behavior

Hello, world!

Actual behaviour

zsh: illegal hardware instruction (core dumped)  nix run github:NixOS/nixpkgs/006c3bd4dd2f5d1d2094047f307cbf9e2b73d9c5#zig -- 

Screenshots

screenshot-20230203-115428

Additional context

Presumably happening because Zig 0.10 is the first to include the self-hosted compiler, which compiles with -march=native by default. See also #185665.

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
stepping	: 12
microcode	: 0xde
cpu MHz		: 2112.003
cache size	: 8192 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves arat md_clear flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb
bogomips	: 4224.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 45 bits physical, 48 bits virtual
power management:
<snip>

Notify maintainers

@aiotter @andrewrk @AndersonTorres @strager

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

$ nix-shell -p nix-info --run "nix-info -m"
this path will be fetched (0.00 MiB download, 0.00 MiB unpacked):
  /nix/store/rmhwi0jcya8f87gzk2jjdwv4hifmmbb4-nix-info
copying path '/nix/store/rmhwi0jcya8f87gzk2jjdwv4hifmmbb4-nix-info' from 'https://cache.nixos.org'...
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.7, NixOS, 23.05 (Stoat)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.12.0`
 - channels(root): `"nixos-21.11pre295280.fa0326ce523"`
 - channels(josh): `""`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@strager
Copy link
Contributor

strager commented Feb 3, 2023

Thanks for reporting! I am investigating this.

@strager
Copy link
Contributor

strager commented Feb 3, 2023

On my Zen 3 machine, Nix' Zig works fine. I can reproduce the bug by using qemu:

$ nix --extra-experimental-features flakes --extra-experimental-features nix-command run github:NixOS/nixpkgs/006c3bd4dd2f5d1d2094047f307cbf9e2b73d9c5#zig -- build-exe -O ReleaseFast hello.zig  
$ qemu-x86_64-static -cpu Broadwell-v2 ./hello
qemu: uncaught target signal 4 (Illegal instruction) - core dumped
zsh: illegal hardware instruction (core dumped)  qemu-x86_64-static -cpu Broadwell-v2 ./hello

Adding -mcpu baseline to the zig build-exe command makes it run without error under qemu.

Now to figure out a good fix...

@strager
Copy link
Contributor

strager commented Feb 4, 2023

Proposed fix: #214440

@strager strager self-assigned this Feb 4, 2023
@sagehane
Copy link
Contributor

sagehane commented Feb 4, 2023

I think I hit this too. I'm on NixOS Unstable and I keep getting Illegal instruction (core dumped) when trying to compile Zig projects with 0.10.1.

On my laptop with an Intel processor (i5-1135G7), lscpu | grep sse4 doesn't seem to show matches for sse4a.

Trying to build a package like (river.override { xwaylandSupport = false; }) gives me the following build log:

@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking sources
unpacking source archive /nix/store/5qlxxp9vmfi64bwc3xbk57mzg90yzwy2-source
source root is source
@nix { "action": "setPhase", "phase": "patchPhase" }
patching sources
@nix { "action": "setPhase", "phase": "buildPhase" }
building
no Makefile or custom buildPhase, doing nothing
@nix { "action": "setPhase", "phase": "installPhase" }
installing
/nix/store/3yfs41f4b60jya2gk6xikx4s97zsxjr0-stdenv-linux/setup: line 1574:    22 Illegal instruction     (core dumped) zig build -Drelease-safe -Dcpu=baseline -Dman-pages --prefix $out install

The substitute from the repo works fine.

@edrex
Copy link
Contributor

edrex commented Feb 7, 2023

I am seeing SIGILL running zig build on #215151 with zig 0.10.1 in nixpkgs on an i7-7500U (7th gen Kaby Lake). Build succeeds with upstream release build from https://ziglang.org/download/. Is this the same bug?

@edrex
Copy link
Contributor

edrex commented Feb 7, 2023

(same as sagehane)

@sagehane
Copy link
Contributor

sagehane commented Feb 7, 2023

I did talk to @ifreund over River's IRC room and he did say this:

guessing that the real problem there is nix not setting -DZIG_TARGET_MCPU=baseline when building zig

I presume this would be a different from the approach given in #214440?
If my understanding is correct, that PR attempts to solve it by passing -mcpu to zig build-exe while this would make it so that the Zig binary itself is less platform-dependent?

@strager
Copy link
Contributor

strager commented Feb 7, 2023

I presume this would be a different from the approach given in #214440?

Yes, -DZIG_TARGET_MCPU=baseline is a different approach which would only affect the compiler, not things the compiler builds. (I.e. the compiler would still nondeterministic.)

If my understanding is correct, that PR attempts to solve it by passing -mcpu to zig build-exe while this would make it so that the Zig binary itself is less platform-dependent?

#214440 affects the Zig compiler itself too.

@ifreund
Copy link

ifreund commented Feb 8, 2023

Regardless of how the discussion in #214440 ends up, a patch that passes -DZIG_TARGET_MCPU=baseline when building the zig compiler would fix the originally reported immediate issue here.

I don't use or contribute to nixos personally, but from my outside perspective it seems sensible to merge that simple fix now while the discussion on the longer term direction to take on #214440 continues. I personally think some kind of helper function to automatically pass -Dcpu=baseline is the better solution there fwiw.

Changing the defaults compared to upstream zig for all users of nix seems very heavy handed and will cause confusion long term.

@AndersonTorres
Copy link
Member

Sometimes it is useful to use a "raw" zig.exe, as in ordinary development, whereas a full injected zig with "patches" for determinism is useful for Nixpkgs.

@edrex
Copy link
Contributor

edrex commented Feb 8, 2023

Thanks @ifreund. It seems that is what is missing. I will test this when i have my laptop open next.

@andrewrk
Copy link
Member

andrewrk commented Feb 9, 2023

Hi all, I'm away at a conference and do not have time to give this issue my full and complete attention, but I wanted to note that upstream is willing to add a feature or enhancement to make this situation easier for Nix, as well as other package distributions, to deal with.

In particular we have this upstream issue: ziglang/zig#14281
It is focused a bit on trying to make sure the language package manager will not interfere with system package manager, but perhaps it is also related to helping make sure the desired target triple is used (baseline in this case).

Perhaps, for example, it could be a flag passed to zig build that nix always provides, that would change the default target to baseline instead of host. Something like --system-distribution-mode. My idea behind this flag would be that it disallows impurities of any kind - networking, host target information, or anything of the sort. This way, instead of a patch that tricks Zig into doing something useful, instead Nix could communicate to Zig what it wants, and Zig could fully and intentionally cooperate.

Let us discuss this possible feature, if it is amenable, and then I will go implement it upstream.

winterqt added a commit to winterqt/nixpkgs that referenced this issue Feb 17, 2023
26b9a2f changes Zig 0.10 to build the
compiler (notably *not* its outputs, at least not by default) with
its baseline CPU target, but we should ideally do it for both versions
to increase reproducibility, as well as increase the number of users who
are able to use Hydra-provided Zig binaries.

This also adds a comment above the flag in 0.10, to explain why we're adding
the flag, as we do with the RPATH one.

See NixOS#214356 and NixOS#185665
for further context.
@winterqt
Copy link
Member

winterqt commented Feb 17, 2023

The issue discussed here (that the compiler builds themselves cannot be executed) has been fixed for Zig 0.10 and 0.9 in #215994 and #216736 respectively, but I'm going to be leaving this issue open (for now) due to the ongoing conversations. (We could also just move to #185665, shrug.)

davidak pushed a commit to hesiod/nixpkgs that referenced this issue Feb 18, 2023
26b9a2f changes Zig 0.10 to build the
compiler (notably *not* its outputs, at least not by default) with
its baseline CPU target, but we should ideally do it for both versions
to increase reproducibility, as well as increase the number of users who
are able to use Hydra-provided Zig binaries.

This also adds a comment above the flag in 0.10, to explain why we're adding
the flag, as we do with the RPATH one.

See NixOS#214356 and NixOS#185665
for further context.
@edrex
Copy link
Contributor

edrex commented Feb 18, 2023

Fixed by #215994, and #216736 fixed equivalent issue with zig_0_9. Confirm fixed and close?

@winterqt
Copy link
Member

@edrex Please read my above comment :) unless you think we should move to the other issue?

@edrex
Copy link
Contributor

edrex commented Feb 18, 2023

I think the reported issue here has been fixed, and #185665 sums up the remaining issue quite well, so +1 to close once fix is verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants