Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Multiple SIGILL occurences with nightly opt-level=1 #1808

Closed
ishitatsuyuki opened this issue Jul 19, 2019 · 19 comments

Comments

@ishitatsuyuki
Copy link

commented Jul 19, 2019

Description

Due to recent changes in rustc regarding the interaction between mem::uninitialized and ManuallyDrop (basically the later no longer being a union), many crates are now directly having the code path nullified by LLVM, getting illegal instruction because they are treated as dead code.

Your Environment

rustc 1.38.0-nightly (78ca1bda3 2019-07-08)
binary: rustc
commit-hash: 78ca1bda3522b14bc0336bc01dd1d49fdba2cda7
commit-date: 2019-07-08
host: x86_64-unknown-linux-gnu
release: 1.38.0-nightly
LLVM version: 8.0
[profile.dev]
opt-level = 1

Causes found so far

@RalfJung

This comment has been minimized.

Copy link

commented Jul 19, 2019

basically the later no longer being a union

Notice that this change happened a long time ago with rust-lang/rust#52711.

@RalfJung

This comment has been minimized.

Copy link

commented Jul 19, 2019

The recent change that likely is causing the new issues is rust-lang/rust#62150. All this does is choose a different strategy for how to implement mem::uninitialized, but one that does exactly the same. However, it leads to slightly different paths being taken in the optimizer, so if code has UB, its behavior can change. This is unavoidable as a part of compiler development.

@RalfJung

This comment has been minimized.

Copy link

commented Jul 19, 2019

Also see my long response at rust-lang/rust#52898 (comment).

@OvermindDL1

This comment has been minimized.

Copy link

commented Jul 19, 2019

Essentially, if this is causing SIGILL's then the code already was UB. By the definition of UB the compiler is allowed to do anything and it should not be surprising for breakages to happen. This is why unsafe code is bad, because it can leak UB out into safe code if not made absolutely perfectly and introducing UB is always, always bad.

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Jul 20, 2019

Essentially, if this is causing SIGILL's then the code already was UB.

IMO the compiler team did pretty poorly with the changes this time. Please keep in mind that mem::uninitialized has been basically impossible to not cause UB, but people had no alternative at the point they wrote such code: MaybeUninit was only stabilized in the latest stable release. Unsafe code has been a field that lacked proper backward compatibility for a long time, because it isn't clear whether a codegen change would lead to large breakage like this.

This is why unsafe code is bad

FFI always involves unsafe, I think at this point your opinions are not relevant to the heart of these issues.

@RalfJung

This comment has been minimized.

Copy link

commented Jul 20, 2019

Unsafe code has been a field that lacked proper backward compatibility for a long time, because it isn't clear whether a codegen change would lead to large breakage like this.

Unsafe code that causes undefined behavior lacks backwards compatibility. We do not commit to supporting buggy programs. There is no feasible way to do this (seemingly harmless changes almost anywhere in the compiler, including things like changing the order in which functions are lowered to LLVM, can have arbitrary effects in programs with UB) and there exists no language with undefined behavior that provides such compatibility. To quote from my other post:

UB is a contract between the programmer and the compiler (and also, somehow, a contract between the programmer and the compiler developers), and if the programmer does not fulfill their side of the contract, the other party will not fulfill their side of the contract either.

So can we maybe stop the useless finger pointing and try to do something constructive?

If you think that helps, you can suggest temporarily reverting rust-lang/rust#62150. That might give people some more time to fix the problems in this code that cause the undefined behavior. There is precedence for this kind of action with the unwind situation. But the code needs to be fixed -- if we cannot agree on the basic responsibilities around UB, I do not think we can have a constructive discussion.

At the same time, I do agree that the situation around mem::uninitialized is a mess and it took us way too long to fix that. That function is a mistake that was made with Rust 1.0, and it took around two years to fully understand how big the mistake is and another two years to fix it. This is what I meant by a "language bug" in the other post: there should have been UB-free ways to do certain things, but there were not. Unfortunately, due to the nature of UB, sometimes programmers find ways to do things that seem to work, but only really "work" due to a happy accident in what the optimizer does and does not do. Whenever that happens, we have to find a balance between giving library authors the possibility to fix their code, and being able to still develop the compiler. The first step here is done; with MaybeUninit finally being stable it is possible for library authors to fix their code.

But requiring Rust 1.36 is not the only possible short-term fix. While it is true that mem::uninitialized causes UB almost any way it is used, there is a big difference between something like mem::uninitialized::<u8>() (which is UB to keep the options open in rust-lang/unsafe-code-guidelines#71); and mem::uninitialized::<&T>() which is UB because LLVM knows that this is garbage data and LLVM also knows that &T can never be NULL (&T has a "niche", it has some impossible values), and together that is under some conditions turned into a trap (SIGILL).

For the vast majority of FFI usecases, the types involved are "C types" that do not have a niche. mem::uninitialized is still UB "in theory" for many types without a niche, but I think we can keep it working "in practice" for some more time to give people time to fix their code. The issue is, some niches are surprising: For example, the mistake in x11-rs was/is to use fn() types in FFI, which are not the same as C function pointer types because they may not be NULL. (It also has to be aligned, but the alignment requirement is trivial on all currently supported platforms.) If you have a function pointer in C, you have to use Option<fn> in Rust. This is documented and also mentioned in the Nomicon but does not seem to be widely known. (Option<fn> still may have a niche, for platforms with non-trivial alignment requirements for function pointers, but currently we support no such platform, so this is "good enough" for temporary solutions until libraries can switch to MaybeUninit.)

The situation is much better with mem::zeroed(), so unless benchmarks show that zero-initializing is too expensive I strongly urge you to use that -- zeroed actually has an agreed-upon well-defined semantics. With zeroed, a type consisting of only raw pointers, integers, Option<&[mut] T> and Option<fn> can be constructed without UB, so this is covered by our stability guarantee.

With all of that out of the way -- the first step, then, to fix a library like this is to find out where it calls mem::zeroed or mem::uninitialized on a type with a niche. Does anyone know?

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Jul 20, 2019

I'm sorry for being emotional. (It blocked me from working on the codebase for several days.) Your detailed writeup was much appreciated.

Back to the path to fixing this issue, I want to mention that specs have already removed the offending code in 0.15, so upgrading it is also an option. It seems that there are several changes that are non-trivial to adopt, though.

@RalfJung

This comment has been minimized.

Copy link

commented Jul 20, 2019

I'm sorry for being emotional. (It blocked me from working on the codebase for several days.) Your detailed writeup was much appreciated.

I feel emotional about this as well, so I think we can both agree to read each others posts with some extra amount of empathy. :)

I want to mention that specs have already removed the offending code in 0.15, so upgrading it is also an option.

Ah, I missed that! Could you link to the patch? Your post above links to the released tag (0.14.3).

So the problem basically is this dependency which incurs a transitive dependency on a broken outdated crossbeam? But that's already a 0.14 so updating to 0.14.3 should be easy? Or is the link wrong and 0.14.3 is not fixed (you mentioned 0.15 in your latest post)?

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Jul 20, 2019

The common module of specs was removed with this PR. After that comes with the port to crossbeam-queue (which is sound).

I cherry-picked the later, and additionally modified the obsolete common module to use crossbeam-queue as well - but tests are left broken (probably cannot be fixed as-is), and due to that it's unlikely that it would make an official release for specs.

@jojolepro

This comment has been minimized.

Copy link
Member

commented Jul 20, 2019

We are currently in the process of updating specs. You can see the current pull request here: #1780
If anyone wants to help with it, let me know.

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Jul 25, 2019

#1822 plus #1780 should fix everything except x11-rs.

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Aug 16, 2019

Closing as this issue is resolved, by both compiler changes and dependency changes for Amethyst.

@RalfJung

This comment has been minimized.

Copy link

commented Aug 16, 2019

So, has a new version of x11-rs been released and the dependency been bumped in Amethyst?

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Aug 16, 2019

New x11-rs is released, the bump isn't (it's an indirect dependency with winit).

@RalfJung

This comment has been minimized.

Copy link

commented Aug 16, 2019

Might be worth keeping this open until the bump happened?

Basically, I want to know when we can be sure that re-introducing the reverted patch in rustc will not affect amethyst.

@ishitatsuyuki

This comment has been minimized.

Copy link
Author

commented Aug 16, 2019

I’m not sure that justifies a bump for a transitive dependency, but reopening anyway.

@ishitatsuyuki ishitatsuyuki reopened this Aug 16, 2019

@jojolepro

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

image

Since it was pushed as a patch update, a simple cargo update will make old installs use the new x11-dl version. I think this can now be closed :)

@RalfJung

This comment has been minimized.

Copy link

commented Aug 16, 2019

Oh, I wasn't aware that a mere cargo update would be enough. That seems fair then, thanks!

I am not sure if there is precedent for raising the minimal required version to make sure users get the patch. But I guess once stuff breaks when rustc changes again, we can just tell people to cargo update. ;)

@jojolepro

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

I am not sure if there is precedent for raising the minimal required version to make sure users get the patch.

Would need winit to release a new version that requires x11 2.18.4, and then us to release a new version of amethyst targeting this new version of winit.

Its easier to just tell people to run cargo update than to delay everything until those two releases happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.