Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reinterpret segfault #21216

Closed
quinnj opened this issue Mar 29, 2017 · 46 comments · Fixed by #21425
Closed

reinterpret segfault #21216

quinnj opened this issue Mar 29, 2017 · 46 comments · Fixed by #21425
Labels
compiler:codegen Generation of LLVM IR and native code kind:bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@quinnj
Copy link
Member

quinnj commented Mar 29, 2017

The following behavior only seems to exist on linux current master (1dd10aa). Discovered in the DecFP.jl package (current master), travis logs here. Repro:

julia> using DecFP

julia> Base.bitcast(UInt128, Dec128(1.5))
0x303e000000000000000000000000000f

julia> reinterpret(UInt128, Dec128(1.5))

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
reinterpret at ./essentials.jl:0
unknown function (ip: 0x7f71512b8a25)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
do_call at /home/centos/buildbot/slave/package_tarball64/build/src/interpreter.c:75
eval at /home/centos/buildbot/slave/package_tarball64/build/src/interpreter.c:242
jl_interpret_toplevel_expr at /home/centos/buildbot/slave/package_tarball64/build/src/interpreter.c:34
jl_toplevel_eval_flex at /home/centos/buildbot/slave/package_tarball64/build/src/toplevel.c:577
jl_toplevel_eval_in at /home/centos/buildbot/slave/package_tarball64/build/src/builtins.c:484
eval at ./boot.jl:235
unknown function (ip: 0x7f736cdb766f)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7f7151278b76)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:73
unknown function (ip: 0x7f7151275caf)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
jl_apply at /home/centos/buildbot/slave/package_tarball64/build/src/julia.h:1416 [inlined]
start_task at /home/centos/buildbot/slave/package_tarball64/build/src/task.c:261
unknown function (ip: 0xffffffffffffffff)
Allocations: 3683352 (Pool: 3681747; Big: 1605); GC: 5
Segmentation fault

As far as I can tell, reinterpret(UInt128, x) just calls Base.bitcast, so I'm not sure why one segfaults and the other seems to work just fine.

@yuyichao
Copy link
Contributor

Seems to be an alignment issue for fp128. Base.bitcast is an intrisic and not a normal function and that's the difference.

@vchuravy
Copy link
Sponsor Member

vchuravy commented Mar 29, 2017

Seems to be an alignment issue for fp128.

You mean alignment issue for primitive type 128? We shouldn't lower anything to Float128 as of yet.

@yuyichao
Copy link
Contributor

I mean fp128. I assume Dec128 is a floating point type.

@vchuravy
Copy link
Sponsor Member

Dec128 is defined as primitive type Dec128 <: DecimalFloatingPoint 128 end https://github.com/stevengj/DecFP.jl/blob/0a99fbeac8f8ec6fdfb3a96945c3cd014b53ef1c/src/DecFP.jl#L44

@yuyichao
Copy link
Contributor

So it is a floating point type.

@quinnj
Copy link
Member Author

quinnj commented Mar 29, 2017

Is it because DecimalFloatingPoint <: AbstractFloat? Does bitcast somehow take that into account?

@yuyichao
Copy link
Contributor

Is it because DecimalFloatingPoint <: AbstractFloat?

Yes.

Does bitcast somehow take that into account?

What do you mean "take that into account"? The main difference is likely that one is compiled and the other is not.

@JeffBezanson JeffBezanson added kind:bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code labels Mar 29, 2017
@stevengj
Copy link
Member

Possibly related problem, observed on MacOS:

julia> collect(v for i in 1:1, v in zeros(Dec64, 1))
1×1 Array{DecFP.Dec64,2}:
 +0E+0

julia> collect(v for i in 1:1, v in zeros(Dec128, 1))
1×1 Array{DecFP.Dec128,2}:
 +4639814192E-6176

Something about accessing a 128-bit primitive type (formerly bitstype) seems wonky.

@stevengj
Copy link
Member

stevengj commented Mar 29, 2017

Here is a minimal example that exhibits the problem on MacOS, without requiring the DecFP module:

julia> abstract type DecimalFloatingPoint <: AbstractFloat end

julia> primitive type Dec128 <: DecimalFloatingPoint 128 end

julia> collect(v for i in 1:1, v in [reinterpret(Dec128, UInt128(0))])
1×1 Array{Dec128,2}:
 Dec128(0x000000011428beb0ffffffffffffffff)

(The output varies, as you might expect: it is clearly reading memory it is not supposed to. @rdeits saw a segfault from the same code, so I think this is probably the same as originally reported above.)

@stevengj
Copy link
Member

In Julia 0.5, there is a different error:

abstract DecimalFloatingPoint <: AbstractFloat
bitstype 128 Dec128 <: DecimalFloatingPoint
collect(v for i in 1:1, v in [reinterpret(Dec128, UInt128(0))])

gives

ERROR: BoundsError: attempt to access 1-element Array{Dec128,1} at index [2]
 in prod_next at ./iterator.jl:527 [inlined]
 in next at ./iterator.jl:538 [inlined]
 in next at ./generator.jl:25 [inlined]
 in collect_to!(::Array{Dec128,2}, ::Base.Generator{Base.Prod2{UnitRange{Int64},Array{Dec128,1}},##1#2}, ::Int64, ::Tuple{Int64,Int64,Nullable{Dec128},Bool}) at ./array.jl:340
 in collect(::Base.Generator{Base.Prod2{UnitRange{Int64},Array{Dec128,1}},##1#2}) at ./array.jl:308

@yuyichao
Copy link
Contributor

0.6 behavior is probably dup of #18019, 0.5 behavior is probably related to #20078

@stevengj
Copy link
Member

@yuyichao, how can the 0.6 behavior be a dup of a closed issue?

@yuyichao
Copy link
Contributor

Because both are round trip reinterpret through a floating point type.

@stevengj
Copy link
Member

stevengj commented Mar 29, 2017

It is a user-defined floating-point type that should be treated as just an opaque bitbucket, no different from primitive type Foo 128 end (which apparently works fine). The ABI should not be involved.

Regarding the 0.5 issue, it goes away for bitstype 64 Dec64 <: DecimalFloatingPoint, so it seems unrelated to #20078.

@yuyichao
Copy link
Contributor

yuyichao commented Mar 29, 2017

User defined floating point primitive types are mapped to machine floating point types. You can use struct Dec128 <: AbstractFloat x::UInt128 end if you don't want this behavior.

@Keno
Copy link
Member

Keno commented Mar 29, 2017

primitive types are special if they are <: FloatingPoint. Bit buckets should use opaque structures.

@Keno
Copy link
Member

Keno commented Mar 29, 2017

We could consider changing this and introducing something like abstract MachineFloatingPoint <: FloatingPoint, but there isn't really much of a point to user-defined primitive types anyway.

@rdeits
Copy link
Contributor

rdeits commented Mar 29, 2017

It's not clear to me how #18019 explains:

  | | |_| | | | (_| |  |  Version 0.6.0-pre.alpha.265 (2017-03-27 13:16 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit f65b8aba9a (2 days old master)
|__/                   |  x86_64-apple-darwin13.4.0

julia> using DecFP

julia> x = zeros(Dec128, 1);

julia> collect(v for i in 1:1, v in x)
[1]    32448 segmentation fault  julia-0.6

@stevengj
Copy link
Member

Julia exports primitive type, and my understanding is that it was precisely for use in memory-efficient bitbuckets, to save you the trouble of defining struct fields that you don't actually want to access. If we define this feature, then it shouldn't be broken, and having it arbitrarily corrupt some bitbuckets depending on their supertype seems broken to me.

abstract MachineFloatingPoint <: FloatingPoint seems like a reasonable solution.

@yuyichao
Copy link
Contributor

Re MachineFloatingPoint, we can change this bahavior when we can have Float128 in Base. (so just treat Float(16|32|64|128) special). For now using a struct should be easy enough....

#18019 is about round trip bitcast. The segfault is related to alignment of fp128 as I mentioned in my very first comment....

@stevengj
Copy link
Member

Why can't we do MachineFloatingPoint now? Someone defining an IEEE Float128 type in a package can still use it.

@Keno
Copy link
Member

Keno commented Mar 29, 2017

Because primitive types aren't useful for what you're trying to do. The whole notion of primitive type is a bit broken, because outside the standard ones, LLVM doesn't really support them. Probably what we should have is this is the LLVM type this corresponds to option.

@stevengj
Copy link
Member

If the whole notion is broken, why do we have them?

@Keno
Copy link
Member

Keno commented Mar 29, 2017

Because we need some way to bootstrap.

@Keno
Copy link
Member

Keno commented Mar 29, 2017

More fundamentally though, the only thing you can do with a primitive type is pass it to intrinsics. The fact that you can pass them to ccall at all is a bit of an accident, because the C ABI does very much care about the identity of a primitive type, not just how large it is.

@stevengj
Copy link
Member

So they shouldn't be documented? Currently, the manual says:

A primitive type is a concrete type whose data consists of plain old bits. Classic examples of primitive types are integers and floating-point values. Unlike most languages, Julia lets you declare your own primitive types, rather than providing only a fixed set of built-in ones.

To be honest, I find this much nicer than declaring structs with dummy members. I don't see why the concept is "broken" just because LLVM doesn't directly support this — you can always map a primitive type internally to a struct with dummy members that pad to the requested width.

Given that we document and export this feature, having it be broken for subtypes of FloatingPoint (because they aren't treated as "plain old bits") still seems bad to me.

@stevengj
Copy link
Member

stevengj commented Mar 29, 2017

Can't we just

  1. use MachineFloatingPoint (or HardwareFloatingPoint) for primitive types that are supposed to map to machine fp registers and ABI
  2. document that any other primitive type is effectively equivalent to UIntXX in terms of storage and ABI
  3. throw an error for any bit width that is currently not supported reliably — support more multiple-of-8 bit widths (≠ 8,16,32,64,128), possibly in the future if this doesn't work now, by making them equivalent to an anonymous struct?

?

@Keno
Copy link
Member

Keno commented Mar 29, 2017

julia> struct DecFP64
       x::UInt64
       DecFP64(x) = new(conversion(x))
       Base.reinterpret(::Type{DecFP64}, x::UInt64) = new(x)
       end

julia> reinterpret(DecFP64, UInt64(0))
DecFP64(0x0000000000000000)

julia> DecFP64(0)
ERROR: UndefVarError: conversion not defined
Stacktrace:
 [1] DecFP64(::Int64) at ./REPL[8]:3

@stevengj
Copy link
Member

stevengj commented Mar 29, 2017

Thanks @Keno, that works, and I'll do that anyway since DecFP still supports 0.5.

But I still would prefer something like my 3-step suggestion above, with the goal of making primitive useful for bitbucket types, and more convenient than anonymous structs.

@Keno
Copy link
Member

Keno commented Mar 29, 2017

They aren't more convenient though. The fact that reinterpret works is mostly an accident. If you want to do anything with them you have to pass them to intrinsics. For convenience, a macro would do fine.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 29, 2017

I would like to think that all of the work I've put into defining the ccall ABI wasn't too much of an implementation accident. stevengj is correct that they should (including per the ccall document) get handled as opaque bitstypes. We don't need MachineFloatingPoint, the types Float64 and Float32 are explicitly called out when needed in the ABI specification as being the only types compatible with double and float (respectively).

The question how they get aligned and laid out in a struct is a bit of a different matter. We have an explicit VecElement wrapper type specifically to override the default alignment. It's rather annoying that C doesn't treat all primitive types the same, since it is ambiguous which C compiler's non-standard extension we should attempt to be compatible with when we define IEEEFloat128. As Keno said, we should probably require an annotation. We now even have potential syntax for it :)

primitive type IEEEFloat128 <: AbstractFloat
    alignment::128
end

(although currently the parser doesn't read that correctly)

@yuyichao
Copy link
Contributor

Alignment isn't the only issue though. fp128 has different calling convention even when they have the same alignment.

@Keno
Copy link
Member

Keno commented Mar 29, 2017

Yes, we could expand the primitive type definition to specify all the various properties (alignment, memory size, loaded size, kind (integer, floating point, double double, long fp)), but that's not what it does at the moment, and at the moment I can't think of any cases where a user can meaningfully use this.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 29, 2017

fp128 is also not part of the platform ABI, since it isn't part of the C standard. It's true that you can use non-portable compiler extensions, but that's not usually relevant and shouldn't form part of that library's API.

@yuyichao
Copy link
Contributor

fp128 is part of the platform ABI on some platforms (e.g. AArch64) so we should be able to express that.

@yuyichao
Copy link
Contributor

(i.e. it's not less portable than VecElement ABIs)

@stevengj
Copy link
Member

Just express it by MachineFloatingPoint.

@JeffBezanson
Copy link
Sponsor Member

I'm certain this can be made to do something other than segfault. Can we lower everything except the few special cases in the C ABI to llvm integer types?

@yuyichao
Copy link
Contributor

The representation issue unrelated to the segfault. We always need to have a type to represent fp128 and that type is currently having the wrong alignment.

@JeffBezanson JeffBezanson added this to the 0.6.x milestone Apr 4, 2017
vtjnash added a commit that referenced this issue Apr 17, 2017
avoids casting away floating point bits until used in a floating-point operation
also ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

fix #21216
vtjnash added a commit that referenced this issue Apr 18, 2017
avoids casting away floating point bits until used in a floating-point operation
also ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

fix #21216
vtjnash added a commit that referenced this issue Apr 18, 2017
ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat

fix #21216
vtjnash added a commit that referenced this issue Apr 19, 2017
ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat

fix #21216
rofinn pushed a commit to invenia/julia that referenced this issue Apr 21, 2017
ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat

fix JuliaLang#21216
tkelman pushed a commit that referenced this issue May 3, 2017
ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat

fix #21216

(cherry picked from commit ece60d0)
ref #21425
tkelman pushed a commit that referenced this issue May 3, 2017
ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat

fix #21216

(cherry picked from commit ece60d0)
ref #21425

resolve an ambiguity from the AbstractFloat codegen test type
tkelman pushed a commit that referenced this issue May 4, 2017
ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation

also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat

fix #21216

(cherry picked from commit ece60d0)
ref #21425

resolve an ambiguity from the AbstractFloat codegen test type

add 0 third argument to jl_subtype for release-0.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants