reinterpret segfault #21216

quinnj · 2017-03-29T03:44:00Z

The following behavior only seems to exist on linux current master (1dd10aa). Discovered in the DecFP.jl package (current master), travis logs here. Repro:

julia> using DecFP

julia> Base.bitcast(UInt128, Dec128(1.5))
0x303e000000000000000000000000000f

julia> reinterpret(UInt128, Dec128(1.5))

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
reinterpret at ./essentials.jl:0
unknown function (ip: 0x7f71512b8a25)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
do_call at /home/centos/buildbot/slave/package_tarball64/build/src/interpreter.c:75
eval at /home/centos/buildbot/slave/package_tarball64/build/src/interpreter.c:242
jl_interpret_toplevel_expr at /home/centos/buildbot/slave/package_tarball64/build/src/interpreter.c:34
jl_toplevel_eval_flex at /home/centos/buildbot/slave/package_tarball64/build/src/toplevel.c:577
jl_toplevel_eval_in at /home/centos/buildbot/slave/package_tarball64/build/src/builtins.c:484
eval at ./boot.jl:235
unknown function (ip: 0x7f736cdb766f)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7f7151278b76)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:73
unknown function (ip: 0x7f7151275caf)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:2234
jl_apply at /home/centos/buildbot/slave/package_tarball64/build/src/julia.h:1416 [inlined]
start_task at /home/centos/buildbot/slave/package_tarball64/build/src/task.c:261
unknown function (ip: 0xffffffffffffffff)
Allocations: 3683352 (Pool: 3681747; Big: 1605); GC: 5
Segmentation fault

As far as I can tell, reinterpret(UInt128, x) just calls Base.bitcast, so I'm not sure why one segfaults and the other seems to work just fine.

The text was updated successfully, but these errors were encountered:

yuyichao · 2017-03-29T04:10:25Z

Seems to be an alignment issue for fp128. Base.bitcast is an intrisic and not a normal function and that's the difference.

vchuravy · 2017-03-29T13:11:30Z

Seems to be an alignment issue for fp128.

You mean alignment issue for primitive type 128? We shouldn't lower anything to Float128 as of yet.

yuyichao · 2017-03-29T13:17:55Z

I mean fp128. I assume Dec128 is a floating point type.

vchuravy · 2017-03-29T13:29:23Z

Dec128 is defined as primitive type Dec128 <: DecimalFloatingPoint 128 end https://github.com/stevengj/DecFP.jl/blob/0a99fbeac8f8ec6fdfb3a96945c3cd014b53ef1c/src/DecFP.jl#L44

yuyichao · 2017-03-29T13:40:50Z

So it is a floating point type.

quinnj · 2017-03-29T13:57:53Z

Is it because DecimalFloatingPoint <: AbstractFloat? Does bitcast somehow take that into account?

yuyichao · 2017-03-29T14:17:37Z

Is it because DecimalFloatingPoint <: AbstractFloat?

Yes.

Does bitcast somehow take that into account?

What do you mean "take that into account"? The main difference is likely that one is compiled and the other is not.

stevengj · 2017-03-29T18:58:39Z

Possibly related problem, observed on MacOS:

julia> collect(v for i in 1:1, v in zeros(Dec64, 1))
1×1 Array{DecFP.Dec64,2}:
 +0E+0

julia> collect(v for i in 1:1, v in zeros(Dec128, 1))
1×1 Array{DecFP.Dec128,2}:
 +4639814192E-6176

Something about accessing a 128-bit primitive type (formerly bitstype) seems wonky.

stevengj · 2017-03-29T19:02:23Z

Here is a minimal example that exhibits the problem on MacOS, without requiring the DecFP module:

julia> abstract type DecimalFloatingPoint <: AbstractFloat end

julia> primitive type Dec128 <: DecimalFloatingPoint 128 end

julia> collect(v for i in 1:1, v in [reinterpret(Dec128, UInt128(0))])
1×1 Array{Dec128,2}:
 Dec128(0x000000011428beb0ffffffffffffffff)

(The output varies, as you might expect: it is clearly reading memory it is not supposed to. @rdeits saw a segfault from the same code, so I think this is probably the same as originally reported above.)

stevengj · 2017-03-29T19:06:09Z

In Julia 0.5, there is a different error:

abstract DecimalFloatingPoint <: AbstractFloat
bitstype 128 Dec128 <: DecimalFloatingPoint
collect(v for i in 1:1, v in [reinterpret(Dec128, UInt128(0))])

gives

ERROR: BoundsError: attempt to access 1-element Array{Dec128,1} at index [2]
 in prod_next at ./iterator.jl:527 [inlined]
 in next at ./iterator.jl:538 [inlined]
 in next at ./generator.jl:25 [inlined]
 in collect_to!(::Array{Dec128,2}, ::Base.Generator{Base.Prod2{UnitRange{Int64},Array{Dec128,1}},##1#2}, ::Int64, ::Tuple{Int64,Int64,Nullable{Dec128},Bool}) at ./array.jl:340
 in collect(::Base.Generator{Base.Prod2{UnitRange{Int64},Array{Dec128,1}},##1#2}) at ./array.jl:308

yuyichao · 2017-03-29T19:09:13Z

0.6 behavior is probably dup of #18019, 0.5 behavior is probably related to #20078

stevengj · 2017-03-29T19:10:42Z

@yuyichao, how can the 0.6 behavior be a dup of a closed issue?

yuyichao · 2017-03-29T19:11:39Z

Because both are round trip reinterpret through a floating point type.

stevengj · 2017-03-29T19:13:40Z

It is a user-defined floating-point type that should be treated as just an opaque bitbucket, no different from primitive type Foo 128 end (which apparently works fine). The ABI should not be involved.

Regarding the 0.5 issue, it goes away for bitstype 64 Dec64 <: DecimalFloatingPoint, so it seems unrelated to #20078.

yuyichao · 2017-03-29T19:14:53Z

User defined floating point primitive types are mapped to machine floating point types. You can use struct Dec128 <: AbstractFloat x::UInt128 end if you don't want this behavior.

Keno · 2017-03-29T19:15:15Z

primitive types are special if they are <: FloatingPoint. Bit buckets should use opaque structures.

Keno · 2017-03-29T19:16:21Z

We could consider changing this and introducing something like abstract MachineFloatingPoint <: FloatingPoint, but there isn't really much of a point to user-defined primitive types anyway.

rdeits · 2017-03-29T19:16:54Z

It's not clear to me how #18019 explains:

  | | |_| | | | (_| |  |  Version 0.6.0-pre.alpha.265 (2017-03-27 13:16 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit f65b8aba9a (2 days old master)
|__/                   |  x86_64-apple-darwin13.4.0

julia> using DecFP

julia> x = zeros(Dec128, 1);

julia> collect(v for i in 1:1, v in x)
[1]    32448 segmentation fault  julia-0.6

stevengj · 2017-03-29T19:19:38Z

Julia exports primitive type, and my understanding is that it was precisely for use in memory-efficient bitbuckets, to save you the trouble of defining struct fields that you don't actually want to access. If we define this feature, then it shouldn't be broken, and having it arbitrarily corrupt some bitbuckets depending on their supertype seems broken to me.

abstract MachineFloatingPoint <: FloatingPoint seems like a reasonable solution.

yuyichao · 2017-03-29T19:19:40Z

Re MachineFloatingPoint, we can change this bahavior when we can have Float128 in Base. (so just treat Float(16|32|64|128) special). For now using a struct should be easy enough....

#18019 is about round trip bitcast. The segfault is related to alignment of fp128 as I mentioned in my very first comment....

stevengj · 2017-03-29T19:20:52Z

Why can't we do MachineFloatingPoint now? Someone defining an IEEE Float128 type in a package can still use it.

Keno · 2017-03-29T19:22:34Z

Because primitive types aren't useful for what you're trying to do. The whole notion of primitive type is a bit broken, because outside the standard ones, LLVM doesn't really support them. Probably what we should have is this is the LLVM type this corresponds to option.

stevengj · 2017-03-29T19:23:31Z

If the whole notion is broken, why do we have them?

Keno · 2017-03-29T19:23:48Z

Because we need some way to bootstrap.

Keno · 2017-03-29T19:26:30Z

More fundamentally though, the only thing you can do with a primitive type is pass it to intrinsics. The fact that you can pass them to ccall at all is a bit of an accident, because the C ABI does very much care about the identity of a primitive type, not just how large it is.

stevengj · 2017-03-29T19:27:58Z

So they shouldn't be documented? Currently, the manual says:

A primitive type is a concrete type whose data consists of plain old bits. Classic examples of primitive types are integers and floating-point values. Unlike most languages, Julia lets you declare your own primitive types, rather than providing only a fixed set of built-in ones.

To be honest, I find this much nicer than declaring structs with dummy members. I don't see why the concept is "broken" just because LLVM doesn't directly support this — you can always map a primitive type internally to a struct with dummy members that pad to the requested width.

Given that we document and export this feature, having it be broken for subtypes of FloatingPoint (because they aren't treated as "plain old bits") still seems bad to me.

stevengj · 2017-03-29T19:51:08Z

Can't we just

use MachineFloatingPoint (or HardwareFloatingPoint) for primitive types that are supposed to map to machine fp registers and ABI
document that any other primitive type is effectively equivalent to UIntXX in terms of storage and ABI
throw an error for any bit width that is currently not supported reliably — support more multiple-of-8 bit widths (≠ 8,16,32,64,128), possibly in the future if this doesn't work now, by making them equivalent to an anonymous struct?

?

Keno · 2017-03-29T19:56:24Z

julia> struct DecFP64
       x::UInt64
       DecFP64(x) = new(conversion(x))
       Base.reinterpret(::Type{DecFP64}, x::UInt64) = new(x)
       end

julia> reinterpret(DecFP64, UInt64(0))
DecFP64(0x0000000000000000)

julia> DecFP64(0)
ERROR: UndefVarError: conversion not defined
Stacktrace:
 [1] DecFP64(::Int64) at ./REPL[8]:3

… in JuliaLang/julia#21216

stevengj · 2017-03-29T20:24:44Z

Thanks @Keno, that works, and I'll do that anyway since DecFP still supports 0.5.

But I still would prefer something like my 3-step suggestion above, with the goal of making primitive useful for bitbucket types, and more convenient than anonymous structs.

Keno · 2017-03-29T20:26:40Z

They aren't more convenient though. The fact that reinterpret works is mostly an accident. If you want to do anything with them you have to pass them to intrinsics. For convenience, a macro would do fine.

vtjnash · 2017-03-29T21:08:30Z

I would like to think that all of the work I've put into defining the ccall ABI wasn't too much of an implementation accident. stevengj is correct that they should (including per the ccall document) get handled as opaque bitstypes. We don't need MachineFloatingPoint, the types Float64 and Float32 are explicitly called out when needed in the ABI specification as being the only types compatible with double and float (respectively).

The question how they get aligned and laid out in a struct is a bit of a different matter. We have an explicit VecElement wrapper type specifically to override the default alignment. It's rather annoying that C doesn't treat all primitive types the same, since it is ambiguous which C compiler's non-standard extension we should attempt to be compatible with when we define IEEEFloat128. As Keno said, we should probably require an annotation. We now even have potential syntax for it :)

primitive type IEEEFloat128 <: AbstractFloat
    alignment::128
end

(although currently the parser doesn't read that correctly)

yuyichao · 2017-03-29T21:30:14Z

Alignment isn't the only issue though. fp128 has different calling convention even when they have the same alignment.

Keno · 2017-03-29T21:32:12Z

Yes, we could expand the primitive type definition to specify all the various properties (alignment, memory size, loaded size, kind (integer, floating point, double double, long fp)), but that's not what it does at the moment, and at the moment I can't think of any cases where a user can meaningfully use this.

vtjnash · 2017-03-29T21:56:30Z

fp128 is also not part of the platform ABI, since it isn't part of the C standard. It's true that you can use non-portable compiler extensions, but that's not usually relevant and shouldn't form part of that library's API.

yuyichao · 2017-03-29T22:27:14Z

fp128 is part of the platform ABI on some platforms (e.g. AArch64) so we should be able to express that.

yuyichao · 2017-03-29T22:28:12Z

(i.e. it's not less portable than VecElement ABIs)

stevengj · 2017-03-29T22:40:27Z

Just express it by MachineFloatingPoint.

JeffBezanson · 2017-03-30T02:23:36Z

I'm certain this can be made to do something other than segfault. Can we lower everything except the few special cases in the C ABI to llvm integer types?

yuyichao · 2017-03-30T02:54:24Z

The representation issue unrelated to the segfault. We always need to have a type to represent fp128 and that type is currently having the wrong alignment.

avoids casting away floating point bits until used in a floating-point operation also ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation fix #21216

ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat fix #21216

ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat fix JuliaLang#21216

ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat fix #21216 (cherry picked from commit ece60d0) ref #21425

ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat fix #21216 (cherry picked from commit ece60d0) ref #21425 resolve an ambiguity from the AbstractFloat codegen test type

ensures that alignment is consistent in the LLVM struct representation as in the Julia struct representation computation also helps ensure that the floating point NaN payload bits won't accidentally get discarded for non-IEEE-754 subtypes of AbstractFloat fix #21216 (cherry picked from commit ece60d0) ref #21425 resolve an ambiguity from the AbstractFloat codegen test type add 0 third argument to jl_subtype for release-0.5

quinnj mentioned this issue Mar 29, 2017

Dec128 Julia codegen bug JuliaMath/DecFP.jl#37

Closed

JeffBezanson added kind:bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code labels Mar 29, 2017

rdeits mentioned this issue Mar 29, 2017

BoundsError in transpose(::Vector{Dec128}) JuliaMath/DecFP.jl#38

Closed

stevengj added a commit to JuliaMath/DecFP.jl that referenced this issue Mar 29, 2017

use immutable/struct wrapper rather than primitive type, as suggested…

381066b

… in JuliaLang/julia#21216

This was referenced Mar 30, 2017

Towards Julia 0.6 compatibility JuliaRobotics/RigidBodyDynamics.jl#197

Merged

Illegal instruction with 0.6 #21243

Closed

JeffBezanson added this to the 0.6.x milestone Apr 4, 2017

vtjnash mentioned this issue Apr 17, 2017

correct the codegen of floating point subtypes #21425

Merged

vtjnash closed this as completed in #21425 Apr 20, 2017

simonbyrne mentioned this issue Feb 18, 2019

Float128 calling convention and alignment JuliaMath/Quadmath.jl#4

Closed

This was referenced Mar 15, 2019

Upgrade to latest release (v2.2) JuliaMath/DecFP.jl#90

Closed

ccall with user defined type on ARM #31333

Closed

reinterpret segfault #21216

reinterpret segfault #21216

Comments

quinnj commented Mar 29, 2017 • edited Loading

yuyichao commented Mar 29, 2017

vchuravy commented Mar 29, 2017 • edited Loading

yuyichao commented Mar 29, 2017

vchuravy commented Mar 29, 2017

yuyichao commented Mar 29, 2017

quinnj commented Mar 29, 2017

yuyichao commented Mar 29, 2017

stevengj commented Mar 29, 2017

stevengj commented Mar 29, 2017 • edited Loading

stevengj commented Mar 29, 2017

yuyichao commented Mar 29, 2017

stevengj commented Mar 29, 2017

yuyichao commented Mar 29, 2017

stevengj commented Mar 29, 2017 • edited Loading

yuyichao commented Mar 29, 2017 • edited Loading

Keno commented Mar 29, 2017

Keno commented Mar 29, 2017

rdeits commented Mar 29, 2017

stevengj commented Mar 29, 2017

yuyichao commented Mar 29, 2017

stevengj commented Mar 29, 2017

Keno commented Mar 29, 2017 • edited Loading

stevengj commented Mar 29, 2017

Keno commented Mar 29, 2017

Keno commented Mar 29, 2017

stevengj commented Mar 29, 2017

stevengj commented Mar 29, 2017 • edited Loading

Keno commented Mar 29, 2017

stevengj commented Mar 29, 2017 • edited Loading

Keno commented Mar 29, 2017

vtjnash commented Mar 29, 2017

yuyichao commented Mar 29, 2017

Keno commented Mar 29, 2017

vtjnash commented Mar 29, 2017

yuyichao commented Mar 29, 2017

yuyichao commented Mar 29, 2017

stevengj commented Mar 29, 2017

JeffBezanson commented Mar 30, 2017

yuyichao commented Mar 30, 2017

quinnj commented Mar 29, 2017 •

edited

Loading

vchuravy commented Mar 29, 2017 •

edited

Loading

stevengj commented Mar 29, 2017 •

edited

Loading

stevengj commented Mar 29, 2017 •

edited

Loading

yuyichao commented Mar 29, 2017 •

edited

Loading

Keno commented Mar 29, 2017 •

edited

Loading

stevengj commented Mar 29, 2017 •

edited

Loading

stevengj commented Mar 29, 2017 •

edited

Loading