Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinterpret with NTuplesand types has bad effects, maybe unnecessarily #52047

Open
gbaraldi opened this issue Nov 6, 2023 · 6 comments
Open
Labels
compiler:effects effect analysis

Comments

@gbaraldi
Copy link
Member

gbaraldi commented Nov 6, 2023

It calls into _reintepret which does unsace_convert, which julia does not like. This causes this code which will compile down to very simple instructions to not get either inlined or const propped.

@gbaraldi gbaraldi changed the title Reinterpret with NTuplesand types has really bad effects Reinterpret with NTuplesand types has bad effects, maybe unnecessarily Nov 6, 2023
@Seelengrab
Copy link
Contributor

Did you mean to link to some piece of code here?

@Seelengrab Seelengrab added the compiler:effects effect analysis label Nov 10, 2023
@nsajko
Copy link
Contributor

nsajko commented Nov 10, 2023

some piece of code

julia> f(n::UInt32) = reinterpret(NTuple{4,UInt8}, n)
f (generic function with 1 method)

julia> f(0x12345678)
(0x78, 0x56, 0x34, 0x12)

julia> Base.infer_effects(f, Tuple{UInt32})
(!c,!e,!n,!t,!s,!m,!u)′

julia> versioninfo()
Julia Version 1.11.0-DEV.856
Commit e7345b89fd4 (2023-11-07 02:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 11 on 8 virtual cores
Environment:
  JULIA_NUM_PRECOMPILE_TASKS = 3
  JULIA_PKG_PRECOMPILE_AUTO = 0
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8

This could just be implemented with a couple of cheap shifts, so I guess Julia should use that approach, too?

@nsajko
Copy link
Contributor

nsajko commented Nov 12, 2023

On second thought, fixing this might not be worth it, or actually it might even be preferable to leave it as is. This is because reinterpreting tuples as primitive integers (or vice versa) is subtly error-prone:

  1. In almost all cases, the result depends on the host endianess.
  2. In the case of heterogeneous tuples, the result may depend on the padding, possibly involving undefined behavior, ref ReinterpretArray and nested structure padding #29605, ReinterpretedArray undefined behavior when accessing padding bytes #41071.

So I think fixing this would actually encourage bugs.

Furthermore, IMO, using reinterpret on non-primitive types should perhaps be forbidden in Julia v2.

@oscardssmith
Copy link
Member

the result depends on the host endianess

This doesn't matter that much because at this point everything is little-endian.

@mikmoore
Copy link
Contributor

mikmoore commented Jan 3, 2024

Here is a benchmark comparing reinterpret against a manual LLVM version (modeled after actual code in Random/src/XoshiroSimd.jl) in the v1.10.0 release:

julia> using BenchmarkTools

julia> @eval tupcast(::Type{NTuple{8,UInt32}}, x::NTuple{4,UInt64}) = map(z->z.value, Core.Intrinsics.llvmcall("%res = bitcast <4 x i64> %0 to <8 x i32>\nret <8 x i32> %res", NTuple{8,Core.VecElement{UInt32}}, Tuple{NTuple{4,Core.VecElement{UInt64}}}, map(Core.VecElement,x)));

julia> x = [ntuple(_->rand(UInt64), Val(4)) for _ in 1:2^10]; y = similar(x, NTuple{8,UInt32});

julia> @btime $y .= reinterpret.(NTuple{8,UInt32}, $x);
  2.400 μs (0 allocations: 0 bytes)

julia> @btime $y .= tupcast.(NTuple{8,UInt32}, $x);
  1.180 μs (0 allocations: 0 bytes)

julia> z = similar(x);

julia> @btime $z .= reinterpret.(NTuple{4,UInt64}, $x); # reinterpret is a no-op
  2.311 μs (0 allocations: 0 bytes)

julia> @btime $z .= $x; # compare to copy
  534.896 ns (0 allocations: 0 bytes)

Note that tupcast achieves 2x the throughput on my x86 machine, relative to reinterpret. Also notice that the no-op case reinterpret(::Type{T}, x::T) where T is causing a 4x slowdown relative to a direct copy. From the @code_native, it appears that reinterpret is indeed not being inlined, as suggested in the original issue.


I've been interested in this functionality for SIMD generation of random numbers requiring further computations. In that case, all the bits of the input tuple are already random so I'm happy to reinterpret them without regard to endianness etc.

@N5N3
Copy link
Member

N5N3 commented Jan 4, 2024

A simple fix for this struct_subpading case is replacing memcpy with unsafe_load, just like #43955. The effect inference should still be bad but at least our compiler would like to inline it for this simple case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:effects effect analysis
Projects
None yet
Development

No branches or pull requests

6 participants