Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dataids and mightalias API #51753

Open
jakobnissen opened this issue Oct 18, 2023 · 2 comments
Open

Make dataids and mightalias API #51753

jakobnissen opened this issue Oct 18, 2023 · 2 comments
Labels
design Design of APIs or of the language itself domain:arrays [a, r, r, a, y, s] domain:broadcast Applying a function over a collection kind:feature Indicates new feature / enhancement requests performance Must go faster

Comments

@jakobnissen
Copy link
Contributor

Famously, in the Julia docs, it states:

The only interfaces that are stable with respect to SemVer of julia version are the Julia Base and standard libraries interfaces described in the documentation and not marked as unstable (e.g., experimental and internal)

Base.dataids and Base.mightalias are neither in the documentation, nor are they exported. However, the docstring of dataids state:

Custom arrays that would like to opt-in to aliasing detection of their component parts can specialize this method to return the concatenation of the dataids of their component parts

Custom arrays, of course, should not extend internal Base methods that are subject to change or deletion. So either the encouragement to extend dataids should be removed, or dataids and mightalias should be documented.

However, see #50820 : I believe the current implementation of mightalias is a huge footgun. So if it's not possible to actually make this work reliably, maybe it's better to have this be explicitly marked internal, to avoid confusion.

@brenhinkeller brenhinkeller added the kind:feature Indicates new feature / enhancement requests label Oct 20, 2023
@jishnub
Copy link
Contributor

jishnub commented Dec 7, 2023

Another one to consider is Base.unaliascopy, which is now recommended in an error message on nightly. E.g.:

ArgumentError: an array of type `MyLazyArray` shares memory with another argument
  and must make a preventative copy of itself in order to maintain consistent semantics,
  but `copy(::MyLazyArray{Float64, 1})` returns a new array of type `Vector{Float64}`.
  To fix, implement:
      `Base.unaliascopy(A::MyLazyArray)::typeof(A)`

@nsajko
Copy link
Contributor

nsajko commented May 4, 2024

The fact that dataids isn't public and thus mustn't be overloaded by packages (without pinning specific Julia versions in Project.toml) causes over 2x slowdowns in the following example:

julia> using Chairmarks

julia> typeof(x)
FixedSizeVector{FixedSizeVector{FixedSizeVector{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}, Memory{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}}}, Memory{FixedSizeVector{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}, Memory{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}}}}}, Memory{FixedSizeVector{FixedSizeVector{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}, Memory{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}}}, Memory{FixedSizeVector{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}, Memory{FixedSizeVector{FixedSizeVector{Float64, Memory{Float64}}, Memory{FixedSizeVector{Float64, Memory{Float64}}}}}}}}}} (alias for FixedSizeArray{FixedSizeArray{FixedSizeArray{FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, 1, GenericMemory{:not_atomic, FixedSizeArray{Float64, 1, GenericMemory{:not_atomic, Float64, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}}, Core.AddrSpace{Core}(0x00)}})

julia> typeof(y)
Vector{Vector{Vector{Vector{Vector{Float64}}}}} (alias for Array{Array{Array{Array{Array{Float64, 1}, 1}, 1}, 1}, 1})

julia> @b x g
4.808 s (22222000 allocs: 2.483 GiB, 22.44% gc time, without a warmup)

julia> @b y g
2.357 s (44444000 allocs: 2.980 GiB, 25.67% gc time, without a warmup)

julia> Base.dataids(a::FixedSizeArray) = Base.dataids(a.mem)

julia> @b x g
2.094 s (22222000 allocs: 2.483 GiB, 22.26% gc time, without a warmup)

julia> versioninfo()
Julia Version 1.12.0-DEV.460
Commit 9d59ecc66fd (2024-05-03 17:04 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

Interpretation: without overloading dataids, g is more than two times slower for FixedSizeVector than for Vector, but when dataids gets overloaded g becomes faster for FixedSizeVector than for Vector. Profiling shows that, before overloading, dataids for FixedSizeVector spends almost all of its time in objectid.

FixedSizeVector is from the experimental package FixedSizeArrays.jl, @giordano, xref JuliaArrays/FixedSizeArrays.jl#53.

The function g is implemented like so:

f(a, b) = a + b
f(a) = f(a, a)
function g(a, n = 2000)
  T = typeof(a)::Type
  for _  Base.OneTo(n)
    a = f(a)::T
  end
  a
end

@nsajko nsajko added performance Must go faster design Design of APIs or of the language itself labels May 4, 2024
@nsajko nsajko added the domain:broadcast Applying a function over a collection label May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself domain:arrays [a, r, r, a, y, s] domain:broadcast Applying a function over a collection kind:feature Indicates new feature / enhancement requests performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants