Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused Broadcasting kills constant folding? #39151

Open
oxinabox opened this issue Jan 8, 2021 · 4 comments
Open

Fused Broadcasting kills constant folding? #39151

oxinabox opened this issue Jan 8, 2021 · 4 comments
Labels
broadcast Applying a function over a collection performance Must go faster

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Jan 8, 2021

Consider two implementations of the same function

julia> function foo2(T, c, f)
       d = T.(c)
       return sum(f .* d)
       end
foo2 (generic function with 2 methods)

julia> bar2(T, c, f) = sum(f .* T.(c))
bar2 (generic function with 3 methods)

bar2 seems like it is just better,
it avoids creating the intermidairy variable, so allows the broadcase to fuse,
and indeed with a plain vector it is:

julia> @btime foo2(Float32, [0.1,0.2], [1.0,2.0]);
  473.802 ns (9 allocations: 512 bytes)

julia> @btime bar2(Float32, [0.1,0.2], [1.0,2.0]);
  369.920 ns (8 allocations: 416 bytes)

But with a StaticArray foo2 is much better.

julia> @btime foo2(Float32, @SVector[0.1,0.2], @SVector[1.0,2.0]);
  0.031 ns (0 allocations: 0 bytes)

julia> @btime bar2(Float32, @SVector[0.1,0.2], @SVector[1.0,2.0]);
  946.536 ns (22 allocations: 528 bytes)

I think what is happening is that something about foo2 is constant fold-able for StaticArrays.
and that bar2 isn't.

Those number were taken on 1.5., but i saw similar on master

@mbauman mbauman added broadcast Applying a function over a collection performance Must go faster labels Jan 8, 2021
@mbauman
Copy link
Member

mbauman commented Jan 8, 2021

I believe this all stems from the fact that we use the mutable Ref{T} to make types broadcastable — combined (perhaps, depending on how minimal your real example is) with the fact that you don't force specialization on your T within your own function. We really should come back to #18379 and use a non-mutable wrapper here internally.

@Seelengrab
Copy link
Contributor

Seelengrab commented Jan 8, 2021

For reference, there's also #35675 where some bikeshedding on the name of an immutable variant of Ref took place.

From reading those, it seems like a lot of people agree that this would be useful, but no name has been decided on and that's why it's stalled?

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Jan 10, 2021

As @MasonProtter mentioned on Slack, instead of Ref(x) for some scalar x in a broadcast expression (x,) works better:

Ref is a mutable container and can block constant propagation. In many cases, this isn’t a big deal, but I think an immutable transparent container like a single argument Tuple is better.

With that information, I suggested using

macro nodot(x)
    :(($x,))
end

@MasonProtter
Copy link
Contributor

MasonProtter commented Jan 10, 2021

Unfortunately, that would be inappropriate for the broadcast internals themselves because single element tuples a 1 dimensional containers instead of zero dimensional like Ref.

There’s some discussion of alternatives here #35778

vtjnash added a commit that referenced this issue Nov 23, 2021
N5N3 pushed a commit to N5N3/julia that referenced this issue Nov 30, 2021
This removes the dependence on inlining for performance, so we also
remove `@inline`, since it can harm performance.

make Some type a zero-dim broadcast container (e.g. a scalar)

Replaces JuliaLang#35778
Replaces JuliaLang#39184
Fixes JuliaLang#39151
Refs JuliaLang#35675
Refs JuliaLang#43200
N5N3 pushed a commit to N5N3/julia that referenced this issue Dec 1, 2021
This removes the dependence on inlining for performance, so we also
remove `@inline`, since it can harm performance.

make Some type a zero-dim broadcast container (e.g. a scalar)

Replaces JuliaLang#35778
Replaces JuliaLang#39184
Fixes JuliaLang#39151
Refs JuliaLang#35675
Refs JuliaLang#43200
N5N3 pushed a commit to N5N3/julia that referenced this issue Dec 1, 2021
This removes the dependence on inlining for performance, so we also
remove `@inline`, since it can harm performance.

make Some type a zero-dim broadcast container (e.g. a scalar)

Replaces JuliaLang#35778
Replaces JuliaLang#39184
Fixes JuliaLang#39151
Refs JuliaLang#35675
Refs JuliaLang#43200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants