Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the efficiency of union splitting for large unions #44131

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

j-fu
Copy link

@j-fu j-fu commented Feb 11, 2022

  • Benchmarks show that probably since RFC: inference: remove union-split limit for linear signatures #37378 (1.6.0), union splitting is efficient for large unions
  • The union splitting example proposed here has jldoctests which work for unions of 5 types
  • Information on efficiently handling collections of different types is scattered around in discourse threads, including macro based solutions for implementing manual dispatch which may be not necessary, as the functionality they cover is mostly available in Julia.

* Benchmarks (https://discourse.julialang.org/t/avoiding-vectors-of-abstract-types/61883/15)
  show that since JuliaLang#37378 (1.6.0), union splitting is efficient for large unions
* The union splitting example proposed has jldoctests which work for unions of 5 types
@j-fu
Copy link
Author

j-fu commented Feb 15, 2022

Notice: figured out that this does not seem to work with dispatching on two unions. May be this is meant by "linear" in #37378... Needs to be documented as well.

@filchristou
Copy link
Sponsor Contributor

Hello, please also have a look at this post:
https://discourse.julialang.org/t/avoiding-vectors-of-abstract-types/61883/19?u=filchristou

Union splitting seems to really have a bottleneck when used for structs, for which the fields need to be accessed.

Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's real value here, but it probably needs to be integrated with other performance tips. This seems to be an expanded version of https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-abstract-container. In particular I think illustrating the manual-dispatch alternative would be a worthwhile addition. However, I would strive for as much brevity as possible, and if a more "demo"/discoursive style is needed then perhaps link to external blog post?

@@ -851,6 +851,113 @@ or thousands of variants compiled for it. Each of these increases the size of th
code, the length of internal lists of methods, etc. Excess enthusiasm for values-as-parameters
can easily waste enormous resources.

## ["Unionize" collections](@id unionize-collections)

When working e.g with agent based models or finite elements with varying element geometries, a common pattern is the occurence of collections (e.g. Vectors) of objects of different types on which one wants to perform certain actions depending on their type. By default, the element type of a vector of objects of different struct types is a common supertype, often `Any`. For dispatch -- choosing the right method of a function to be applied -- the compiler needs
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working e.g with agent based models or finite elements with varying element geometries

I use collections of heterogeneous object types all the time and never work with either agent-based models or finite elements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will second Tim's comment, and also mention that many (most?) readers won't know what "agent based models" or "finite elements with varying element geometries" means, so this is a barrier for understanding the point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the hints, and the further explanations on discourse. I tend agree with your them and will work on an upgrade after I learned more from @timholy about the different facets of dispatch.

Copy link
Contributor

@Tortar Tortar Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in Agents.jl you can see a comparison on the same operation with multiple types vs only one: https://github.com/JuliaDynamics/Agents.jl/blob/main/test/performance/variable_agent_types_simple_dynamics.jl (results at the end of it), from 4 types on the impact is big

## ["Unionize" collections](@id unionize-collections)

When working e.g with agent based models or finite elements with varying element geometries, a common pattern is the occurence of collections (e.g. Vectors) of objects of different types on which one wants to perform certain actions depending on their type. By default, the element type of a vector of objects of different struct types is a common supertype, often `Any`. For dispatch -- choosing the right method of a function to be applied -- the compiler needs
to assume that new matching types can be added after compilation. Thus arises the need for expensive [dynamic dispatch](https://discourse.julialang.org/t/dynamic-dispatch/6963/2) at runtime.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just the ability to add new methods/types: if we specialized on all potential calls in the known world, compilation would never finish.


struct T1 end

f(::T1,x)=1x
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x parameter seems superfluous and adds complexity

```


With __"manual dispatch"__, each time when c is accessed as a function parameter, due to the test via `isa`, the compiler knows the type of `c` and can choose the proper method of `f` at compile time, resulting in signficant savings at runtime:
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting these manual-dispatch blocks help only when T is a concrete type. A block like

if isa(c, AbstractVector)
    s += f(c)
elseif isa(c, AbstractDict)
    s += f(c)
...
end

can actually hurt performance (though it can occasionally protect you from invalidation).

@@ -531,8 +531,7 @@ ERROR: TypeError: in typeassert, expected Union{Int64, AbstractString}, got a va

The compilers for many languages have an internal union construct for reasoning about types; Julia
simply exposes it to the programmer. The Julia compiler is able to generate efficient code in the
presence of `Union` types with a small number of types [^1], by generating specialized code
in separate branches for each possible type.
presence of `Union` types, by generating specialized code in separate branches for each possible type.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the limit has been lifted for specific situations, as you discovered it's not completely gone. Maybe best to acknowledge there are still limits?

@Tortar
Copy link
Contributor

Tortar commented Jul 10, 2024

Seems there aren't anymore performance drops with many types in a Union in 1.11!!! https://discourse.julialang.org/t/avoiding-vectors-of-abstract-types/61883/19?u=filchristou as many other example I tried have no dynamic dispatch even with many types. It's probably worth adding this section now in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants