Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is too hard to reason about when Julia specializes arguments #51423

Open
jakobnissen opened this issue Sep 22, 2023 · 4 comments
Open

It is too hard to reason about when Julia specializes arguments #51423

jakobnissen opened this issue Sep 22, 2023 · 4 comments
Labels
performance Must go faster

Comments

@jakobnissen
Copy link
Contributor

jakobnissen commented Sep 22, 2023

Background

Julia users quickly learn that Julia specializes code for the types of every argument, and that every function is its own type.
However, invariably, users will try to write a function like this:

function foo(f, v)
    v2 = map(f, v)
    y = 0
    for i in v2
        y += i
    end
    y
end

And find that it allocates like crazy:

julia x = collect(1:1_000_000);

julia> @time foo(isodd, x);
  0.084467 seconds (2.00 M allocations: 31.448 MiB)

This is extraordinarily unintuitive, but of course this is a well known issue.

Strangely, there are various ways you can regain specialization, for example here - where I use f in the function:

function foo(f, v)
    v2 = map(f, v)
    y = f(first(v)) - f(first(v)) # zero, but uses f
    for i in v2
        y += i
    end
    y
end

You also can enable specialization by annotating f::F.
And you can disable specialization using @nospecialize - but look at what happens if you do these toggether:

function foo(@nospecialize(f::F), v) where F
    v2 = map(f, v)
    y = 0
    for i in v2
        y += i
    end
    y
end

julia> @time foo(isodd, x);
  0.001473 seconds (3 allocations: 976.625 KiB)

@nospecialize is not respected. Even when you apply @nospecialize to the whole function:

julia> @nospecialize function foo(f::F, v) where F
           v2 = map(f, v)
           y = 0
           for i in v2
               y += i
           end
           y
       end

julia> @time foo(isodd, x);
ERROR: UndefVarError: `foo` not defined

Oops - @nospecialize silently deleted the function there.

It seems very hard to understand when Julia specializes and when it does not.
Luckily, we have code introspection tooling like @code_warntype - but of course, this doesn't work in this case and actively misleads the investigator as to what is wrong (a known issue: #32834).

Furthermore, it's quite unclear what kind of specialization exactly is blocked by @nospecialize - as a user, the mental model of @nospecialize is presumably: "Julia just treats this @nospecialize(x::Integer) as an Integer, nothing else". But this isn't the case, and the actual semantics of @nospecialize is kind of complex, see #41931.

In short:

This issue

While each of the individual issues linked to are already known, I think the overall issue really is this: What can we expect a Julia user to intuit or learn about specialization?

If specialization is critical for fast code, then users must be able to have their code specialize, which means they must be able have a working understanding of when it specializes. I argue that it is currently so hard to understand what is happening that you need to be a Julia compiler developer, or have years of Julian experience, to really understand what is happening.

If specialization is considered a compiler implementation detail, then it is unacceptable that Julia chooses to despecialize code seemingly arbitrarily

Possible solutions

Perhaps the simplest solution would be: Julia always specializes, and always respects @nospecialize.

Perhaps effects could be used to achieve this: In the original function map(f, v) in this post, this could be made to specialize because Julia knew that the result of the map was used in the rest of the function. This way, lack of specialization would never affect type stability and therefore could be more of a compiler implementation detail, with the semantics being that Julia always specializes. This would also make #32834 less critical to fix, and would enable us to remove the section from the Julia performance tips.

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 22, 2023

Related (maybe even a duplicate?): #11339

@brenhinkeller brenhinkeller added the performance Must go faster label Sep 22, 2023
@andrewjradcliffe
Copy link
Contributor

andrewjradcliffe commented Sep 27, 2023

Not mentioned above, but there is another place that specialization (or lack thereof) matters: type inference which depends on recursion depth permitted by the Julia compiler. That is, recursive types, which, with some coaxing (without using internals), the Julia compiler will infer. The pain begins when the compiler heuristics change, invaliding whatever scheme permitted inference to work when the code was written.

The need to avoid combinatorial explosion is quite real with type inference, thus, your humble author understands why such heuristics must be in place. Unfortunately, this is the reason I avoid writing Julia code, as I find that I must re-write it a few months later, despite the fact that I am not doing anything which involves language internals. It very well could be a case of user error, i.e. such code should simply not be written.

@jakobnissen
Copy link
Contributor Author

I would argue that this is not a matter of specialization: In the situation you describe, Julia does not choose to despecialize in that case even though it could. It can't actually specialize recursive functions generally, since then inference could never stop.
In contrast, in the above examples, Julia could absolutely specialize.

@andrewjradcliffe
Copy link
Contributor

Under specific conditions, this repository exhibited said problem, or, at least it did at the time of writing (tests were performed using master at August 2022, which was 1.9); here are some concrete examples.

To explain what is going on there: a sequence of function calls, each of which is wrapped in a type -- one can (and should) view it as a singly-linked list of static size, expressed in the type domain. Usage of said list occurs as follows: unwrap a node, apply the function, then proceed to the next node, stopping (via dispatch on Nothing type) if the next node is nothing. The list is just a parametric type, which, when a concrete instantiation is used, it is consumed via recursive calls; each call leads to a function invocation on a new type (if the initial list had N elements, the next call has N-1 elements).
What is it used for? Re-constructing trees from various tabular representations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants