Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompilation of DataFrames #1502

Closed
bkamins opened this issue Sep 5, 2018 · 23 comments
Closed

Precompilation of DataFrames #1502

bkamins opened this issue Sep 5, 2018 · 23 comments
Assignees
Labels
non-breaking The proposed change is not breaking performance

Comments

@bkamins
Copy link
Member

bkamins commented Sep 5, 2018

I think it would be good to put some most common cases of functions used in DataFrames.jl to the module to force their precompilation as this will improve user experience (and DataFrames.jl is probably one of the packages that new users start with). Any opinions if we should do it and how (I could manually list the cases I think are relevant and code it by hand in the worst case).

@nalimilan
Copy link
Member

Good idea. We should also do this for dependencies, in particular CategoricalArrays. A reasonable way of doing that would be to run the full test suite under SnoopCompile.jl.

@quinnj
Copy link
Member

quinnj commented Sep 5, 2018

Honestly I haven't found using precompile statements to be very helpful (or at least worth the effort to generate them). The problem is that precompile still doesn't generate machine code, so it's really just the lowering cost that you're avoiding. What has been more useful is to include a simple couple calls in the top-level __init__ function that will get called every time the module is loaded. Yes, it adds a little extra time to the initial module loading, but IMO, it can be worth it to have the penalty there instead of the first time a user calls a function.

@nalimilan
Copy link
Member

Given how big is the DataFrames API, it doesn't really sound possible to call all functions from __init__. At least setting up precompilation will be useful in preparation of the future Julia version where native code will be stored.

@KristofferC
Copy link
Contributor

+1 for precompile statements not helping much unless it is paired with sysimg building (and then they help tremendously, c.f the Julia REPL).

@nalimilan
Copy link
Member

While discussing the h2oai benchmarks, it appeared that it would be useful to provide a function or script to compile functions for all common types. That would allow reporting times net of the initial compilation cost, which is generally only paid once, and in particular would allow separating it from the cost of specializing some functions on the particular operation at hand. See discussion at h2oai/db-benchmark#69.

I guess an easy way to do that would be to copy the output of SnoopCompile to an unexported function, which could be called by the h2oai benchmark, and possibly by users who want to put that in their .juliarc.

@bkamins
Copy link
Member Author

bkamins commented Jan 23, 2019

I think it would be worth to do it at least to understand what should be compiled and how long this compilation takes. Then I guess people who want to build a custom sysimg instead of using .juliarc (which is now ~/.julia/config/startup.jl right?) also would have an easier life.

@nalimilan
Copy link
Member

Right. Let's try using SnoopCompile while running the full test suite.

@bkamins
Copy link
Member Author

bkamins commented Mar 23, 2020

+1 for precompile statements not helping much unless it is paired with sysimg building

@KristofferC Is this still true for Julia 1.4?

Here is a comparison (I am annotating by with @nospecialize - not sure if this matters)

Without precompile:

julia> @time using DataFrames
  0.924471 seconds (1.39 M allocations: 88.058 MiB)

julia> df = DataFrame(rand(10,4));

julia> @time by(df, :x1, [:x2, :x3] => +);
  4.523086 seconds (15.01 M allocations: 759.657 MiB, 8.18% gc time)

With precompile:

julia> @time using DataFrames
  1.311571 seconds (1.60 M allocations: 103.156 MiB)

julia> df = DataFrame(rand(10,4));

julia> @time by(df, :x1, [:x2, :x3] => +);
  3.059164 seconds (8.85 M allocations: 458.088 MiB, 2.41% gc time)

So using precompile increases load time by ~0.4 sec, but decreases time to first result by ~1.4 sec . So in total 1 second is saved.

@KristofferC
Copy link
Contributor

It might have changed or I didn't have a correct assessment earlier. I know Plots.jl has added some and it seems to help there. So if the numbers say it helps, I say go for it.

@bkamins
Copy link
Member Author

bkamins commented Mar 24, 2020

@nalimilan - so I will put out all new @nospecialize changes in #2158 and will think what can be proposed for @nospecialize and precompile combinations in this issue (I am not yet sure how they interact in practice and have to investigate).

@bkamins
Copy link
Member Author

bkamins commented May 13, 2020

@nalimilan - I have checked what we have now. We could handle most of the compilation latency after the package is loaded if we included the following in the init file:

let
    df = DataFrame(a=1, b=2)
    g = groupby(df, :a)
    combine(g, :a => sum)
    innerjoin(df, df, on=:a, makeunique=true)
    unstack(stack(df, :a))
end;

It would make the package startup time grow from 0.9 sec to 5.5 sec. So for now this is for sure prohibitive, but maybe when we separate DataFramesBase.jl actually it could be acceptable to have such code in DataFrames.jl? It is normal for other packages to load in 5-10 seconds (I do not want to say that this is something users like of course).

(we could alternatively do precompile but I think it is simplest just to run the functions)

@KristofferC
Copy link
Contributor

KristofferC commented May 13, 2020

I really dislike packages running unnecessary code at startup. What if I only use the DataFrames functionality spuriously in my session, why are you adding 5 extra seconds of completely avoidable latency to me? It can also introduce invalidations that cause other things to have to be recompiled.

Unconditionally shifting eventual compilation cost to unconditional package load time cost seems like a very bad trade off.

Ref, JuliaData/CSV.jl#325.

@bkamins
Copy link
Member Author

bkamins commented May 13, 2020

@KristofferC - sure.

BTW: are you aware of the current state of the work towards better caching the compiled code between Julia sessions?

@nalimilan
Copy link
Member

I agree it doesn't sound like a good tradeoff to impose 5s on every load to make things faster afterwards. And we never know what kind of functions users will call, so we'd have to do that for all functions...

Would adding these to precompilation give some speedup? With SnoopCompile it should be easy to transform a script into a series of precompile statements.

@bkamins
Copy link
Member Author

bkamins commented Sep 23, 2020

As discussed with @nalimilan - for now we can add an unexported DataFrames.compile() (or something similar) to DataFrames.jl so that the user can opt-in to compile the most common methods. (would be used in H2O benchmarks for example).

Also this function can be used by the users in combination with PackageCompiler.jl.

@nalimilan - I am assigning you to this issue just to make sure it is kept track of, as this is an old issue. I hope it is OK with you.

@bkamins bkamins added the non-breaking The proposed change is not breaking label Sep 23, 2020
@nalimilan
Copy link
Member

For reference, taking the radical approach of precompiling everything that is used by the tests (#2456, which takes more than 3 minutes!), here are the timings I get in a fresh session:

julia> @time using DataFrames
3.241466 seconds (2.29 M allocations: 187.625 MiB, 1.72% gc time)

julia> df = DataFrame(x=[1, 1, 2, 2], y=rand(4));

julia> @time combine(groupby(df, :x), :y => sum);
  3.899736 seconds (4.45 M allocations: 236.128 MiB, 8.39% gc time)

As opposed to current master:

julia> @time using DataFrames
  0.981128 seconds (1.23 M allocations: 81.321 MiB)

julia> df = DataFrame(x=[1, 1, 2, 2], y=rand(4));

julia> @time combine(groupby(df, :x), :y => sum);
  6.365692 seconds (16.59 M allocations: 850.439 MiB, 7.69% gc time)

So it looks like precompilation can make a difference, but at the cost of increasing the load time. Though clearly we don't want to precompile so many things. Maybe we could identify a few priorities which would give a reasonable tradeoff.

@bkamins
Copy link
Member Author

bkamins commented Sep 27, 2020

Thank you for working on it. Clearly there is too much in the precompilation (as e.g. there are many very specific precompilations that relate to concrete field column, like :a, :b etc.). How would you prefer to approach to pruning of the proposed list?.
In general I think we should precompile the methods related with core functionalities like: constructor, groupby, combine, sort, etc..

Also there is an issue of maintenance of precompile. The reason is that when we change functionality the precompile body should change. Should we re-run it every time we make a release?

@nalimilan
Copy link
Member

If we keep track of the code needed to generate the precompile statements, we can run it for each new major release without too much work. It's not the end of the world if it's not completely up to date though.

One way to compile only the major functions is to use @snoopi tmin=0.1 include("test/runtests.jl"). With that threshold I get about 200 precompile statements, which make precompilation take 30s at the first using DataFrames (versus 6s on master). This gives interesting speedups already (for some reason this one is even a bit faster than with the full precompilation):

julia> @time using DataFrames
  1.832080 seconds (1.73 M allocations: 126.133 MiB)

julia> df = DataFrame(x=[1, 1, 2, 2], y=rand(4));

julia> @time combine(groupby(df, :x), :y => sum);
  3.611778 seconds (5.33 M allocations: 281.904 MiB, 7.17% gc time)

We could use a higher threshold -- but with tmin=0.5 I get only ~5 statements.

@bkamins
Copy link
Member Author

bkamins commented Sep 27, 2020

Actually for H2O benchmark I think that it would be even sensible to build a custom Julia image (I guess this is OK with the H2O rules - as normally user would do this).

@nalimilan
Copy link
Member

Actually for H2O benchmark I think that it would be even sensible to build a custom Julia image (I guess this is OK with the H2O rules - as normally user would do this).

That would be more complex for a limited gain, wouldn't it?

@bkamins
Copy link
Member Author

bkamins commented Sep 27, 2020

It would, but it would "kind of" show the real recommended deployment pattern.

@KristofferC
Copy link
Contributor

Often it isn't too bad to write the statements manually. Having a few well targeted statements usually gets you quite a long way.

@nalimilan
Copy link
Member

Fixed by #2456.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking The proposed change is not breaking performance
Projects
None yet
Development

No branches or pull requests

4 participants