Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce invalidations when loading JuliaData packages #47889

Merged
merged 3 commits into from
Dec 15, 2022
Merged

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Dec 13, 2022

This fixes some invalidations that hinder both CSV (@quinnj) and DataFrames (@bkamins and @nalimilan). Both packages were benchmarked in the discussion of #47184 and @giordano noted that DataFrames had a large load-time regression.

This PR, on top of #47184, together with JuliaLang/Pkg.jl#3275 delivers an unqualified gain in the upcoming Julia 1.9 (workloads are defined in detail farther below):

Task Julia-1.8 PR 47184 47184 + this PR
using CSV 0.72 0.64 0.44
CSV.File(...) 7.60 2.43 1.59
using DataFrames... 1.40 4.82 0.95
DataFrames TTFX 9.89 7.24 3.44

The substantial load-time penalty on "1.9" with just #47184 is explained by the fact that Base.require is among the invalidated targets, and therefore has to be recompiled while DataFrames is being loaded. This PR fixes that.

Here are the workloads:

  • using CSV: @time using CSV

  • CSV.File(...): @time @eval CSV.File(joinpath(pkgdir(CSV), "test", "testfiles", "precompile.csv"))

  • using DataFrames...: @time begin using PooledArrays: PooledArrays, PooledArray; using DataFrames, Statistics; end

  • DataFrames TTFX: uses the precompile workload.

CC @vchuravy, @vtjnash

@bkamins, one thing I also noted is that loading both DataFrames and CSV (either before or after, order shouldn't matter) invalidates some of the code in DataFrames. Happy to consult with you about fixing it if you need help. precompile_blockers seems useful in this context, as it led me directly to some DataFrames code that wasn't very inferrable.

@timholy timholy added compiler:latency Compiler latency backport 1.9 Change should be backported to release-1.9 labels Dec 13, 2022
@@ -830,7 +830,7 @@ julia> hex2bytes(a)
"""
function hex2bytes end

hex2bytes(s) = hex2bytes!(Vector{UInt8}(undef, length(s) >> 1), s)
hex2bytes(s) = hex2bytes!(Vector{UInt8}(undef, length(s)::Int >> 1), s)
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly controversial. However, JuliaHub does not list an InfiniteStrings package (there is an InifiniteArrays package).

base/array.jl Show resolved Hide resolved
Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo; thanks @timholy!

base/array.jl Show resolved Hide resolved
base/array.jl Show resolved Hide resolved
@timholy timholy added the status:merge me PR is reviewed. Merge when all tests are passing label Dec 14, 2022
@aviatesk
Copy link
Sponsor Member

The doctest failure seems to come from changes in this PR?

@timholy
Copy link
Sponsor Member Author

timholy commented Dec 15, 2022

Hmm, passes for me locally.

@timholy
Copy link
Sponsor Member Author

timholy commented Dec 15, 2022

@timholy timholy merged commit e84634e into master Dec 15, 2022
@timholy timholy deleted the teh/invs_data branch December 15, 2022 09:39
@giordano
Copy link
Contributor

The doctest failure seems to come from changes in this PR?

They're failing only on the new AWS runners, I'm not sure those are good doctests if results can be slightly different on different CPUs

@giordano giordano removed the status:merge me PR is reviewed. Merge when all tests are passing label Dec 15, 2022
KristofferC pushed a commit that referenced this pull request Dec 16, 2022
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants