Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add precompile runs for xparse and xparse2 #108

Merged
merged 5 commits into from Feb 11, 2022
Merged

Conversation

rafaqz
Copy link
Contributor

@rafaqz rafaqz commented Feb 3, 2022

This reduces ttfx of CSV.jl by 12 seconds in these benchmarks:

Current timing with main branches of Parsers.jl and CSV.jl:

julia> @time using CSV
  3.086746 seconds (6.63 M allocations: 375.584 MiB, 7.03% gc time, 88.55% compilation time)

julia> const input="""
       time,ping,label
       1,25.7,x
       2,31.8,y
       """
"time,ping,label\n1,25.7,x\n2,31.8,y\n"

julia> io = IOBuffer(input)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=34, maxsize=Inf, ptr=1, mark=-1)

julia> @time file = CSV.File(io)
 17.450644 seconds (48.24 M allocations: 1.982 GiB, 6.00% gc time, 99.98% compilation time)
2-element CSV.File:
 CSV.Row: (time = 1, ping = 25.7, label = "x")
 CSV.Row: (time = 2, ping = 31.8, label = "y")

With this PR and main of CSV.jl:

julia> @time using CSV
  3.027161 seconds (6.59 M allocations: 385.136 MiB, 7.53% gc time, 90.56% compilation time)

julia> const input="""
       time,ping,label
       1,25.7,x
       2,31.8,y
       """
"time,ping,label\n1,25.7,x\n2,31.8,y\n"

julia> io = IOBuffer(input)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=34, maxsize=Inf, ptr=1, mark=-1)

julia> @time file = CSV.File(io)
  5.894205 seconds (12.28 M allocations: 656.183 MiB, 4.75% gc time, 99.94% compilation time)
2-element CSV.File:
 CSV.Row: (time = 1, ping = 25.7, label = "x")
 CSV.Row: (time = 2, ping = 31.8, label = "y")

With the PR at CSV.jl:

julia> @time using CSV
  0.871332 seconds (2.21 M allocations: 136.379 MiB, 5.89% gc time, 78.47% compilation time)

julia> const input="""
       time,ping,label
       1,25.7,x
       2,31.8,y
       """
"time,ping,label\n1,25.7,x\n2,31.8,y\n"

julia> io = IOBuffer(input)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=34, maxsize=Inf, ptr=1, mark=-1)

julia> @time file = CSV.File(io)
  5.757817 seconds (14.37 M allocations: 780.821 MiB, 4.84% gc time, 99.97% compilation time)
2-element CSV.File:
 CSV.Row: (time = 1, ping = 25.7, label = "x")
 CSV.Row: (time = 2, ping = 31.8, label = "y")

@rafaqz rafaqz changed the title add precompile statements for xparse add precompile runs for xparse Feb 4, 2022
@rafaqz
Copy link
Contributor Author

rafaqz commented Feb 5, 2022

Added more precompile for xparse2 and tryparse.

This shaves multiple seconds off TTFX for packages depending on JSON.jl, like Blink.jl, Interact.jl, etc. But precompile in JSON.jl still has an effect although its mostly precompiling Parsers.jl JuliaIO/JSON.jl#337 So there could be more to add here.

Probably this PR should be quite thorough. Parsers.jl has 2200 dependents, so I'm guessing it's probably responsible for a significant fraction of TTFX for the whole julia ecosystem.

We could also rewrite some of the long unstable methods that are causing this compile time. The abstract typed fields of the objects here may contribute.

@rafaqz rafaqz changed the title add precompile runs for xparse add precompile runs for xparse and xparse2 Feb 5, 2022
@rafaqz rafaqz force-pushed the precompile branch 2 times, most recently from 55a50de to ed2540a Compare February 5, 2022 12:40
@codecov
Copy link

codecov bot commented Feb 6, 2022

Codecov Report

Merging #108 (29f9985) into main (8bd7f84) will decrease coverage by 0.79%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #108      +/-   ##
==========================================
- Coverage   87.57%   86.77%   -0.80%     
==========================================
  Files           9        9              
  Lines        2309     2321      +12     
==========================================
- Hits         2022     2014       -8     
- Misses        287      307      +20     
Impacted Files Coverage Δ
src/precompile.jl 11.76% <0.00%> (+11.76%) ⬆️
src/bools.jl 97.61% <0.00%> (-1.59%) ⬇️
src/utils.jl 89.28% <0.00%> (-1.54%) ⬇️
src/floats.jl 91.52% <0.00%> (-0.88%) ⬇️
src/Parsers.jl 93.87% <0.00%> (-0.38%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bd7f84...29f9985. Read the comment docs.

@ufechner7
Copy link

So we need a test that calls the function precompile() to make codecov happy ...

@oscardssmith
Copy link
Contributor

we could also just accept the code coverage regression...

@rafaqz
Copy link
Contributor Author

rafaqz commented Feb 8, 2022

Maybe it will look better if I add it. It is making sure all those types at least run...

Edit: added _precompile_() to the tests

@rafaqz rafaqz force-pushed the precompile branch 2 times, most recently from 5cbe60d to 9595a55 Compare February 8, 2022 13:54
src/precompile.jl Outdated Show resolved Hide resolved
src/precompile.jl Outdated Show resolved Hide resolved
pos = 1
val = "a"
len = length(val)
for T in (Char, String), buf in (codeunits(val), Vector(codeunits(val)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also InlineStrings.jl precompilation be included here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think they could be included here since they're in a separate package (InlineStrings.jl); but we could do similar precompilation over there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency goes the other way. Should we add a similar precompile block over there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually get much improvement with precompilation in InlineStrings.jl. Precompiling for all types only takes half a second there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "Precompiling for all types only takes half a second there"? Precompiling all inlinestring types? I wouldn't be too surprised by that since the heavy lifting would be handled by precompiling String in the Parsers.jl package.

Copy link
Contributor Author

@rafaqz rafaqz Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all the InlineString types. Without String in Parsers.jl precompilation adds under a second to InlineStrings.jl precompilation.

It seems there is something about the numerical methods in Parsers.jl that takes a lot longer.

@rafaqz
Copy link
Contributor Author

rafaqz commented Feb 9, 2022

This should be ready to go. Can we change the actions settings here so that CI runs for anyone except first-time users?

src/precompile.jl Outdated Show resolved Hide resolved
if !(T === String)
Parsers.xparse(T, buf, pos, len, options, T)
end
Parsers.xparse(T, buf, pos, len, options, Any)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this method important to precompile? is it because it is used in CSV (as _parseany)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep for _parseany.

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rafaqz!

@quinnj quinnj merged commit e7fe3e7 into JuliaData:main Feb 11, 2022
@oscardssmith
Copy link
Contributor

I think this might have broken CSV.

julia> using CSV
[ Info: Precompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
ERROR: LoadError: ccall method definition: argument 1 type doesn't correspond to a C type
Stacktrace:
 [1] top-level scope
   @ ~/.julia/packages/SentinelArrays/p1IoM/src/SentinelArrays.jl:209
 [2] include
   @ ./Base.jl:421 [inlined]
 [3] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
   @ Base ./loading.jl:1399
 [4] top-level scope
   @ stdin:1
in expression starting at /home/oscardssmith/.julia/packages/SentinelArrays/p1IoM/src/SentinelArrays.jl:1
in expression starting at stdin:

@giordano
Copy link

@rafaqz rafaqz deleted the precompile branch February 11, 2022 09:36
@rafaqz
Copy link
Contributor Author

rafaqz commented Feb 11, 2022

The ccall isn't actually from this PR, it was there before. Maybe an issue would be better to track this.

pos = 1
val = "123"
len = length(val)
for T in (String, Int32, Int64, Float64, BigFloat, Dates.Date, Dates.DateTime, Bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaqz (and as fyi @nickrobinson251 ), no idea why, but including the Dates.Date and Dates.DateTime precompiles in Parsers.jl is causing JuliaData/CSV.jl#981, i.e. precompiling CSV.jl on windows compleletly fails. Going to revert those two type precompiles for now and we can figure out what to do afterwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you only disable them for Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its pretty weird that that happens. We could also rewrite a few of these methods to reduce compilation overheads, which may fix the problem.

@quinnj quinnj mentioned this pull request Mar 28, 2022
@CarloLucibello
Copy link

It would be great to have this back again, it was a big quality of life improvement

@quinnj
Copy link
Member

quinnj commented Apr 7, 2022

We have to figure out why it causes segfaults on windows with Julia 1.7 first during precompilation.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 7, 2022

We could just rewrite those methods to reduce compilation time and see if that fixes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants