Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings during precompilation on v1.10-beta1: waiting for IO to finish #50873

Closed
nickrobinson251 opened this issue Aug 10, 2023 · 27 comments
Closed
Labels
compiler:precompilation Precompilation of modules domain:error messages Better, more actionable error messages

Comments

@nickrobinson251
Copy link
Contributor

When precompiling a large private package, on v1.10-beta1, i see these warnings:

FilePathsBase Waiting for background task / IO / timer.
[pid 18736] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x600002cbed00->0x10da90ee0
Mocking Waiting for background task / IO / timer.
[pid 18734] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x600000c43c00->0x10d71c1f0

Precompilation continues and completes successfully, but i do not know how to fix these warning...

  • Is this pointing to some issue within Julia (or Pkg)? Or is it an issue in these packages?
  • If this is pointing to an issue in these packages, please could we have a clearer warning message? (i.e. how do i go fix them?)
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 10, 2023

It is not a warning, just an informational statement that the package is using sleep in some place that is slowing down precompilation. It may not be easy to find the offending package, since Julia doesn't really know why the package called sleep, just that it is now stuck waiting for it to complete before it can get back to doing useful work.

@nickrobinson251
Copy link
Contributor Author

thanks for the explanation!

So perhaps more of a user-experience question... why are we showing this to every user if it's not problematic or (realistically) actionable? Right now it's a kinda gnarly/scary looking thing

Could we just not log this info? or at least make it opt-in? or dependent on how long we've been waiting? or...?

@timholy
Copy link
Sponsor Member

timholy commented Aug 10, 2023

The same warning can be issued for precompilation job that will block forever unless fixed: comonicon/Comonicon.jl#249

@brenhinkeller brenhinkeller added compiler:precompilation Precompilation of modules domain:error messages Better, more actionable error messages labels Aug 10, 2023
@brenhinkeller
Copy link
Sponsor Contributor

brenhinkeller commented Aug 10, 2023

FWIW, it gets called a warning on-screen -- also seems to be associated with this glitch of the progress bar:

Screenshot 2023-08-10 at 12 47 24 PM

@ViralBShah
Copy link
Member

I have seen that warning on a few occasions, and I was wondering that unless the system wants me to do something about it, why show it?

@timholy
Copy link
Sponsor Member

timholy commented Aug 11, 2023

In some cases things are only being held open transiently. Could we delay showing the warning a bit longer?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 13, 2023

That they are held open at all is usually at least a small mistake, since the program should be closing them explicitly when done with them.

@sloede
Copy link
Contributor

sloede commented Aug 18, 2023

I'm getting these warnings on a regular basis as well, including with Julia v1.10.0-beta2. E.g., from pkg> add Trixi OrdinaryDiffEq, I get

Precompiling project...
  194 dependencies successfully precompiled in 92 seconds. 10 already precompiled.
  3 dependencies had warnings during precompilation:
┌ SummationByPartsOperatorsDiffEqCallbacksExt [82d7d246-636b-5c42-a4f7-7a8a453087a7]
│  [pid 720185] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x107c040->0x7f8e2e0e9060
└
┌ LinearMapsStatisticsExt [364b1850-867c-5a65-9f2f-642d093ba2f6]
│  [pid 704422] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x261b550->0x7f6d92ae8f10
└
┌ ExponentialUtilities [d4d017d3-3776-5f7e-afef-a10c40355c18]
│  [pid 712737] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x955b000->0x7f9e61c56230
└

Is there a problem with Trixi.jl or OrdinaryDiffEq.j? Can the user do something about it?

  • If yes, could we improve the messages such that they are more helpful?
  • If not, could we just not show them anymore (they look scary!)?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 18, 2023

Yes, it means one of their dependencies is using a Timer to handle some sort of event, and that is quite possibly some sort of bug in those packages (e.g. a datarace somewhere) that may need to be fixed.

@sloede
Copy link
Contributor

sloede commented Aug 19, 2023

OK, thanks. Could you please clarify: Which package needs fixing? The one reported, the one that was originally installed, or one of the dependencies of the reported packages?

@timholy
Copy link
Sponsor Member

timholy commented Aug 19, 2023

This is far more difficult to debug now than it should be, but see #50914 (which contains both documentation on how to fix now as well as a speculative change that might give us tools to make it somewhat easier).

@sloede
Copy link
Contributor

sloede commented Aug 21, 2023

Thanks for the hint!

I'd still argue that before v1.10 is released, it would be great if the warning message were downgraded to an info or made less scary looking, and would be more helpful on what to do/expect as a user.

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Clearly we have to do something to make it better before releasing 1.10. But other than pointing to long documentation, no one seems to have any ideas about how to make this substantially easier to deal with. If you have any ideas, please let us know.

@nickrobinson251
Copy link
Contributor Author

I agree it's not at all clear what to do here...

Please can we add this issue to the v1.10 milestone so we're sure to do something ?

@nickrobinson251
Copy link
Contributor Author

In the absence of a good positive suggestion... How bad do we think it would be to just not show this info at all (til we have a plan to make it useful)?

Are we ruling that out because it could be critical info in cases where precompile hangs?
If so, how difficult would it be to require a flag be passed for this info to be shown (so it's still recoverable when debugging hangs)?

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

Please can we add this issue to the v1.10 milestone so we're sure to do something ?

Sorry, I should have re-posted a link to #50914 (I knew @sloede was following both discussions, and I had linked that PR earlier in the discussion).

How bad do we think it would be to just not show this info at all (til we have a plan to make it useful)?...Are we ruling that out because it could be critical info in cases where precompile hangs?

Yeah, while it shows "spuriously" for some packages (due to transient blockage), for others it would block forever, and doing that silently would be bad.

I do think, though, we should put a little more effort into figuring out whether the warning on transient blockage can be addressed somehow. @vtjnash, from #50873 (comment) are you saying that even a bit of "cleanup spillover" is a bug? I do have the sense that this warning is being issued for a number of packages that don't "hang" but for which perhaps task cleanup hasn't quite completed by the time the module closes.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 22, 2023

If task cleanup didn't happen, then it means the project didn't remember to wait or close some resource properly. That is not always a bug. It is more often likely to be a bug if it occurs during precompile, since that should be a fairly simple case. Relatedly, this is also the premise of #33248, which would propose making that an error for all objects at all times (which is partly why I am not a huge proponent of that proposal however).

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

For a concrete case, my (sometimes faulty) memory suggests I've seen that warning, issued once, despite https://github.com/JuliaGizmos/GtkObservables.jl/blob/af20a698aceb4ae0bdc9729bddc56c2dcc6339fe/src/precompile.jl#L168-L170. But I couldn't reproduce that behavior in two tries I just gave this (that said, GtkObservables did just migrate to Gtk4, and it's possible that change improved behavior).

In any event, I interpret the text in the OP here as suggesting that all these packages precompiled successfully, but that the warning was issued (once) regardless. Without knowing how careful those packages are being about closing resources, should we interpret any case in which the warning is issued once but then resolves itself as possibly an over-eager warning system? Specifically, should we increase the delay at

uv_timer_start(&wait_empty_worker, wait_empty_func, 10, 15000);
to something larger than 10ms?

timholy added a commit that referenced this issue Aug 22, 2023
This increases the delay slightly before checking whether all I/O
tasks have finished. This may reduce the number of "spurious" warnings
during precompilation.

Fixes #50873
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 22, 2023

10ms is the start of the interval for running GC after the program exited and began sleeping, so it should be short

@timholy
Copy link
Sponsor Member

timholy commented Aug 22, 2023

xref #51003

@Seelengrab
Copy link
Contributor

This has come up on discourse: https://discourse.julialang.org/t/waiting-for-io-to-finish-1-10-beta2/103240

Since a user of any of these packages can't realistically do anything about them, how about only showing that in CI? Or at least drastically increasing the amount of time until the warning is shown, with an additional note that this is something the package in question needs to fix, not the end user who installs the package?

@KronosTheLate
Copy link
Contributor

KronosTheLate commented Oct 2, 2023

I have also run into this a number of times on 1.10. Below is a screenshot of an offender:
image

Should I open an issue with the relevant package? For completeness, the precompilation stayed on this step for a few minutes, which feels like longer than it should take.

The full printed error when precompilation finnished:
1 dependency had warnings during precompilation:
┌ LanguageServer [2b0e0bc5-e4fd-59b4-8912-456d1b03d8d7]
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[11]           @0x314a550->0x7fa4fc91a680
│   process[4022]      @0x29ea2a0->0x7fa4fc8f20e0
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x6fe1750->0x7fa4fd843f40
│   pipe[11]           @0x492b940->0x7fa5287a6a40
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x6fe1750->0x7fa4fd843f40
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
│  
│  [pid 4011] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   pipe[19]           @0x28a6120->0x7fa50b6535d0
│   pipe[16]           @0x4a92b70->0x7fa50b6538f0
│   process[4241]      @0x703d9a0->0x7fa525860a90
└  

@baggepinnen
Copy link
Contributor

A package which precompiles fine on julia v1.9 gets stuck in what seems like an infinite loop of

[pid 60775] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x1c4ebd0->0x7eff20da2da0

on Julia v1.10 beta 3. The screenshot below was taken after about 5 minutes of precompilation, while on julia v1.9 the whole thing is done in seconds
image

If I terminate the precompilation with ctrl-C, I get

^C Interrupted: Exiting precompilation...===========>]  150/151

[60775] signal (2): Interrupt
in expression starting at none:0
epoll_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:236
uv_run at /workspace/srcdir/libuv/src/unix/core.c:400
ijl_task_get_next at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/partr.c:470

  3 dependencies had output during precompilation:
┌ GeometryBasics
│  WARNING: method definition for centered at /home/fredrikb/.julia/packages/GeometryBasics/pjtu1/src/primitives/rectangles.jl:515 declares type variable N but does not use it.
│  WARNING: method definition for volume at /home/fredrikb/.julia/packages/GeometryBasics/pjtu1/src/meshes.jl:209 declares type variable FT but does not use it.
│  WARNING: method definition for volume at /home/fredrikb/.julia/packages/GeometryBasics/pjtu1/src/meshes.jl:209 declares type variable VT but does not use it.
│  WARNING: method definition for volume at /home/fredrikb/.julia/packages/GeometryBasics/pjtu1/src/meshes.jl:221 declares type variable FT but does not use it.
│  WARNING: method definition for volume at /home/fredrikb/.julia/packages/GeometryBasics/pjtu1/src/meshes.jl:221 declares type variable VT but does not use it.
└  
┌ FilePathsBase
│  [pid 56893] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x26ff250->0x7fbfcb594790
└  
┌ HomeAssistantBot
│  [Output was shown above]
└  

(HomeAssistantBot) pkg> poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
_trywait at ./asyncevent.jl:138
wait at ./asyncevent.jl:155 [inlined]
sleep at ./asyncevent.jl:240
jfptr_sleep_75641.1 at /home/fredrikb/.julia/juliaup/julia-1.10.0-beta3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/gf.c:2892 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/gf.c:3074
scheduler at /home/fredrikb/.julia/packages/JobSchedulers/7Gjsg/src/scheduler.jl:307
#45 at /home/fredrikb/.julia/packages/JobSchedulers/7Gjsg/src/control.jl:7
unknown function (ip: 0x7eff56399642)
_jl_invoke at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/gf.c:2892 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/gf.c:3074
jl_apply at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/julia.h:1976 [inlined]
start_task at /cache/build/default-amdci5-7/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))
Allocations: 12267301 (Pool: 12255140; Big: 12161); GC: 42

Since the package works fine on julia v1.9, I assume that something has broken in the precompilation in v1.10. Is there some debug info I can gather that can help figuring the problem out?

@IanButterworth
Copy link
Sponsor Member

Can you see if this happens on master?

@baggepinnen
Copy link
Contributor

@IanButterworth
It happens also on a fresh

Version 1.11.0-DEV.650 (2023-10-14)
Commit b050af1e9ea (0 days old master)

julia> versioninfo()
Julia Version 1.11.0-DEV.650
Commit b050af1e9ea (2023-10-14 11:24 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
  Threads: 17 on 24 virtual cores
Environment:
  JULIA_PKG_PRECOMPILE_AUTO = 1
  JULIA_NUM_PRECOMPILE_TASKS = 22
  JULIA_PKG_USE_CLI_GIT = true

@IanButterworth
Copy link
Sponsor Member

Thanks. The issue is most likely real then. Typically it will be because a dependency has an __init__() that starts a timer but doesn't close it.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 25, 2023

Fixed by #51847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules domain:error messages Better, more actionable error messages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants