Skip to content

download fails with FDWatcher: bad file descriptor (EBADF) #197

@kleinschmidt

Description

@kleinschmidt

On Julia 1.7.3, I've found that downloads sometimes fail with the following error:

UNHANDLED TASK ERROR: IOError: FDWatcher: bad file descriptor (EBADF)
Stacktrace:
[1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
@ Base ./task.jl:812
[2] wait()
@ Base ./task.jl:872
[3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
@ Base ./condition.jl:123
[4] wait(fdw::FileWatching._FDWatcher; readable::Bool, writable::Bool)
@ FileWatching /usr/local/julia/share/julia/stdlib/v1.7/FileWatching/src/FileWatching.jl:533
[5] wait
@ /usr/local/julia/share/julia/stdlib/v1.7/FileWatching/src/FileWatching.jl:504 [inlined]
[6] macro expansion
@ /usr/local/julia/share/julia/stdlib/v1.7/Downloads/src/Curl/Multi.jl:166 [inlined]

The line this points to in the Downloads.jl source is

events = try wait(watcher)

"Sometimes" here means "after millions of S3 requests in the span of multiple days of runtime with retry around the actual request-making code". (retry using the default settings, so with the default ExponentialBackOff schedule with a single retry). When this error occurred, it occurred multiple times, on multiple different pods (which by design are accessing different s3 URIs but still in the same region), so I'm wondering if it is somehow related to the "connection pool corruption" issue w/ AWS.jl. Another possibly relevant bit of context is that the code that actually is making the requests is actually doing an asyncmap over dozens (<100) of small s3 GET requests.

I'm afraid this happened in a long-running job that I can't interact with directly and don't have a reprex that I can share, but wanted to open an issue in case someone else has seen this or has advice on how to debug or what other information would be useful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions