Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sleep(n) and Timer creation more MT-safe #32174

Merged
merged 7 commits into from
Jun 4, 2019

Conversation

jpsamaroo
Copy link
Member

This returns some of the locking removed by 93e3d28 (which @JeffBezanson said was removed because it was probably incorrect), but this time the calls to uv_update_time and uv_timer_start are done within a single lock/unlock of the global libuv lock (instead of two separate lock/unlock pairs).

This still hangs every so often on my system with JULIA_NUM_THREADS=8 when running a slightly longer version of the test laid out in #32152, which I'd like to have this PR fix once I figure out the root cause. I also think this PR should have at least one test to ensure highly-contended calls to sleep or Timer don't cause segfaults or hangs; I'll add a simple test to this PR soon to get something on the table.

(Eventually) fixes #32152

@JeffBezanson
Copy link
Sponsor Member

uv_timer_init needs to be inside the lock too since it inserts into a queue in the event loop object.

@JeffBezanson JeffBezanson added the domain:multithreading Base.Threads and related functionality label May 29, 2019
src/jl_uv.c Outdated
return err;
}

jl_uv_associate_julia_struct((uv_handle_t *)handle, (jl_value_t *)handle);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of these things is not like the other—you can't cast to both libuv and julia

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm getting slightly confused by what's what, since in the following two locations it seems that we're already passing objects which (to my uneducated eyes) seem to end up with casts that are essentially the same as what you pointed out at:

err = ccall(:uv_timer_init, Cint, (Ptr{Cvoid}, Ptr{Cvoid}), eventloop(), this)

julia/base/asyncevent.jl

Lines 102 to 103 in 4b0c8e7

ccall(:uv_timer_start, Cint, (Ptr{Cvoid}, Ptr{Cvoid}, UInt64, UInt64),
this, uv_jl_timercb::Ptr{Cvoid},

Those two libuv calls don't expect a jl_value_t*, even though that's what we're doing there, right? I assume it works because this.handle is the first field in the Timer struct so the pointers are the same anyway?

src/jl_uv.c Outdated
int err = uv_timer_init(loop, handle);
if (err) {
// TODO: this codepath is currently not tested
free(handle);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing JL_UV_UNLOCK on this codepath, also probably better to keep the free call in Julia

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully my latest commit addresses both of these properly 😄

base/asyncevent.jl Outdated Show resolved Hide resolved
src/jl_uv.c Outdated
{
JL_UV_LOCK();
int err = uv_timer_init(loop, uvtimer);
if (err) {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually can't fail, so we can just delete the error checking code (or make it an abort)


jl_uv_associate_julia_struct((uv_handle_t*)uvtimer, jltimer);
uv_update_time(loop);
err = uv_timer_start(uvtimer, cb, timeout, repeat);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, this can't actually fail either, we can just abort if it returns non-zero

@JeffBezanson JeffBezanson reopened this Jun 1, 2019
@JeffBezanson JeffBezanson reopened this Jun 3, 2019
@JeffBezanson JeffBezanson changed the title [WIP] Make sleep(n) and Timer creation more MT-safe Make sleep(n) and Timer creation more MT-safe Jun 3, 2019
@JeffBezanson JeffBezanson merged commit b6f1be5 into JuliaLang:master Jun 4, 2019
@alhirzel
Copy link
Contributor

alhirzel commented Jan 22, 2020

I believe I encountered one of the hanging conditions mentioned at the top of this PR, and wanted to note for others that Libc.systemsleep can be used instead (as mentioned in #14494) when a busysleep is needed, e.g. when emulating a task scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sleep(n) sometimes segfaults in a threaded loop
4 participants