Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sleep(n) and Timer creation more MT-safe #32174

Merged
merged 7 commits into from Jun 4, 2019

Conversation

@jpsamaroo
Copy link
Contributor

jpsamaroo commented May 28, 2019

This returns some of the locking removed by 93e3d28 (which @JeffBezanson said was removed because it was probably incorrect), but this time the calls to uv_update_time and uv_timer_start are done within a single lock/unlock of the global libuv lock (instead of two separate lock/unlock pairs).

This still hangs every so often on my system with JULIA_NUM_THREADS=8 when running a slightly longer version of the test laid out in #32152, which I'd like to have this PR fix once I figure out the root cause. I also think this PR should have at least one test to ensure highly-contended calls to sleep or Timer don't cause segfaults or hangs; I'll add a simple test to this PR soon to get something on the table.

(Eventually) fixes #32152

@JeffBezanson

This comment has been minimized.

Copy link
Member

JeffBezanson commented May 29, 2019

uv_timer_init needs to be inside the lock too since it inserts into a queue in the event loop object.

src/jl_uv.c Outdated
return err;
}

jl_uv_associate_julia_struct((uv_handle_t *)handle, (jl_value_t *)handle);

This comment has been minimized.

Copy link
@vtjnash

vtjnash May 29, 2019

Member

one of these things is not like the other—you can't cast to both libuv and julia

This comment has been minimized.

Copy link
@jpsamaroo

jpsamaroo May 29, 2019

Author Contributor

Sorry, I'm getting slightly confused by what's what, since in the following two locations it seems that we're already passing objects which (to my uneducated eyes) seem to end up with casts that are essentially the same as what you pointed out at:

err = ccall(:uv_timer_init, Cint, (Ptr{Cvoid}, Ptr{Cvoid}), eventloop(), this)

julia/base/asyncevent.jl

Lines 102 to 103 in 4b0c8e7

ccall(:uv_timer_start, Cint, (Ptr{Cvoid}, Ptr{Cvoid}, UInt64, UInt64),
this, uv_jl_timercb::Ptr{Cvoid},

Those two libuv calls don't expect a jl_value_t*, even though that's what we're doing there, right? I assume it works because this.handle is the first field in the Timer struct so the pointers are the same anyway?

src/jl_uv.c Outdated
int err = uv_timer_init(loop, handle);
if (err) {
// TODO: this codepath is currently not tested
free(handle);

This comment has been minimized.

Copy link
@vtjnash

vtjnash May 29, 2019

Member

missing JL_UV_UNLOCK on this codepath, also probably better to keep the free call in Julia

This comment has been minimized.

Copy link
@jpsamaroo

jpsamaroo May 29, 2019

Author Contributor

Hopefully my latest commit addresses both of these properly 😄

base/asyncevent.jl Outdated Show resolved Hide resolved
src/jl_uv.c Outdated
{
JL_UV_LOCK();
int err = uv_timer_init(loop, uvtimer);
if (err) {

This comment has been minimized.

Copy link
@vtjnash

vtjnash May 31, 2019

Member

this actually can't fail, so we can just delete the error checking code (or make it an abort)


jl_uv_associate_julia_struct((uv_handle_t*)uvtimer, jltimer);
uv_update_time(loop);
err = uv_timer_start(uvtimer, cb, timeout, repeat);

This comment has been minimized.

Copy link
@vtjnash

vtjnash May 31, 2019

Member

similarly, this can't actually fail either, we can just abort if it returns non-zero

@JeffBezanson JeffBezanson reopened this Jun 1, 2019
@JeffBezanson JeffBezanson reopened this Jun 3, 2019
@JeffBezanson JeffBezanson changed the title [WIP] Make sleep(n) and Timer creation more MT-safe Make sleep(n) and Timer creation more MT-safe Jun 3, 2019
@JeffBezanson JeffBezanson merged commit b6f1be5 into JuliaLang:master Jun 4, 2019
10 of 12 checks passed
10 of 12 checks passed
buildbot/analyzegc_linux64 Run complete
Details
buildbot/package_macos64 Run complete
Details
buildbot/package_linux32 Run complete
Details
buildbot/package_linux64 Run complete
Details
buildbot/package_win32 Run complete
Details
buildbot/package_win64 Run complete
Details
buildbot/tester_linux32 Run complete
Details
buildbot/tester_linux64 Run complete
Details
buildbot/tester_win32 Run complete
Details
buildbot/tester_win64 Run complete
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@alhirzel

This comment has been minimized.

Copy link

alhirzel commented Jan 22, 2020

I believe I encountered one of the hanging conditions mentioned at the top of this PR, and wanted to note for others that Libc.systemsleep can be used instead (as mentioned in #14494) when a busysleep is needed, e.g. when emulating a task scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.