New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ck_ec: event count with OS-assisted blocking #133

Merged
merged 1 commit into from Dec 3, 2018

Conversation

Projects
None yet
2 participants
@pkhuong
Copy link
Contributor

pkhuong commented Oct 28, 2018

I finally convinced myself that this thing works, and will always do so reliably.

It's x86-TSO only, so I had to introduce some weirdness in the makefiles.

TODO: update docs for new vtable
TODO: test the timeval internal utils
TODO: make the regression tests work on x86
TODONT: final style check
TODONT: man page
TODO: benchmarks
TODO: fix build on !linux x86oids, done, except for validate.

To discuss: how do we feel about the type generic macros.

I'm gonna leave the man pages for later ;)

@pkhuong pkhuong force-pushed the ck_ec branch from 67a69e7 to 65a3bae Oct 28, 2018

@pkhuong pkhuong requested a review from sbahra Oct 28, 2018

@pkhuong pkhuong force-pushed the ck_ec branch 7 times, most recently from 5a269a7 to 1352f8a Oct 28, 2018

@pkhuong

This comment has been minimized.

Copy link
Contributor

pkhuong commented Oct 29, 2018

pkhuong@2delilah ~/ck/r/c/benchmark ♪ [ pwd && make check ] ± ck_ec@10ce30… ⌚ 18:20:05
/home/pkhuong/ck/regressions/ck_ec/benchmark
./ck_ec 24 1
SP ec32
SP ec32_value: 3
SP ec32_wait fast: 4
SP ec32_wait timeout: 21
SP ec32_inc: 8
SP ec32_inc slow: 710
SP ec32_add: 8
SP ec32_add slow: 705

MP ec32
MP ec32_value: 2
MP ec32_wait fast: 4
MP ec32_wait timeout: 20
MP ec32_inc: 32
MP ec32_inc slow: 730
MP ec32_add: 32
MP ec32_add slow: 725

SP ec64
SP ec64_value: 3
SP ec64_wait fast: 4
SP ec64_wait timeout: 20
SP ec64_inc: 8
SP ec64_inc slow: 710
SP ec64_add: 8
SP ec64_add slow: 699

MP ec64
MP ec64_value: 1
MP ec64_wait fast: 4
MP ec64_wait timeout: 21
MP ec64_inc: 31
MP ec64_inc slow: 725
MP ec64_add: 32
MP ec64_add slow: 742
1:21.75s total, 75.03s user, 6.70s system, 99% cpu

@pkhuong pkhuong force-pushed the ck_ec branch 2 times, most recently from ab6db1e to abb9030 Oct 29, 2018

@pkhuong pkhuong changed the title ck_ec: futex-backed event count for linux/x86{,_64} ck_ec: event count with OS-assisted blocking Oct 30, 2018

@pkhuong pkhuong force-pushed the ck_ec branch 4 times, most recently from 22ab421 to 8bdc2ea Oct 30, 2018

@sbahra sbahra requested a review from cognet Oct 30, 2018

@pkhuong pkhuong force-pushed the ck_ec branch from d21374f to 0f5a248 Nov 4, 2018

@pkhuong

This comment has been minimized.

Copy link
Contributor

pkhuong commented Nov 4, 2018

Packed the predicate's argument list in a struct for more ABI stability.

@pkhuong pkhuong force-pushed the ck_ec branch 4 times, most recently from 90d1cf5 to 62d393d Nov 4, 2018

@sbahra

This comment has been minimized.

Copy link
Contributor

sbahra commented Nov 7, 2018

@pkhuong This is great work! Look forward to merging it in. I see a few things building:

prop_test_wakeup.c:17:12: warning: initialization of ‘void (*)(const struct ck_ec_wait_state *, const uint32_t *, uint32_t,  const struct timespec *)’ {aka ‘void (*)(const struct ck_ec_wait_state *, const unsigned int *, unsigned int,  const struct timespec *)’} from incompatible pointer type ‘void (*)(const struct ck_ec_ops *, const uint32_t *, uint32_t,  const struct timespec *)’ {aka ‘void (*)(const struct ck_ec_ops *, const unsigned int *, unsigned int,  const struct timespec *)’} [-Wincompatible-pointer-types]
prop_test_wakeup.c:17:12: note: (near initialization for ‘test_ops.wait32’)
prop_test_wakeup.c:18:12: warning: initialization of ‘void (*)(const struct ck_ec_wait_state *, const uint64_t *, uint64_t,  const struct timespec *)’ {aka ‘void (*)(const struct ck_ec_wait_state *, const long unsigned int *, long unsigned int,  const struct timespec *)’} from incompatible pointer type ‘void (*)(const struct ck_ec_ops *, const uint64_t *, uint64_t,  const struct timespec *)’ {aka ‘void (*)(const struct ck_ec_ops *, const long unsigned int *, long unsigned int,  const struct timespec *)’} [-Wincompatible-pointer-types]
  .wait64 = wait64

Less important, but putting on radar just in case it matters:

../../../include/ck_ec.h:815:23: warning: inlining failed in call to ‘ck_ec64_add.isra.6’: call is unlikely and code size would grow [-Winline]
 CK_CC_INLINE uint64_t ck_ec64_add(struct ck_ec64 *ec```

@pkhuong pkhuong force-pushed the ck_ec branch from 62d393d to dab40f3 Nov 9, 2018

@pkhuong

This comment has been minimized.

Copy link
Contributor

pkhuong commented Nov 9, 2018

Fixed the type mismatch. Tried to get inlining smarter wrt dispatching, but I think we want to let the compiler keep ck_ec_add out of line when it makes sense.

@pkhuong pkhuong force-pushed the ck_ec branch from dab40f3 to fe9f41a Nov 9, 2018

@pkhuong

This comment has been minimized.

Copy link
Contributor

pkhuong commented Nov 13, 2018

  • Assign copyright to Google LLC
@sbahra

This comment has been minimized.

Copy link
Contributor

sbahra commented Nov 29, 2018

@pkhuong - Any other follow-up commits or should I start process of merging down?

ck_ec: event count with optimistic OS-level blocking
ck_ec implements 32- and (on 64 bit platforms) 64- bit event
counts. Event counts let us easily integrate OS-level blocking (e.g.,
futexes) in lock-free protocols. Waking up waiters only locks in the
OS kernel, and does not happen at all when no waiter is blocked.

Waiters only block conditionally, if the event count's value is
still equal to some prior value.

ck_ec supports multiple producers (wakers) and consumers (waiters),
and, on x86-TSO, has a more efficient specialisation for single
producer mode. In the latter mode, the overhead compared to a version
counter is on the order of 2-3 cycles and 1-2 instructions, in the
fast path. The slow path, when there are threads blocked on the event
count, consists of one additional atomic instruction and a futex
syscall.

Similarly, the fast path for consumers, when an update comes quickly,
has no overhead compared to spinning on a read-only counter. After
a few thousand cycles, consumers (waiters) enter the slow path with
one atomic instruction and a few blocking syscalls.

The single-producer specialisation requires the x86-TSO memory model,
x86's non-atomic read-modify-write instructions, and, ideally a
futex-like OS abstraction. On !x86/x86_64 platforms, single producer
increments fall back to the multiple producer code path.

Fixes #79

@pkhuong pkhuong force-pushed the ck_ec branch from fe9f41a to 2b754ac Dec 1, 2018

@sbahra

This comment has been minimized.

Copy link
Contributor

sbahra commented Dec 3, 2018

Merging in, I expect additional iteration in master, treating as experimental. Thanks Paul!

@sbahra sbahra merged commit a16642f into master Dec 3, 2018

2 of 3 checks passed

continuous-integration/travis-ci/push The Travis CI build could not complete due to an error
Details
continuous-integration/tea the build was successful
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment