Skip to content

Commit

Permalink
MFC r329276,r329451,r330294,r330414,r330415,r330418,r331109,r332394,r…
Browse files Browse the repository at this point in the history
…332398,

    r333831:

    rwlock: diff-reduction of runlock compared to sx sunlock

==

    Undo LOCK_PROFILING pessimisation after r313454 and r313455

    With the option used to compile the kernel both sx and rw shared ops would
    always go to the slow path which added avoidable overhead even when the
    facility is disabled.

    Furthermore the increased time spent doing uncontested shared lock acquire
    would be bogusly added to total wait time, somewhat skewing the results.

    Restore old behaviour of going there only when profiling is enabled.

    This change is a no-op for kernels without LOCK_PROFILING (which is the
    default).

==

    sx: fix adaptive spinning broken in r327397

    The condition was flipped.

    In particular heavy multithreaded kernel builds on zfs started suffering
    due to nested sx locks.

    For instance make -s -j 128 buildkernel:

    before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
    after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total

==

    locks: fix a corner case in r327399

    If there were exactly rowner_retries/asx_retries (by default: 10) transitions
    between read and write state and the waiters still did not get the lock, the
    next owner -> reader transition would result in the code correctly falling
    back to turnstile/sleepq where it would incorrectly think it was waiting
    for a writer and decide to leave turnstile/sleepq to loop back. From this
    point it would take ts/sq trips until the lock gets released.

    The bug sometimes manifested itself in stalls during -j 128 package builds.

    Refactor the code to fix the bug, while here remove some of the gratituous
    differences between rw and sx locks.

==

    sx: don't do an atomic op in upgrade if it cananot succeed

    The code already pays the cost of reading the lock to obtain the waiters
    flag. Checking whether there is more than one reader is not a problem and
    avoids dirtying the line.

    This also fixes a small corner case: if waiters were to show up between
    reading the flag and upgrading the lock, the operation would fail even
    though it should not. No correctness change here though.

==

    mtx: tidy up recursion handling in thread lock

    Normally after grabbing the lock it has to be verified we got the right one
    to begin with. However, if we are recursing, it must not change thus the
    check can be avoided. In particular this avoids a lock read for non-recursing
    case which found out the lock was changed.

    While here avoid an irq trip of this happens.

==

    locks: slightly depessimize lockstat

    The slow path is always taken when lockstat is enabled. This induces
    rdtsc (or other) calls to get the cycle count even when there was no
    contention.

    Still go to the slow path to not mess with the fast path, but avoid
    the heavy lifting unless necessary.

    This reduces sys and real time during -j 80 buildkernel:
    before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
    after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
    disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total

    So note this is still a significant hit.

    LOCK_PROFILING results are not affected.

==

    rw: whack avoidable re-reads in try_upgrade

==

    locks: extend speculative spin waiting for readers to drain

    Now that 10 years have passed since the original limit of 10000 was
    committed, bump it a little bit.

    Spinning waiting for writers is semi-informed in the sense that we always
    know if the owner is running and base the decision to spin on that.
    However, no such information is provided for read-locking. In particular
    this means that it is possible for a write-spinner to completely waste cpu
    time waiting for the lock to be released, while the reader holding it was
    preempted and is now waiting for the spinner to go off cpu.

    Nonetheless, in majority of cases it is an improvement to spin instead of
    instantly giving up and going to sleep.

    The current approach is pretty simple: snatch the number of current readers
    and performs that many pauses before checking again. The total number of
    pauses to execute is limited to 10k. If the lock is still not free by
    that time, go to sleep.

    Given the previously noted problem of not knowing whether spinning makes
    any sense to begin with the new limit has to remain rather conservative.
    But at the very least it should also be related to the machine. Waiting
    for writers uses parameters selected based on the number of activated
    hardware threads. The upper limit of pause instructions to be executed
    in-between re-reads of the lock is typically 16384 or 32678. It was
    selected as the limit of total spins. The lower bound is set to
    already present 10000 as to not change it for smaller machines.

    Bumping the limit reduces system time by few % during benchmarks like
    buildworld, buildkernel and others. Tested on 2 and 4 socket machines
    (Broadwell, Skylake).

    Figuring out how to make a more informed decision while not pessimizing
    the fast path is left as an exercise for the reader.

==

    fix uninitialized variable warning in reader locks

Approved by:	re (marius)
  • Loading branch information
mjguzik committed May 31, 2018
1 parent 2330477 commit 8b9af5c
Show file tree
Hide file tree
Showing 4 changed files with 235 additions and 167 deletions.
79 changes: 49 additions & 30 deletions sys/kern/kern_mutex.c
Expand Up @@ -486,8 +486,25 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
#if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
int doing_lockprof;
#endif

td = curthread;
tid = (uintptr_t)td;
m = mtxlock2mtx(c);

#ifdef KDTRACE_HOOKS
if (LOCKSTAT_PROFILE_ENABLED(adaptive__acquire)) {
while (v == MTX_UNOWNED) {
if (_mtx_obtain_lock_fetch(m, &v, tid))
goto out_lockstat;
}
doing_lockprof = 1;
all_time -= lockstat_nsecs(&m->lock_object);
}
#endif
#ifdef LOCK_PROFILING
doing_lockprof = 1;
#endif

if (SCHEDULER_STOPPED_TD(td))
return;

Expand All @@ -496,7 +513,7 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
#elif defined(KDTRACE_HOOKS)
lock_delay_arg_init(&lda, NULL);
#endif
m = mtxlock2mtx(c);

if (__predict_false(v == MTX_UNOWNED))
v = MTX_READ_VALUE(m);

Expand Down Expand Up @@ -527,13 +544,6 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
CTR4(KTR_LOCK,
"_mtx_lock_sleep: %s contested (lock=%p) at %s:%d",
m->lock_object.lo_name, (void *)m->mtx_lock, file, line);
#ifdef LOCK_PROFILING
doing_lockprof = 1;
#elif defined(KDTRACE_HOOKS)
doing_lockprof = lockstat_enabled;
if (__predict_false(doing_lockprof))
all_time -= lockstat_nsecs(&m->lock_object);
#endif

for (;;) {
if (v == MTX_UNOWNED) {
Expand Down Expand Up @@ -637,10 +647,6 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
#endif
#ifdef KDTRACE_HOOKS
all_time += lockstat_nsecs(&m->lock_object);
#endif
LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, m, contested,
waittime, file, line);
#ifdef KDTRACE_HOOKS
if (sleep_time)
LOCKSTAT_RECORD1(adaptive__block, m, sleep_time);

Expand All @@ -649,7 +655,10 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
*/
if (lda.spin_cnt > sleep_cnt)
LOCKSTAT_RECORD1(adaptive__spin, m, all_time - sleep_time);
out_lockstat:
#endif
LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, m, contested,
waittime, file, line);
}

#ifdef SMP
Expand Down Expand Up @@ -685,6 +694,20 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t v)
tid = (uintptr_t)curthread;
m = mtxlock2mtx(c);

#ifdef KDTRACE_HOOKS
if (LOCKSTAT_PROFILE_ENABLED(adaptive__acquire)) {
while (v == MTX_UNOWNED) {
if (_mtx_obtain_lock_fetch(m, &v, tid))
goto out_lockstat;
}
doing_lockprof = 1;
spin_time -= lockstat_nsecs(&m->lock_object);
}
#endif
#ifdef LOCK_PROFILING
doing_lockprof = 1;
#endif

if (__predict_false(v == MTX_UNOWNED))
v = MTX_READ_VALUE(m);

Expand All @@ -707,13 +730,7 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t v)
PMC_SOFT_CALL( , , lock, failed);
#endif
lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime);
#ifdef LOCK_PROFILING
doing_lockprof = 1;
#elif defined(KDTRACE_HOOKS)
doing_lockprof = lockstat_enabled;
if (__predict_false(doing_lockprof))
spin_time -= lockstat_nsecs(&m->lock_object);
#endif

for (;;) {
if (v == MTX_UNOWNED) {
if (_mtx_obtain_lock_fetch(m, &v, tid))
Expand Down Expand Up @@ -744,13 +761,12 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t v)
#endif
#ifdef KDTRACE_HOOKS
spin_time += lockstat_nsecs(&m->lock_object);
#endif
LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, m,
contested, waittime, file, line);
#ifdef KDTRACE_HOOKS
if (lda.spin_cnt != 0)
LOCKSTAT_RECORD1(spin__spin, m, spin_time);
out_lockstat:
#endif
LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, m,
contested, waittime, file, line);
}
#endif /* SMP */

Expand Down Expand Up @@ -806,10 +822,8 @@ _thread_lock(struct thread *td)
WITNESS_LOCK(&m->lock_object, LOP_EXCLUSIVE, file, line);
return;
}
if (m->mtx_recurse != 0)
m->mtx_recurse--;
else
_mtx_release_lock_quick(m);
MPASS(m->mtx_recurse == 0);
_mtx_release_lock_quick(m);
slowpath_unlocked:
spinlock_exit();
slowpath_noirq:
Expand Down Expand Up @@ -863,9 +877,10 @@ thread_lock_flags_(struct thread *td, int opts, const char *file, int line)
if (__predict_false(doing_lockprof))
spin_time -= lockstat_nsecs(&td->td_lock->lock_object);
#endif
spinlock_enter();

for (;;) {
retry:
spinlock_enter();
m = td->td_lock;
thread_lock_validate(m, opts, file, line);
v = MTX_READ_VALUE(m);
Expand All @@ -877,6 +892,7 @@ thread_lock_flags_(struct thread *td, int opts, const char *file, int line)
}
if (v == tid) {
m->mtx_recurse++;
MPASS(m == td->td_lock);
break;
}
lock_profile_obtain_lock_failed(&m->lock_object,
Expand All @@ -889,15 +905,18 @@ thread_lock_flags_(struct thread *td, int opts, const char *file, int line)
} else {
_mtx_lock_indefinite_check(m, &lda);
}
if (m != td->td_lock)
if (m != td->td_lock) {
spinlock_enter();
goto retry;
}
v = MTX_READ_VALUE(m);
} while (v != MTX_UNOWNED);
spinlock_enter();
}
if (m == td->td_lock)
break;
__mtx_unlock_spin(m); /* does spinlock_exit() */
MPASS(m->mtx_recurse == 0);
_mtx_release_lock_quick(m);
}
LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file,
line);
Expand Down

0 comments on commit 8b9af5c

Please sign in to comment.