Skip to content

Commit b35c591

Browse files
author
Jan Lindström
committed
MDEV-6376: InnoDB: Assertion failure in thread 139995225970432
in file buf0mtflu.cc line 570. Analysis: Real timing bug, we should take the mutex before we try to send those shutdown messages, that would make sure that threads doing a unfinished flush (they have acquired this mutex) have time to do their work before we add shutdown messages to work queue. Currently, we just add those shutdown messages to work queue and code assumes that at flush, there is constant number of items to be processed and thus leading to assertion.
1 parent 36e86ba commit b35c591

File tree

2 files changed

+31
-0
lines changed

2 files changed

+31
-0
lines changed

storage/innobase/buf/buf0mtflu.cc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,20 @@ buf_mtflu_io_thread_exit(void)
378378
fprintf(stderr, "InnoDB: [Note]: Signal mtflush_io_threads to exit [%lu]\n",
379379
srv_mtflush_threads);
380380

381+
/* This lock is to safequard against timing bug: flush request take
382+
this mutex before sending work items to be processed by flush
383+
threads. Inside flush thread we assume that work queue contains only
384+
a constant number of items. Thus, we may not install new work items
385+
below before all previous ones are processed. This mutex is released
386+
by flush request after all work items sent to flush threads have
387+
been processed. Thus, we can get this mutex if and only if work
388+
queue is empty. */
389+
390+
os_fast_mutex_lock(&mtflush_mtx);
391+
392+
/* Make sure the work queue is empty */
393+
ut_a(ib_wqueue_is_empty(mtflush_io->wq));
394+
381395
/* Send one exit work item/thread */
382396
for (i=0; i < srv_mtflush_threads; i++) {
383397
work_item[i].tsk = MT_WRK_NONE;
@@ -399,6 +413,9 @@ buf_mtflu_io_thread_exit(void)
399413

400414
ut_a(ib_wqueue_is_empty(mtflush_io->wq));
401415

416+
/* Requests sent */
417+
os_fast_mutex_unlock(&mtflush_mtx);
418+
402419
/* Collect all work done items */
403420
for (i=0; i < srv_mtflush_threads;) {
404421
wrk_t* work_item = NULL;

storage/xtradb/buf/buf0mtflu.cc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,17 @@ buf_mtflu_io_thread_exit(void)
385385
fprintf(stderr, "InnoDB: [Note]: Signal mtflush_io_threads to exit [%lu]\n",
386386
srv_mtflush_threads);
387387

388+
/* This lock is to safequard against timing bug: flush request take
389+
this mutex before sending work items to be processed by flush
390+
threads. Inside flush thread we assume that work queue contains only
391+
a constant number of items. Thus, we may not install new work items
392+
below before all previous ones are processed. This mutex is released
393+
by flush request after all work items sent to flush threads have
394+
been processed. Thus, we can get this mutex if and only if work
395+
queue is empty. */
396+
397+
os_fast_mutex_lock(&mtflush_mtx);
398+
388399
/* Send one exit work item/thread */
389400
for (i=0; i < srv_mtflush_threads; i++) {
390401
work_item[i].tsk = MT_WRK_NONE;
@@ -406,6 +417,9 @@ buf_mtflu_io_thread_exit(void)
406417

407418
ut_a(ib_wqueue_is_empty(mtflush_io->wq));
408419

420+
/* Requests sent */
421+
os_fast_mutex_unlock(&mtflush_mtx);
422+
409423
/* Collect all work done items */
410424
for (i=0; i < srv_mtflush_threads;) {
411425
wrk_t* work_item = NULL;

0 commit comments

Comments
 (0)