Skip to content

Commit 53cf978

Browse files
Xiaoguang Wangtytso
authored andcommitted
jbd2: fix deadlock while checkpoint thread waits commit thread to finish
This issue was found when I tried to put checkpoint work in a separate thread, the deadlock below happened: Thread1 | Thread2 __jbd2_log_wait_for_space | jbd2_log_do_checkpoint (hold j_checkpoint_mutex)| if (jh->b_transaction != NULL) | ... | jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail | will lock j_checkpoint_mutex, | but will be blocked here. | jbd2_log_wait_commit(journal, tid); | wait_event(journal->j_wait_done_commit, | !tid_gt(tid, journal->j_commit_sequence)); | ... |wake_up(j_wait_done_commit) } | then deadlock occurs, Thread1 will never be waken up. To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint() when we are going to wait for transaction commit. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
1 parent 8fdd60f commit 53cf978

File tree

2 files changed

+16
-3
lines changed

2 files changed

+16
-3
lines changed

fs/jbd2/checkpoint.c

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
113113
nblocks = jbd2_space_needed(journal);
114114
while (jbd2_log_space_left(journal) < nblocks) {
115115
write_unlock(&journal->j_state_lock);
116-
mutex_lock(&journal->j_checkpoint_mutex);
116+
mutex_lock_io(&journal->j_checkpoint_mutex);
117117

118118
/*
119119
* Test again, another process may have checkpointed while we
@@ -276,9 +276,22 @@ int jbd2_log_do_checkpoint(journal_t *journal)
276276
"JBD2: %s: Waiting for Godot: block %llu\n",
277277
journal->j_devname, (unsigned long long) bh->b_blocknr);
278278

279+
if (batch_count)
280+
__flush_batch(journal, &batch_count);
279281
jbd2_log_start_commit(journal, tid);
282+
/*
283+
* jbd2_journal_commit_transaction() may want
284+
* to take the checkpoint_mutex if JBD2_FLUSHED
285+
* is set, jbd2_update_log_tail() called by
286+
* jbd2_journal_commit_transaction() may also take
287+
* checkpoint_mutex. So we need to temporarily
288+
* drop it.
289+
*/
290+
mutex_unlock(&journal->j_checkpoint_mutex);
280291
jbd2_log_wait_commit(journal, tid);
281-
goto retry;
292+
mutex_lock_io(&journal->j_checkpoint_mutex);
293+
spin_lock(&journal->j_list_lock);
294+
goto restart;
282295
}
283296
if (!buffer_dirty(bh)) {
284297
if (unlikely(buffer_write_io_error(bh)) && !result)

fs/jbd2/journal.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2067,7 +2067,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
20672067
err = jbd2_journal_skip_recovery(journal);
20682068
if (write) {
20692069
/* Lock to make assertions happy... */
2070-
mutex_lock(&journal->j_checkpoint_mutex);
2070+
mutex_lock_io(&journal->j_checkpoint_mutex);
20712071
jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
20722072
mutex_unlock(&journal->j_checkpoint_mutex);
20732073
}

0 commit comments

Comments
 (0)