Skip to content

scst: park async LUN-replace cleanup until async_lun_replace clears#365

Merged
lnocturno merged 1 commit into
SCST-project:masterfrom
bmeagherix:fix_async_lun_replace
May 12, 2026
Merged

scst: park async LUN-replace cleanup until async_lun_replace clears#365
lnocturno merged 1 commit into
SCST-project:masterfrom
bmeagherix:fix_async_lun_replace

Conversation

@bmeagherix
Copy link
Copy Markdown
Contributor

@bmeagherix bmeagherix commented May 12, 2026

Follow-up to commit a4a55aa, which added the async_lun_replace knob to defer the slow drain of old tgt_devs off the LUN-replace management write path.

That defers the drain. It does not defer the free - the asynchronous worker still acquires scst_mutex to call scst_free_tgt_dev, and that function's first action, scst_clear_reservation -> scst_dlm_res_lock, does a DLM round-trip. When the peer node has just died and has not yet been evicted from the lockspace, that round-trip stalls in scst_dlm_lock_wait. With scst_mutex held by the stalled worker, every subsequent LUN-replace management write queues behind it.

When async_lun_replace=1, scst_acg_repl_lun() now parks the deferred cleanup of old tgt_devs on a list instead of scheduling it on the workqueue immediately. Writing 0 to the async_lun_replace sysfs knob releases the parked work in a batch.

This lets the orchestrating layer hold cleanup until any cluster coordination it depends on (e.g. DLM peer eviction during HA failover) has completed.

Module unload calls scst_async_lun_replace_set(false) as a safety net.

@bmeagherix bmeagherix force-pushed the fix_async_lun_replace branch from 530bf97 to ab95509 Compare May 12, 2026 15:37
Follow-up to commit a4a55aa ("scst: add async_lun_replace to defer
tgt_dev cleanup after LUN replace"), which moved the slow drain of
old tgt_devs off the LUN-replace management write path.

That defers the drain. It does not defer the free - the asynchronous
worker still acquires scst_mutex to call scst_free_tgt_dev, and that
function's first action, scst_clear_reservation -> scst_dlm_res_lock,
does a DLM round-trip. When the peer node has just died and has not
yet been evicted from the lockspace, that round-trip stalls in
scst_dlm_lock_wait. With scst_mutex held by the stalled worker,
every subsequent LUN-replace management write queues behind it.

When async_lun_replace=1, scst_acg_repl_lun() now parks the deferred
cleanup of old tgt_devs on a list instead of scheduling it on the
workqueue immediately. Writing 0 to the async_lun_replace sysfs knob
releases the parked work in a batch.

This lets the orchestrating layer hold cleanup until any cluster
coordination it depends on (e.g. DLM peer eviction during HA failover)
has completed.

Module unload calls scst_async_lun_replace_set(false) as a safety net.
@bmeagherix bmeagherix force-pushed the fix_async_lun_replace branch from ab95509 to 0ad984b Compare May 12, 2026 15:45
@lnocturno
Copy link
Copy Markdown
Contributor

Hi Brian,

Thank you for the patch!

Gleb

@lnocturno lnocturno merged commit c259c7a into SCST-project:master May 12, 2026
51 of 52 checks passed
@bmeagherix bmeagherix deleted the fix_async_lun_replace branch May 14, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants