Skip to content

Commit e8c7d14

Browse files
mcgrofaxboe
authored andcommitted
block: revert back to synchronous request_queue removal
Commit dc9edc4 ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d995 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic context. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behavior before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. While at it, update the docs with the context expectations for the request_queue / gendisk refcount decrement, and make these expectations explicit by using might_sleep(). Fixes: dc9edc4 ("block: Fix a blk_exit_rl() regression") Suggested-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
1 parent 763b589 commit e8c7d14

File tree

4 files changed

+47
-23
lines changed

4 files changed

+47
-23
lines changed

block/blk-core.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,9 @@ EXPORT_SYMBOL_GPL(blk_clear_pm_only);
327327
*
328328
* Decrements the refcount of the request_queue kobject. When this reaches 0
329329
* we'll have blk_release_queue() called.
330+
*
331+
* Context: Any context, but the last reference must not be dropped from
332+
* atomic context.
330333
*/
331334
void blk_put_queue(struct request_queue *q)
332335
{
@@ -359,9 +362,14 @@ EXPORT_SYMBOL_GPL(blk_set_queue_dying);
359362
*
360363
* Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and
361364
* put it. All future requests will be failed immediately with -ENODEV.
365+
*
366+
* Context: can sleep
362367
*/
363368
void blk_cleanup_queue(struct request_queue *q)
364369
{
370+
/* cannot be called from atomic context */
371+
might_sleep();
372+
365373
WARN_ON_ONCE(blk_queue_registered(q));
366374

367375
/* mark @q DYING, no new request or merges will be allowed afterwards */

block/blk-sysfs.c

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -873,22 +873,32 @@ static void blk_exit_queue(struct request_queue *q)
873873
bdi_put(q->backing_dev_info);
874874
}
875875

876-
877876
/**
878-
* __blk_release_queue - release a request queue
879-
* @work: pointer to the release_work member of the request queue to be released
877+
* blk_release_queue - releases all allocated resources of the request_queue
878+
* @kobj: pointer to a kobject, whose container is a request_queue
879+
*
880+
* This function releases all allocated resources of the request queue.
881+
*
882+
* The struct request_queue refcount is incremented with blk_get_queue() and
883+
* decremented with blk_put_queue(). Once the refcount reaches 0 this function
884+
* is called.
885+
*
886+
* For drivers that have a request_queue on a gendisk and added with
887+
* __device_add_disk() the refcount to request_queue will reach 0 with
888+
* the last put_disk() called by the driver. For drivers which don't use
889+
* __device_add_disk() this happens with blk_cleanup_queue().
880890
*
881-
* Description:
882-
* This function is called when a block device is being unregistered. The
883-
* process of releasing a request queue starts with blk_cleanup_queue, which
884-
* set the appropriate flags and then calls blk_put_queue, that decrements
885-
* the reference counter of the request queue. Once the reference counter
886-
* of the request queue reaches zero, blk_release_queue is called to release
887-
* all allocated resources of the request queue.
891+
* Drivers exist which depend on the release of the request_queue to be
892+
* synchronous, it should not be deferred.
893+
*
894+
* Context: can sleep
888895
*/
889-
static void __blk_release_queue(struct work_struct *work)
896+
static void blk_release_queue(struct kobject *kobj)
890897
{
891-
struct request_queue *q = container_of(work, typeof(*q), release_work);
898+
struct request_queue *q =
899+
container_of(kobj, struct request_queue, kobj);
900+
901+
might_sleep();
892902

893903
if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
894904
blk_stat_remove_callback(q, q->poll_cb);
@@ -917,15 +927,6 @@ static void __blk_release_queue(struct work_struct *work)
917927
call_rcu(&q->rcu_head, blk_free_queue_rcu);
918928
}
919929

920-
static void blk_release_queue(struct kobject *kobj)
921-
{
922-
struct request_queue *q =
923-
container_of(kobj, struct request_queue, kobj);
924-
925-
INIT_WORK(&q->release_work, __blk_release_queue);
926-
schedule_work(&q->release_work);
927-
}
928-
929930
static const struct sysfs_ops queue_sysfs_ops = {
930931
.show = queue_attr_show,
931932
.store = queue_attr_store,

block/genhd.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -889,12 +889,19 @@ static void invalidate_partition(struct gendisk *disk, int partno)
889889
* The final removal of the struct gendisk happens when its refcount reaches 0
890890
* with put_disk(), which should be called after del_gendisk(), if
891891
* __device_add_disk() was used.
892+
*
893+
* Drivers exist which depend on the release of the gendisk to be synchronous,
894+
* it should not be deferred.
895+
*
896+
* Context: can sleep
892897
*/
893898
void del_gendisk(struct gendisk *disk)
894899
{
895900
struct disk_part_iter piter;
896901
struct hd_struct *part;
897902

903+
might_sleep();
904+
898905
blk_integrity_del(disk);
899906
disk_del_events(disk);
900907

@@ -1548,11 +1555,15 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno)
15481555
* drivers we also call blk_put_queue() for them, and we expect the
15491556
* request_queue refcount to reach 0 at this point, and so the request_queue
15501557
* will also be freed prior to the disk.
1558+
*
1559+
* Context: can sleep
15511560
*/
15521561
static void disk_release(struct device *dev)
15531562
{
15541563
struct gendisk *disk = dev_to_disk(dev);
15551564

1565+
might_sleep();
1566+
15561567
blk_free_devt(dev->devt);
15571568
disk_release_events(disk);
15581569
kfree(disk->random);
@@ -1797,6 +1808,9 @@ EXPORT_SYMBOL(get_disk_and_module);
17971808
*
17981809
* This decrements the refcount for the struct gendisk. When this reaches 0
17991810
* we'll have disk_release() called.
1811+
*
1812+
* Context: Any context, but the last reference must not be dropped from
1813+
* atomic context.
18001814
*/
18011815
void put_disk(struct gendisk *disk)
18021816
{
@@ -1811,6 +1825,9 @@ EXPORT_SYMBOL(put_disk);
18111825
*
18121826
* This is a counterpart of get_disk_and_module() and thus also of
18131827
* get_gendisk().
1828+
*
1829+
* Context: Any context, but the last reference must not be dropped from
1830+
* atomic context.
18141831
*/
18151832
void put_disk_and_module(struct gendisk *disk)
18161833
{

include/linux/blkdev.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -584,8 +584,6 @@ struct request_queue {
584584

585585
size_t cmd_size;
586586

587-
struct work_struct release_work;
588-
589587
#define BLK_MAX_WRITE_HINTS 5
590588
u64 write_hints[BLK_MAX_WRITE_HINTS];
591589
};

0 commit comments

Comments
 (0)