Damien-Le-Moal…
Commits on Aug 12, 2021
-
doc: Fix typo in request queue sysfs documentation
Fix a typo (are -> as) in the introduction paragraph of Documentation/block/queue-sysfs.rst. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
-
doc: document sysfs queue/cranges attributes
Update the file Documentation/block/queue-sysfs.rst to add a description of a device queue sysfs entries related to concurrent sector ranges (e.g. concurrent positioning ranges for multi-actuator hard-disks). While at it, also fix a typo in this file introduction paragraph. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
libata: support concurrent positioning ranges log
Add support to discover if an ATA device supports the Concurrent Positioning Ranges Log (address 0x47), indicating that the device is capable of seeking to multiple different locations in parallel using multiple actuators serving different LBA ranges. Also add support to translate the concurrent positioning ranges log into its equivalent Concurrent Positioning Ranges VPD page B9h in libata-scsi.c. The format of the Concurrent Positioning Ranges Log is defined in ACS-5 r9. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
scsi: sd: add concurrent positioning ranges support
Add the sd_read_cpr() function to the sd scsi disk driver to discover if a device has multiple concurrent positioning ranges (i.e. multiple actuators on an HDD). This new function is called from sd_revalidate_disk() and uses the block layer functions blk_alloc_cranges() and blk_queue_set_cranges() to set a device cranges according to the information retrieved from log page B9h, if the device supports it. The format of the Concurrent Positioning Ranges VPD page B9h is defined in section 6.6.6 of SBC-5. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
block: Add concurrent positioning ranges support
The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA) contain parameters describing the number of sets of contiguous LBAs that can be served independently by a single LUN multi-actuator disk. This patch provides the blk_queue_set_cranges() function allowing a device driver to signal to the block layer that a disk has multiple actuators, each one serving a contiguous range of sectors. To describe the set of sector ranges representing the different actuators of a device, the data type struct blk_cranges is introduced. For a device with multiple actuators, a struct blk_cranges is attached to the device request queue by the disk_set_cranges() function. The function disk_alloc_cranges() is provided for drivers to allocate this structure. The blk_cranges structure contains kobjects (struct kobject) to register with sysfs the set of sector ranges defined by a device. On initial device scan, this registration is done from blk_register_queue() using the block layer internal function disk_register_cranges(). If a driver calls disk_set_cranges() for a registered queue, e.g. when a device is revalidated, disk_set_cranges() will execute disk_register_cranges() to update the queue sysfs attribute files. The sysfs file structure created starts from the cranges sub-directory and contains the start sector and number of sectors served by an actuator, with the information for each actuator grouped in one directory per actuator. E.g. for a dual actuator drive, we have: $ tree /sys/block/sdk/queue/cranges/ /sys/block/sdk/queue/cranges/ |-- 0 | |-- nr_sectors | `-- sector `-- 1 |-- nr_sectors `-- sector For a regular single actuator device, the cranges directory does not exist. Device revalidation may lead to changes to this structure and to the attribute values. When manipulated, the queue sysfs_lock and sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and elevator sysfs queue sub-directories are protected. The code related to the management of cranges is added in the new file block/blk-cranges.c. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> -
Merge branch 'for-5.15/block' into for-next
* for-5.15/block: block: move some macros to blkdev.h
-
block: move some macros to blkdev.h
Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the generic header file to remove redundancy. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Aug 11, 2021
-
Merge branch 'io_uring-bio-cache.4' into for-next
* io_uring-bio-cache.4: block: enable use of bio allocation cache io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
-
block: enable use of bio allocation cache
Initialize the bio_set used for IO with per-cpu bio caching enabled, and use the new bio_alloc_kiocb() helper to dip into that cache. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: enable use of bio alloc cache
Mark polled IO as being safe for dipping into the bio allocation cache, in case the targeted bio_set has it enabled. This brings an IOPOLL gen2 Optane QD=128 workload from ~3.0M IOPS to ~3.3M IOPS. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
block: clear BIO_PERCPU_CACHE flag if polling isn't supported
The bio alloc cache relies on the fact that a polled bio will complete in process context, clear the cacheable flag if we disable polling for a given bio. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
bio: add allocation cache abstraction
Add a per-cpu bio_set cache for bio allocations, enabling us to quickly recycle them instead of going through the slab allocator. This cache isn't IRQ safe, and hence is only really suitable for polled IO. Very simple - keeps a count of bio's in the cache, and maintains a max of 512 with a slack of 64. If we get above max + slack, we drop slack number of bio's. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
fs: add kiocb alloc cache flag
If this kiocb can safely use the polled bio allocation cache, then this flag must be set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
bio: optimize initialization of a bio
The memset() used is measurably slower in targeted benchmarks. Get rid of it and fill in the bio manually, in a separate helper. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Merge branch 'for-5.15/io_uring' into for-next
* for-5.15/io_uring: (47 commits) io_uring: optimise hot path of ltimeout prep io_uring: skip request refcounting io_uring: remove submission references io_uring: remove req_ref_sub_and_test() io_uring: move req_ref_get() and friends io_uring: remove IRQ aspect of io_ring_ctx completion lock io_uring: run regular file completions from task_work io_uring: run linked timeouts from task_work io_uring: run timeouts from task_work io_uring: remove file batch-get optimisation io_uring: clean up tctx_task_work() io_uring: inline io_poll_remove_waitqs io_uring: remove extra argument for overflow flush io_uring: inline struct io_comp_state io_uring: use inflight_entry instead of compl.list io_uring: remove redundant args from cache_free io_uring: cache __io_free_req()'d requests io_uring: move io_fallback_req_func() io_uring: optimise putting task struct io_uring: drop exec checks from io_req_task_submit ...
-
io_uring: optimise hot path of ltimeout prep
io_prep_linked_timeout() grew too heavy and compiler now refuse to inline the function. Help it by splitting in two and annotating with inline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/560636717a32e9513724f09b9ecaace942dde4d4.1628705069.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: skip request refcounting
As submission references are gone, there is only one initial reference left. Instead of actually doing atomic refcounting, add a flag indicating whether we're going to take more refs or doing any other sync magic. The flag should be set before the request may get used in parallel. Together with the previous patch it saves 2 refcount atomics per request for IOPOLL and IRQ completions, and 1 atomic per req for inline completions, with some exceptions. In particular, currently, there are three cases, when the refcounting have to be enabled: - Polling, including apoll. Because double poll entries takes a ref. Might get relaxed in the near future. - Link timeouts, enabled for both, the timeout and the request it's bound to, because they work in-parallel and we need to synchronise to cancel one of them on completion. - When a request gets in io-wq, because it doesn't hold uring_lock and we need guarantees of submission references. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b204b6c5f6643062270a1913d6d3a7f8f795fd9.1628705069.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: remove submission references
Requests are by default given with two references, submission and completion. Completion references are straightforward, they represent request ownership and are put when a request is completed or so. Submission references are a bit more trickier. They're needed when io_issue_sqe() followed deep into the submission stack (e.g. in fs, block, drivers, etc.), request may have given away for concurrent execution or already completed, and the code unwinding back to io_issue_sqe() may be accessing some pieces of our requests, e.g. file or iov. Now, we prevent such async/in-depth completions by pushing requests through task_work. Punting to io-wq is also done through task_works, apart from a couple of cases with a pretty well known context. So, there're two cases: 1) io_issue_sqe() from the task context and protected by ->uring_lock. Either requests return back to io_uring or handed to task_work, which won't be executed because we're currently controlling that task. So, we can be sure that requests are staying alive all the time and we don't need submission references to pin them. 2) io_issue_sqe() from io-wq, which doesn't hold the mutex. The role of submission reference is played by io-wq reference, which is put by io_wq_submit_work(). Hence, it should be fine. Considering that, we can carefully kill the submission reference. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b68f1c763229a590f2a27148aee77767a8d7750.1628705069.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: remove req_ref_sub_and_test()
Soon, we won't need to put several references at once, remove req_ref_sub_and_test() and @nr argument from io_put_req_deferred(), and put the rest of the references by hand. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1868c7554108bff9194fb5757e77be23fadf7fc0.1628705069.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: move req_ref_get() and friends
Move all request refcount helpers to avoid forward declarations in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89fd36f6f3fe5b733dfe4546c24725eee40df605.1628705069.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: remove IRQ aspect of io_ring_ctx completion lock
We have no hard/soft IRQ users of this lock left, remove any IRQ disabling/saving and restoring when grabbing this lock. This is straight forward with no users entering with IRQs disabled anymore, the only thing to look out for is the waitqueue poll head lock which nests inside the completion lock. That needs IRQs disabled, and hence we have to do that now instead of relying on the outer lock doing so. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: run regular file completions from task_work
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: run linked timeouts from task_work
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: run timeouts from task_work
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Add a timeout_lock to handle the ordering of timeout completions or cancelations with the timeouts actually triggering. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: remove file batch-get optimisation
For requests with non-fixed files, instead of grabbing just one reference, we get by the number of left requests, so the following requests using the same file can take it without atomics. However, it's not all win. If there is one request in the middle not using files or having a fixed file, we'll need to put back the left references. Even worse if an application submits requests dealing with different files, it will do a put for each new request, so doubling the number of atomics needed. Also, even if not used, it's still takes some cycles in the submission path. If a file used many times, it rather makes sense to pre-register it, if not, we may fall in the described pitfall. So, this optimisation is a matter of use case. Go with the simpliest code-wise way, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: clean up tctx_task_work()
After recent fixes, tctx_task_work() always does proper spinlocking before looking into ->task_list, so now we don't need atomics for ->task_state, replace it with non-atomic task_running using the critical section. Tide it up, combine two separate block with spinlocking, and always try to splice in there, so we do less locking when new requests are arriving during the function execution. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: fix missing ->task_running reset on task_work_add() failure] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Aug 10, 2021
-
io_uring: inline io_poll_remove_waitqs
Inline io_poll_remove_waitqs() into its only user and clean it up. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2f1a91a19ffcd591531dc4c61e2f11c64a2d6a6d.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: remove extra argument for overflow flush
Unlike __io_cqring_overflow_flush(), nobody does forced flushing with io_cqring_overflow_flush(), so removed the argument from it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7594f869ca41b7cfb5a35a3c7c2d402242834e9e.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: inline struct io_comp_state
Inline struct io_comp_state into struct io_submit_state. They are already coupled tightly, together with mixed responsibilities it only brings confusion having them separately. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e55bba77426b399e3a2e54e3c6c267c6a0fc4b57.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: use inflight_entry instead of compl.list
req->compl.list is used to cache freed requests, and so can't overlap in time with req->inflight_entry. So, use inflight_entry to link requests and remove compl.list. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e430e79d22d70a190d718831bda7bfed1daf8976.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: remove redundant args from cache_free
We don't use @tsk argument of io_req_cache_free(), remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6a28b4a58ee0aaf0db98e2179b9c9f06f9b0cca1.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: cache __io_free_req()'d requests
Don't kfree requests in __io_free_req() but put them back into the internal request cache. That makes allocations more sustainable and will be used for refcounting optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9f4950fbe7771c8d41799366d0a3a08ac3040236.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: move io_fallback_req_func()
Move io_fallback_req_func() to kill yet another forward declaration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d0a8f9d9a0057ed761d6237167d51c9378798d2d.1628536684.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: optimise putting task struct
We cache all the reference to task + tctx, so if io_put_task() is called by the corresponding task itself, we can save on atomics and return the refs right back into the cache. It's beneficial for all inline completions, and also iopolling, when polling and submissions are done by the same task, including SQPOLL|IOPOLL. Note: io_uring_cancel_generic() can return refs to the cache as well, so those should be flushed in the loop for tctx_inflight() to work right. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6fe9646b3cb70e46aca1f58426776e368c8926b3.1628471125.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
io_uring: drop exec checks from io_req_task_submit
In case of on-exec io_uring cancellations, tasks already wait for all submitted requests to get completed/cancelled, so we don't need to check for ->in_execve separately. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/be8707049f10df9d20ca03dc4ca3316239b5e8e0.1628471125.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>