Skip to content
Permalink
Damien-Le-Moal…
Switch branches/tags

Commits on Aug 12, 2021

  1. doc: Fix typo in request queue sysfs documentation

    Fix a typo (are -> as) in the introduction paragraph of
    Documentation/block/queue-sysfs.rst.
    
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    damien-lemoal authored and intel-lab-lkp committed Aug 12, 2021
  2. doc: document sysfs queue/cranges attributes

    Update the file Documentation/block/queue-sysfs.rst to add a description
    of a device queue sysfs entries related to concurrent sector ranges
    (e.g. concurrent positioning ranges for multi-actuator hard-disks).
    
    While at it, also fix a typo in this file introduction paragraph.
    
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    damien-lemoal authored and intel-lab-lkp committed Aug 12, 2021
  3. libata: support concurrent positioning ranges log

    Add support to discover if an ATA device supports the Concurrent
    Positioning Ranges Log (address 0x47), indicating that the device is
    capable of seeking to multiple different locations in parallel using
    multiple actuators serving different LBA ranges.
    
    Also add support to translate the concurrent positioning ranges log
    into its equivalent Concurrent Positioning Ranges VPD page B9h in
    libata-scsi.c.
    
    The format of the Concurrent Positioning Ranges Log is defined in ACS-5
    r9.
    
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    damien-lemoal authored and intel-lab-lkp committed Aug 12, 2021
  4. scsi: sd: add concurrent positioning ranges support

    Add the sd_read_cpr() function to the sd scsi disk driver to discover
    if a device has multiple concurrent positioning ranges (i.e. multiple
    actuators on an HDD). This new function is called from
    sd_revalidate_disk() and uses the block layer functions
    blk_alloc_cranges() and blk_queue_set_cranges() to set a device
    cranges according to the information retrieved from log page B9h,
    if the device supports it.
    
    The format of the Concurrent Positioning Ranges VPD page B9h is defined
    in section 6.6.6 of SBC-5.
    
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    damien-lemoal authored and intel-lab-lkp committed Aug 12, 2021
  5. block: Add concurrent positioning ranges support

    The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
    contain parameters describing the number of sets of contiguous LBAs that
    can be served independently by a single LUN multi-actuator disk. This
    patch provides the blk_queue_set_cranges() function allowing a device
    driver to signal to the block layer that a disk has multiple actuators,
    each one serving a contiguous range of sectors. To describe the set
    of sector ranges representing the different actuators of a device, the
    data type struct blk_cranges is introduced.
    
    For a device with multiple actuators, a struct blk_cranges is attached
    to the device request queue by the disk_set_cranges() function. The
    function disk_alloc_cranges() is provided for drivers to allocate this
    structure.
    
    The blk_cranges structure contains kobjects (struct kobject) to register
    with sysfs the set of sector ranges defined by a device. On initial
    device scan, this registration is done from blk_register_queue() using
    the block layer internal function disk_register_cranges(). If a driver
    calls disk_set_cranges() for a registered queue, e.g. when a device
    is revalidated, disk_set_cranges() will execute disk_register_cranges()
    to update the queue sysfs attribute files.
    
    The sysfs file structure created starts from the cranges sub-directory
    and contains the start sector and number of sectors served by an
    actuator, with the information for each actuator grouped in one
    directory per actuator. E.g. for a dual actuator drive, we have:
    
    $ tree /sys/block/sdk/queue/cranges/
    /sys/block/sdk/queue/cranges/
    |-- 0
    |   |-- nr_sectors
    |   `-- sector
    `-- 1
        |-- nr_sectors
        `-- sector
    
    For a regular single actuator device, the cranges directory does not
    exist.
    
    Device revalidation may lead to changes to this structure and to the
    attribute values. When manipulated, the queue sysfs_lock and
    sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
    elevator sysfs queue sub-directories are protected.
    
    The code related to the management of cranges is added in the new
    file block/blk-cranges.c.
    
    Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
    damien-lemoal authored and intel-lab-lkp committed Aug 12, 2021
  6. Merge branch 'for-5.15/block' into for-next

    * for-5.15/block:
      block: move some macros to blkdev.h
    axboe committed Aug 12, 2021
  7. block: move some macros to blkdev.h

    Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the
    generic header file to remove redundancy.
    
    Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
    Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Guoqing Jiang authored and axboe committed Aug 12, 2021

Commits on Aug 11, 2021

  1. Merge branch 'io_uring-bio-cache.4' into for-next

    * io_uring-bio-cache.4:
      block: enable use of bio allocation cache
      io_uring: enable use of bio alloc cache
      block: clear BIO_PERCPU_CACHE flag if polling isn't supported
      bio: add allocation cache abstraction
      fs: add kiocb alloc cache flag
      bio: optimize initialization of a bio
    axboe committed Aug 11, 2021
  2. block: enable use of bio allocation cache

    Initialize the bio_set used for IO with per-cpu bio caching enabled,
    and use the new bio_alloc_kiocb() helper to dip into that cache.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  3. io_uring: enable use of bio alloc cache

    Mark polled IO as being safe for dipping into the bio allocation
    cache, in case the targeted bio_set has it enabled.
    
    This brings an IOPOLL gen2 Optane QD=128 workload from ~3.0M IOPS to
    ~3.3M IOPS.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  4. block: clear BIO_PERCPU_CACHE flag if polling isn't supported

    The bio alloc cache relies on the fact that a polled bio will complete
    in process context, clear the cacheable flag if we disable polling
    for a given bio.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  5. bio: add allocation cache abstraction

    Add a per-cpu bio_set cache for bio allocations, enabling us to quickly
    recycle them instead of going through the slab allocator. This cache
    isn't IRQ safe, and hence is only really suitable for polled IO.
    
    Very simple - keeps a count of bio's in the cache, and maintains a max
    of 512 with a slack of 64. If we get above max + slack, we drop slack
    number of bio's.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  6. fs: add kiocb alloc cache flag

    If this kiocb can safely use the polled bio allocation cache, then this
    flag must be set.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  7. bio: optimize initialization of a bio

    The memset() used is measurably slower in targeted benchmarks. Get rid
    of it and fill in the bio manually, in a separate helper.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  8. Merge branch 'for-5.15/io_uring' into for-next

    * for-5.15/io_uring: (47 commits)
      io_uring: optimise hot path of ltimeout prep
      io_uring: skip request refcounting
      io_uring: remove submission references
      io_uring: remove req_ref_sub_and_test()
      io_uring: move req_ref_get() and friends
      io_uring: remove IRQ aspect of io_ring_ctx completion lock
      io_uring: run regular file completions from task_work
      io_uring: run linked timeouts from task_work
      io_uring: run timeouts from task_work
      io_uring: remove file batch-get optimisation
      io_uring: clean up tctx_task_work()
      io_uring: inline io_poll_remove_waitqs
      io_uring: remove extra argument for overflow flush
      io_uring: inline struct io_comp_state
      io_uring: use inflight_entry instead of compl.list
      io_uring: remove redundant args from cache_free
      io_uring: cache __io_free_req()'d requests
      io_uring: move io_fallback_req_func()
      io_uring: optimise putting task struct
      io_uring: drop exec checks from io_req_task_submit
      ...
    axboe committed Aug 11, 2021
  9. io_uring: optimise hot path of ltimeout prep

    io_prep_linked_timeout() grew too heavy and compiler now refuse to
    inline the function. Help it by splitting in two and annotating with
    inline.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/560636717a32e9513724f09b9ecaace942dde4d4.1628705069.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021
  10. io_uring: skip request refcounting

    As submission references are gone, there is only one initial reference
    left. Instead of actually doing atomic refcounting, add a flag
    indicating whether we're going to take more refs or doing any other sync
    magic. The flag should be set before the request may get used in
    parallel.
    
    Together with the previous patch it saves 2 refcount atomics per request
    for IOPOLL and IRQ completions, and 1 atomic per req for inline
    completions, with some exceptions. In particular, currently, there are
    three cases, when the refcounting have to be enabled:
    - Polling, including apoll. Because double poll entries takes a ref.
      Might get relaxed in the near future.
    - Link timeouts, enabled for both, the timeout and the request it's
      bound to, because they work in-parallel and we need to synchronise
      to cancel one of them on completion.
    - When a request gets in io-wq, because it doesn't hold uring_lock and
      we need guarantees of submission references.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/8b204b6c5f6643062270a1913d6d3a7f8f795fd9.1628705069.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021
  11. io_uring: remove submission references

    Requests are by default given with two references, submission and
    completion. Completion references are straightforward, they represent
    request ownership and are put when a request is completed or so.
    Submission references are a bit more trickier. They're needed when
    io_issue_sqe() followed deep into the submission stack (e.g. in fs,
    block, drivers, etc.), request may have given away for concurrent
    execution or already completed, and the code unwinding back to
    io_issue_sqe() may be accessing some pieces of our requests, e.g.
    file or iov.
    
    Now, we prevent such async/in-depth completions by pushing requests
    through task_work. Punting to io-wq is also done through task_works,
    apart from a couple of cases with a pretty well known context. So,
    there're two cases:
    1) io_issue_sqe() from the task context and protected by ->uring_lock.
    Either requests return back to io_uring or handed to task_work, which
    won't be executed because we're currently controlling that task. So,
    we can be sure that requests are staying alive all the time and we don't
    need submission references to pin them.
    
    2) io_issue_sqe() from io-wq, which doesn't hold the mutex. The role of
    submission reference is played by io-wq reference, which is put by
    io_wq_submit_work(). Hence, it should be fine.
    
    Considering that, we can carefully kill the submission reference.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/6b68f1c763229a590f2a27148aee77767a8d7750.1628705069.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021
  12. io_uring: remove req_ref_sub_and_test()

    Soon, we won't need to put several references at once, remove
    req_ref_sub_and_test() and @nr argument from io_put_req_deferred(),
    and put the rest of the references by hand.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/1868c7554108bff9194fb5757e77be23fadf7fc0.1628705069.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021
  13. io_uring: move req_ref_get() and friends

    Move all request refcount helpers to avoid forward declarations in the
    future.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/89fd36f6f3fe5b733dfe4546c24725eee40df605.1628705069.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021
  14. io_uring: remove IRQ aspect of io_ring_ctx completion lock

    We have no hard/soft IRQ users of this lock left, remove any IRQ
    disabling/saving and restoring when grabbing this lock.
    
    This is straight forward with no users entering with IRQs disabled
    anymore, the only thing to look out for is the waitqueue poll head
    lock which nests inside the completion lock. That needs IRQs disabled,
    and hence we have to do that now instead of relying on the outer lock
    doing so.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  15. io_uring: run regular file completions from task_work

    This is in preparation to making the completion lock work outside of
    hard/soft IRQ context.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  16. io_uring: run linked timeouts from task_work

    This is in preparation to making the completion lock work outside of
    hard/soft IRQ context.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  17. io_uring: run timeouts from task_work

    This is in preparation to making the completion lock work outside of
    hard/soft IRQ context.
    
    Add a timeout_lock to handle the ordering of timeout completions or
    cancelations with the timeouts actually triggering.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Aug 11, 2021
  18. io_uring: remove file batch-get optimisation

    For requests with non-fixed files, instead of grabbing just one
    reference, we get by the number of left requests, so the following
    requests using the same file can take it without atomics.
    
    However, it's not all win. If there is one request in the middle
    not using files or having a fixed file, we'll need to put back the left
    references. Even worse if an application submits requests dealing with
    different files, it will do a put for each new request, so doubling the
    number of atomics needed. Also, even if not used, it's still takes some
    cycles in the submission path.
    
    If a file used many times, it rather makes sense to pre-register it, if
    not, we may fall in the described pitfall. So, this optimisation is a
    matter of use case. Go with the simpliest code-wise way, remove it.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021
  19. io_uring: clean up tctx_task_work()

    After recent fixes, tctx_task_work() always does proper spinlocking
    before looking into ->task_list, so now we don't need atomics for
    ->task_state, replace it with non-atomic task_running using the critical
    section.
    
    Tide it up, combine two separate block with spinlocking, and always try
    to splice in there, so we do less locking when new requests are arriving
    during the function execution.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    [axboe: fix missing ->task_running reset on task_work_add() failure]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 11, 2021

Commits on Aug 10, 2021

  1. io_uring: inline io_poll_remove_waitqs

    Inline io_poll_remove_waitqs() into its only user and clean it up.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/2f1a91a19ffcd591531dc4c61e2f11c64a2d6a6d.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  2. io_uring: remove extra argument for overflow flush

    Unlike __io_cqring_overflow_flush(), nobody does forced flushing with
    io_cqring_overflow_flush(), so removed the argument from it.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/7594f869ca41b7cfb5a35a3c7c2d402242834e9e.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  3. io_uring: inline struct io_comp_state

    Inline struct io_comp_state into struct io_submit_state. They are
    already coupled tightly, together with mixed responsibilities it
    only brings confusion having them separately.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/e55bba77426b399e3a2e54e3c6c267c6a0fc4b57.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  4. io_uring: use inflight_entry instead of compl.list

    req->compl.list is used to cache freed requests, and so can't overlap in
    time with req->inflight_entry. So, use inflight_entry to link requests
    and remove compl.list.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/e430e79d22d70a190d718831bda7bfed1daf8976.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  5. io_uring: remove redundant args from cache_free

    We don't use @tsk argument of io_req_cache_free(), remove it.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/6a28b4a58ee0aaf0db98e2179b9c9f06f9b0cca1.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  6. io_uring: cache __io_free_req()'d requests

    Don't kfree requests in __io_free_req() but put them back into the
    internal request cache. That makes allocations more sustainable and will
    be used for refcounting optimisations.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/9f4950fbe7771c8d41799366d0a3a08ac3040236.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  7. io_uring: move io_fallback_req_func()

    Move io_fallback_req_func() to kill yet another forward declaration.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/d0a8f9d9a0057ed761d6237167d51c9378798d2d.1628536684.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  8. io_uring: optimise putting task struct

    We cache all the reference to task + tctx, so if io_put_task() is
    called by the corresponding task itself, we can save on atomics and
    return the refs right back into the cache.
    
    It's beneficial for all inline completions, and also iopolling, when
    polling and submissions are done by the same task, including
    SQPOLL|IOPOLL.
    
    Note: io_uring_cancel_generic() can return refs to the cache as well,
    so those should be flushed in the loop for tctx_inflight() to work
    right.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/6fe9646b3cb70e46aca1f58426776e368c8926b3.1628471125.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
  9. io_uring: drop exec checks from io_req_task_submit

    In case of on-exec io_uring cancellations, tasks already wait for all
    submitted requests to get completed/cancelled, so we don't need to check
    for ->in_execve separately.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/be8707049f10df9d20ca03dc4ca3316239b5e8e0.1628471125.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Aug 10, 2021
Older