Skip to content

Commits

Permalink
Qu-Wenruo/btrf…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Nov 28, 2021

  1. btrfs: temporarily disable RAID56

    !!! DON'T MERGE THIS COMMIT !!!
    
    There are still some bugs buried deeply inside RAID56 code which is not
    yet compatible with bio split at btrfs_map_bio() time, disable it for
    now.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    6cab817 View commit details
    Browse the repository at this point in the history
  2. btrfs: remove btrfs_bio_ctrl::len_to_stripe_boundary

    Since we can split bio at btrfs_map_bio() time, there is no need to do
    bio split for stripe boundaries at submit_extent_page() time.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    51a7650 View commit details
    Browse the repository at this point in the history
  3. btrfs: remove bio split operations in btrfs_submit_direct()

    Since btrfs_map_bio() will handle the split, there is no need to do the
    split in btrfs_submit_direct() anymore.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    675923a View commit details
    Browse the repository at this point in the history
  4. btrfs: allow btrfs_map_bio() to split bio according to chunk stripe b…

    …oundaries
    
    With the new btrfs_bio_split() helper, we are able to split bio
    according to chunk stripe boundaries at btrfs_map_bio() time.
    
    Although currently due bio split at submit_extent_page() this ability is
    not yet utilized.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    63f3924 View commit details
    Browse the repository at this point in the history
  5. btrfs: make end_bio_extent_*_writepage() to handle split biot properly

    There are 3 call sites involved:
    
    - end_bio_extent_writepage()
      For data writeback
    
    - end_bio_subpage_eb_writepage()
      For subpage metadata writeback
    
    - end_bio_extent_buffer_writepage()
      For regular metadata writeback
    
    All those functions share the same modification:
    
    - Remove ASSERT() to non-cloned bios
    - Use bio_for_each_segment()
      Which can handle both unsplit bio and split biot properly.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    4f56f0c View commit details
    Browse the repository at this point in the history
  6. btrfs: make end_bio_extent_readpage() to handle split bio properly

    This involves the following modifications:
    
    - Use bio_for_each_segment() instead of bio_for_each_segment_all()
      bio_for_each_segment_all() will iterate all bvecs, even if they are
      not referred by current bi_iter.
    
      *_all() variant can only be used if the bio is never split.
    
      Change it to bio_for_each_segment() call so we won't have endio called
      on the same range by both split and parent bios.
    
    - Make check_data_csum() to take bbio->offset_to_original into
      consideration
      Since btrfs bio can be split now, split/original bio can all start
      with some offset to the original logical bytenr.
    
      Take btrfs_bio::offset_to_original into consideration to get correct
      checksum offset.
    
    - Remove the BIO_CLONED ASSERT() in submit_read_repair()
    
    For metadata path, there is no change as they only rely on file offset,
    doesn't care about btrfs_bio at all.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    a18fd79 View commit details
    Browse the repository at this point in the history
  7. btrfs: save bio::bi_iter into btrfs_bio::iter before submitting

    Since block layer will advance bio::bi_iter, at endio time we can no
    longer rely on bio::bi_iter for split bio.
    
    But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to
    ensure endio function is only executed for the split range, not the
    whole original bio.
    
    Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to
    save bi_iter into btrfs_bio::iter.
    
    The following call sites need this helper call:
    
    - btrfs_submit_compressed_read()
      For compressed read. For compressed write it doesn't really care as
      they use ordered extent.
    
    - raid56_parity_write()
    - raid56_parity_recovery()
      For RAID56.
    
    - submit_stripe_bio()
      For all other cases.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    69899e5 View commit details
    Browse the repository at this point in the history
  8. btrfs: introduce btrfs_bio_split() helper

    This new function will handle the split of a btrfs bio, to co-operate
    with the incoming chunk mapping time bio split.
    
    This patch will introduce the following new members:
    
    - btrfs_bio::offset_to_original
      Since btrfs_bio::csum is still storing the checksum for the original
      logical bytenr, we need to know the offset between current advanced
      bio and the original logical bytenr.
    
      Thus here we need such new member.
      And the new member will fit into the existing hole between
      btrfs_bio::mirror_num and btrfs_bio::device, it should not increase
      the memory usage of btrfs_bio.
    
    - btrfs_bio::parent and btrfs_bio::orig_endio
      To record where the parent bio is and the original endio function.
    
    - btrfs_bio::is_split_bio
      To distinguish bio created by btrfs_bio_split() and
      btrfs_bio_clone*().
    
      For cloned bio, they still have their csum pointed to correct memory,
      while split bio must rely on its parent bbio to grab csum pointer.
    
    - split_bio_endio()
      Just to call the original endio function then call bio_endio() on
      the original bio.
      This will ensure the original bio is freed after all cloned bio.
    
    Currently there is no other caller utilizing above new members/functions
    yet.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    b68d55c View commit details
    Browse the repository at this point in the history
  9. btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio()

    This is a preparation patch for the incoming chunk mapping layer bio
    split.
    
    Function btrfs_bio_wq_end_io() is going to remap bio::bi_private and
    bio::bi_end_io so that the real endio function will be executed in a
    workqueue.
    
    The problem is, remapped bio::bi_private will be a newly allocated
    memory, and after the original endio executed, the memory will be freed.
    
    This will not work well with split bio.
    
    So this patch will move all btrfs_bio_wq_end_io() call into one helper
    function, btrfs_bio_final_endio_remap(), and call that helper in
    submit_stripe_bio().
    
    This refactor also unified all data bio behaviors.
    
    Before this patch, compressed bio no matter if read or write, will
    always be delayed using workqueue.
    
    However all data write operations are already delayed using ordered
    extent, and all metadata write doesn't need any delayed execution.
    
    Thus this patch will make compressed bios follow the same data
    read/write behavior.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    ee9ce07 View commit details
    Browse the repository at this point in the history
  10. btrfs: refactor btrfs_map_bio()

    Currently in btrfs_map_bio() we call __btrfs_map_block(), then using the
    returned bioc to submit real stripes.
    
    This is fine if we're only going to handle one bio a time.
    
    For the incoming bio split at btrfs_map_bio() time, we want to handle
    several differnet bios, thus there we introduce a new helper,
    submit_one_mapped_range() to handle the submission part, making it much
    easier to make it work in a loop.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    29c3b1c View commit details
    Browse the repository at this point in the history
  11. btrfs: update an stale comment on btrfs_submit_bio_hook()

    This function is renamed to btrfs_submit_data_bio(), update the comment
    and add extra reason why it doesn't completely follow the same rule in
    btrfs_submit_data_bio().
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 authored and intel-lab-lkp committed Nov 28, 2021
    Copy the full SHA
    e354a15 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2021

  1. Fixup merge-to-merge conflict in lzo_compress_pages

    Signed-off-by: David Sterba <dsterba@suse.com>
    kdave committed Nov 16, 2021
    Copy the full SHA
    279373d View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    b712218 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'for-next-current-v5.15-20211116' into for-next-20211116

    # Conflicts:
    #	fs/btrfs/lzo.c
    kdave committed Nov 16, 2021
    Copy the full SHA
    85b7c01 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    f0d739a View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    d872c62 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    e78b180 View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    ed93ad7 View commit details
    Browse the repository at this point in the history
  8. btrfs: allow device add if balance is paused

    Currently paused balance precludes adding a device since they are both
    considered exclusive ops and we can have at most 1 running at a time.
    This is problematic in case a filesystem encounters an ENOSPC situation
    while balance is running, in this case the only thing the user can do
    is mount the fs with "skip_balance" which pauses balance and delete some
    data to free up space for balance. However, it should be possible to add
    a new device when balance is paused.
    
    Fix this by allowing device add to proceed when balance is paused.
    
    Signed-off-by: Nikolay Borisov <nborisov@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    lorddoskias authored and kdave committed Nov 16, 2021
    Copy the full SHA
    a6effc0 View commit details
    Browse the repository at this point in the history
  9. btrfs: make device add compatible with paused balance in btrfs_exclop…

    …_start_try_lock
    
    This is needed to enable device add to work in cases when a file system
    has been mounted with 'skip_balance' mount option.
    
    Signed-off-by: Nikolay Borisov <nborisov@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    lorddoskias authored and kdave committed Nov 16, 2021
    Copy the full SHA
    68375f1 View commit details
    Browse the repository at this point in the history
  10. btrfs: introduce BTRFS_EXCLOP_BALANCE_PAUSED exclusive state

    Current set of exclusive operation states is not sufficient to handle
    all practical use cases. In particular there is a need to be able to add
    a device to a filesystem that have paused balance. Currently there is no
    way to distinguish between a running and a paused balance. Fix this by
    introducing BTRFS_EXCLOP_BALANCE_PAUSED which is going to be set in 2
    occasions:
    
    1. When a filesystem is mounted with skip_balance and there is an
       unfinished balance it will now be into BALANCE_PAUSED instead of
       simply BALANCE state.
    
    2. When a running balance is paused.
    
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Nikolay Borisov <nborisov@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    lorddoskias authored and kdave committed Nov 16, 2021
    Copy the full SHA
    aa7a001 View commit details
    Browse the repository at this point in the history
  11. btrfs: change root to fs_info for btrfs_reserve_metadata_bytes

    We used to need the root for btrfs_reserve_metadata_bytes to check the
    orphan cleanup state, but we no longer need that, we simply need the
    fs_info.  Change btrfs_reserve_metadata_bytes() to use the fs_info, and
    change both btrfs_block_rsv_refill() and btrfs_block_rsv_add() to do the
    same as they simply call btrfs_reserve_metadata_bytes() and then
    manipulate the block_rsv that is being used.
    
    Reviewed-by: Nikolay Borisov <nborisov@suse.com>
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    11d02f3 View commit details
    Browse the repository at this point in the history
  12. btrfs: get rid of root->orphan_cleanup_state

    Now that we don't care about the stage of the orphan_cleanup_state,
    simply replace it with a bit on ->state to make sure we don't call the
    orphan cleanup every time we wander into this root.
    
    Reviewed-by: Nikolay Borisov <nborisov@suse.com>
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    be4b570 View commit details
    Browse the repository at this point in the history
  13. btrfs: remove global rsv stealing logic for orphan cleanup

    This is very old code before we were stealing from the global reserve
    during evict.  We have proper ways to steal from the global reserve
    while we're evicting, so rip out this code as it's no longer necessary.
    
    Reviewed-by: Nikolay Borisov <nborisov@suse.com>
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    22b461a View commit details
    Browse the repository at this point in the history
  14. btrfs: make BTRFS_RESERVE_FLUSH_EVICT use the global rsv stealing code

    I forgot to convert this over when I introduced the global reserve
    stealing code to the space flushing code.  Evict was simply trying to
    make its reservation and then if it failed it would steal from the
    global rsv, which is racey because it's outside of the normal ticketing
    code.
    
    Fix this by setting ticket->steal if we are BTRFS_RESERVE_FLUSH_EVICT,
    and then make the priority flushing path do the steal for us.
    
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    5a92f59 View commit details
    Browse the repository at this point in the history
  15. btrfs: check ticket->steal in steal_from_global_block_rsv

    We're going to use this helper in the priority flushing loop, move this
    check into the helper to simplify the logic.
    
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    8ec9702 View commit details
    Browse the repository at this point in the history
  16. btrfs: check for priority ticket granting before flushing

    Since we're dropping locks before we enter the priority flushing loops
    we could have had our ticket granted before we got the space_info->lock.
    So add this check to avoid doing some extra flushing in the priority
    flushing cases.
    
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    8a96b88 View commit details
    Browse the repository at this point in the history
  17. btrfs: handle priority ticket failures in their respective helpers

    Currently the error case for the priority tickets is handled where we
    deal with all of the tickets, priority and non-priority.  This is ok in
    general, but it makes for some awkward locking.  We take and drop the
    space_info->lock back to back because of these different types of
    tickets.
    
    Rework the code to handle priority ticket failures in their respective
    helpers.  This allows us to be less wonky with our space_info->lock
    usage, and means that the main handler simply has to check
    ticket->error, as the ticket is guaranteed to be off any list and
    completely handled by the time it exits one of the handlers.
    
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    josefbacik authored and kdave committed Nov 16, 2021
    Copy the full SHA
    9f270c7 View commit details
    Browse the repository at this point in the history
  18. btrfs: deprecate BTRFS_IOC_BALANCE ioctl

    The v2 balance ioctl has been introduced more than 9 years ago. Users of
    the old v1 ioctl should have long been migrated to it. It's time we
    deprecate it and eventually remove it.
    
    The only known user is in btrfs-progs that tries v1 as a fallback in
    case v2 is not supported. This is not necessary anymore.
    
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Anand Jain <anand.jain@oracle.com>
    Signed-off-by: Nikolay Borisov <nborisov@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    lorddoskias authored and kdave committed Nov 16, 2021
    Copy the full SHA
    6c405b2 View commit details
    Browse the repository at this point in the history
  19. btrfs: make 1-bit bit-fields of scrub_page unsigned int

    The bitfields have_csum and io_error are currently signed which is not
    recommended as the representation is an implementation defined
    behaviour. Fix this by making the bit-fields unsigned ints.
    
    Fixes: 2c36395 ("btrfs: scrub: remove the anonymous structure from scrub_page")
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Colin Ian King authored and kdave committed Nov 16, 2021
    Copy the full SHA
    d08e38b View commit details
    Browse the repository at this point in the history
  20. btrfs: check-integrity: fix a warning on write caching disabled disk

    When a disk has write caching disabled, we skip submission of a bio with
    flush and sync requests before writing the superblock, since it's not
    needed. However when the integrity checker is enabled, this results in
    reports that there are metadata blocks referred by a superblock that
    were not properly flushed. So don't skip the bio submission only when
    the integrity checker is enabled for the sake of simplicity, since this
    is a debug tool and not meant for use in non-debug builds.
    
    fstests/btrfs/220 trigger a check-integrity warning like the following
    when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
    
      btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
      ------------[ cut here ]------------
      WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
      CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
      Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
      RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
      RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
      RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
      RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
      R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
      R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
      FS:  00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
      Call Trace:
       btrfsic_process_written_block+0x2f7/0x850 [btrfs]
       __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
       ? bio_associate_blkg_from_css+0xa4/0x2c0
       btrfsic_submit_bio+0x18/0x30 [btrfs]
       write_dev_supers+0x81/0x2a0 [btrfs]
       ? find_get_pages_range_tag+0x219/0x280
       ? pagevec_lookup_range_tag+0x24/0x30
       ? __filemap_fdatawait_range+0x6d/0xf0
       ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
       ? find_first_extent_bit+0x9b/0x160 [btrfs]
       ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
       write_all_supers+0x1b3/0xa70 [btrfs]
       ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
       btrfs_commit_transaction+0x59d/0xac0 [btrfs]
       close_ctree+0x11d/0x339 [btrfs]
       generic_shutdown_super+0x71/0x110
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20 [btrfs]
       deactivate_locked_super+0x31/0x70
       cleanup_mnt+0xb8/0x140
       task_work_run+0x6d/0xb0
       exit_to_user_mode_prepare+0x1f0/0x200
       syscall_exit_to_user_mode+0x12/0x30
       do_syscall_64+0x46/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f009f711dfb
      RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
      RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
      RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
      R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
      R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
      ---[ end trace 2c4b82abcef9eec4 ]---
      S-65536(sdb2/65536/1)
       -->
      M-1064960(sdb2/1064960/1)
    
    Reviewed-by: Filipe Manana <fdmanana@gmail.com>
    Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    wangyugui-e16 authored and kdave committed Nov 16, 2021
    Copy the full SHA
    a91cf0f View commit details
    Browse the repository at this point in the history
  21. btrfs: silence lockdep when reading chunk tree during mount

    Often some test cases like btrfs/161 trigger lockdep splats that complain
    about possible unsafe lock scenario due to the fact that during mount,
    when reading the chunk tree we end up calling blkdev_get_by_path() while
    holding a read lock on a leaf of the chunk tree. That produces a lockdep
    splat like the following:
    
    [ 3653.683975] ======================================================
    [ 3653.685148] WARNING: possible circular locking dependency detected
    [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted
    [ 3653.687239] ------------------------------------------------------
    [ 3653.688400] mount/447465 is trying to acquire lock:
    [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.691054]
                   but task is already holding lock:
    [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
    [ 3653.693978]
                   which lock already depends on the new lock.
    
    [ 3653.695510]
                   the existing dependency chain (in reverse order) is:
    [ 3653.696915]
                   -> #3 (btrfs-chunk-00){++++}-{3:3}:
    [ 3653.698053]        down_read_nested+0x4b/0x140
    [ 3653.698893]        __btrfs_tree_read_lock+0x24/0x110 [btrfs]
    [ 3653.699988]        btrfs_read_lock_root_node+0x31/0x40 [btrfs]
    [ 3653.701205]        btrfs_search_slot+0x537/0xc00 [btrfs]
    [ 3653.702234]        btrfs_insert_empty_items+0x32/0x70 [btrfs]
    [ 3653.703332]        btrfs_init_new_device+0x563/0x15b0 [btrfs]
    [ 3653.704439]        btrfs_ioctl+0x2110/0x3530 [btrfs]
    [ 3653.705405]        __x64_sys_ioctl+0x83/0xb0
    [ 3653.706215]        do_syscall_64+0x3b/0xc0
    [ 3653.706990]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 3653.708040]
                   -> #2 (sb_internal#2){.+.+}-{0:0}:
    [ 3653.708994]        lock_release+0x13d/0x4a0
    [ 3653.709533]        up_write+0x18/0x160
    [ 3653.710017]        btrfs_sync_file+0x3f3/0x5b0 [btrfs]
    [ 3653.710699]        __loop_update_dio+0xbd/0x170 [loop]
    [ 3653.711360]        lo_ioctl+0x3b1/0x8a0 [loop]
    [ 3653.711929]        block_ioctl+0x48/0x50
    [ 3653.712442]        __x64_sys_ioctl+0x83/0xb0
    [ 3653.712991]        do_syscall_64+0x3b/0xc0
    [ 3653.713519]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 3653.714233]
                   -> #1 (&lo->lo_mutex){+.+.}-{3:3}:
    [ 3653.715026]        __mutex_lock+0x92/0x900
    [ 3653.715648]        lo_open+0x28/0x60 [loop]
    [ 3653.716275]        blkdev_get_whole+0x28/0x90
    [ 3653.716867]        blkdev_get_by_dev.part.0+0x142/0x320
    [ 3653.717537]        blkdev_open+0x5e/0xa0
    [ 3653.718043]        do_dentry_open+0x163/0x390
    [ 3653.718604]        path_openat+0x3f0/0xa80
    [ 3653.719128]        do_filp_open+0xa9/0x150
    [ 3653.719652]        do_sys_openat2+0x97/0x160
    [ 3653.720197]        __x64_sys_openat+0x54/0x90
    [ 3653.720766]        do_syscall_64+0x3b/0xc0
    [ 3653.721285]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 3653.721986]
                   -> #0 (&disk->open_mutex){+.+.}-{3:3}:
    [ 3653.722775]        __lock_acquire+0x130e/0x2210
    [ 3653.723348]        lock_acquire+0xd7/0x310
    [ 3653.723867]        __mutex_lock+0x92/0x900
    [ 3653.724394]        blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.725041]        blkdev_get_by_path+0xb8/0xd0
    [ 3653.725614]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
    [ 3653.726332]        open_fs_devices+0xd7/0x2c0 [btrfs]
    [ 3653.726999]        btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
    [ 3653.727739]        open_ctree+0xb8e/0x17bf [btrfs]
    [ 3653.728384]        btrfs_mount_root.cold+0x12/0xde [btrfs]
    [ 3653.729130]        legacy_get_tree+0x30/0x50
    [ 3653.729676]        vfs_get_tree+0x28/0xc0
    [ 3653.730192]        vfs_kern_mount.part.0+0x71/0xb0
    [ 3653.730800]        btrfs_mount+0x11d/0x3a0 [btrfs]
    [ 3653.731427]        legacy_get_tree+0x30/0x50
    [ 3653.731970]        vfs_get_tree+0x28/0xc0
    [ 3653.732486]        path_mount+0x2d4/0xbe0
    [ 3653.732997]        __x64_sys_mount+0x103/0x140
    [ 3653.733560]        do_syscall_64+0x3b/0xc0
    [ 3653.734080]        entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 3653.734782]
                   other info that might help us debug this:
    
    [ 3653.735784] Chain exists of:
                     &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00
    
    [ 3653.737123]  Possible unsafe locking scenario:
    
    [ 3653.737865]        CPU0                    CPU1
    [ 3653.738435]        ----                    ----
    [ 3653.739007]   lock(btrfs-chunk-00);
    [ 3653.739449]                                lock(sb_internal#2);
    [ 3653.740193]                                lock(btrfs-chunk-00);
    [ 3653.740955]   lock(&disk->open_mutex);
    [ 3653.741431]
                    *** DEADLOCK ***
    
    [ 3653.742176] 3 locks held by mount/447465:
    [ 3653.742739]  #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
    [ 3653.744114]  #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs]
    [ 3653.745563]  #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
    [ 3653.747066]
                   stack backtrace:
    [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1
    [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    [ 3653.750592] Call Trace:
    [ 3653.750967]  dump_stack_lvl+0x57/0x72
    [ 3653.751526]  check_noncircular+0xf3/0x110
    [ 3653.752136]  ? stack_trace_save+0x4b/0x70
    [ 3653.752748]  __lock_acquire+0x130e/0x2210
    [ 3653.753356]  lock_acquire+0xd7/0x310
    [ 3653.753898]  ? blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.754596]  ? lock_is_held_type+0xe8/0x140
    [ 3653.755125]  ? blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.755729]  ? blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.756338]  __mutex_lock+0x92/0x900
    [ 3653.756794]  ? blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.757400]  ? do_raw_spin_unlock+0x4b/0xa0
    [ 3653.757930]  ? _raw_spin_unlock+0x29/0x40
    [ 3653.758437]  ? bd_prepare_to_claim+0x129/0x150
    [ 3653.758999]  ? trace_module_get+0x2b/0xd0
    [ 3653.759508]  ? try_module_get.part.0+0x50/0x80
    [ 3653.760072]  blkdev_get_by_dev.part.0+0xe7/0x320
    [ 3653.760661]  ? devcgroup_check_permission+0xc1/0x1f0
    [ 3653.761288]  blkdev_get_by_path+0xb8/0xd0
    [ 3653.761797]  btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
    [ 3653.762454]  open_fs_devices+0xd7/0x2c0 [btrfs]
    [ 3653.763055]  ? clone_fs_devices+0x8f/0x170 [btrfs]
    [ 3653.763689]  btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
    [ 3653.764370]  ? kvm_sched_clock_read+0x14/0x40
    [ 3653.764922]  open_ctree+0xb8e/0x17bf [btrfs]
    [ 3653.765493]  ? super_setup_bdi_name+0x79/0xd0
    [ 3653.766043]  btrfs_mount_root.cold+0x12/0xde [btrfs]
    [ 3653.766780]  ? rcu_read_lock_sched_held+0x3f/0x80
    [ 3653.767488]  ? kfree+0x1f2/0x3c0
    [ 3653.767979]  legacy_get_tree+0x30/0x50
    [ 3653.768548]  vfs_get_tree+0x28/0xc0
    [ 3653.769076]  vfs_kern_mount.part.0+0x71/0xb0
    [ 3653.769718]  btrfs_mount+0x11d/0x3a0 [btrfs]
    [ 3653.770381]  ? rcu_read_lock_sched_held+0x3f/0x80
    [ 3653.771086]  ? kfree+0x1f2/0x3c0
    [ 3653.771574]  legacy_get_tree+0x30/0x50
    [ 3653.772136]  vfs_get_tree+0x28/0xc0
    [ 3653.772673]  path_mount+0x2d4/0xbe0
    [ 3653.773201]  __x64_sys_mount+0x103/0x140
    [ 3653.773793]  do_syscall_64+0x3b/0xc0
    [ 3653.774333]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 3653.775094] RIP: 0033:0x7f648bc45aaa
    
    This happens because through btrfs_read_chunk_tree(), which is called only
    during mount, ends up acquiring the mutex open_mutex of a block device
    while holding a read lock on a leaf of the chunk tree while other paths
    need to acquire other locks before locking extent buffers of the chunk
    tree.
    
    Since at mount time when we call btrfs_read_chunk_tree() we know that
    we don't have other tasks running in parallel and modifying the chunk
    tree, we can simply skip locking of chunk tree extent buffers. So do
    that and move the assertion that checks the fs is not yet mounted to the
    top block of btrfs_read_chunk_tree(), with a comment before doing it.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    fdmanana authored and kdave committed Nov 16, 2021
    Copy the full SHA
    4d9380e View commit details
    Browse the repository at this point in the history
  22. btrfs: fix memory ordering between normal and ordered work functions

    Ordered work functions aren't guaranteed to be handled by the same thread
    which executed the normal work functions. The only way execution between
    normal/ordered functions is synchronized is via the WORK_DONE_BIT,
    unfortunately the used bitops don't guarantee any ordering whatsoever.
    
    This manifested as seemingly inexplicable crashes on ARM64, where
    async_chunk::inode is seen as non-null in async_cow_submit which causes
    submit_compressed_extents to be called and crash occurs because
    async_chunk::inode suddenly became NULL. The call trace was similar to:
    
        pc : submit_compressed_extents+0x38/0x3d0
        lr : async_cow_submit+0x50/0xd0
        sp : ffff800015d4bc20
    
        <registers omitted for brevity>
    
        Call trace:
         submit_compressed_extents+0x38/0x3d0
         async_cow_submit+0x50/0xd0
         run_ordered_work+0xc8/0x280
         btrfs_work_helper+0x98/0x250
         process_one_work+0x1f0/0x4ac
         worker_thread+0x188/0x504
         kthread+0x110/0x114
         ret_from_fork+0x10/0x18
    
    Fix this by adding respective barrier calls which ensure that all
    accesses preceding setting of WORK_DONE_BIT are strictly ordered before
    setting the flag. At the same time add a read barrier after reading of
    WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads
    would be strictly ordered after reading the bit. This in turn ensures
    are all accesses before WORK_DONE_BIT are going to be strictly ordered
    before any access that can occur in ordered_func.
    
    Reported-by: Chris Murphy <lists@colorremedies.com>
    Fixes: 08a9ff3 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue")
    CC: stable@vger.kernel.org # 4.4+
    Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Tested-by: Chris Murphy <chris@colorremedies.com>
    Signed-off-by: Nikolay Borisov <nborisov@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    lorddoskias authored and kdave committed Nov 16, 2021
    Copy the full SHA
    45da9c1 View commit details
    Browse the repository at this point in the history
  23. btrfs: fix a out-of-bound access in copy_compressed_data_to_page()

    [BUG]
    The following script can cause btrfs to crash:
    
      $ mount -o compress-force=lzo $DEV /mnt
      $ dd if=/dev/urandom of=/mnt/foo bs=4k count=1
      $ sync
    
    The call trace looks like this:
    
      general protection fault, probably for non-canonical address 0xe04b37fccce3b000: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 5 PID: 164 Comm: kworker/u20:3 Not tainted 5.15.0-rc7-custom+ #4
      Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
      RIP: 0010:__memcpy+0x12/0x20
      Call Trace:
       lzo_compress_pages+0x236/0x540 [btrfs]
       btrfs_compress_pages+0xaa/0xf0 [btrfs]
       compress_file_range+0x431/0x8e0 [btrfs]
       async_cow_start+0x12/0x30 [btrfs]
       btrfs_work_helper+0xf6/0x3e0 [btrfs]
       process_one_work+0x294/0x5d0
       worker_thread+0x55/0x3c0
       kthread+0x140/0x170
       ret_from_fork+0x22/0x30
      ---[ end trace 63c3c0f131e61982 ]---
    
    [CAUSE]
    In lzo_compress_pages(), parameter @out_pages is not only an output
    parameter (for the number of compressed pages), but also an input
    parameter, as the upper limit of compressed pages we can utilize.
    
    In commit d408880 ("btrfs: subpage: make lzo_compress_pages()
    compatible"), the refactoring doesn't take @out_pages as an input, thus
    completely ignoring the limit.
    
    And for compress-force case, we could hit incompressible data that
    compressed size would go beyond the page limit, and cause the above
    crash.
    
    [FIX]
    Save @out_pages as @max_nr_page, and pass it to lzo_compress_pages(),
    and check if we're beyond the limit before accessing the pages.
    
    Note: this also fixes crash on 32bit architectures that was suspected to
    be caused by merge of btrfs patches to 5.16-rc1. Reported in
    https://lore.kernel.org/all/20211104115001.GU20319@twin.jikos.cz/ .
    
    Reported-by: Omar Sandoval <osandov@fb.com>
    Fixes: d408880 ("btrfs: subpage: make lzo_compress_pages() compatible")
    Reviewed-by: Omar Sandoval <osandov@fb.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    [ add note ]
    Signed-off-by: David Sterba <dsterba@suse.com>
    adam900710 authored and kdave committed Nov 16, 2021
    Copy the full SHA
    6f019c0 View commit details
    Browse the repository at this point in the history
  24. btrfs: fix a out-of-boundary access for copy_compressed_data_to_page()

    [BUG]
    The following script can cause btrfs to crash:
    
     mount -o compress-force=lzo $DEV /mnt
     dd if=/dev/urandom of=/mnt/foo bs=4k count=1
     sync
    
    The calltrace looks like this:
    
     general protection fault, probably for non-canonical address 0xe04b37fccce3b000: 0000 [#1] PREEMPT SMP NOPTI
     CPU: 5 PID: 164 Comm: kworker/u20:3 Not tainted 5.15.0-rc7-custom+ #4
     Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
     RIP: 0010:__memcpy+0x12/0x20
     Call Trace:
      lzo_compress_pages+0x236/0x540 [btrfs]
      btrfs_compress_pages+0xaa/0xf0 [btrfs]
      compress_file_range+0x431/0x8e0 [btrfs]
      async_cow_start+0x12/0x30 [btrfs]
      btrfs_work_helper+0xf6/0x3e0 [btrfs]
      process_one_work+0x294/0x5d0
      worker_thread+0x55/0x3c0
      kthread+0x140/0x170
      ret_from_fork+0x22/0x30
     ---[ end trace 63c3c0f131e61982 ]---
    
    [CAUSE]
    In lzo_compress_pages(), parameter @out_pages is not only an output
    parameter (for the number of compressed pages), but also an input
    parameter, as the upper limit of compressed pages we can utilize.
    
    In commit d408880 ("btrfs: subpage: make lzo_compress_pages()
    compatible"), the refactor doesn't take @out_pages as an input, thus
    completely ignoring the limit.
    
    And for compress-force case, we could hit incompressible data that
    compressed size would go beyond the page limit, and cause above crash.
    
    [FIX]
    Save @out_pages as @max_nr_page, and pass it to lzo_compress_pages(),
    and check if we're beyond the limit before accessing the pages.
    
    Reported-by: Omar Sandoval <osandov@fb.com>
    Fixes: d408880 ("btrfs: subpage: make lzo_compress_pages() compatible")
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Reviewed-by: Omar Sandoval <osandov@fb.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    adam900710 authored and kdave committed Nov 16, 2021
    Copy the full SHA
    bf9cda0 View commit details
    Browse the repository at this point in the history
Older