Skip to content
Permalink
Anand-Jain/btr…

Commits on Mar 29, 2016

  1. btrfs: check device for critical errors and mark failed

    Write and Flush errors are considered as critical errors,
    upon which the device will be brought offline and marked as
    failed. Write and Flush errors are identified using device
    error statistics.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    
    btrfs: check for failed device and hot replace
    
    This patch creates casualty_kthread to check for the failed
    devices, and triggers device replace.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  2. btrfs: introduce helper functions to perform hot replace

    Hot replace / auto replace is important volume manager feature
    and is critical to the data center operations, so that the degraded
    volume can be brought back to a healthy state at the earliest and
    without manual intervention.
    
    This modifies the existing replace code to suite the need of auto
    replace, in the long run I hope both the codes to be merged.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  3. btrfs: provide framework to get and put a spare device

    This adds functions to get and put a spare device from the list.
    So that hot repace code can pick a spare device when needed.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  4. btrfs: support btrfs dev scan for spare device

    When the user or system calls the BTRFS_IOC_SCAN_DEV,
    ioctl this patch will make sure it is added to the device
    list and set it as spare.
    
    This operation will be same when BTRFS_IOC_DEVICES_READY
    as well since BTRFS_IOC_DEVICES_READY ioctl has been doing
    that by legacy.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  5. btrfs: add check not to mount a spare device

    Spare devices can be scanned but shouldn't be mountable.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  6. btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV

    Add BTRFS_FEATURE_INCOMPAT_SPARE_DEV (400) flag to identify
    a spare device.
    
    Along with this it checks in the mount context that a spare
    device will fail to mount.  As spare devices aren't mountable.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  7. btrfs: introduce device dynamic state transition to offline or failed

    Need device forced offline/failed feature for the following reasons,
    1) a. it can be reported that device has failed when it does
       b. close the device when it goes offline so that blocklayer can
          cleanup
    2) identify the candidate for the auto replace
    3) avoid further commit error reported against the failing device and
    4) a device in the multi device btrfs may go offline from the system
       (but as of now in in some system config btrfs gets unmounted in this
        context, which is not a correct behavior)
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    asj authored and fengguang committed Mar 29, 2016
  8. btrfs: Cleanup num_tolerated_disk_barrier_failures

    As we use per-chunk degradable check, now the global
    num_tolerated_disk_barrier_failures is of no use. So cleanup it.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    
    [Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures']
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    Qu Wenruo authored and fengguang committed Mar 29, 2016
  9. btrfs: Allow barrier_all_devices to do per-chunk device check

    The last user of num_tolerated_disk_barrier_failures is
    barrier_all_devices(). But it's can be easily changed to new per-chunk
    degradable check framework.
    
    Now btrfs_device will have two extra members, representing send/wait
    error, set at write_dev_flush() time. And then check it in a similar but
    more accurate behavior than old code.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Mar 29, 2016
  10. btrfs: Do per-chunk degraded check for remount

    Just the same for mount time check, use new btrfs_check_degraded() to do
    per chunk check.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    
    Btrfs: use btrfs_error instead of btrfs_err during remount
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    Qu Wenruo authored and fengguang committed Mar 29, 2016
  11. btrfs: Do per-chunk check for mount time check

    Now use the btrfs_check_degraded() to do mount time degraded check.
    
    With this patch, now we can mount with the following case:
     # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
     # wipefs -a /dev/sdc
     # mount /dev/sdb /mnt/btrfs -o degraded
     As the single data chunk is only in sdb, so it's OK to mount as degraded,
     as missing one device is OK for RAID1.
    
    But still fail with the following case as expected:
     # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
     # wipefs -a /dev/sdb
     # mount /dev/sdc /mnt/btrfs -o degraded
     As the data chunk is only in sdb, so it's not OK to mount it as degraded.
    
    Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
    Reported-by: Anand Jain <anand.jain@oracle.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    
    [Btrfs: use btrfs_error instead of btrfs_err during mount]
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    Qu Wenruo authored and fengguang committed Mar 29, 2016
  12. btrfs: Introduce a new function to check if all chunks a OK for degra…

    …ded mount
    
    Introduce a new function, btrfs_check_degradable(), to judge if all chunks
    in btrfs is OK for degraded mount.
    
    It provides the new basis for accurate btrfs mount/remount and even
    runtime degraded mount check other than old one-size-fit-all method.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Mar 29, 2016

Commits on Mar 14, 2016

  1. btrfs: Fix misspellings in comments.

    Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    adambuchbinder authored and kdave committed Mar 14, 2016
  2. btrfs: Print Warning only if ENOSPC_DEBUG is enabled

    Dont print warning for ENOSPC error unless ENOSPC_DEBUG is enabled. Use
    btrfs_debug if it is enabled.
    
    Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
    [ preserve the WARN_ON ]
    Signed-off-by: David Sterba <dsterba@suse.com>
    Ashish Samant authored and kdave committed Mar 14, 2016

Commits on Mar 11, 2016

  1. btrfs: scrub: silence an uninitialized variable warning

    It's basically harmless if "ref_level" isn't initialized since it's only
    used for an error message, but it causes a static checker warning.
    
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    error27 authored and kdave committed Mar 11, 2016
  2. btrfs: move btrfs_compression_type to compression.h

    So that its better organized.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    asj authored and kdave committed Mar 11, 2016
  3. btrfs: rename btrfs_print_info to btrfs_print_mod_info

    So that it indicates what it does.
    
    Signed-off-by: Anand Jain <anand.jain@oracle.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    asj authored and kdave committed Mar 11, 2016
  4. Btrfs: Show a warning message if one of objectid reaches its highest …

    …value
    
    It's better to show a warning message for the exceptional case
    that one of objectid (in most case, inode number) reaches its
    highest value. For example, if inode cache is off and this event
    happens, we can't create any file even if there are not so many files.
    This message ease detecting such problem.
    
    Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Satoru Takeuchi authored and kdave committed Mar 11, 2016
  5. Documentation: btrfs: remove usage specific information

    The document in the kernel sources is yet another palce where the
    documentation would need to be updated, while it is not the primary
    source. We actively maintain the wiki pages.
    
    Signed-off-by: David Sterba <dsterba@suse.com>
    kdave committed Mar 11, 2016
  6. btrfs: use kbasename in btrfsic_mount

    This is more readable.
    
    Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Reviewed-by Andy Shevchenko <andy.shevchenko@gmail.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Villemoes authored and kdave committed Mar 11, 2016

Commits on Mar 1, 2016

  1. Btrfs: do not collect ordered extents when logging that inode exists

    When logging that an inode exists, for example as part of a directory
    fsync operation, we were collecting any ordered extents for the inode but
    we ended up doing nothing with them except tagging them as processed, by
    setting the flag BTRFS_ORDERED_LOGGED on them, which prevented a
    subsequent fsync of that inode (using the LOG_INODE_ALL mode) from
    collecting and processing them. This created a time window where a second
    fsync against the inode, using the fast path, ended up not logging the
    checksums for the new extents but it logged the extents since they were
    part of the list of modified extents. This happened because the ordered
    extents were not collected and checksums were not yet added to the csum
    tree - the ordered extents have not gone through btrfs_finish_ordered_io()
    yet (which is where we add them to the csum tree by calling
    inode.c:add_pending_csums()).
    
    So fix this by not collecting an inode's ordered extents if we are logging
    it with the LOG_INODE_EXISTS mode.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  2. Btrfs: fix race when checking if we can skip fsync'ing an inode

    If we're about to do a fast fsync for an inode and btrfs_inode_in_log()
    returns false, it's possible that we had an ordered extent in progress
    (btrfs_finish_ordered_io() not run yet) when we noticed that the inode's
    last_trans field was not greater than the id of the last committed
    transaction, but shortly after, before we checked if there were any
    ongoing ordered extents, the ordered extent had just completed and
    removed itself from the inode's ordered tree, in which case we end up not
    logging the inode, losing some data if a power failure or crash happens
    after the fsync handler returns and before the transaction is committed.
    
    Fix this by checking first if there are any ongoing ordered extents
    before comparing the inode's last_trans with the id of the last committed
    transaction - when it completes, an ordered extent always updates the
    inode's last_trans before it removes itself from the inode's ordered
    tree (at btrfs_finish_ordered_io()).
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  3. Btrfs: fix listxattrs not listing all xattrs packed in the same item

    In the listxattrs handler, we were not listing all the xattrs that are
    packed in the same btree item, which happens when multiple xattrs have
    a name that when crc32c hashed produce the same checksum value.
    
    Fix this by processing them all.
    
    The following test case for xfstests reproduces the issue:
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          cd /
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
      . ./common/attr
    
      # real QA test starts here
      _supported_fs generic
      _supported_os Linux
      _require_scratch
      _require_attrs
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _scratch_mount
    
      # Create our test file with a few xattrs. The first 3 xattrs have a name
      # that when given as input to a crc32c function result in the same checksum.
      # This made btrfs list only one of the xattrs through listxattrs system call
      # (because it packs xattrs with the same name checksum into the same btree
      # item).
      touch $SCRATCH_MNT/testfile
      $SETFATTR_PROG -n user.foobar -v 123 $SCRATCH_MNT/testfile
      $SETFATTR_PROG -n user.WvG1c1Td -v qwerty $SCRATCH_MNT/testfile
      $SETFATTR_PROG -n user.J3__T_Km3dVsW_ -v hello $SCRATCH_MNT/testfile
      $SETFATTR_PROG -n user.something -v pizza $SCRATCH_MNT/testfile
      $SETFATTR_PROG -n user.ping -v pong $SCRATCH_MNT/testfile
    
      # Now call getfattr with --dump, which calls the listxattrs system call.
      # It should list all the xattrs we have set before.
      $GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/testfile | _filter_scratch
    
      status=0
      exit
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  4. Btrfs: fix deadlock between direct IO reads and buffered writes

    While running a test with a mix of buffered IO and direct IO against
    the same files I hit a deadlock reported by the following trace:
    
    [11642.140352] INFO: task kworker/u32:3:15282 blocked for more than 120 seconds.
    [11642.142452]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
    [11642.143982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [11642.146332] kworker/u32:3   D ffff880230ef7988 [11642.147737] systemd-journald[571]: Sent WATCHDOG=1 notification.
    [11642.149771]     0 15282      2 0x00000000
    [11642.151205] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
    [11642.154074]  ffff880230ef7988 0000000000000246 0000000000014ec0 ffff88023ec94ec0
    [11642.156722]  ffff880233fe8f80 ffff880230ef8000 ffff88023ec94ec0 7fffffffffffffff
    [11642.159205]  0000000000000002 ffffffff8147b7f9 ffff880230ef79a0 ffffffff8147b541
    [11642.161403] Call Trace:
    [11642.162129]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
    [11642.163396]  [<ffffffff8147b541>] schedule+0x82/0x9a
    [11642.164871]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
    [11642.167020]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
    [11642.167931]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
    [11642.182320]  [<ffffffff8108affa>] ? trace_hardirqs_on+0xd/0xf
    [11642.183762]  [<ffffffff810b079b>] ? timekeeping_get_ns+0xe/0x33
    [11642.185308]  [<ffffffff810b0f61>] ? ktime_get+0x41/0x52
    [11642.186782]  [<ffffffff8147ac08>] io_schedule_timeout+0xa0/0x102
    [11642.188217]  [<ffffffff8147ac08>] ? io_schedule_timeout+0xa0/0x102
    [11642.189626]  [<ffffffff8147b814>] bit_wait_io+0x1b/0x39
    [11642.190803]  [<ffffffff8147bb21>] __wait_on_bit_lock+0x4c/0x90
    [11642.192158]  [<ffffffff8111829f>] __lock_page+0x66/0x68
    [11642.193379]  [<ffffffff81082f29>] ? autoremove_wake_function+0x3a/0x3a
    [11642.194831]  [<ffffffffa0450ddd>] lock_page+0x31/0x34 [btrfs]
    [11642.197068]  [<ffffffffa0454e3b>] extent_write_cache_pages.isra.19.constprop.35+0x1af/0x2f4 [btrfs]
    [11642.199188]  [<ffffffffa0455373>] extent_writepages+0x4b/0x5c [btrfs]
    [11642.200723]  [<ffffffffa043c913>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
    [11642.202465]  [<ffffffffa043aa82>] btrfs_writepages+0x28/0x2a [btrfs]
    [11642.203836]  [<ffffffff811236bc>] do_writepages+0x23/0x2c
    [11642.205624]  [<ffffffff811198c9>] __filemap_fdatawrite_range+0x5a/0x61
    [11642.207057]  [<ffffffff81119946>] filemap_fdatawrite_range+0x13/0x15
    [11642.208529]  [<ffffffffa044f87e>] btrfs_start_ordered_extent+0xd0/0x1a1 [btrfs]
    [11642.210375]  [<ffffffffa0462613>] ? btrfs_scrubparity_helper+0x140/0x33a [btrfs]
    [11642.212132]  [<ffffffffa044f974>] btrfs_run_ordered_extent_work+0x25/0x34 [btrfs]
    [11642.213837]  [<ffffffffa046262f>] btrfs_scrubparity_helper+0x15c/0x33a [btrfs]
    [11642.215457]  [<ffffffffa046293b>] btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
    [11642.217095]  [<ffffffff8106483e>] process_one_work+0x256/0x48b
    [11642.218324]  [<ffffffff81064f20>] worker_thread+0x1f5/0x2a7
    [11642.219466]  [<ffffffff81064d2b>] ? rescuer_thread+0x289/0x289
    [11642.220801]  [<ffffffff8106a500>] kthread+0xd4/0xdc
    [11642.222032]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
    [11642.223190]  [<ffffffff8147fdef>] ret_from_fork+0x3f/0x70
    [11642.224394]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
    [11642.226295] 2 locks held by kworker/u32:3/15282:
    [11642.227273]  #0:  ("%s-%s""btrfs", name){++++.+}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
    [11642.229412]  #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
    [11642.231414] INFO: task kworker/u32:8:15289 blocked for more than 120 seconds.
    [11642.232872]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
    [11642.234109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [11642.235776] kworker/u32:8   D ffff88020de5f848     0 15289      2 0x00000000
    [11642.237412] Workqueue: writeback wb_workfn (flush-btrfs-481)
    [11642.238670]  ffff88020de5f848 0000000000000246 0000000000014ec0 ffff88023ed54ec0
    [11642.240475]  ffff88021b1ece40 ffff88020de60000 ffff88023ed54ec0 7fffffffffffffff
    [11642.242154]  0000000000000002 ffffffff8147b7f9 ffff88020de5f860 ffffffff8147b541
    [11642.243715] Call Trace:
    [11642.244390]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
    [11642.245432]  [<ffffffff8147b541>] schedule+0x82/0x9a
    [11642.246392]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
    [11642.247479]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
    [11642.248551]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
    [11642.249968]  [<ffffffff8108affa>] ? trace_hardirqs_on+0xd/0xf
    [11642.251043]  [<ffffffff810b079b>] ? timekeeping_get_ns+0xe/0x33
    [11642.252202]  [<ffffffff810b0f61>] ? ktime_get+0x41/0x52
    [11642.253210]  [<ffffffff8147ac08>] io_schedule_timeout+0xa0/0x102
    [11642.254307]  [<ffffffff8147ac08>] ? io_schedule_timeout+0xa0/0x102
    [11642.256118]  [<ffffffff8147b814>] bit_wait_io+0x1b/0x39
    [11642.257131]  [<ffffffff8147bb21>] __wait_on_bit_lock+0x4c/0x90
    [11642.258200]  [<ffffffff8111829f>] __lock_page+0x66/0x68
    [11642.259168]  [<ffffffff81082f29>] ? autoremove_wake_function+0x3a/0x3a
    [11642.260516]  [<ffffffffa0450ddd>] lock_page+0x31/0x34 [btrfs]
    [11642.261841]  [<ffffffffa0454e3b>] extent_write_cache_pages.isra.19.constprop.35+0x1af/0x2f4 [btrfs]
    [11642.263531]  [<ffffffffa0455373>] extent_writepages+0x4b/0x5c [btrfs]
    [11642.264747]  [<ffffffffa043c913>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
    [11642.266148]  [<ffffffffa043aa82>] btrfs_writepages+0x28/0x2a [btrfs]
    [11642.267264]  [<ffffffff811236bc>] do_writepages+0x23/0x2c
    [11642.268280]  [<ffffffff81192a2b>] __writeback_single_inode+0xda/0x5ba
    [11642.269407]  [<ffffffff811939f0>] writeback_sb_inodes+0x27b/0x43d
    [11642.270476]  [<ffffffff81193c28>] __writeback_inodes_wb+0x76/0xae
    [11642.271547]  [<ffffffff81193ea6>] wb_writeback+0x19e/0x41c
    [11642.272588]  [<ffffffff81194821>] wb_workfn+0x201/0x341
    [11642.273523]  [<ffffffff81194821>] ? wb_workfn+0x201/0x341
    [11642.274479]  [<ffffffff8106483e>] process_one_work+0x256/0x48b
    [11642.275497]  [<ffffffff81064f20>] worker_thread+0x1f5/0x2a7
    [11642.276518]  [<ffffffff81064d2b>] ? rescuer_thread+0x289/0x289
    [11642.277520]  [<ffffffff81064d2b>] ? rescuer_thread+0x289/0x289
    [11642.278517]  [<ffffffff8106a500>] kthread+0xd4/0xdc
    [11642.279371]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
    [11642.280468]  [<ffffffff8147fdef>] ret_from_fork+0x3f/0x70
    [11642.281607]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
    [11642.282604] 3 locks held by kworker/u32:8/15289:
    [11642.283423]  #0:  ("writeback"){++++.+}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
    [11642.285629]  #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
    [11642.287538]  #2:  (&type->s_umount_key#37){+++++.}, at: [<ffffffff81171217>] trylock_super+0x1b/0x4b
    [11642.289423] INFO: task fdm-stress:26848 blocked for more than 120 seconds.
    [11642.290547]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
    [11642.291453] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [11642.292864] fdm-stress      D ffff88022c107c20     0 26848  26591 0x00000000
    [11642.294118]  ffff88022c107c20 000000038108affa 0000000000014ec0 ffff88023ed54ec0
    [11642.295602]  ffff88013ab1ca40 ffff88022c108000 ffff8800b2fc19d0 00000000000e0fff
    [11642.297098]  ffff8800b2fc19b0 ffff88022c107c88 ffff88022c107c38 ffffffff8147b541
    [11642.298433] Call Trace:
    [11642.298896]  [<ffffffff8147b541>] schedule+0x82/0x9a
    [11642.299738]  [<ffffffffa045225d>] lock_extent_bits+0xfe/0x1a3 [btrfs]
    [11642.300833]  [<ffffffff81082eef>] ? add_wait_queue_exclusive+0x44/0x44
    [11642.301943]  [<ffffffffa0447516>] lock_and_cleanup_extent_if_need+0x68/0x18e [btrfs]
    [11642.303270]  [<ffffffffa04485ba>] __btrfs_buffered_write+0x238/0x4c1 [btrfs]
    [11642.304552]  [<ffffffffa044b50a>] ? btrfs_file_write_iter+0x17c/0x408 [btrfs]
    [11642.305782]  [<ffffffffa044b682>] btrfs_file_write_iter+0x2f4/0x408 [btrfs]
    [11642.306878]  [<ffffffff8116e298>] __vfs_write+0x7c/0xa5
    [11642.307729]  [<ffffffff8116e7d1>] vfs_write+0x9d/0xe8
    [11642.308602]  [<ffffffff8116efbb>] SyS_write+0x50/0x7e
    [11642.309410]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
    [11642.310403] 3 locks held by fdm-stress/26848:
    [11642.311108]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff811877e8>] __fdget_pos+0x3a/0x40
    [11642.312578]  #1:  (sb_writers#11){.+.+.+}, at: [<ffffffff811706ee>] __sb_start_write+0x5f/0xb0
    [11642.314170]  #2:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa044b401>] btrfs_file_write_iter+0x73/0x408 [btrfs]
    [11642.316796] INFO: task fdm-stress:26849 blocked for more than 120 seconds.
    [11642.317842]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
    [11642.318691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [11642.319959] fdm-stress      D ffff8801964ffa68     0 26849  26591 0x00000000
    [11642.321312]  ffff8801964ffa68 00ff8801e9975f80 0000000000014ec0 ffff88023ed94ec0
    [11642.322555]  ffff8800b00b4840 ffff880196500000 ffff8801e9975f20 0000000000000002
    [11642.323715]  ffff8801e9975f18 ffff8800b00b4840 ffff8801964ffa80 ffffffff8147b541
    [11642.325096] Call Trace:
    [11642.325532]  [<ffffffff8147b541>] schedule+0x82/0x9a
    [11642.326303]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
    [11642.327180]  [<ffffffff8108ae40>] ? mark_held_locks+0x5e/0x74
    [11642.328114]  [<ffffffff8147f30e>] ? _raw_spin_unlock_irq+0x2c/0x4a
    [11642.329051]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
    [11642.330053]  [<ffffffff8147bceb>] __wait_for_common+0x109/0x147
    [11642.330952]  [<ffffffff8147bceb>] ? __wait_for_common+0x109/0x147
    [11642.331869]  [<ffffffff8147e7bb>] ? usleep_range+0x4a/0x4a
    [11642.332925]  [<ffffffff81074075>] ? wake_up_q+0x47/0x47
    [11642.333736]  [<ffffffff8147bd4d>] wait_for_completion+0x24/0x26
    [11642.334672]  [<ffffffffa044f5ce>] btrfs_wait_ordered_extents+0x1c8/0x217 [btrfs]
    [11642.335858]  [<ffffffffa0465b5a>] btrfs_mksubvol+0x224/0x45d [btrfs]
    [11642.336854]  [<ffffffff81082eef>] ? add_wait_queue_exclusive+0x44/0x44
    [11642.337820]  [<ffffffffa0465edb>] btrfs_ioctl_snap_create_transid+0x148/0x17a [btrfs]
    [11642.339026]  [<ffffffffa046603b>] btrfs_ioctl_snap_create_v2+0xc7/0x110 [btrfs]
    [11642.340214]  [<ffffffffa0468582>] btrfs_ioctl+0x590/0x27bd [btrfs]
    [11642.341123]  [<ffffffff8147dc00>] ? mutex_unlock+0xe/0x10
    [11642.341934]  [<ffffffffa00fa6e9>] ? ext4_file_write_iter+0x2a3/0x36f [ext4]
    [11642.342936]  [<ffffffff8108895d>] ? __lock_is_held+0x3c/0x57
    [11642.343772]  [<ffffffff81186a1d>] ? rcu_read_unlock+0x3e/0x5d
    [11642.344673]  [<ffffffff8117dc95>] do_vfs_ioctl+0x458/0x4dc
    [11642.346024]  [<ffffffff81186bbe>] ? __fget_light+0x62/0x71
    [11642.346873]  [<ffffffff8117dd70>] SyS_ioctl+0x57/0x79
    [11642.347720]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
    [11642.350222] 4 locks held by fdm-stress/26849:
    [11642.350898]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff811706ee>] __sb_start_write+0x5f/0xb0
    [11642.352375]  #1:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffffa0465981>] btrfs_mksubvol+0x4b/0x45d [btrfs]
    [11642.354072]  #2:  (&fs_info->subvol_sem){++++..}, at: [<ffffffffa0465a2a>] btrfs_mksubvol+0xf4/0x45d [btrfs]
    [11642.355647]  #3:  (&root->ordered_extent_mutex){+.+...}, at: [<ffffffffa044f456>] btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
    [11642.357516] INFO: task fdm-stress:26850 blocked for more than 120 seconds.
    [11642.358508]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
    [11642.359376] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [11642.368625] fdm-stress      D ffff88021f167688     0 26850  26591 0x00000000
    [11642.369716]  ffff88021f167688 0000000000000001 0000000000014ec0 ffff88023edd4ec0
    [11642.370950]  ffff880128a98680 ffff88021f168000 ffff88023edd4ec0 7fffffffffffffff
    [11642.372210]  0000000000000002 ffffffff8147b7f9 ffff88021f1676a0 ffffffff8147b541
    [11642.373430] Call Trace:
    [11642.373853]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
    [11642.374623]  [<ffffffff8147b541>] schedule+0x82/0x9a
    [11642.375948]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
    [11642.376862]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
    [11642.377637]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
    [11642.378610]  [<ffffffff8108affa>] ? trace_hardirqs_on+0xd/0xf
    [11642.379457]  [<ffffffff810b079b>] ? timekeeping_get_ns+0xe/0x33
    [11642.380366]  [<ffffffff810b0f61>] ? ktime_get+0x41/0x52
    [11642.381353]  [<ffffffff8147ac08>] io_schedule_timeout+0xa0/0x102
    [11642.382255]  [<ffffffff8147ac08>] ? io_schedule_timeout+0xa0/0x102
    [11642.383162]  [<ffffffff8147b814>] bit_wait_io+0x1b/0x39
    [11642.383945]  [<ffffffff8147bb21>] __wait_on_bit_lock+0x4c/0x90
    [11642.384875]  [<ffffffff8111829f>] __lock_page+0x66/0x68
    [11642.385749]  [<ffffffff81082f29>] ? autoremove_wake_function+0x3a/0x3a
    [11642.386721]  [<ffffffffa0450ddd>] lock_page+0x31/0x34 [btrfs]
    [11642.387596]  [<ffffffffa0454e3b>] extent_write_cache_pages.isra.19.constprop.35+0x1af/0x2f4 [btrfs]
    [11642.389030]  [<ffffffffa0455373>] extent_writepages+0x4b/0x5c [btrfs]
    [11642.389973]  [<ffffffff810a25ad>] ? rcu_read_lock_sched_held+0x61/0x69
    [11642.390939]  [<ffffffffa043c913>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
    [11642.392271]  [<ffffffffa0451c32>] ? __clear_extent_bit+0x26e/0x2c0 [btrfs]
    [11642.393305]  [<ffffffffa043aa82>] btrfs_writepages+0x28/0x2a [btrfs]
    [11642.394239]  [<ffffffff811236bc>] do_writepages+0x23/0x2c
    [11642.395045]  [<ffffffff811198c9>] __filemap_fdatawrite_range+0x5a/0x61
    [11642.395991]  [<ffffffff81119946>] filemap_fdatawrite_range+0x13/0x15
    [11642.397144]  [<ffffffffa044f87e>] btrfs_start_ordered_extent+0xd0/0x1a1 [btrfs]
    [11642.398392]  [<ffffffffa0452094>] ? clear_extent_bit+0x17/0x19 [btrfs]
    [11642.399363]  [<ffffffffa0445945>] btrfs_get_blocks_direct+0x12b/0x61c [btrfs]
    [11642.400445]  [<ffffffff8119f7a1>] ? dio_bio_add_page+0x3d/0x54
    [11642.401309]  [<ffffffff8119fa93>] ? submit_page_section+0x7b/0x111
    [11642.402213]  [<ffffffff811a0258>] do_blockdev_direct_IO+0x685/0xc24
    [11642.403139]  [<ffffffffa044581a>] ? btrfs_page_exists_in_range+0x1a1/0x1a1 [btrfs]
    [11642.404360]  [<ffffffffa043d267>] ? btrfs_get_extent_fiemap+0x1c0/0x1c0 [btrfs]
    [11642.406187]  [<ffffffff811a0828>] __blockdev_direct_IO+0x31/0x33
    [11642.407070]  [<ffffffff811a0828>] ? __blockdev_direct_IO+0x31/0x33
    [11642.407990]  [<ffffffffa043d267>] ? btrfs_get_extent_fiemap+0x1c0/0x1c0 [btrfs]
    [11642.409192]  [<ffffffffa043b4ca>] btrfs_direct_IO+0x1c7/0x27e [btrfs]
    [11642.410146]  [<ffffffffa043d267>] ? btrfs_get_extent_fiemap+0x1c0/0x1c0 [btrfs]
    [11642.411291]  [<ffffffff81119a2c>] generic_file_read_iter+0x89/0x4e1
    [11642.412263]  [<ffffffff8108ac05>] ? mark_lock+0x24/0x201
    [11642.413057]  [<ffffffff8116e1f8>] __vfs_read+0x79/0x9d
    [11642.413897]  [<ffffffff8116e6f1>] vfs_read+0x8f/0xd2
    [11642.414708]  [<ffffffff8116ef3d>] SyS_read+0x50/0x7e
    [11642.415573]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
    [11642.416572] 1 lock held by fdm-stress/26850:
    [11642.417345]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff811877e8>] __fdget_pos+0x3a/0x40
    [11642.418703] INFO: task fdm-stress:26851 blocked for more than 120 seconds.
    [11642.419698]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
    [11642.420612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [11642.421807] fdm-stress      D ffff880196483d28     0 26851  26591 0x00000000
    [11642.422878]  ffff880196483d28 00ff8801c8f60740 0000000000014ec0 ffff88023ed94ec0
    [11642.424149]  ffff8801c8f60740 ffff880196484000 0000000000000246 ffff8801c8f60740
    [11642.425374]  ffff8801bb711840 ffff8801bb711878 ffff880196483d40 ffffffff8147b541
    [11642.426591] Call Trace:
    [11642.427013]  [<ffffffff8147b541>] schedule+0x82/0x9a
    [11642.427856]  [<ffffffff8147b6d5>] schedule_preempt_disabled+0x18/0x24
    [11642.428852]  [<ffffffff8147c23a>] mutex_lock_nested+0x1d7/0x3b4
    [11642.429743]  [<ffffffffa044f456>] ? btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
    [11642.430911]  [<ffffffffa044f456>] btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
    [11642.432102]  [<ffffffffa044f674>] ? btrfs_wait_ordered_roots+0x57/0x191 [btrfs]
    [11642.433259]  [<ffffffffa044f456>] ? btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
    [11642.434431]  [<ffffffffa044f6ea>] btrfs_wait_ordered_roots+0xcd/0x191 [btrfs]
    [11642.436079]  [<ffffffffa0410cab>] btrfs_sync_fs+0xe0/0x1ad [btrfs]
    [11642.437009]  [<ffffffff81197900>] ? SyS_tee+0x23c/0x23c
    [11642.437860]  [<ffffffff81197920>] sync_fs_one_sb+0x20/0x22
    [11642.438723]  [<ffffffff81171435>] iterate_supers+0x75/0xc2
    [11642.439597]  [<ffffffff81197d00>] sys_sync+0x52/0x80
    [11642.440454]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
    [11642.441533] 3 locks held by fdm-stress/26851:
    [11642.442370]  #0:  (&type->s_umount_key#37){+++++.}, at: [<ffffffff8117141f>] iterate_supers+0x5f/0xc2
    [11642.444043]  #1:  (&fs_info->ordered_operations_mutex){+.+...}, at: [<ffffffffa044f661>] btrfs_wait_ordered_roots+0x44/0x191 [btrfs]
    [11642.446010]  #2:  (&root->ordered_extent_mutex){+.+...}, at: [<ffffffffa044f456>] btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
    
    This happened because under specific timings the path for direct IO reads
    can deadlock with concurrent buffered writes. The diagram below shows how
    this happens for an example file that has the following layout:
    
         [  extent A  ]  [  extent B  ]  [ ....
         0K              4K              8K
    
         CPU 1                                               CPU 2                             CPU 3
    
    DIO read against range
     [0K, 8K[ starts
    
    btrfs_direct_IO()
      --> calls btrfs_get_blocks_direct()
          which finds the extent map for the
          extent A and leaves the range
          [0K, 4K[ locked in the inode's
          io tree
    
                                                       buffered write against
                                                       range [4K, 8K[ starts
    
                                                       __btrfs_buffered_write()
                                                         --> dirties page at 4K
    
                                                                                         a user space
                                                                                         task calls sync
                                                                                         for e.g or
                                                                                         writepages() is
                                                                                         invoked by mm
    
                                                                                         writepages()
                                                                                           run_delalloc_range()
                                                                                             cow_file_range()
                                                                                               --> ordered extent X
                                                                                                   for the buffered
                                                                                                   write is created
                                                                                                   and
                                                                                                   writeback starts
    
      --> calls btrfs_get_blocks_direct()
          again, without submitting first
          a bio for reading extent A, and
          finds the extent map for extent B
    
      --> calls lock_extent_direct()
    
          --> locks range [4K, 8K[
          --> finds ordered extent X
              covering range [4K, 8K[
          --> unlocks range [4K, 8K[
    
                                                      buffered write against
                                                      range [0K, 8K[ starts
    
                                                      __btrfs_buffered_write()
                                                        prepare_pages()
                                                          --> locks pages with
                                                              offsets 0 and 4K
                                                        lock_and_cleanup_extent_if_need()
                                                          --> blocks attempting to
                                                              lock range [0K, 8K[ in
                                                              the inode's io tree,
                                                              because the range [0, 4K[
                                                              is already locked by the
                                                              direct IO task at CPU 1
    
          --> calls
              btrfs_start_ordered_extent(oe X)
    
              btrfs_start_ordered_extent(oe X)
    
                --> At this point writeback for ordered
                    extent X has not finished yet
    
                filemap_fdatawrite_range()
                  btrfs_writepages()
                    extent_writepages()
                      extent_write_cache_pages()
                        --> finds page with offset 0
                            with the writeback tag
                            (and not dirty)
                        --> tries to lock it
                             --> deadlock, task at CPU 2
                                 has the page locked and
                                 is blocked on the io range
                                 [0, 4K[ that was locked
                                 earlier by this task
    
    So fix this by falling back to a buffered read in the direct IO read path
    when an ordered extent for a buffered write is found.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  5. Btrfs: fix extent_same allowing destination offset beyond i_size

    When using the same file as the source and destination for a dedup
    (extent_same ioctl) operation we were allowing it to dedup to a
    destination offset beyond the file's size, which doesn't make sense and
    it's not allowed for the case where the source and destination files are
    not the same file. This made de deduplication operation successful only
    when the source range corresponded to a hole, a prealloc extent or an
    extent with all bytes having a value of 0x00. This was also leaving a
    file hole (between i_size and destination offset) without the
    corresponding file extent items, which can be reproduced with the
    following steps for example:
    
      $ mkfs.btrfs -f /dev/sdi
      $ mount /dev/sdi /mnt/sdi
    
      $ xfs_io -f -c "pwrite -S 0xab 304457 404990" /mnt/sdi/foobar
      wrote 404990/404990 bytes at offset 304457
      395 KiB, 99 ops; 0.0000 sec (31.150 MiB/sec and 7984.5149 ops/sec)
    
      $ /git/hub/duperemove/btrfs-extent-same 24576 /mnt/sdi/foobar 28672 /mnt/sdi/foobar 929792
      Deduping 2 total files
      (28672, 24576): /mnt/sdi/foobar
      (929792, 24576): /mnt/sdi/foobar
      1 files asked to be deduped
      i: 0, status: 0, bytes_deduped: 24576
      24576 total bytes deduped in this operation
    
      $ umount /mnt/sdi
      $ btrfsck /dev/sdi
      Checking filesystem on /dev/sdi
      UUID: 98c528aa-0833-427d-9403-b98032ffbf9d
      checking extents
      checking free space cache
      checking fs roots
      root 5 inode 257 errors 100, file extent discount
      Found file extent holes:
              start: 712704, len: 217088
      found 540673 bytes used err is 1
      total csum bytes: 400
      total tree bytes: 131072
      total fs tree bytes: 32768
      total extent tree bytes: 16384
      btree space waste bytes: 123675
      file data blocks allocated: 671744
        referenced 671744
      btrfs-progs v4.2.3
    
    So fix this by not allowing the destination to go beyond the file's size,
    just as we do for the same where the source and destination files are not
    the same.
    
    A test for xfstests follows.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  6. Btrfs: fix file loss on log replay after renaming a file and fsync

    We have two cases where we end up deleting a file at log replay time
    when we should not. For this to happen the file must have been renamed
    and a directory inode must have been fsynced/logged.
    
    Two examples that exercise these two cases are listed below.
    
      Case 1)
    
      $ mkfs.btrfs -f /dev/sdb
      $ mount /dev/sdb /mnt
      $ mkdir -p /mnt/a/b
      $ mkdir /mnt/c
      $ touch /mnt/a/b/foo
      $ sync
      $ mv /mnt/a/b/foo /mnt/c/
      # Create file bar just to make sure the fsync on directory a/ does
      # something and it's not a no-op.
      $ touch /mnt/a/bar
      $ xfs_io -c "fsync" /mnt/a
      < power fail / crash >
    
      The next time the filesystem is mounted, the log replay procedure
      deletes file foo.
    
      Case 2)
    
      $ mkfs.btrfs -f /dev/sdb
      $ mount /dev/sdb /mnt
      $ mkdir /mnt/a
      $ mkdir /mnt/b
      $ mkdir /mnt/c
      $ touch /mnt/a/foo
      $ ln /mnt/a/foo /mnt/b/foo_link
      $ touch /mnt/b/bar
      $ sync
      $ unlink /mnt/b/foo_link
      $ mv /mnt/b/bar /mnt/c/
      $ xfs_io -c "fsync" /mnt/a/foo
      < power fail / crash >
    
      The next time the filesystem is mounted, the log replay procedure
      deletes file bar.
    
    The reason why the files are deleted is because when we log inodes
    other then the fsync target inode, we ignore their last_unlink_trans
    value and leave the log without enough information to later replay the
    rename operations. So we need to look at the last_unlink_trans values
    and fallback to a transaction commit if they are greater than the
    id of the last committed transaction.
    
    So fix this by looking at the last_unlink_trans values and fallback to
    transaction commits when needed. Also, when logging other inodes (for
    case 1 we logged descendants of the fsync target inode while for case 2
    we logged ascendants) we need to care about concurrent tasks updating
    the last_unlink_trans of inodes we are logging (which was already an
    existing problem in check_parent_dirs_for_sync()). Since we can not
    acquire their inode mutex (vfs' struct inode ->i_mutex), as that causes
    deadlocks with other concurrent operations that acquire the i_mutex of
    2 inodes (other fsyncs or renames for example), we need to serialize on
    the log_mutex of the inode we are logging. A task setting a new value for
    an inode's last_unlink_trans must acquire the inode's log_mutex and it
    must do this update before doing the actual unlink operation (which is
    already the case except when deleting a snapshot). Conversely the task
    logging the inode must first log the inode and then check the inode's
    last_unlink_trans value while holding its log_mutex, as if its value is
    not greater then the id of the last committed transaction it means it
    logged a safe state of the inode's items, while if its value is not
    smaller then the id of the last committed transaction it means the inode
    state it has logged might not be safe (the concurrent task might have
    just updated last_unlink_trans but hasn't done yet the unlink operation)
    and therefore a transaction commit must be done.
    
    Test cases for xfstests follow in separate patches.
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  7. Btrfs: fix unreplayable log after snapshot delete + parent dir fsync

    If we delete a snapshot, fsync its parent directory and crash/power fail
    before the next transaction commit, on the next mount when we attempt to
    replay the log tree of the root containing the parent directory we will
    fail and prevent the filesystem from mounting, which is solvable by wiping
    out the log trees with the btrfs-zero-log tool but very inconvenient as
    we will lose any data and metadata fsynced before the parent directory
    was fsynced.
    
    For example:
    
      $ mkfs.btrfs -f /dev/sdc
      $ mount /dev/sdc /mnt
      $ mkdir /mnt/testdir
      $ btrfs subvolume snapshot /mnt /mnt/testdir/snap
      $ btrfs subvolume delete /mnt/testdir/snap
      $ xfs_io -c "fsync" /mnt/testdir
      < crash / power failure and reboot >
      $ mount /dev/sdc /mnt
      mount: mount(2) failed: No such file or directory
    
    And in dmesg/syslog we get the following message and trace:
    
    [192066.361162] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
    [192066.363010] ------------[ cut here ]------------
    [192066.365268] WARNING: CPU: 4 PID: 5130 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x17a/0x354 [btrfs]()
    [192066.367250] BTRFS: Transaction aborted (error -2)
    [192066.368401] Modules linked in: btrfs dm_flakey dm_mod ppdev sha256_generic xor raid6_pq hmac drbg ansi_cprng aesni_intel acpi_cpufreq tpm_tis aes_x86_64 tpm ablk_helper evdev cryptd sg parport_pc i2c_piix4 psmouse lrw parport i2c_core pcspkr gf128mul processor serio_raw glue_helper button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
    [192066.377154] CPU: 4 PID: 5130 Comm: mount Tainted: G        W       4.4.0-rc6-btrfs-next-20+ #1
    [192066.378875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
    [192066.380889]  0000000000000000 ffff880143923670 ffffffff81257570 ffff8801439236b8
    [192066.382561]  ffff8801439236a8 ffffffff8104ec07 ffffffffa039dc2c 00000000fffffffe
    [192066.384191]  ffff8801ed31d000 ffff8801b9fc9c88 ffff8801086875e0 ffff880143923710
    [192066.385827] Call Trace:
    [192066.386373]  [<ffffffff81257570>] dump_stack+0x4e/0x79
    [192066.387387]  [<ffffffff8104ec07>] warn_slowpath_common+0x99/0xb2
    [192066.388429]  [<ffffffffa039dc2c>] ? __btrfs_unlink_inode+0x17a/0x354 [btrfs]
    [192066.389236]  [<ffffffff8104ec68>] warn_slowpath_fmt+0x48/0x50
    [192066.389884]  [<ffffffffa039dc2c>] __btrfs_unlink_inode+0x17a/0x354 [btrfs]
    [192066.390621]  [<ffffffff81184b55>] ? iput+0xb0/0x266
    [192066.391200]  [<ffffffffa039ea25>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
    [192066.391930]  [<ffffffffa03ca623>] check_item_in_log+0x1fe/0x29b [btrfs]
    [192066.392715]  [<ffffffffa03ca827>] replay_dir_deletes+0x167/0x1cf [btrfs]
    [192066.393510]  [<ffffffffa03cccc7>] replay_one_buffer+0x417/0x570 [btrfs]
    [192066.394241]  [<ffffffffa03ca164>] walk_up_log_tree+0x10e/0x1dc [btrfs]
    [192066.394958]  [<ffffffffa03cac72>] walk_log_tree+0xa5/0x190 [btrfs]
    [192066.395628]  [<ffffffffa03ce8b8>] btrfs_recover_log_trees+0x239/0x32c [btrfs]
    [192066.396790]  [<ffffffffa03cc8b0>] ? replay_one_extent+0x50a/0x50a [btrfs]
    [192066.397891]  [<ffffffffa0394041>] open_ctree+0x1d8b/0x2167 [btrfs]
    [192066.398897]  [<ffffffffa03706e1>] btrfs_mount+0x5ef/0x729 [btrfs]
    [192066.399823]  [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf
    [192066.400739]  [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3
    [192066.401700]  [<ffffffff811714b9>] mount_fs+0x67/0x131
    [192066.402482]  [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde
    [192066.403930]  [<ffffffffa03702bd>] btrfs_mount+0x1cb/0x729 [btrfs]
    [192066.404831]  [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf
    [192066.405726]  [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3
    [192066.406621]  [<ffffffff811714b9>] mount_fs+0x67/0x131
    [192066.407401]  [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde
    [192066.408247]  [<ffffffff8118ae36>] do_mount+0x893/0x9d2
    [192066.409047]  [<ffffffff8113009b>] ? strndup_user+0x3f/0x8c
    [192066.409842]  [<ffffffff8118b187>] SyS_mount+0x75/0xa1
    [192066.410621]  [<ffffffff8147e517>] entry_SYSCALL_64_fastpath+0x12/0x6b
    [192066.411572] ---[ end trace 2de42126c1e0a0f0 ]---
    [192066.412344] BTRFS: error (device dm-0) in __btrfs_unlink_inode:3986: errno=-2 No such entry
    [192066.413748] BTRFS: error (device dm-0) in btrfs_replay_log:2464: errno=-2 No such entry (Failed to recover log tree)
    [192066.415458] BTRFS error (device dm-0): cleaner transaction attach returned -30
    [192066.444613] BTRFS: open_ctree failed
    
    This happens because when we are replaying the log and processing the
    directory entry pointing to the snapshot in the subvolume tree, we treat
    its btrfs_dir_item item as having a location with a key type matching
    BTRFS_INODE_ITEM_KEY, which is wrong because the type matches
    BTRFS_ROOT_ITEM_KEY and therefore must be processed differently, as the
    object id refers to a root number and not to an inode in the root
    containing the parent directory.
    
    So fix this by triggering a transaction commit if an fsync against the
    parent directory is requested after deleting a snapshot. This is the
    simplest approach for a rare use case. Some alternative that avoids the
    transaction commit would require more code to explicitly delete the
    snapshot at log replay time (factoring out common code from ioctl.c:
    btrfs_ioctl_snap_destroy()), special care at fsync time to remove the
    log tree of the snapshot's root from the log root of the root of tree
    roots, amongst other steps.
    
    A test case for xfstests that triggers the issue follows.
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          _cleanup_flakey
          cd /
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
      . ./common/dmflakey
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs btrfs
      _supported_os Linux
      _require_scratch
      _require_dm_target flakey
      _require_metadata_journaling $SCRATCH_DEV
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _init_flakey
      _mount_flakey
    
      # Create a snapshot at the root of our filesystem (mount point path), delete it,
      # fsync the mount point path, crash and mount to replay the log. This should
      # succeed and after the filesystem is mounted the snapshot should not be visible
      # anymore.
      _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1
      _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT
      _flakey_drop_and_remount
      [ -e $SCRATCH_MNT/snap1 ] && \
          echo "Snapshot snap1 still exists after log replay"
    
      # Similar scenario as above, but this time the snapshot is created inside a
      # directory and not directly under the root (mount point path).
      mkdir $SCRATCH_MNT/testdir
      _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/testdir/snap2
      _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap2
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir
      _flakey_drop_and_remount
      [ -e $SCRATCH_MNT/testdir/snap2 ] && \
          echo "Snapshot snap2 still exists after log replay"
    
      _unmount_flakey
    
      echo "Silence is golden"
      status=0
      exit
    
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Tested-by: Liu Bo <bo.li.liu@oracle.com>
    Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
    Signed-off-by: Chris Mason <clm@fb.com>
    fdmanana authored and masoncl committed Mar 1, 2016
  8. Merge tag 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/kdave/linux into for-linus-4.6
    
    Btrfs patchsets for 4.6
    masoncl committed Mar 1, 2016

Commits on Feb 28, 2016

  1. Linux 4.5-rc6

    torvalds committed Feb 28, 2016
  2. Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull perf fixes from Thomas Gleixner:
     "A rather largish series of 12 patches addressing a maze of race
      conditions in the perf core code from Peter Zijlstra"
    
    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf: Robustify task_function_call()
      perf: Fix scaling vs. perf_install_in_context()
      perf: Fix scaling vs. perf_event_enable()
      perf: Fix scaling vs. perf_event_enable_on_exec()
      perf: Fix ctx time tracking by introducing EVENT_TIME
      perf: Cure event->pending_disable race
      perf: Fix race between event install and jump_labels
      perf: Fix cloning
      perf: Only update context time when active
      perf: Allow perf_release() with !event->ctx
      perf: Do not double free
      perf: Close install vs. exit race
    torvalds committed Feb 28, 2016
  3. Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull x86 fixes from Thomas Gleixner:
     "This update contains:
    
       - Hopefully the last ASM CLAC fixups
    
       - A fix for the Quark family related to the IMR lock which makes
         kexec work again
    
       - A off-by-one fix in the MPX code.  Ironic, isn't it?
    
       - A fix for X86_PAE which addresses once more an unsigned long vs
         phys_addr_t hickup"
    
    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/mpx: Fix off-by-one comparison with nr_registers
      x86/mm: Fix slow_virt_to_phys() for X86_PAE again
      x86/entry/compat: Add missing CLAC to entry_INT80_32
      x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32
      x86/platform/intel/quark: Change the kernel's IMR lock bit to false
    torvalds committed Feb 28, 2016
  4. Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull scheduler fixlet from Thomas Gleixner:
     "A trivial printk typo fix"
    
    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/deadline: Fix trivial typo in printk() message
    torvalds committed Feb 28, 2016
  5. Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull irq fixes from Thomas Gleixner:
     "Four small fixes for irqchip drivers:
    
       - Add missing low level irq handler initialization on mxs, so
         interrupts can acutally be delivered
    
       - Add a missing barrier to the GIC driver
    
       - Two fixes for the GIC-V3-ITS driver, addressing a double EOI write
         and a cache flush beyond the actual region"
    
    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      irqchip/gic-v3: Add missing barrier to 32bit version of gic_read_iar()
      irqchip/mxs: Add missing set_handle_irq()
      irqchip/gicv3-its: Avoid cache flush beyond ITS_BASERn memory size
      irqchip/gic-v3-its: Fix double ICC_EOIR write for LPI in EOImode==1
    torvalds committed Feb 28, 2016
  6. Merge tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/gregkh/staging
    
    Pull staging/android fix from Greg KH:
     "Here is one patch, for the android binder driver, to resolve a
      reported problem.  Turns out it has been around for a while (since
      3.15), so it is good to finally get it resolved.
    
      It has been in linux-next for a while with no reported issues"
    
    * tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
      drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
    torvalds committed Feb 28, 2016
  7. Merge tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/gregkh/usb
    
    Pull USB fixes from Greg KH:
     "Here are a few USB fixes for 4.5-rc6
    
      They fix a reported bug for some USB 3 devices by reverting the recent
      patch, a MAINTAINERS change for some drivers, some new device ids, and
      of course, the usual bunch of USB gadget driver fixes.
    
      All have been in linux-next for a while with no reported issues"
    
    * tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
      MAINTAINERS: drop OMAP USB and MUSB maintainership
      usb: musb: fix DMA for host mode
      usb: phy: msm: Trigger USB state detection work in DRD mode
      usb: gadget: net2280: fix endpoint max packet for super speed connections
      usb: gadget: gadgetfs: unregister gadget only if it got successfully registered
      usb: gadget: remove driver from pending list on probe error
      Revert "usb: hub: do not clear BOS field during reset device"
      usb: chipidea: fix return value check in ci_hdrc_pci_probe()
      usb: chipidea: error on overflow for port_test_write
      USB: option: add "4G LTE usb-modem U901"
      USB: cp210x: add IDs for GE B650V3 and B850V3 boards
      USB: option: add support for SIM7100E
      usb: musb: Fix DMA desired mode for Mentor DMA engine
      usb: gadget: fsl_qe_udc: fix IS_ERR_VALUE usage
      usb: dwc2: USB_DWC2 should depend on HAS_DMA
      usb: dwc2: host: fix the data toggle error in full speed descriptor dma
      usb: dwc2: host: fix logical omissions in dwc2_process_non_isoc_desc
      usb: dwc3: Fix assignment of EP transfer resources
      usb: dwc2: Add extra delay when forcing dr_mode
    torvalds committed Feb 28, 2016
Older
You can’t perform that action at this time.