Skip to content
Permalink
Ian-Kent/vfs-c…
Switch branches/tags

Commits on Nov 11, 2021

  1. xfs: make sure link path does not go away at access

    When following a trailing symlink in rcu-walk mode it's possible to
    succeed in getting the ->get_link() method pointer but the link path
    string be deallocated while it's being used.
    
    Utilize the rcu mechanism to mitigate this risk.
    
    Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
    Signed-off-by: Ian Kent <raven@themaw.net>
    raven-au authored and intel-lab-lkp committed Nov 11, 2021
  2. vfs: check dentry is still valid in get_link()

    When following a trailing symlink in rcu-walk mode it's possible for
    the dentry to become invalid between the last dentry seq lock check
    and getting the link (eg. an unlink) leading to a backtrace similar
    to this:
    
    crash> bt
    PID: 10964  TASK: ffff951c8aa92f80  CPU: 3   COMMAND: "TaniumCX"
    …
     torvalds#7 [ffffae44d0a6fbe0] page_fault at ffffffff8d6010fe
        [exception RIP: unknown or invalid address]
        RIP: 0000000000000000  RSP: ffffae44d0a6fc90  RFLAGS: 00010246
        RAX: ffffffff8da3cc80  RBX: ffffae44d0a6fd30  RCX: 0000000000000000
        RDX: ffffae44d0a6fd98  RSI: ffff951aa9af3008  RDI: 0000000000000000
        RBP: 0000000000000000   R8: ffffae44d0a6fb94   R9: 0000000000000000
        R10: ffff951c95d8c318  R11: 0000000000080000  R12: ffffae44d0a6fd98
        R13: ffff951aa9af3008  R14: ffff951c8c9eb840  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     torvalds#8 [ffffae44d0a6fc90] trailing_symlink at ffffffff8cf24e61
     torvalds#9 [ffffae44d0a6fcc8] path_lookupat at ffffffff8cf261d1
    torvalds#10 [ffffae44d0a6fd28] filename_lookup at ffffffff8cf2a700
    torvalds#11 [ffffae44d0a6fe40] vfs_statx at ffffffff8cf1dbc4
    torvalds#12 [ffffae44d0a6fe98] __do_sys_newstat at ffffffff8cf1e1f9
    torvalds#13 [ffffae44d0a6ff38] do_syscall_64 at ffffffff8cc0420b
    
    Most of the time this is not a problem because the inode is unchanged
    while the rcu read lock is held.
    
    But xfs can re-use inodes which can result in the inode ->get_link()
    method becoming invalid (or NULL).
    
    This case needs to be checked for in fs/namei.c:get_link() and if
    detected the walk re-started.
    
    Signed-off-by: Ian Kent <raven@themaw.net>
    raven-au authored and intel-lab-lkp committed Nov 11, 2021

Commits on Oct 30, 2021

  1. xfs: use swap() to make code cleaner

    Use swap() in order to make code cleaner. Issue found by coccinelle.
    
    Reported-by: Zeal Robot <zealci@zte.com.cn>
    Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Changcheng Deng authored and Darrick J. Wong committed Oct 30, 2021
  2. xfs: Remove duplicated include in xfs_super

    Fix following checkincludes.pl warning:
    ./fs/xfs/xfs_super.c: xfs_btree.h is included more than once.
    
    The include is in line 15. Remove the duplicated here.
    
    Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Wan Jiabing authored and Darrick J. Wong committed Oct 30, 2021

Commits on Oct 22, 2021

  1. xfs: punch out data fork delalloc blocks on COW writeback failure

    If writeback I/O to a COW extent fails, the COW fork blocks are
    punched out and the data fork blocks left alone. It is possible for
    COW fork blocks to overlap non-shared data fork blocks (due to
    cowextsz hint prealloc), however, and writeback unconditionally maps
    to the COW fork whenever blocks exist at the corresponding offset of
    the page undergoing writeback. This means it's quite possible for a
    COW fork extent to overlap delalloc data fork blocks, writeback to
    convert and map to the COW fork blocks, writeback to fail, and
    finally for ioend completion to cancel the COW fork blocks and leave
    stale data fork delalloc blocks around in the inode. The blocks are
    effectively stale because writeback failure also discards dirty page
    state.
    
    If this occurs, it is likely to trigger assert failures, free space
    accounting corruption and failures in unrelated file operations. For
    example, a subsequent reflink attempt of the affected file to a new
    target file will trip over the stale delalloc in the source file and
    fail. Several of these issues are occasionally reproduced by
    generic/648, but are reproducible on demand with the right sequence
    of operations and timely I/O error injection.
    
    To fix this problem, update the ioend failure path to also punch out
    underlying data fork delalloc blocks on I/O error. This is analogous
    to the writeback submission failure path in xfs_discard_page() where
    we might fail to map data fork delalloc blocks and consistent with
    the successful COW writeback completion path, which is responsible
    for unmapping from the data fork and remapping in COW fork blocks.
    
    Fixes: 787eb48 ("xfs: fix and streamline error handling in xfs_end_io")
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Oct 22, 2021
  2. xfs: remove unused parameter from refcount code

    The owner info parameter is always NULL, so get rid of the parameter.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021
  3. xfs: reduce the size of struct xfs_extent_free_item

    We only use EFIs to free metadata blocks -- not regular data/attr fork
    extents.  Remove all the fields that we never use, for a net reduction
    of 16 bytes.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021
  4. xfs: rename xfs_bmap_add_free to xfs_free_extent_later

    xfs_bmap_add_free isn't a block mapping function; it schedules deferred
    freeing operations for a later point in a compound transaction chain.
    While it's primarily used by bunmapi, its use has expanded beyond that.
    Move it to xfs_alloc.c and rename the function since it's now general
    freeing functionality.  Bring the slab cache bits in line with the
    way we handle the other intent items.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021
  5. xfs: create slab caches for frequently-used deferred items

    Create slab caches for the high-level structures that coordinate
    deferred intent items, since they're used fairly heavily.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021
  6. xfs: compact deferred intent item structures

    Rearrange these structs to reduce the amount of unused padding bytes.
    This saves eight bytes for each of the three structs changed here, which
    means they're now all (rmap/bmap are 64 bytes, refc is 32 bytes) even
    powers of two.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021
  7. xfs: rename _zone variables to _cache

    Now that we've gotten rid of the kmem_zone_t typedef, rename the
    variables to _cache since that's what they are.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021
  8. xfs: remove kmem_zone typedef

    Remove these typedefs by referencing kmem_cache directly.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Darrick J. Wong committed Oct 22, 2021

Commits on Oct 19, 2021

  1. xfs: use separate btree cursor cache for each btree type

    Now that we have the infrastructure to track the max possible height of
    each btree type, we can create a separate slab cache for cursors of each
    type of btree.  For smaller indices like the free space btrees, this
    means that we can pack more cursors into a slab page, improving slab
    utilization.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  2. xfs: compute absolute maximum nlevels for each btree type

    Add code for all five btree types so that we can compute the absolute
    maximum possible btree height for each btree type.  This is a setup for
    the next patch, which makes every btree type have its own cursor cache.
    
    The functions are exported so that we can have xfs_db report the
    absolute maximum btree heights for each btree type, rather than making
    everyone run their own ad-hoc computations.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  3. xfs: kill XFS_BTREE_MAXLEVELS

    Nobody uses this symbol anymore, so kill it.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  4. xfs: compute the maximum height of the rmap btree when reflink enabled

    Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
    enough to handle the maximally tall rmap btree when all blocks are in
    use and maximally shared, let's compute the maximum height assuming the
    rmapbt consumes as many blocks as possible.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  5. xfs: clean up xfs_btree_{calc_size,compute_maxlevels}

    During review of the next patch, Dave remarked that he found these two
    btree geometry calculation functions lacking in documentation and that
    they performed more work than was really necessary.
    
    These functions take the same parameters and have nearly the same logic;
    the only real difference is in the return values.  Reword the function
    comment to make it clearer what each function does, and move them to be
    adjacent to reinforce their relation.
    
    Clean up both of them to stop opencoding the howmany functions, stop
    using the uint typedefs, and make them both support computations for
    more than 2^32 leaf records, since we're going to need all of the above
    for files with large data forks and large rmap btrees.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  6. xfs: compute maximum AG btree height for critical reservation calcula…

    …tion
    
    Compute the actual maximum AG btree height for deciding if a per-AG
    block reservation is critically low.  This only affects the sanity check
    condition, since we /generally/ will trigger on the 10% threshold.  This
    is a long-winded way of saying that we're removing one more usage of
    XFS_BTREE_MAXLEVELS.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  7. xfs: rename m_ag_maxlevels to m_allocbt_maxlevels

    Years ago when XFS was thought to be much more simple, we introduced
    m_ag_maxlevels to specify the maximum btree height of per-AG btrees for
    a given filesystem mount.  Then we observed that inode btrees don't
    actually have the same height and split that off; and now we have rmap
    and refcount btrees with much different geometries and separate
    maxlevels variables.
    
    The 'ag' part of the name doesn't make much sense anymore, so rename
    this to m_alloc_maxlevels to reinforce that this is the maximum height
    of the *free space* btrees.  This sets us up for the next patch, which
    will add a variable to track the maximum height of all AG btrees.
    
    (Also take the opportunity to improve adjacent comments and fix minor
    style problems.)
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  8. xfs: dynamically allocate cursors based on maxlevels

    To support future btree code, we need to be able to size btree cursors
    dynamically for very large btrees.  Switch the maxlevels computation to
    use the precomputed values in the superblock, and create cursors that
    can handle a certain height.  For now, we retain the btree cursor cache
    that can handle up to 9-level btrees, though a subsequent patch
    introduces separate caches for each btree type, where each cache's
    objects will be exactly tall enough to handle the specific btree type.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  9. xfs: encode the max btree height in the cursor

    Encode the maximum btree height in the cursor, since we're soon going to
    allow smaller cursors for AG btrees and larger cursors for file btrees.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  10. xfs: refactor btree cursor allocation function

    Refactor btree allocation to a common helper.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  11. xfs: rearrange xfs_btree_cur fields for better packing

    Reduce the size of the btree cursor structure some more by rearranging
    fields to eliminate unused space.  While we're at it, fix the ragged
    indentation and a spelling error.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  12. xfs: prepare xfs_btree_cur for dynamic cursor heights

    Split out the btree level information into a separate struct and put it
    at the end of the cursor structure as a VLA.  Files with huge data forks
    (and in the future, the realtime rmap btree) will require the ability to
    support many more levels than a per-AG btree cursor, which means that
    we're going to create per-btree type cursor caches to conserve memory
    for the more common case.
    
    Note that a subsequent patch actually introduces dynamic cursor heights.
    This one merely rearranges the structure to prepare for that.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  13. xfs: dynamically allocate btree scrub context structure

    Reorganize struct xchk_btree so that we can dynamically size the context
    structure to fit the type of btree cursor that we have.  This will
    enable us to use memory more efficiently once we start adding very tall
    btree types.  Right-size the lastkey array to match the number of *node*
    levels in the tree so that we stop wasting space.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  14. xfs: don't track firstrec/firstkey separately in xchk_btree

    The btree scrubbing code checks that the records (or keys) that it finds
    in a btree block are all in order by calling the btree cursor's
    ->recs_inorder function.  This of course makes no sense for the first
    item in the block, so we switch that off with a separate variable in
    struct xchk_btree.
    
    Christoph helped me figure out that the variable is unnecessary, since
    we just accessed bc_ptrs[level] and can compare that against zero.  Use
    that, and save ourselves some memory space.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  15. xfs: reduce the size of nr_ops for refcount btree cursors

    We're never going to run more than 4 billion btree operations on a
    refcount cursor, so shrink the field to an unsigned int to reduce the
    structure size.  Fix whitespace alignment too.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  16. xfs: remove xfs_btree_cur.bc_blocklog

    This field isn't used by anyone, so get rid of it.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  17. xfs: fix incorrect decoding in xchk_btree_cur_fsbno

    During review of subsequent patches, Dave and I noticed that this
    function doesn't work quite right -- accessing cur->bc_ino depends on
    the ROOT_IN_INODE flag, not LONG_PTRS.  Fix that and the parentheses
    isssue.  While we're at it, remove the piece that accesses cur->bc_ag,
    because block 0 of an AG is never part of a btree.
    
    Note: This changes the btree scrubber tracepoints behavior -- if the
    cursor has no buffer for a certain level, it will always report
    NULLFSBLOCK.  It is assumed that anyone tracing the online fsck code
    will also be tracing xchk_start/xchk_done or otherwise be aware of what
    exactly is being scrubbed.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Darrick J. Wong committed Oct 19, 2021
  18. xfs: fix perag reference leak on iteration race with growfs

    The for_each_perag*() set of macros are hacky in that some (i.e.
    those based on sb_agcount) rely on the assumption that perag
    iteration terminates naturally with a NULL perag at the specified
    end_agno. Others allow for the final AG to have a valid perag and
    require the calling function to clean up any potential leftover
    xfs_perag reference on termination of the loop.
    
    Aside from providing a subtly inconsistent interface, the former
    variant is racy with growfs because growfs can create discoverable
    post-eofs perags before the final superblock update that completes
    the grow operation and increases sb_agcount. This leads to the
    following assert failure (reproduced by xfs/104) in the perag free
    path during unmount:
    
     XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195
    
    This occurs because one of the many for_each_perag() loops in the
    code that is expected to terminate with a NULL pag (and thus has no
    post-loop xfs_perag_put() check) raced with a growfs and found a
    non-NULL post-EOFS perag, but terminated naturally based on the
    end_agno check without releasing the post-EOFS perag.
    
    Rework the iteration logic to lift the agno check from the main for
    loop conditional to the iteration helper function. The for loop now
    purely terminates on a NULL pag and xfs_perag_next() avoids taking a
    reference to any perag beyond end_agno in the first place.
    
    Fixes: f250eed ("xfs: make for_each_perag... a first class citizen")
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Oct 19, 2021
  19. xfs: terminate perag iteration reliably on agcount

    The for_each_perag_from() iteration macro relies on sb_agcount to
    process every perag currently within EOFS from a given starting
    point. It's perfectly valid to have perag structures beyond
    sb_agcount, however, such as if a growfs is in progress. If a perag
    loop happens to race with growfs in this manner, it will actually
    attempt to process the post-EOFS perag where ->pag_agno ==
    sb_agcount. This is reproduced by xfs/104 and manifests as the
    following assert failure in superblock write verifier context:
    
     XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
    
    Update the corresponding macro to only process perags that are
    within the current sb_agcount.
    
    Fixes: 58d43a7 ("xfs: pass perags around in fsmap data dev functions")
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Oct 19, 2021
  20. xfs: rename the next_agno perag iteration variable

    Rename the next_agno variable to be consistent across the several
    iteration macros and shorten line length.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Oct 19, 2021
  21. xfs: fold perag loop iteration logic into helper function

    Fold the loop iteration logic into a helper in preparation for
    further fixups. No functional change in this patch.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Oct 19, 2021
  22. xfs: replace snprintf in show functions with sysfs_emit

    coccicheck complains about the use of snprintf() in sysfs show functions.
    
    Fix the coccicheck warning:
    WARNING: use scnprintf or sprintf.
    
    Use sysfs_emit instead of scnprintf or sprintf makes more sense.
    
    Signed-off-by: Qing Wang <wangqing@vivo.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Qing Wang authored and Darrick J. Wong committed Oct 19, 2021

Commits on Oct 14, 2021

  1. xfs: remove the xfs_dqblk_t typedef

    Remove the few leftover instances of the xfs_dinode_t typedef.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Oct 14, 2021
Older