Skip to content
Permalink
Brian-Foster/x…
Switch branches/tags

Commits on Apr 12, 2021

  1. xfs: set aside allocation btree blocks from block reservation

    The blocks used for allocation btrees (bnobt and countbt) are
    technically considered free space. This is because as free space is
    used, allocbt blocks are removed and naturally become available for
    traditional allocation. However, this means that a significant
    portion of free space may consist of in-use btree blocks if free
    space is severely fragmented.
    
    On large filesystems with large perag reservations, this can lead to
    a rare but nasty condition where a significant amount of physical
    free space is available, but the majority of actual usable blocks
    consist of in-use allocbt blocks. We have a record of a (~12TB, 32
    AG) filesystem with multiple AGs in a state with ~2.5GB or so free
    blocks tracked across ~300 total allocbt blocks, but effectively at
    100% full because the the free space is entirely consumed by
    refcountbt perag reservation.
    
    Such a large perag reservation is by design on large filesystems.
    The problem is that because the free space is so fragmented, this AG
    contributes the 300 or so allocbt blocks to the global counters as
    free space. If this pattern repeats across enough AGs, the
    filesystem lands in a state where global block reservation can
    outrun physical block availability. For example, a streaming
    buffered write on the affected filesystem continues to allow delayed
    allocation beyond the point where writeback starts to fail due to
    physical block allocation failures. The expected behavior is for the
    delalloc block reservation to fail gracefully with -ENOSPC before
    physical block allocation failure is a possibility.
    
    To address this problem, introduce an in-core counter to track the
    sum of all allocbt blocks in use by the filesystem. Use the new
    counter to set these blocks aside at reservation time and thus
    ensure they cannot be reserved until truly available. Since this is
    only necessary when perag reservations are active and the counter
    requires a read of each AGF to fully populate, only enforce on perag
    res enabled filesystems. This allows initialization of the counter
    at ->pagf_init time because the perag reservation init code reads
    each AGF at mount time.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Brian Foster authored and intel-lab-lkp committed Apr 12, 2021
  2. xfs: set a mount flag when perag reservation is active

    perag reservation is enabled at mount time on a per AG basis. The
    upcoming in-core allocation btree accounting mechanism needs to know
    when reservation is enabled and that all perag AGF contexts are
    initialized. As a preparation step, set a flag in the mount
    structure and unconditionally initialize the pagf on all mounts
    where at least one reservation is active.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Brian Foster authored and intel-lab-lkp committed Apr 12, 2021

Commits on Apr 9, 2021

  1. xfs: drop unnecessary setfilesize helper

    xfs_setfilesize() is the only remaining caller of the internal
    __xfs_setfilesize() helper. Fold them into a single function.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Apr 9, 2021
  2. xfs: drop unused ioend private merge and setfilesize code

    XFS no longer attaches anthing to ioend->io_private. Remove the
    unnecessary ->io_private merging code. This removes the only remaining
    user of xfs_setfilesize_ioend() so remove that function as well.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Apr 9, 2021
  3. xfs: open code ioend needs workqueue helper

    Open code xfs_ioend_needs_workqueue() into the only remaining
    caller.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Apr 9, 2021
  4. xfs: drop submit side trans alloc for append ioends

    Per-inode ioend completion batching has a log reservation deadlock
    vector between preallocated append transactions and transactions
    that are acquired at completion time for other purposes (i.e.,
    unwritten extent conversion or COW fork remaps). For example, if the
    ioend completion workqueue task executes on a batch of ioends that
    are sorted such that an append ioend sits at the tail, it's possible
    for the outstanding append transaction reservation to block
    allocation of transactions required to process preceding ioends in
    the list.
    
    Append ioend completion is historically the common path for on-disk
    inode size updates. While file extending writes may have completed
    sometime earlier, the on-disk inode size is only updated after
    successful writeback completion. These transactions are preallocated
    serially from writeback context to mitigate concurrency and
    associated log reservation pressure across completions processed by
    multi-threaded workqueue tasks.
    
    However, now that delalloc blocks unconditionally map to unwritten
    extents at physical block allocation time, size updates via append
    ioends are relatively rare. This means that inode size updates most
    commonly occur as part of the preexisting completion time
    transaction to convert unwritten extents. As a result, there is no
    longer a strong need to preallocate size update transactions.
    
    Remove the preallocation of inode size update transactions to avoid
    the ioend completion processing log reservation deadlock. Instead,
    continue to send all potential size extending ioends to workqueue
    context for completion and allocate the transaction from that
    context. This ensures that no outstanding log reservation is owned
    by the ioend completion worker task when it begins to process
    ioends.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Brian Foster authored and Darrick J. Wong committed Apr 9, 2021
  5. xfs: fix return of uninitialized value in variable error

    A previous commit removed a call to xfs_attr3_leaf_read that
    assigned an error return code to variable error. We now have
    a few early error return paths to label 'out' that return
    error if error is set; however error now is uninitialized
    so potentially garbage is being returned.  Fix this by setting
    error to zero to restore the original behaviour where error
    was zero at the label 'restart'.
    
    Addresses-Coverity: ("Uninitialized scalar variable")
    Fixes: 07120f1 ("xfs: Add xfs_has_attr and subroutines")
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Colin Ian King authored and Darrick J. Wong committed Apr 9, 2021
  6. xfs: get rid of the ip parameter to xchk_setup_*

    Now that the scrub context stores a pointer to the file that was used to
    invoke the scrub call, the struct xfs_inode pointer that we passed to
    all the setup functions is no longer necessary.  This is only ever used
    if the caller wants us to scrub the metadata of the open file.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Darrick J. Wong committed Apr 9, 2021
  7. xfs: fix scrub and remount-ro protection when running scrub

    While running a new fstest that races a readonly remount with scrub
    running in repair mode, I observed the kernel tripping over debugging
    assertions in the log quiesce code that were checking that the CIL was
    empty.  When the sysadmin runs scrub in repair mode, the scrub code
    allocates real transactions (with reservations) to change things, but
    doesn't increment the superblock writers count to block a readonly
    remount attempt while it is running.
    
    We don't require the userspace caller to have a writable file descriptor
    to run repairs, so we have to call mnt_want_write_file to obtain freeze
    protection and increment the writers count.  It's ok to remove the call
    to sb_start_write for the dry-run case because commit 8321ddb
    removed the behavior where scrub and fsfreeze fight over the buffer LRU.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Darrick J. Wong committed Apr 9, 2021

Commits on Apr 7, 2021

  1. xfs: move the check for post-EOF mappings into xfs_can_free_eofblocks

    Fix the weird split of responsibilities between xfs_can_free_eofblocks
    and xfs_free_eofblocks by moving the chunk of code that looks for any
    actual post-EOF space mappings from the second function into the first.
    
    This clears the way for deferred inode inactivation to be able to decide
    if an inode needs inactivation work before committing the released inode
    to the inactivation code paths (vs. marking it for reclaim).
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Darrick J. Wong committed Apr 7, 2021
  2. xfs: move the xfs_can_free_eofblocks call under the IOLOCK

    In xfs_inode_free_eofblocks, move the xfs_can_free_eofblocks call
    further down in the function to the point where we have taken the
    IOLOCK.  This is preparation for the next patch, where we will need that
    lock (or equivalent) so that we can check if there are any post-eof
    blocks to clean out.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Darrick J. Wong committed Apr 7, 2021
  3. xfs: precalculate default inode attribute offset

    Default attr fork offset is based on inode size, so is a fixed
    geometry parameter of the inode. Move it to the xfs_ino_geometry
    structure and stop calculating it on every call to
    xfs_default_attroffset().
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Tested-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Dave Chinner authored and Darrick J. Wong committed Apr 7, 2021
  4. xfs: default attr fork size does not handle device inodes

    Device inodes have a non-default data fork size of 8 bytes
    as checked/enforced by xfs_repair. xfs_default_attroffset() doesn't
    handle this, so lets do a minor refactor so it does.
    
    Fixes: e6a688c ("xfs: initialise attr fork on inode create")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Tested-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Dave Chinner authored and Darrick J. Wong committed Apr 7, 2021
  5. xfs: inode fork allocation depends on XFS_IFEXTENT flag

    Due to confusion on when the XFS_IFEXTENT needs to be set, the
    changes in e6a688c ("xfs: initialise attr fork on inode
    create") failed to set the flag when initialising the empty
    attribute fork at inode creation. Set this flag the same way
    xfs_bmap_add_attrfork() does after attry fork allocation.
    
    Fixes: e6a688c ("xfs: initialise attr fork on inode create")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Tested-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Dave Chinner authored and Darrick J. Wong committed Apr 7, 2021
  6. xfs: eager inode attr fork init needs attr feature awareness

    The pitfalls of regression testing on a machine without realising
    that selinux was disabled. Only set the attr fork during inode
    allocation if the attr feature bits are already set on the
    superblock.
    
    Fixes: e6a688c ("xfs: initialise attr fork on inode create")
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Tested-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Dave Chinner authored and Darrick J. Wong committed Apr 7, 2021
  7. xfs: scrub: Disable check for unoptimized data fork bmbt node

    xchk_btree_check_minrecs() checks if the contents of the immediate child of a
    bmbt root block can fit within the root block. This check could fail on inodes
    with an attr fork since xfs_bmap_add_attrfork_btree() used to demote the
    current root node of the data fork as the child of a newly allocated root node
    if it found that the size of "struct xfs_btree_block" along with the space
    required for records exceeded that of space available in the data fork.
    
    xfs_bmap_add_attrfork_btree() should have used "struct xfs_bmdr_block" instead
    of "struct xfs_btree_block" for the above mentioned space requirement
    calculation. This commit disables the check for unoptimized (in terms of
    disk space usage) data fork bmbt trees since there could be filesystems
    in use that already have such a layout.
    
    Suggested-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    chandanr authored and Darrick J. Wong committed Apr 7, 2021
  8. xfs: Use struct xfs_bmdr_block instead of struct xfs_btree_block to c…

    …alculate root node size
    
    The incore data fork of an inode stores the bmap btree root node as 'struct
    xfs_btree_block'. However, the ondisk version of the inode stores the bmap
    btree root node as a 'struct xfs_bmdr_block'.
    
    xfs_bmap_add_attrfork_btree() checks if the btree root node fits inside the
    data fork of the inode. However, it incorrectly uses 'struct xfs_btree_block'
    to compute the size of the bmap btree root node. Since size of 'struct
    xfs_btree_block' is larger than that of 'struct xfs_bmdr_block',
    xfs_bmap_add_attrfork_btree() could end up unnecessarily demoting the current
    root node as the child of newly allocated root node.
    
    This commit optimizes space usage by modifying xfs_bmap_add_attrfork_btree()
    to use 'struct xfs_bmdr_block' to check if the bmap btree root node fits
    inside the data fork of the inode.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    chandanr authored and Darrick J. Wong committed Apr 7, 2021
  9. xfs: deprecate BMV_IF_NO_DMAPI_READ flag

    Use of the flag has had no effect since kernel commit 288699f
    ("xfs: drop dmapi hooks"), which removed all dmapi related code, so
    deprecate it.
    
    Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Anthony Iliopoulos authored and Darrick J. Wong committed Apr 7, 2021
  10. xfs: merge _xfs_dic2xflags into xfs_ip2xflags

    Merge _xfs_dic2xflags into its only caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  11. xfs: move the di_crtime field to struct xfs_inode

    Move the crtime field from struct xfs_icdinode into stuct xfs_inode and
    remove the now entirely unused struct xfs_icdinode.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  12. xfs: move the di_flags2 field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the flags2
    field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  13. xfs: move the di_flags field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the flags
    field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  14. xfs: move the di_forkoff field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the
    forkoff field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  15. xfs: use a union for i_cowextsize and i_flushiter

    The i_cowextsize field is only used for v3 inodes, and the i_flushiter
    field is only used for v1/v2 inodes.  Use a union to pack the inode a
    littler better after adding a few missing guards around their usage.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  16. xfs: use XFS_B_TO_FSB in xfs_ioctl_setattr

    Clean up xfs_ioctl_setattr a bit by using XFS_B_TO_FSB.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  17. xfs: cleanup xfs_fill_fsxattr

    Add a local xfs_mount variable, and use the XFS_FSB_TO_B helper.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  18. xfs: move the di_flushiter field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the
    flushiter field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  19. xfs: move the di_cowextsize field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the
    cowextsize field into the containing xfs_inode structure.  Also
    switch to use the xfs_extlen_t instead of a uint32_t.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  20. xfs: move the di_extsize field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the extsize
    field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  21. xfs: move the di_nblocks field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the nblocks
    field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  22. xfs: move the di_size field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the on-disk
    size field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  23. xfs: move the di_projid field to struct xfs_inode

    In preparation of removing the historic icinode struct, move the projid
    field into the containing xfs_inode structure.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  24. xfs: don't clear the "dinode core" in xfs_inode_alloc

    The xfs_icdinode structure just contains a random mix of inode field,
    which are all read from the on-disk inode and mostly not looked at
    before reading the inode or initializing a new inode cluster.  The
    only exceptions are the forkoff and blocks field, which are used
    in sanity checks for freshly allocated inodes.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  25. xfs: remove the di_dmevmask and di_dmstate fields from struct xfs_icd…

    …inode
    
    The legacy DMAPI fields were never set by upstream Linux XFS, and have no
    way to be read using the kernel APIs.  So instead of bloating the in-core
    inode for them just copy them from the on-disk inode into the log when
    logging the inode.  The only caveat is that we need to make sure to zero
    the fields for newly read or deleted inodes, which is solved using a new
    flag in the inode.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
  26. xfs: remove the unused xfs_icdinode_has_bigtime helper

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    Christoph Hellwig authored and Darrick J. Wong committed Apr 7, 2021
Older