Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tag: v2.6.39-rc3
Commits on Apr 12, 2011
  1. @torvalds

    Linux 2.6.39-rc3

    torvalds authored
Commits on Apr 11, 2011
  1. @torvalds

    Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

    torvalds authored
    * 'for-linus' of git://oss.sgi.com/xfs/xfs:
      xfs: use proper interfaces for on-stack plugging
      xfs: fix xfs_debug warnings
      xfs: fix variable set but not used warnings
      xfs: convert log tail checking to a warning
      xfs: catch bad block numbers freeing extents.
      xfs: push the AIL from memory reclaim and periodic sync
      xfs: clean up code layout in xfs_trans_ail.c
      xfs: convert the xfsaild threads to a workqueue
      xfs: introduce background inode reclaim work
      xfs: convert ENOSPC inode flushing to use new syncd workqueue
      xfs: introduce a xfssyncd workqueue
      xfs: fix extent format buffer allocation size
      xfs: fix unreferenced var error in xfs_buf.c
    
    Also, applied patch from Tony Luck that fixes ia64:
      xfs_destroy_workqueues() should not be tagged with__exit
    in the branch before merging.
  2. @torvalds

    xfs_destroy_workqueues() should not be tagged with__exit

    Luck, Tony authored torvalds committed
    ia64 throws away .exit sections for the built-in CONFIG case, so routines
    that are used in other circumstances should not be tagged as __exit.
    
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Alex Elder <aelder@sgi.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  3. @torvalds

    Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…

    torvalds authored
    …/git/tytso/ext4
    
    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
      ext4: fix data corruption regression by reverting commit 6de9843
      ext4: Allow indirect-block file to grow the file size to max file size
      ext4: allow an active handle to be started when freezing
      ext4: sync the directory inode in ext4_sync_parent()
      ext4: init timer earlier to avoid a kernel panic in __save_error_info
      jbd2: fix potential memory leak on transaction commit
      ext4: fix a double free in ext4_register_li_request
      ext4: fix credits computing for indirect mapped files
      ext4: remove unnecessary [cm]time update of quota file
      jbd2: move bdget out of critical section
  4. @torvalds

    Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux

    torvalds authored
    * 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
      nfsd4: fix oops on lock failure
      nfsd: fix auth_domain reference leak on nlm operations
  5. @torvalds

    Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6

    torvalds authored
    * 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
      dt/fsldma: fix build warning caused by of_platform_device changes
      spi: Fix race condition in stop_queue()
      gpio/pch_gpio: Fix output value of pch_gpio_direction_output()
      gpio/ml_ioh_gpio: Fix output value of ioh_gpio_direction_output()
      gpio/pca953x: fix error handling path in probe() call
  6. @torvalds

    pci: fix PCI bus allocation alignment handling

    torvalds authored
    In commit 13583b1 ("PCI: refactor io size calculation code") Ram
    had a thinko in the refactorization of the code: the end result used the
    variable 'align' for the bus alignment, but the original code used
    'min_align'.
    
    Since then, another use of that 'align' variable got introduced by
    commit c8adf9a ("PCI: pre-allocate additional resources to devices
    only after successful allocation of essential resources.")
    
    Fix both of those uses to use 'min_align' as they should.
    
    Daniel Hellstrom <daniel@gaisler.com>
    Acked-by: Ram Pai <linuxram@us.ibm.com>
    Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  7. @torvalds

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

    torvalds authored
    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
      net: Add support for SMSC LAN9530, LAN9730 and LAN89530
      mlx4_en: Restoring RX buffer pointer in case of failure
      mlx4: Sensing link type at device initialization
      ipv4: Fix "Set rt->rt_iif more sanely on output routes."
      MAINTAINERS: add entry for Xen network backend
      be2net: Fix suspend/resume operation
      be2net: Rename some struct members for clarity
      pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
      dsa/mv88e6131: add support for mv88e6085 switch
      ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
      be2net: Fix a potential crash during shutdown.
      bna: Fix for handling firmware heartbeat failure
      can: mcp251x: Allow pass IRQ flags through platform data.
      smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
      iwlwifi: accept EEPROM version 0x423 for iwl6000
      rt2x00: fix cancelling uninitialized work
      rtlwifi: Fix some warnings/bugs
      p54usb: IDs for two new devices
      wl12xx: fix potential buffer overflow in testmode nvs push
      zd1211rw: reset rx idle timer from tasklet
      ...
  8. dt/fsldma: fix build warning caused by of_platform_device changes

    Ira W. Snyder authored Grant Likely committed
    Commit 0000612, "dt/powerpc:
    Eliminate users of of_platform_{,un}register_driver" forgot to convert
    the type of structure passed into platform_device_register() when it
    was converted from of_platform_device_register. Fix it.
    
    Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
    Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
  9. @tytso

    ext4: fix data corruption regression by reverting commit 6de9843

    tytso authored
    Revert commit 6de9843, since it
    caused a data corruption regression with BitTorrent downloads.  Thanks
    to Damien for discovering and bisecting to find the problem commit.
    
    https://bugzilla.kernel.org/show_bug.cgi?id=32972
    
    Reported-by: Damien Grassart <damien@grassart.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
  10. @tytso

    ext4: Allow indirect-block file to grow the file size to max file size

    Kazuya Mio authored tytso committed
    We can create 4402345721856 byte file with indirect block mapping.
    However, if we grow an indirect-block file to the size with ftruncate(),
    we can see an ext4 warning. The following patch fixes this problem.
    
    How to reproduce:
    # dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s
    # tail -n 1 /var/log/messages
    Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12
    
    Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
  11. @YANGYongqiang @tytso

    ext4: allow an active handle to be started when freezing

    YANGYongqiang authored tytso committed
    ext4_journal_start_sb() should not prevent an active handle from being
    started due to s_frozen.  Otherwise, deadlock is easy to happen, below
    is a situation.
    
    ================================================
         freeze         |       truncate
    ================================================
                        |  ext4_ext_truncate()
        freeze_super()  |   starts a handle
        sets s_frozen   |
                        |  ext4_ext_truncate()
                        |  holds i_data_sem
      ext4_freeze()     |
      waits for updates |
                        |  ext4_free_blocks()
                        |  calls dquot_free_block()
                        |
                        |  dquot_free_blocks()
                        |  calls ext4_dirty_inode()
                        |
                        |  ext4_dirty_inode()
                        |  trys to start an active
                        |  handle
                        |
                        |  block due to s_frozen
    ================================================
    
    Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Reported-by: Amir Goldstein <amir73il@users.sf.net>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Andreas Dilger <adilger@dilger.ca>
  12. @tytso

    ext4: sync the directory inode in ext4_sync_parent()

    Curt Wohlgemuth authored tytso committed
    ext4 has taken the stance that, in the absence of a journal,
    when an fsync/fdatasync of an inode is done, the parent
    directory should be sync'ed if this inode entry is new.
    ext4_sync_parent(), which implements this, does indeed sync
    the dirent pages for parent directories, but it does not
    sync the directory *inode*.  This patch fixes this.
    
    Also now return error status from ext4_sync_parent().
    
    I tested this using a power fail test, which panics a
    machine running a file server getting requests from a
    client.  Without this patch, on about every other test run,
    the server is missing many, many files that had been synced.
    With this patch, on > 6 runs, I see zero files being lost.
    
    Google-Bug-Id: 4179519
    Signed-off-by: Curt Wohlgemuth <curtw@google.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
  13. @davem330

    net: Add support for SMSC LAN9530, LAN9730 and LAN89530

    Steve Glendinning authored davem330 committed
    This patch adds support for SMSC's LAN9530, LAN9730 and LAN89530 USB
    ethernet controllers to the existing smsc95xx driver by adding
    their new USB VID/PID pairs.
    
    Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Apr 10, 2011
  1. @torvalds

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    torvalds authored
    …/git/tiwai/sound-2.6
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
      ALSA: hda - Don't query connections for widgets have no connections
      ALSA: HDA: Fix single internal mic on ALC275 (Sony Vaio VPCSB1C5E)
      ALSA: hda - HDMI: Fix MCP7x audio infoframe checksums
      ALSA: usb-audio: define another USB ID for a buggy USB MIDI cable
      ALSA: HDA: Fix dock mic for Lenovo X220-tablet
      ASoC: format_register_str: Don't clip register values
      ASoC: PXA: Fix oops in __pxa2xx_pcm_prepare
      ASoC: zylonite: set .codec_dai_name in initializer
  2. nfsd4: fix oops on lock failure

    J. Bruce Fields authored
    Lock stateid's can have access_bmap 0 if they were only partially
    initialized (due to a failed lock request); handle that case in
    free_generic_stateid.
    
    ------------[ cut here ]------------
    kernel BUG at fs/nfsd/nfs4state.c:380!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/kernel/mm/ksm/run
    Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf]
    
    Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
    EIP: 0060:[<e24f180d>] EFLAGS: 00010297 CPU: 0
    EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd]
    EAX: ffffffff EBX: dd758120 ECX: 00000000 EDX: 00000004
    ESI: dd758120 EDI: ddfe657c EBP: dd54dde0 ESP: dd54dde0
     DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process nfsd (pid: 1468, ti=dd54c000 task=ddc92580 task.ti=dd54c000)
    Stack:
     dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20
     ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf
     dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68
    Call Trace:
     [<e24f19ca>] free_generic_stateid+0x1c/0x3e [nfsd]
     [<e24f1a5d>] release_lockowner+0x71/0x8a [nfsd]
     [<e24f52fd>] nfsd4_lock+0x617/0x66c [nfsd]
     [<e24e57b6>] ? nfsd_setuser+0x199/0x1bb [nfsd]
     [<e24e056c>] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd]
     [<c07a0052>] ? _cond_resched+0x8/0x1c
     [<c04ca61f>] ? slab_pre_alloc_hook.clone.33+0x23/0x27
     [<c04cac01>] ? kmem_cache_alloc+0x1a/0xd2
     [<c04835a0>] ? __call_rcu+0xd7/0xdd
     [<e24e0dfb>] ? fh_verify+0x401/0x452 [nfsd]
     [<e24f0b61>] ? nfsd4_encode_operation+0x52/0x117 [nfsd]
     [<e24ea0d7>] ? nfsd4_putfh+0x33/0x3b [nfsd]
     [<e24f4ce6>] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd]
     [<e24ea2c9>] nfsd4_proc_compound+0x1ea/0x33e [nfsd]
     [<e24de6ee>] nfsd_dispatch+0xd1/0x1a5 [nfsd]
     [<e1d6e1c7>] svc_process_common+0x282/0x46f [sunrpc]
     [<e1d6e578>] svc_process+0xdc/0xfa [sunrpc]
     [<e24de0fa>] nfsd+0xd6/0x115 [nfsd]
     [<e24de024>] ? nfsd_shutdown+0x24/0x24 [nfsd]
     [<c0454322>] kthread+0x62/0x67
     [<c04542c0>] ? kthread_worker_fn+0x114/0x114
     [<c07a6ebe>] kernel_thread_helper+0x6/0x10
    Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b <0f> 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d
    EIP: [<e24f180d>] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:dd54dde0
    ---[ end trace 2b0bf6c6557cb284 ]---
    
    The trace route is:
    
     -> nfsd4_lock()
       -> if (lock->lk_is_new) {
         -> alloc_init_lock_stateid()
    
            3739: stp->st_access_bmap = 0;
    
       ->if (status && lock->lk_is_new && lock_sop)
         -> release_lockowner()
          -> free_generic_stateid()
           -> nfs4_access_bmap_to_omode()
              -> nfs4_access_to_omode()
    
            380: BUG();   *****
    
    This problem was introduced by 0997b17.
    
    Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
    Tested-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commits on Apr 9, 2011
  1. @torvalds

    Merge git://git.infradead.org/mtd-2.6

    torvalds authored
    * git://git.infradead.org/mtd-2.6:
      mtd: atmel_nand: use CPU I/O when buffer is in vmalloc(ed) region
      mtd: atmel_nand: modify test case for using DMA operations
      mtd: atmel_nand: fix support for CPUs that do not support DMA access
      mtd: atmel_nand: trivial: change DMA usage information trace
      mtd: mtdswap: fix printk format warning
  2. @tiwai
  3. @tiwai
Commits on Apr 8, 2011
  1. @torvalds

    Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/n…

    torvalds authored
    …fs-2.6
    
    * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
      NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
      NFS: Fix a signed vs. unsigned secinfo bug
      Revert "net/sunrpc: Use static const char arrays"
  2. @torvalds

    signal.c: fix erroneous syscall kernel-doc

    Randy Dunlap authored torvalds committed
    Fix erroneous syscall kernel-doc comments in kernel/signal.c.
    
    Reported-by: Matt Fleming <matt@console-pimps.org>
    Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  3. @torvalds

    Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6

    torvalds authored
    * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
      [S390] compile fix for latest binutils
      [S390] cio: prevent purging of CCW devices in the online state
      [S390] qdio: fix init sequence
      [S390] Fix parameter passing for smp_switch_to_cpu()
      [S390] oprofile s390: prevent stack corruption
  4. @torvalds

    Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…

    torvalds authored
    …/git/jack/linux-fs-2.6
    
    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
      quota: Don't write quota info in dquot_commit()
      ext3: Fix writepage credits computation for ordered mode
  5. xfs: use proper interfaces for on-stack plugging

    Christoph Hellwig authored Alex Elder committed
    Add proper blk_start_plug/blk_finish_plug pairs for the two places where
    we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and
    xfs_buf_iowait, given that context switches already flush the per-process
    plugging lists.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Alex Elder <aelder@sgi.com>
  6. xfs: fix xfs_debug warnings

    Christoph Hellwig authored Alex Elder committed
    For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
    effect in xfs_debug:
    
    fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
    fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
    
    The reason for that is that the various new xfs message functions have a
    return value which is never used, and in case of the non-debug build
    xfs_debug the macro evaluates to a plain 0 which produces the above
    warnings.  This can be fixed by turning xfs_debug into an inline function
    instead of a macro, but in addition to that I've also changed all the
    message helpers to return void as we never use their return values.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Alex Elder <aelder@sgi.com>
  7. xfs: fix variable set but not used warnings

    Christoph Hellwig authored Alex Elder committed
    GCC 4.6 now warnings about variables set but not used.  Fix the trivially
    fixable warnings of this sort.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Alex Elder <aelder@sgi.com>
  8. @davem330

    mlx4_en: Restoring RX buffer pointer in case of failure

    Yevgeny Petrilin authored davem330 committed
    If not done, second attempt to open the RX ring would cause memory corruption.
    
    Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  9. @davem330

    mlx4: Sensing link type at device initialization

    Yevgeny Petrilin authored davem330 committed
    When bringing the port up, performing a SENSE_PORT command
    To try and check to which physical link type (IB or Ethernet) the physical
    port is connected.
    In case there is no valid link partner, the port will come up as its
    supported default.
    
    Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  10. @dchinner

    xfs: convert log tail checking to a warning

    Dave Chinner authored dchinner committed
    On the Power platform, the log tail debug checks fire excessively
    causing the system to panic early in testing. The debug checks are
    known to be racy, though on x86_64 there is no evidence that they
    trigger at all.
    
    We want to keep the checks active on debug systems to alert us to
    problems with log space accounting, but we need to reduce the impact
    of a racy check on testing on the Power platform.
    
    As a result, convert the ASSERT conditions to warnings, and
    allow them to fire only once per filesystem mount. This will prevent
    false positives from interfering with testing, whilst still
    providing us with the indication that they may be a problem with log
    space accounting should that occur.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Alex Elder <aelder@sgi.com>
  11. @dchinner

    xfs: catch bad block numbers freeing extents.

    Dave Chinner authored dchinner committed
    A fuzzed filesystem crashed a kernel when freeing an extent with a
    block number beyond the end of the filesystem. Convert all the debug
    asserts in xfs_free_extent() to active checks so that we catch bad
    extents and return that the filesytsem is corrupted rather than
    crashing.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Alex Elder <aelder@sgi.com>
  12. @dchinner

    xfs: push the AIL from memory reclaim and periodic sync

    Dave Chinner authored dchinner committed
    When we are short on memory, we want to expedite the cleaning of
    dirty objects.  Hence when we run short on memory, we need to kick
    the AIL flushing into action to clean as many dirty objects as
    quickly as possible.  To implement this, sample the lsn of the log
    item at the head of the AIL and use that as the push target for the
    AIL flush.
    
    Further, we keep items in the AIL that are dirty that are not
    tracked any other way, so we can get objects sitting in the AIL that
    don't get written back until the AIL is pushed. Hence to get the
    filesystem to the idle state, we might need to push the AIL to flush
    out any remaining dirty objects sitting in the AIL. This requires
    the same push mechanism as the reclaim push.
    
    This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
    match the new xfs_ail_max_lsn() function introduced in this patch.
    Similarly for xfs_trans_ail_push -> xfs_ail_push.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Alex Elder <aelder@sgi.com>
  13. @dchinner

    xfs: clean up code layout in xfs_trans_ail.c

    Dave Chinner authored dchinner committed
    This patch rearranges the location of functions in xfs_trans_ail.c
    to remove the need for forward declarations of those functions in
    preparation for adding new functions without the need for forward
    declarations.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Alex Elder <aelder@sgi.com>
  14. @dchinner

    xfs: convert the xfsaild threads to a workqueue

    Dave Chinner authored dchinner committed
    Similar to the xfssyncd, the per-filesystem xfsaild threads can be
    converted to a global workqueue and run periodically by delayed
    works. This makes sense for the AIL pushing because it uses
    variable timeouts depending on the work that needs to be done.
    
    By removing the xfsaild, we simplify the AIL pushing code and
    remove the need to spread the code to implement the threading
    and pushing across multiple files.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Alex Elder <aelder@sgi.com>
  15. @dchinner

    xfs: introduce background inode reclaim work

    Dave Chinner authored dchinner committed
    Background inode reclaim needs to run more frequently that the XFS
    syncd work is run as 30s is too long between optimal reclaim runs.
    Add a new periodic work item to the xfs syncd workqueue to run a
    fast, non-blocking inode reclaim scan.
    
    Background inode reclaim is kicked by the act of marking inodes for
    reclaim.  When an AG is first marked as having reclaimable inodes,
    the background reclaim work is kicked. It will continue to run
    periodically untill it detects that there are no more reclaimable
    inodes. It will be kicked again when the first inode is queued for
    reclaim.
    
    To ensure shrinker based inode reclaim throttles to the inode
    cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
    background inode reclaim so that when we are low on memory we are
    trying to reclaim inodes as efficiently as possible. This kick shoul
    d not be necessary, but it will protect against failures to kick the
    background reclaim when inodes are first dirtied.
    
    To provide the rate throttling, make the shrinker pass do
    synchronous inode reclaim so that it blocks on inodes under IO. This
    means that the shrinker will reclaim inodes rather than just
    skipping over them, but it does not adversely affect the rate of
    reclaim because most dirty inodes are already under IO due to the
    background reclaim work the shrinker kicked.
    
    These two modifications solve one of the two OOM killer invocations
    Chris Mason reported recently when running a stress testing script.
    The particular workload trigger for the OOM killer invocation is
    where there are more threads than CPUs all unlinking files in an
    extremely memory constrained environment. Unlike other solutions,
    this one does not have a performance impact on performance when
    memory is not constrained or the number of concurrent threads
    operating is <= to the number of CPUs.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Alex Elder <aelder@sgi.com>
  16. @dchinner

    xfs: convert ENOSPC inode flushing to use new syncd workqueue

    Dave Chinner authored dchinner committed
    On of the problems with the current inode flush at ENOSPC is that we
    queue a flush per ENOSPC event, regardless of how many are already
    queued. Thi can result in    hundreds of queued flushes, most of
    which simply burn CPU scanned and do no real work. This simply slows
    down allocation at ENOSPC.
    
    We really only need one active flush at a time, and we can easily
    implement that via the new xfs_syncd_wq. All we need to do is queue
    a flush if one is not already active, then block waiting for the
    currently active flush to complete. The result is that we only ever
    have a single ENOSPC inode flush active at a time and this greatly
    reduces the overhead of ENOSPC processing.
    
    On my 2p test machine, this results in tests exercising ENOSPC
    conditions running significantly faster - 042 halves execution time,
    083 drops from 60s to 5s, etc - while not introducing test
    regressions.
    
    This allows us to remove the old xfssyncd threads and infrastructure
    as they are no longer used.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Alex Elder <aelder@sgi.com>
Something went wrong with that request. Please try again.