Permalink
Commits on Apr 27, 2010
  1. @torvalds

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/jmorris/security-testing-2.6
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
      keys: don't need to use RCU in keyring_read() as semaphore is held
    torvalds committed Apr 27, 2010
  2. @torvalds

    Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux

    * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
      nfsd4: bug in read_buf
    torvalds committed Apr 27, 2010
  3. @torvalds

    keys: the request_key() syscall should link an existing key to the de…

    …st keyring
    
    The request_key() system call and request_key_and_link() should make a
    link from an existing key to the destination keyring (if supplied), not
    just from a new key to the destination keyring.
    
    This can be tested by:
    
    	ring=`keyctl newring fred @s`
    	keyctl request2 user debug:a a
    	keyctl request user debug:a $ring
    	keyctl list $ring
    
    If it says:
    
    	keyring is empty
    
    then it didn't work.  If it shows something like:
    
    	1 key in keyring:
    	1070462727: --alswrv     0     0 user: debug:a
    
    then it did.
    
    request_key() system call is meant to recursively search all your keyrings for
    the key you desire, and, optionally, if it doesn't exist, call out to userspace
    to create one for you.
    
    If request_key() finds or creates a key, it should, optionally, create a link
    to that key from the destination keyring specified.
    
    Therefore, if, after a successful call to request_key() with a desination
    keyring specified, you see the destination keyring empty, the code didn't work
    correctly.
    
    If you see the found key in the keyring, then it did - which is what the patch
    is required for.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: <stable@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    David Howells committed with torvalds Apr 27, 2010
  4. @mzyngier @torvalds

    gpio: fix pca953x set_type 'scheduling while atomic' bug

    Bill Gatliff reported the following bug when using the irq_chip facility
    of the pca953x driver on a PPC platform:
    
    BUG: scheduling while atomic: insmod/1530/0x00000002
    
    He traced it back to an i2c transaction in pca953x_irq_set_type(), which
    can be called with interrupt disabled (from __setup_irq()).  As the i2c
    controller can sleep while sending a message, this qualifies as a bad
    idea.
    
    This patch moves the i2c transaction to pca953x_irq_bus_sync_unlock(),
    where it is actually safe to send an i2c message.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Marc Zyngier <maz@misterjones.org>
    Reported-by: Bill Gatliff <bgat@billgatliff.com>
    Cc: Eric Miao <eric.y.miao@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mzyngier committed with torvalds Apr 27, 2010
  5. @torvalds

    procfs: fix tid fdinfo

    Correct the file_operations struct in fdinfo entry of tid_base_stuff[].
    
    Presently /proc/*/task/*/fdinfo contains symlinks to opened files like
    /proc/*/fd/.
    
    Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Miklos Szeredi <mszeredi@suse.cz>
    Cc: Alexey Dobriyan <adobriyan@gmail.com>
    Cc: <stable@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Jerome Marchand committed with torvalds Apr 27, 2010
  6. @PeterHuewe @torvalds

    arch/avr32: fix build failure caused by wrong prototype

    This patch fixes a build failure introduced by 1d83931 ("avr32: use
    generic ptrace_resume code") which had the static keyword as a leftover.
    
      arch/avr32/kernel/ptrace.c:32: error: static declaration of `user_enable_single_step' follows non-static declaration
      include/linux/ptrace.h:268: error: previous declaration of `user_enable_single_step' was here
    
    References:
    [1]http://kisskb.ellerman.id.au/kisskb/buildresult/2448162/
    
    Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
    Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    PeterHuewe committed with torvalds Apr 27, 2010
  7. keys: don't need to use RCU in keyring_read() as semaphore is held

    keyring_read() doesn't need to use rcu_dereference() to access the keyring
    payload as the caller holds the key semaphore to prevent modifications
    from happening whilst the data is read out.
    
    This should solve the following warning:
    
    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/keyring.c:204 invoked rcu_dereference_check() without protection!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by keyctl/2144:
     #0:  (&key->sem){+++++.}, at: [<ffffffff81177f7c>] keyctl_read_key+0x9c/0xcf
    
    stack backtrace:
    Pid: 2144, comm: keyctl Not tainted 2.6.34-rc2-cachefs #113
    Call Trace:
     [<ffffffff8105121f>] lockdep_rcu_dereference+0xaa/0xb2
     [<ffffffff811762d5>] keyring_read+0x4d/0xe7
     [<ffffffff81177f8c>] keyctl_read_key+0xac/0xcf
     [<ffffffff811788d4>] sys_keyctl+0x75/0xb9
     [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: James Morris <jmorris@namei.org>
    David Howells committed with James Morris Apr 27, 2010
  8. @torvalds

    Remove redundant check for CONFIG_MMU

    The checks for CONFIG_MMU at this location are duplicated as all the code is
    located inside a #ifndef CONFIG_MMU block. So the first conditional block will
    always be included while the second never will.
    
    Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
    Signed-off-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Christoph Egger committed with torvalds Apr 26, 2010
  9. @torvalds

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus

    * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
      squashfs: fix potential buffer over-run on 4K block file systems
      squashfs: add missing buffer free
      squashfs: fix warn_on when root inode is corrupted
      squashfs: fix locking bug in zlib wrapper
    torvalds committed Apr 27, 2010
  10. @torvalds

    Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

    * 'for-linus' of git://oss.sgi.com/xfs/xfs:
      xfs: more swap extent fixes for dynamic fork offsets
    torvalds committed Apr 27, 2010
  11. @torvalds

    Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tmlind/linux-omap-2.6
    
    * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (39 commits)
      omap: delete unused bootloader tag variables
      omap: Devkit8000: Remove unused pins
      omap: Devkit8000: Change position of init calls
      omap: Devkit8000: Remove unnecessary include file
      omap: Devkit8000: Fix typo in pin name
      omap: Devkit8000: Add missing package selection
      omap: Devkit8000: Fix typo in supplies
      n8x0_defconfig: remove CONFIG_NILFS2_FS override
      omap: board-sdp-flash.c: Fix typos in debug output
      omap4: Fix McBSP4 base address
      omap: rx51_defconfig: Remove CONFIG_SYSFS_DEPRECATED*=y options
      omap: rx51_defconfig: Remove duplicate phonet
      omap: fix a gpmc nand problem
      AM3517: initialize i2c subsystem after mux subsystem
      omap: remove one of the define of INT_34XX_BENCH_MPU_EMUL
      omap: fix the compile error if CONFIG_MTD_NAND_OMAP2 is notenabled
      OMAP4: Clocks: Change SPI Instance Names
      omap: Devkit8000: Fix wrong usb port on Devkit8000
      OMAP4: Fix for CONTROL register Base
      OMAP4-HSMMC: FIX for MMC5 Controller IRQ Base
      ...
    torvalds committed Apr 27, 2010
  12. @rikvanriel @torvalds

    mmap: check ->vm_ops before dereferencing

    Check whether the VMA has a vm_ops before calling close, just
    like we check vm_ops before calling open a few dozen lines
    higher up in the function.
    
    Signed-off-by: Rik van Riel <riel@redhat.com>
    Reported-by: Dan Carpenter <error27@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rikvanriel committed with torvalds Apr 26, 2010
  13. @torvalds

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
      crypto: authenc - Add EINPROGRESS check
    torvalds committed Apr 27, 2010
  14. @torvalds

    Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/airlied/drm-2.6
    
    * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
      drm/radeon: Fix sparc regression in r300_scratch()
      drm: make sure vblank interrupts are disabled at DPMS time
      drm/radeon/kms/evergreen: No EnableYUV table
      drm/radeon: 9800 SE has only one quadpipe
      drm/radeon/kms: don't print error for legal crtcs.
      drm/radeon/kms/evergreen: fix LUT setup
    torvalds committed Apr 27, 2010
Commits on Apr 26, 2010
  1. @davem330

    drm/radeon: Fix sparc regression in r300_scratch()

    Commit b4fe945 ("drm/radeon: Fix
    memory allocation failures in the preKMS command stream checking.")
    added a regression in that it completely tossed the get_unaligned()
    done by r300_scratch() which we added in commit
    958a6f8 ("drm: radeon: Fix unaligned
    access in r300_scratch().").
    
    Put it back.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Acked-by: Matt Turner <mattst88@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    davem330 committed with Dave Airlie Apr 26, 2010
  2. @jbarnes993

    drm: make sure vblank interrupts are disabled at DPMS time

    When we call drm_vblank_off() at DPMS off time (to wake any clients so
    they don't hang) we need to make sure interrupts are actually disabled.
    If drm_vblank_off() gets called before the vblank usage timer expires,
    it'll prevent the timer from disabling interrupts since it also clears
    the vblank_enabled flag for the pipe.
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    jbarnes993 committed with Dave Airlie Mar 26, 2010
  3. @neilbrown

    nfsd4: bug in read_buf

    When read_buf is called to move over to the next page in the pagelist
    of an NFSv4 request, it sets argp->end to essentially a random
    number, certainly not an address within the page which argp->p now
    points to.  So subsequent calls to READ_BUF will think there is much
    more than a page of spare space (the cast to u32 ensures an unsigned
    comparison) so we can expect to fall off the end of the second
    page.
    
    We never encountered thsi in testing because typically the only
    operations which use more than two pages are write-like operations,
    which have their own decoding logic.  Something like a getattr after a
    write may cross a page boundary, but it would be very unusual for it to
    cross another boundary after that.
    
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
    neilbrown committed with J. Bruce Fields Apr 20, 2010
  4. xfs: more swap extent fixes for dynamic fork offsets

    A new xfsqa test (226) with a prototype xfs_fsr change to try to
    handle dynamic fork offsets better triggers an assertion failure
    where the inode data fork is in btree format, yet there is room in
    the inode for it to be in extent format. The two inodes look like:
    
    before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
    before: ino 0x115 (temp),  num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
    after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
    after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56
    
    Basically the target inode ends up with 5 extents in btree format,
    but it had space for 6 extents in extent format, so ends up
    incorrect. Notably here the broot size is the same, and that is
    where the kernel code is going wrong - the btree root will fit, so
    it lets the swap go ahead.
    
    The check should not allow the swap to take place if the number of
    extents while in btree format is less than the number of extents
    that can fit in the inode in extent format. Adding that check will
    prevent this swap and corruption from occurring.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Dave Chinner committed with Alex Elder Apr 20, 2010
  5. @herbertx

    crypto: authenc - Add EINPROGRESS check

    When Steffen originally wrote the authenc async hash patch, he
    correctly had EINPROGRESS checks in place so that we did not invoke
    the original completion handler with it.
    
    Unfortuantely I told him to remove it before the patch was applied.
    
    As only MAY_BACKLOG request completion handlers are required to
    handle EINPROGRESS completions, those checks are really needed.
    
    This patch restores them.
    
    Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    herbertx committed Apr 26, 2010
Commits on Apr 25, 2010
  1. @torvalds

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
      ipv6: Fix inet6_csk_bind_conflict()
      e100: Fix the TX workqueue race
    torvalds committed Apr 25, 2010
  2. @davem330

    ipv6: Fix inet6_csk_bind_conflict()

    Commit fda48a0 (tcp: bind() fix when many ports are bound)
    introduced a bug on IPV6 part.
    We should not call ipv6_addr_any(inet6_rcv_saddr(sk2)) but
    ipv6_addr_any(inet6_rcv_saddr(sk)) because sk2 can be IPV4, while sk is
    IPV6.
    
    Reported-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Tested-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Eric Dumazet committed with davem330 Apr 25, 2010
  3. @torvalds

    Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/tytso/ext4
    
    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
      ext4: Issue the discard operation *before* releasing the blocks to be reused
      ext4: Fix buffer head leaks after calls to ext4_get_inode_loc()
      ext4: Fix possible lost inode write in no journal mode
    torvalds committed Apr 25, 2010
  4. @davem330

    e100: Fix the TX workqueue race

    Nothing stops the workqueue being left to run in parallel with close or a
    few other operations. This causes double unmaps and the like.
    
    See kerneloops.org #1041230 for an example
    
    Signed-off-by: Alan Cox <alan@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Alan Cox committed with davem330 Apr 25, 2010
  5. @plougher

    squashfs: fix potential buffer over-run on 4K block file systems

    Sizing the buffer based on block size is incorrect, leading
    to a potential buffer over-run on 4K block size file systems
    (because the metadata block size is always 8K).  This bug
    doesn't seem have triggered because 4K block size file systems
    are not default, and also because metadata blocks after
    compression tend to be less than 4K.
    
    Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
    plougher committed Apr 23, 2010
  6. @plougher

    squashfs: add missing buffer free

    Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
    plougher committed Apr 22, 2010
  7. @plougher

    squashfs: fix warn_on when root inode is corrupted

    Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where
    the root inode has been corrupted.
    
    Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
    Reported-by: Steve Grubb <sgrubb@redhat.com>
    plougher committed Apr 16, 2010
Commits on Apr 24, 2010
  1. @torvalds

    Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git…

    …/davej/cpufreq
    
    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
      [CPUFREQ] use max load in conservative governor
      [CPUFREQ] fix a lockdep warning
    torvalds committed Apr 24, 2010
  2. @torvalds

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
      gianfar: Fix potential oops during OF address translation
      fsl_pq_mdio: Fix kernel oops during OF address translation
      tcp: bind() fix when many ports are bound
      rdma: potential ERR_PTR dereference
      rtnetlink: potential ERR_PTR dereference
      net: ipv6 bind to device issue
      ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU
      drivers/net/usb: Add new driver ipheth
      cxgb3: fix linkup issue
      X25 fix dead unaccepted sockets
      KS8851: NULL pointer dereference if list is empty
      net: 3c574_cs fix stats.tx_bytes counter
      xfrm6: ensure to use the same dev when building a bundle
      can: Fix possible NULL pointer dereference in ems_usb.c
      net: Fix an RCU warning in dev_pick_tx()
      ipv6: Fix tcp_v6_send_response transport header setting.
      bridge: add a missing ntohs()
      8139too: Fix a typo in the function name.
      mac80211: pass HT changes to driver when off channel
      mac80211: remove bogus TX agg state assignment
      ...
    torvalds committed Apr 24, 2010
  3. @torvalds

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/jbarnes/pci-2.6
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
      PCI: Ensure we re-enable devices on resume
      x86/PCI: parse additional host bridge window resource types
      PCI: revert broken device warning
      PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
      x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
    torvalds committed Apr 24, 2010
  4. @torvalds

    Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/mjg59/platform-drivers-x86
    
    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86:
      eeepc-laptop: add missing sparse_keymap_free
      eeepc-wmi: Build fix
      asus: don't modify bluetooth/wlan on boot
      dell-wmi: Fix memory leak
      eeepc-wmi: add backlight support
      eeepc-wmi: use a platform device as parent device of all sub-devices
      eeepc-wmi: add an eeepc_wmi context structure
    torvalds committed Apr 24, 2010
  5. @plougher @torvalds

    initramfs: handle unrecognised decompressor when unpacking

    The unpack routine fails to handle the decompress_method() returning
    unrecognised decompressor (compress_name == NULL).  This results in the
    routine looping eventually oopsing on an out of bounds memory access.
    
    Note this bug is usually hidden, only triggering on trailing junk after
    one or more correct compressed blocks.  The case of the compressed archive
    being complete junk is (by accident?) caught by the if (state != Reset)
    check because state is initialised to Start, but not updated due to the
    decompressor not having been called.  Obviously if the junk is trailing a
    correctly decompressed buffer, state == Reset from the previous call to
    the decompressor.
    
    Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
    Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    plougher committed with torvalds Apr 23, 2010
  6. @error27 @torvalds

    ksm: check for ERR_PTR from follow_page()

    The follow_page() function can potentially return -EFAULT so I added
    checks for this.
    
    Also I silenced an uninitialized variable warning on my version of gcc
    (version 4.3.2).
    
    Signed-off-by: Dan Carpenter <error27@gmail.com>
    Acked-by: Rik van Riel <riel@redhat.com>
    Acked-by: Izik Eidus <ieidus@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    error27 committed with torvalds Apr 23, 2010
  7. @torvalds

    VMware Balloon driver

    This is a standalone version of VMware Balloon driver.  Ballooning is a
    technique that allows hypervisor dynamically limit the amount of memory
    available to the guest (with guest cooperation).  In the overcommit
    scenario, when hypervisor set detects that it needs to shuffle some
    memory, it instructs the driver to allocate certain number of pages, and
    the underlying memory gets returned to the hypervisor.  Later hypervisor
    may return memory to the guest by reattaching memory to the pageframes and
    instructing the driver to "deflate" balloon.
    
    We are submitting a standalone driver because KVM maintainer (Avi Kivity)
    expressed opinion (rightly) that our transport does not fit well into
    virtqueue paradigm and thus it does not make much sense to integrate with
    virtio.
    
    There were also some concerns whether current ballooning technique is the
    right thing.  If there appears a better framework to achieve this we are
    prepared to evaluate and switch to using it, but in the meantime we'd like
    to get this driver upstream.
    
    We want to get the driver accepted in distributions so that users do not
    have to deal with an out-of-tree module and many distributions have
    "upstream first" requirement.
    
    The driver has been shipping for a number of years and users running on
    VMware platform will have it installed as part of VMware Tools even if it
    will not come from a distribution, thus there should not be additional
    risk in pulling the driver into mainline.  The driver will only activate
    if host is VMware so everyone else should not be affected at all.
    
    Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
    Cc: Avi Kivity <avi@redhat.com>
    Cc: Jeremy Fitzhardinge <jeremy@goop.org>
    Cc: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Dmitry Torokhov committed with torvalds Apr 23, 2010
  8. @antonblanchard @torvalds

    fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes …

    …to block devices
    
    We are seeing a large regression in database performance on recent
    kernels.  The database opens a block device with O_DIRECT|O_SYNC and a
    number of threads write to different regions of the file at the same time.
    
    A simple test case is below.  I haven't defined DEVICE since getting it
    wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
    see about 17MB/sec and only a few threads in IO wait:
    
    procs  -----io---- -system-- -----cpu------
     r  b     bi    bo   in   cs us sy id wa st
     0  3      0 16170  656 2259  0  0 86 14  0
     0  2      0 16704  695 2408  0  0 92  8  0
     0  2      0 17308  744 2653  0  0 86 14  0
     0  2      0 17933  759 2777  0  0 89 10  0
    
    Most threads are blocking in vfs_fsync_range, which has:
    
            mutex_lock(&mapping->host->i_mutex);
            err = fop->fsync(file, dentry, datasync);
            if (!ret)
                    ret = err;
            mutex_unlock(&mapping->host->i_mutex);
    
    commit 148f948 (vfs: Introduce new
    helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
    some explanation of what is going on:
    
        Use these new helpers for syncing from generic VFS functions. This makes
        O_SYNC writes to block devices acquire i_mutex for syncing. If we really
        care about this, we can make block_fsync() drop the i_mutex and reacquire
        it before it returns.
    
    Thanks Jan for such a good commit message!  As well as dropping i_mutex,
    Christoph suggests we should remove the call to sync_blockdev():
    
    > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
    > the block device inode, which is exactly what we did just before calling
    > into ->fsync
    
    The patch below incorporates both suggestions. With it the testcase improves
    from 17MB/s to 68M/sec:
    
    procs  -----io---- -system-- -----cpu------
     r  b     bi    bo   in   cs us sy id wa st
     0  7      0 65536 1000 3878  0  0 70 30  0
     0 34      0 69632 1016 3921  0  1 46 53  0
     0 57      0 69632 1000 3921  0  0 55 45  0
     0 53      0 69640  754 4111  0  0 81 19  0
    
    Testcase:
    
    #define _GNU_SOURCE
    #include <stdio.h>
    #include <pthread.h>
    #include <unistd.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    
    #define NR_THREADS 64
    #define BUFSIZE (64 * 1024)
    
    #define DEVICE "/dev/mapper/XXXXXX"
    
    #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))
    
    static int fd;
    
    static void *doit(void *arg)
    {
    	unsigned long offset = (long)arg;
    	char *b, *buf;
    
    	b = malloc(BUFSIZE + 1024);
    	buf = (char *)ALIGN((unsigned long)b, 1024);
    	memset(buf, 0, BUFSIZE);
    
    	while (1)
    		pwrite(fd, buf, BUFSIZE, offset);
    }
    
    int main(int argc, char *argv[])
    {
    	int flags = O_RDWR|O_DIRECT;
    	int i;
    	unsigned long offset = 0;
    
    	if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
    		flags |= O_SYNC;
    
    	fd = open(DEVICE, flags);
    	if (fd == -1) {
    		perror("open");
    		exit(1);
    	}
    
    	for (i = 0; i < NR_THREADS-1; i++) {
    		pthread_t tid;
    		pthread_create(&tid, NULL, doit, (void *)offset);
    		offset += BUFSIZE;
    	}
    	doit((void *)offset);
    
    	return 0;
    }
    
    Signed-off-by: Anton Blanchard <anton@samba.org>
    Acked-by: Jan Kara <jack@suse.cz>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Jens Axboe <jens.axboe@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    antonblanchard committed with torvalds Apr 23, 2010
  9. @torvalds

    lib/vsprintf.c: add missing EXPORT_SYMBOL(simple_strtoll)

    Add a missing EXPORT_SYMBOL.
    
    I must be the first person that wants to use this function :-)
    
    Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hans Verkuil committed with torvalds Apr 23, 2010