Skip to content
Branch: kitkat
Commits on Feb 19, 2014
  1. Squashed update of BFQ-v6r1 to BFQ-v6r2

    arco authored and chil360 committed Jun 18, 2013
  2. Squashed update of BFQ-v6 to BFQ-v6r1

    arco authored and chil360 committed Jun 13, 2013
    - Fix use-after-free of queues in __bfq_bfqq_expire(). It may happen that
      a call to bfq_del_bfqq_busy() puts the last reference taken on a queue
      and frees it. Subsequent accesses to that same queue would result in a
      use-after-free. Make sure that a queue that has just been deleted from
      busy is no more touched.
    - Use the uninitialized_var() macro when needed. It may happen that a
      variable is initialized in a function that is called by the function
      that defined it. Use the uninitialized_var() macro in these cases.
  3. mm: change initial readahead window size calculation

    Lee Susman authored and chil360 committed Apr 8, 2013
    Change the logic which determines the initial readahead window size
    such that for small requests (one page) the initial window size
    will be x4 the size of the original request, regardless of the
    VM_MAX_READAHEAD value. This prevents a rapid ramp-up
    that could be caused due to increasing VM_MAX_READAHEAD.
    Change-Id: I93d59c515d7e6c6d62348790980ff7bd4f434997
    Signed-off-by: Lee Susman <>
  4. mm: pass readahead info down to the i/o scheduler

    Lee Susman authored and chil360 committed May 5, 2013
    Some i/o schedulers (i.e. row-iosched, cfq-iosched) deploy an idling
    algorithm in order to be better synced with the readahead algorithm.
    Idling is a prediction algorithm for incoming read requests.
    In this patch we mark pages which are part of a readahead window, by
    setting a newly introduced flag. With this flag, the i/o scheduler can
    identify a request which is associated with a readahead page. This
    enables the i/o scheduler's idling mechanism to be en-sync with the
    readahead mechanism and, in turn, can increase read throughput.
    Change-Id: I0654f23315b6d19d71bcc9cc029c6b281a44b196
    Signed-off-by: Lee Susman <>
  5. mm: Optimized SLUB memory allocator

    Christopher83 authored and chil360 committed Mar 12, 2013
  6. mm: Lowered swappiness 60->45

    Christopher83 authored and chil360 committed Feb 24, 2013
  7. arm: Added NEON compilation flag to VFP module

    Christopher83 authored and chil360 committed Feb 21, 2013
  8. writeback: fix race that cause writeback hung

    biger410 authored and chil360 committed Sep 11, 2013
    There is a race between mark inode dirty and writeback thread, see the
    following scenario.  In this case, writeback thread will not run though
    there is dirty_io.
    __mark_inode_dirty()                                          bdi_writeback_workfn()
    	...                                                       	...
    	if (bdi_cap_writeback_dirty(bdi)) {
    	    <<< assume wb has dirty_io, so wakeup_bdi is false.
    	    <<< the following inode_dirty also have wakeup_bdi false.
    	    if (!wb_has_dirty_io(&bdi->wb))
    		    wakeup_bdi = true;
    	                                                            <<< assume last dirty_io is removed here.
    	                                                            pages_written = wb_do_writeback(wb);
    	                                                            <<< work_list empty and wb has no dirty_io,
    	                                                            <<< delayed_work will not be queued.
    	                                                            if (!list_empty(&bdi->work_list) ||
    	                                                                (wb_has_dirty_io(wb) && dirty_writeback_interval))
    	                                                                queue_delayed_work(bdi_wq, &wb->dwork,
    	                                                                    msecs_to_jiffies(dirty_writeback_interval * 10));
    	inode->dirtied_when = jiffies;
    	<<< new dirty_io is added.
    	list_move(&inode->i_wb_list, &bdi->wb.b_dirty);
    	<<< though there is dirty_io, but wakeup_bdi is false,
    	<<< so writeback thread will not be waked up and
    	<<< the new dirty_io will not be flushed.
    	if (wakeup_bdi)
    Writeback will run until there is a new flush work queued.  This may cause
    a lot of dirty pages stay in memory for a long time.
    Signed-off-by: Junxiao Bi <>
    Reviewed-by: Jan Kara <>
    Cc: Fengguang Wu <>
    Signed-off-by: Andrew Morton <>
    Signed-off-by: Linus Torvalds <>
    Signed-off-by: Francisco Franco <>
    Change-Id: I973fcba5381881a003a035ffff48f64348660079
  9. ext4: speed up truncate/unlink by not using bforget() unless needed

    Andrey Sidorov authored and chil360 committed Sep 19, 2012
    Do not iterate over data blocks scanning for bh's to forget as they're
    never exist. This improves time taken by unlink / truncate syscall.
    Tested by continuously truncating file that is being written by dd.
    Another test is rm -rf of linux tree while tar unpacks it. With
    ordered data mode condition unlikely(!tbh) was always met in
    ext4_free_blocks. With journal data mode tbh was found only few times,
    so optimisation is also possible.
    Unlinking fallocated 60G file after doing sync && echo 3 >
    /proc/sys/vm/drop_caches && time rm --help
    X86 before (linux 3.6-rc4):
    real    0m2.710s
    user    0m0.000s
    sys     0m1.530s
    X86 after:
    real    0m0.644s
    user    0m0.003s
    sys     0m0.060s
    MIPS before (linux 2.6.37):
    real    0m 4.93s
    user    0m 0.00s
    sys     0m 4.61s
    MIPS after:
    real    0m 0.16s
    user    0m 0.00s
    sys     0m 0.06s
    Signed-off-by: "Theodore Ts'o" <>
    Signed-off-by: Andrey Sidorov <>
    Signed-off-by: franciscofranco <>
    Change-Id: Ie78a945cb82b3892eaf88701f2dc3b7726104fb5
  10. jbd2: optimize jbd2_journal_force_commit

    mrg666 authored and chil360 committed Dec 23, 2013
    Current implementation of jbd2_journal_force_commit() is suboptimal because
    result in empty and useless commits. But callers just want to force and wait
    any unfinished commits. We already have jbd2_journal_force_commit_nested()
    which does exactly what we want, except we are guaranteed that we do not hold
    journal transaction open.
    Signed-off-by: Dmitry Monakhov <>
    Signed-off-by: "Theodore Ts'o" <>
    Change-Id: I5c041a1898838e880714a913b5a915f105a8dfb9
  11. fs: vfat: reduce the worst case latencies

    xiaogang authored and chil360 committed May 28, 2013
    When a block partition is mounted with FAT file system
    and MS_DIRSYNC option is used, some file system operations
    like create, rename shall sleep in caller's context until
    all the metadata have been committed to the non-volatile memory.
    Since this operation is blocking call for user context,
    the WRITE_SYNC option must be used instead of WRITE
    (async operation) which incur inherent latencies while
    flushing the meta-data corresponding to directory entries
    Change-Id: I41c514889873a39d564271db0a421e6c66e5ae33
    Signed-off-by: xiaogang <>
  12. fs/sync: Make sync() satisfy many requests with one invocation

    chil360 committed Feb 19, 2014
    Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
    amazingly long NMI handlers from a trinity-induced workload involving
    lots of concurrent sync() calls (
    There are any number of things that one might do to make sync() behave
    better under high levels of contention, but it is also the case that
    multiple concurrent sync() system calls can be satisfied by a single
    sys_sync() invocation.
    Given that this situation is reminiscent of rcu_barrier(), this commit
    applies the rcu_barrier() approach to sys_sync().  This approach uses
    a global mutex and a sequence counter.  The mutex is held across the
    sync() operation, which eliminates contention between concurrent sync()
    operations.  The counter is incremented at the beginning and end of
    each sync() operation, so that it is odd while a sync() operation is in
    progress and even otherwise, just like sequence locks.
    The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
    now handles the concurrency.  The sys_sync() function first takes a
    snapshot of the counter, then acquires the mutex, and then takes another
    snapshot of the counter.  If the values of the two snapshots indicate that
    a full do_sync() executed during the mutex acquisition, the sys_sync()
    function releases the mutex and returns ("Our work is done!").  Otherwise,
    sys_sync() increments the counter, invokes do_sync(), and increments
    the counter again.
    This approach allows a single call to do_sync() to satisfy an arbitrarily
    large number of sync() system calls, which should eliminate issues due
    to large numbers of concurrent invocations of the sync() system call.
    Changes since v1 (
    o	Add a pair of memory barriers to keep the increments from
    	bleeding into the do_sync() code.  (The failure probability
    	is insanely low, but when you have several hundred million
    	devices running Linux, you can expect several hundred instances
    	of one-in-a-million failures.)
    o	Actually CC some people who have experience in this area.
    Reported-by: Dave Jones <>
    Signed-off-by: Paul E. McKenney <>
    Cc: Alexander Viro <>
    Cc: Christoph Hellwig <>
    Cc: Jan Kara <>
    Cc: Curt Wohlgemuth <>
    Cc: Jens Axboe <>
    Signed-off-by: Paul Reioux <>
  13. writeback: Fix occasional slow sync(1)

    jankara authored and chil360 committed Jul 12, 2013
    In case when system contains no dirty pages, wakeup_flusher_threads()
    will submit WB_SYNC_NONE writeback for 0 pages so wb_writeback() exits
    immediately without doing anything. Thus sync(1) will write all the
    dirty inodes from a WB_SYNC_ALL writeback pass which is slow.
    Fix the problem by using get_nr_dirty_pages() in
    wakeup_flusher_threads() instead of calculating number of dirty pages
    manually. That function also takes number of dirty inodes into account.
    Reported-by: Paul Taysom <>
    Signed-off-by: Jan Kara <>
    Signed-off-by: Cristoforo Cataldo <>
  14. hrtimer: Introduce effective timer slack

    kiryl authored and chil360 committed Oct 11, 2011
    task_get_effective_timer_slack() returns timer slack value to be used
    to configure per-task timers. It can be equal or higher than task's
    timer slack value.
    For now task_get_effective_timer_slack() returns timer_slack_ns of the
    task. Timer slack cgroup controller will implement a bit more
    sophisticated logic.
    Signed-off-by: Kirill A. Shutemov <>
    Signed-off-by: Cristoforo Cataldo <>
  15. fs/dyn_sync_cntrl: check dyn fsync control's active prior to performi…

    chil360 committed Feb 19, 2014
    …ng fsync ops
    Signed-off-by: Andrew Bartholomew <>
  16. fs: Asynchronous I/O latency to a solid-state disk greatly increased

    kleikamp authored and chil360 committed Jun 26, 2012
    between the 2.6.32 and 3.0 kernels. By removing the plug from
    do_io_submit(), we observed a 34% improvement in the I/O latency.
    Unfortunately, at this level, we don't know if the request is to
    a rotating disk or not.
Commits on Feb 18, 2014
  1. ARM: only allow kernel mode neon with AEABI

    chil360 committed Feb 18, 2014
    This prevents the linker erroring with:
    arm-linux-ld: error: arch/arm/lib/xor-neon.o uses VFP instructions, whereas arch/arm/lib/built-in.o does not
    arm-linux-ld: failed to merge target specific data of file arch/arm/lib/xor-neon.o
    This is due to the non-neon files being marked as containing FPA data/
    instructions (even though they do not) being mixed with files which
    contain VFP, which is an incompatible floating point format.
    Signed-off-by: Russell King <>
  2. ARM: Perform the creation of procfs node for VFP later

    chil360 committed Feb 18, 2014
    The creation of procfs node for VFP bounce reporting failed
    when placed in early init, so perform this creation later.
  3. ARM: move VFP init to an earlier boot stage

    chil360 committed Feb 18, 2014
    In order to use the NEON unit in the kernel, we should
    initialize it a bit earlier in the boot process so NEON users
    that like to do a quick benchmark at load time (like the
    xor_blocks or RAID-6 code) find the NEON/VFP unit already
    Replaced late_initcall() with core_initcall().
    Signed-off-by: Ard Biesheuvel <>
    Acked-by: Nicolas Pitre <>
  4. ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled

    ardbiesheuvel authored and chil360 committed Jan 28, 2014
    Commit 0195659 introduced a NEON accelerated version of the xor_blocks()
    function, but it needs the changes in this patch to allow it to be built
    as a module rather than statically into the kernel.
    This patch creates a separate module xor-neon.ko which exports the NEON
    inner xor_blocks() functions depended upon by the regular xor.ko if it
    is built with CONFIG_KERNEL_MODE_NEON=y
    Reported-by: Josh Boyer <>
    Signed-off-by: Ard Biesheuvel <>
    Signed-off-by: Russell King <>
  5. ARM: crypto: add NEON accelerated XOR implementation

    ardbiesheuvel authored and chil360 committed May 17, 2013
    Add a source file xor-neon.c (which is really just the reference
    C implementation passed through the GCC vectorizer) and hook it
    up to the XOR framework.
    Signed-off-by: Ard Biesheuvel <>
    Acked-by: Nicolas Pitre <>
  6. ARM: add support for kernel mode NEON

    chil360 committed Feb 18, 2014
    In order to safely support the use of NEON instructions in
    kernel mode, some precautions need to be taken:
    - the userland context that may be present in the registers (even
      if the NEON/VFP is currently disabled) must be stored under the
      correct task (which may not be 'current' in the UP case),
    - to avoid having to keep track of additional vfpstates for the
      kernel side, disallow the use of NEON in interrupt context
      and run with preemption disabled,
    - after use, re-enable preemption and re-enable the lazy restore
      machinery by disabling the NEON/VFP unit.
    This patch adds the functions kernel_neon_begin() and
    kernel_neon_end() which take care of the above. It also adds
    the Kconfig symbol KERNEL_MODE_NEON to enable it.
    Signed-off-by: Ard Biesheuvel <>
    Acked-by: Nicolas Pitre <>
  7. arm/crypto: Add optimized AES and SHA1 routines

    chil360 committed Feb 18, 2014
    Add assembler versions of AES and SHA1 for ARM platforms.  This has provided
    up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1.
    Platform   CPU SPeed    Endian   Before (bps)   After (bps)   Improvement
    IXP425      533 MHz      big     11217042        15566294        ~38%
    KS8695      166 MHz     little    3828549         5795373        ~51%
    Signed-off-by: David McCullough <>
Commits on Feb 16, 2014
  1. Enable OndemandX

    chil360 committed Feb 16, 2014
  2. Add more gcc optimizations

    chil360 committed Feb 16, 2014
  3. Remove SavagedZen

    chil360 committed Feb 16, 2014
  4. Add Smartass2

    chil360 committed Feb 16, 2014
  5. Add SavagedZen

    chil360 committed Feb 16, 2014
You can’t perform that action at this time.