Skip to content
Permalink
Vineet-Gupta/A…
Switch branches/tags

Commits on Aug 12, 2021

  1. ARC: mm: introduce _PAGE_TABLE to explicitly link pgd,pud,pmd entries

    ARCv3 hardware walker expects Table Descriptors to have b'11 in LSB bits
    to continue moving to next level.
    
    This commits adds that (to ARCv2 code) and ensures that it works in
    software walked regime.
    
    The pte entries stil need tagging, but that is not possible in ARCv2
    since the LSB 2 bits are currently used.
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  2. ARC: mm: vmalloc sync from kernel to user table to update PMD ...

    ... not PGD
    
    vmalloc() sets up the kernel page table (starting from @swapper_pg_dir).
    But when vmalloc area is accessed in context of a user task, say opening
    terminal in n_tty_open(), the user page tables need to be synced from
    kernel page tables so that TLB entry is created in "user context".
    
    The old code was doing this incorrectly, as it was updating the user pgd
    entry (first level itself) to point to kernel pud table (2nd level),
    effectively yanking away the entire user space translation with kernel one.
    
    The correct way to do this is to ONLY update a user space pgd/pud/pmd entry
    if it is not popluated already. This ensures that only the missing leaf
    pmd entry gets updated to point to relevant kernel pte table.
    
    From code change pov, we are chaging the pattern:
    
    	p4d = p4d_offset(pgd, address);
    	p4d_k = p4d_offset(pgd_k, address);
    	if (!p4d_present(*p4d_k))
    		goto bad_area;
    	set_p4d(p4d, *p4d_k);
    
    with
    	p4d = p4d_offset(pgd, address);
    	p4d_k = p4d_offset(pgd_k, address);
    	if (p4d_none(*p4d_k))
    		goto bad_area;
    	if (!p4d_present(*p4d))
    		set_p4d(p4d, *p4d_k);
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  3. ARC: mm: support 4 levels of page tables

    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  4. ARC: mm: support 3 levels of page tables

    ARCv2 MMU is software walked and Linux implements 2 levels of paging: pgd/pte.
    Forthcoming hw will have multiple levels, so this change preps mm code
    for same. It is also fun to try multi levels even on soft-walked code to
    ensure generic mm code is robust to handle.
    
    overview
    ________
    
    2 levels {pgd, pte} : pmd is folded but pmd_* macros are valid and operate on pgd
    3 levels {pgd, pmd, pte}:
      - pud is folded and pud_* macros point to pgd
      - pmd_* macros operate on actual pmd
    
    code changes
    ____________
    
    1. #include <asm-generic/pgtable-nopud.h>
    
    2. Define CONFIG_PGTABLE_LEVELS 3
    
    3a. Define PMD_SHIFT, PMD_SIZE, PMD_MASK, pmd_t
    3b. Define pmd_val() which actually deals with pmd
        (pmd_offset(), pmd_index() are provided by generic code)
    3c. pmd_alloc_one()/pmd_free() also provided by generic code
        (pmd_populate/pmd_free already exist)
    
    4. Define pud_none(), pud_bad() macros based on generic pud_val() which
       internally pertains to pgd now.
    4b. define pud_populate() to just setup pgd
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  5. ARC: mm: hack to allow 2 level build with 4 level code

    PMD_SHIFT is mapped to PUD_SHIFT or PGD_SHIFT by asm-generic/pgtable-*
    but only for !__ASSEMBLY__
    
    tlbex.S asm code has PTRS_PER_PTE which uses PMD_SHIFT hence barfs
    for CONFIG_PGTABLE_LEVEL={2,3} and works for 4.
    
    So add a workaround local to tlbex.S - the proper fix is to change
    asm-generic/pgtable-* headers to expose the defines for __ASSEMBLY__ too
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  6. ARC: mm: disintegrate pgtable.h into levels and flags

     - pgtable-bits-arcv2.h (MMU specific page table flags)
     - pgtable-levels.h (paging levels)
    
    No functional changes, but paves way for easy addition of new MMU code
    with different bits and levels etc
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  7. ARC: mm: disintegrate mmu.h (arcv2 bits out)

    non functional change
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  8. ARC: mm: move MMU specific bits out of entry code ...

    ... to avoid polluting shared entry code (across three ISA variants)
    with ISA/MMU specific code.
    
    Cc: Jose Abreu <joabreu@synopsys.com>
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  9. ARC: mm: move MMU specific bits out of ASID allocator

    And while at it, rewrite commentary on ASID allocator
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  10. ARC: mm: non-functional code cleanup ahead of 3 levels

    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  11. ARC: mm: switch to asm-generic/pgalloc.h

    With previous patch ARC pgalloc functions are same as generic, hence
    switch to that.
    
    Suggested-by: Mike Rapoport <rppt@kernel.org>
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  12. ARC: mm: switch pgtable_t back to struct page *

    So far ARC pgtable_t has not been struct page based to avoid extra
    page_address() calls involved. However the differences are down to
    noise and get in the way of using generic code, hence this patch.
    
    Suggested-by: Mike Rapoport <rppt@kernel.org>
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  13. ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set)

    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  14. ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag

    and remove the one off uncached definition for ARC
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  15. ARC: mm: Enable STRICT_MM_TYPECHECKS

    In the past I've refrained from doing this (at least 2 times) due to the
    slight code bloat due to ABI implications of pte_t etc becoming struct
    
    Per ARC ABI, functions return struct via memory and not through register
    r0, even if the struct would fit in register(s)
    
     - caller allocates space on stack and passes the address as first arg
       (r0), shifting rest of args by one
    
     - callee creates return struct in memory (referenced via r0)
    
    This time around the code actually shrunk slightly (due to subtle
    inlining heuristic effects), but still slightly inefficient due to
    return values passed through memory. That however seems like a small
    cost compared to maintenance burden given the impending new mmu support
    for page walk etc
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  16. ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS

    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  17. ARC: mm: move mmu/cache externs out to setup.h

    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  18. ARC: mm: remove tlb paranoid code

    This was used back in arc700 days when ASID allocator was fragile.
    Not needed in last 5 years
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021
  19. ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only

    MMU SCRATCH_DATA0 register is intended to cache task pgd. However in
    ARC700 SMP port, it has to be repurposed for reentrant interrupt
    handling, while UP port doesn't. We  currently ahandle boe usecases
    using a fabricated which has usual issues of dependency nesting and
    ugliness.
    
    So clean this up: for ARC700 don't use to cache pgd (even in UP) and do
    the opposite for ARCv2.
    
    And while here, switch to canonical pgd_offset().
    
    Signed-off-by: Vineet Gupta <vgupta@kernel.org>
    vineetgarc authored and intel-lab-lkp committed Aug 12, 2021

Commits on Aug 6, 2021

  1. ARC: retire MMUv1 and MMUv2 support

    These were present in ancient ARC700 cores which don't seem to be
    active in field anymore.
    
    Removal helps cleanup code and remove the hack for
    MMU_VER to MMU_V[3-4] conversion
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 6, 2021
  2. ARC: retire ARC750 support

    There's no known/active customer using them with latest kernels anyways.
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 6, 2021

Commits on Aug 5, 2021

  1. ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants

    And move them out of cmpxchg.h to canonical atomic.h
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  2. ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)

    It only makes sense to do this for the LLSC config
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  3. ARC: cmpxchg/xchg: rewrite as macros to make type safe

    Existing code forces/assume args to type "long" which won't work in LP64
    regime, so prepare code for that
    
    Interestingly this should be a non functional change but I do see
    some codegen changes
    
    | bloat-o-meter vmlinux-cmpxchg-A vmlinux-cmpxchg-B
    | add/remove: 0/0 grow/shrink: 17/12 up/down: 218/-150 (68)
    |
    | Function                                     old     new   delta
    | rwsem_optimistic_spin                        518     550     +32
    | rwsem_down_write_slowpath                   1244    1274     +30
    | __do_sys_perf_event_open                    2576    2600     +24
    | down_read                                    192     200      +8
    | __down_read                                  192     200      +8
    ...
    | task_work_run                                168     148     -20
    | dma_fence_chain_walk.part                    760     736     -24
    | __genradix_ptr_alloc                         674     646     -28
    
    Total: Before=6187409, After=6187477, chg +0.00%
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  4. ARC: xchg: !LLSC: remove UP micro-optimization/hack

    It gets in the way of cleaning things up and is a maintenance
    pain-in-neck !
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  5. ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines

    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  6. ARC: switch to generic bitops

     - !LLSC now only needs a single spinlock for atomics and bitops
    
     - Some codegen changes (slight bloat) with generic bitops
    
       1. code increase due to LD-check-atomic paradigm vs. unconditonal
          atomic (but dirty'ing the cache line even if set already).
          So despite increase, generic is right thing to do.
    
       2. code decrease (but use of costlier instructions such as DIV vs.
          shifts based math) due to signed arithmetic.
          This needs to be revisited seperately.
    
         arc:
         static inline int test_bit(unsigned int nr, const volatile unsigned long *addr)
                                    ^^^^^^^^^^^^
         generic:
         static inline int test_bit(int nr, const volatile unsigned long *addr)
                                    ^^^
    
    Link: https://lore.kernel.org/r/20180830135749.GA13005@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    [vgupta: wrote patch based on Will's poc, analysed codegen diffs]
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    wildea01 authored and vineetgarc committed Aug 5, 2021
  7. ARC: atomics: implement relaxed variants

    The current ARC fetch/return atomics provide fully ordered semantics
    only with 2 full barriers around the operation.
    
    Instead implement them as relaxed variants without any barriers and
    rely on generic code to generate the fully-ordered, acquire and release
    varaints by adding the appropriate full barriers.
    
    This helps elide some extra barriers in case of acquire/release/relaxed
    calls.
    
    bloat-o-meter for hsdk defconfig shows codegen improvements, although
    numbers below inflated due to unrelated inlining heuristic changes
    
    | bloat-o-meter vmlinux-643babe34fd7-non-relaxed vmlinux-45aa05cb44d7-relaxed
    | add/remove: 2/5 grow/shrink: 42/1222 up/down: 4158/-14312 (-10154)
    | Function                                     old     new   delta
    | ..
    | sys_renameat                                 462     476     +14
    | ip_mc_inc_group                              424     436     +12
    | do_read_cache_page                          1882    1894     +12
    | ..
    | refcount_dec_and_mutex_lock                  254     250      -4
    | refcount_dec_and_lock_irqsave                258     254      -4
    | refcount_dec_and_lock                        254     250      -4
    | ..
    | tcp_v6_route_req                             246     238      -8
    | tcp_v4_destroy_sock                          286     278      -8
    | tcp_twsk_unique                              352     344      -8
    
    Link: https://lore.kernel.org/r/20180830144344.GW24142@hirez.programming.kicks-ass.net
    Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  8. ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return

    This is a non-functional change since those wrappers are not
    used in kernel sources at all.
    
    Link: http://lists.infradead.org/pipermail/linux-snps-arc/2018-August/004246.html
    Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  9. ARC: atomic: !LLSC: use int data type consistently

    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  10. ARC: atomic: !LLSC: remove hack in atomic_set() for for UP

    !LLSC atomics use spinlock (SMP) or irq-disable (UP) to implement
    criticla regions. UP atomic_set() however was "cheating" by not doing
    any of that so and still being functional.
    
    Remove this anomaly (primarily as cleanup for future code improvements)
    given that this config is not worth hassle of special case code.
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021
  11. ARC: atomics: disintegrate header

    Non functional change, to ease future addition/removal
    
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
    vineetgarc committed Aug 5, 2021

Commits on Aug 2, 2021

  1. Linux 5.14-rc4

    torvalds committed Aug 2, 2021

Commits on Aug 1, 2021

  1. Merge tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel…

    ….org/pub/scm/linux/kernel/git/acme/linux
    
    Pull perf tools fixes from Arnaldo Carvalho de Melo:
    
     - Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top'
       abort, uncovering a design flaw on how namespace information is kept.
       The fix for that is more than we can do right now, leave it for the
       next merge window.
    
     - Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up
       the decoding of some records.
    
     - Fix PMU alias matching.
    
    Thanks to James Clark and John Garry for these fixes.
    
    * tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
      Revert "perf map: Fix dso->nsinfo refcounting"
      perf pmu: Fix alias matching
      perf cs-etm: Split --dump-raw-trace by AUX records
    torvalds committed Aug 1, 2021
  2. Merge tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/powerpc/linux
    
    Pull powerpc fixes from Michael Ellerman:
    
     - Don't use r30 in VDSO code, to avoid breaking existing Go lang
       programs.
    
     - Change an export symbol to allow non-GPL modules to use spinlocks
       again.
    
    Thanks to Paul Menzel, and Srikar Dronamraju.
    
    * tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/vdso: Don't use r30 to avoid breaking Go lang
      powerpc/pseries: Fix regression while building external modules
    torvalds committed Aug 1, 2021
Older