Vineet-Gupta/A…
Commits on Aug 12, 2021
-
ARC: mm: introduce _PAGE_TABLE to explicitly link pgd,pud,pmd entries
ARCv3 hardware walker expects Table Descriptors to have b'11 in LSB bits to continue moving to next level. This commits adds that (to ARCv2 code) and ensures that it works in software walked regime. The pte entries stil need tagging, but that is not possible in ARCv2 since the LSB 2 bits are currently used. Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: vmalloc sync from kernel to user table to update PMD ...
... not PGD vmalloc() sets up the kernel page table (starting from @swapper_pg_dir). But when vmalloc area is accessed in context of a user task, say opening terminal in n_tty_open(), the user page tables need to be synced from kernel page tables so that TLB entry is created in "user context". The old code was doing this incorrectly, as it was updating the user pgd entry (first level itself) to point to kernel pud table (2nd level), effectively yanking away the entire user space translation with kernel one. The correct way to do this is to ONLY update a user space pgd/pud/pmd entry if it is not popluated already. This ensures that only the missing leaf pmd entry gets updated to point to relevant kernel pte table. From code change pov, we are chaging the pattern: p4d = p4d_offset(pgd, address); p4d_k = p4d_offset(pgd_k, address); if (!p4d_present(*p4d_k)) goto bad_area; set_p4d(p4d, *p4d_k); with p4d = p4d_offset(pgd, address); p4d_k = p4d_offset(pgd_k, address); if (p4d_none(*p4d_k)) goto bad_area; if (!p4d_present(*p4d)) set_p4d(p4d, *p4d_k); Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: support 4 levels of page tables
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: support 3 levels of page tables
ARCv2 MMU is software walked and Linux implements 2 levels of paging: pgd/pte. Forthcoming hw will have multiple levels, so this change preps mm code for same. It is also fun to try multi levels even on soft-walked code to ensure generic mm code is robust to handle. overview ________ 2 levels {pgd, pte} : pmd is folded but pmd_* macros are valid and operate on pgd 3 levels {pgd, pmd, pte}: - pud is folded and pud_* macros point to pgd - pmd_* macros operate on actual pmd code changes ____________ 1. #include <asm-generic/pgtable-nopud.h> 2. Define CONFIG_PGTABLE_LEVELS 3 3a. Define PMD_SHIFT, PMD_SIZE, PMD_MASK, pmd_t 3b. Define pmd_val() which actually deals with pmd (pmd_offset(), pmd_index() are provided by generic code) 3c. pmd_alloc_one()/pmd_free() also provided by generic code (pmd_populate/pmd_free already exist) 4. Define pud_none(), pud_bad() macros based on generic pud_val() which internally pertains to pgd now. 4b. define pud_populate() to just setup pgd Signed-off-by: Vineet Gupta <vgupta@kernel.org> -
ARC: mm: hack to allow 2 level build with 4 level code
PMD_SHIFT is mapped to PUD_SHIFT or PGD_SHIFT by asm-generic/pgtable-* but only for !__ASSEMBLY__ tlbex.S asm code has PTRS_PER_PTE which uses PMD_SHIFT hence barfs for CONFIG_PGTABLE_LEVEL={2,3} and works for 4. So add a workaround local to tlbex.S - the proper fix is to change asm-generic/pgtable-* headers to expose the defines for __ASSEMBLY__ too Signed-off-by: Vineet Gupta <vgupta@kernel.org> -
ARC: mm: disintegrate pgtable.h into levels and flags
- pgtable-bits-arcv2.h (MMU specific page table flags) - pgtable-levels.h (paging levels) No functional changes, but paves way for easy addition of new MMU code with different bits and levels etc Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: disintegrate mmu.h (arcv2 bits out)
non functional change Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: move MMU specific bits out of entry code ...
... to avoid polluting shared entry code (across three ISA variants) with ISA/MMU specific code. Cc: Jose Abreu <joabreu@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: move MMU specific bits out of ASID allocator
And while at it, rewrite commentary on ASID allocator Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: non-functional code cleanup ahead of 3 levels
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: switch to asm-generic/pgalloc.h
With previous patch ARC pgalloc functions are same as generic, hence switch to that. Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: switch pgtable_t back to struct page *
So far ARC pgtable_t has not been struct page based to avoid extra page_address() calls involved. However the differences are down to noise and get in the way of using generic code, hence this patch. Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set)
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag
and remove the one off uncached definition for ARC Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: Enable STRICT_MM_TYPECHECKS
In the past I've refrained from doing this (at least 2 times) due to the slight code bloat due to ABI implications of pte_t etc becoming struct Per ARC ABI, functions return struct via memory and not through register r0, even if the struct would fit in register(s) - caller allocates space on stack and passes the address as first arg (r0), shifting rest of args by one - callee creates return struct in memory (referenced via r0) This time around the code actually shrunk slightly (due to subtle inlining heuristic effects), but still slightly inefficient due to return values passed through memory. That however seems like a small cost compared to maintenance burden given the impending new mmu support for page walk etc Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: move mmu/cache externs out to setup.h
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: remove tlb paranoid code
This was used back in arc700 days when ASID allocator was fragile. Not needed in last 5 years Signed-off-by: Vineet Gupta <vgupta@kernel.org>
-
ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only
MMU SCRATCH_DATA0 register is intended to cache task pgd. However in ARC700 SMP port, it has to be repurposed for reentrant interrupt handling, while UP port doesn't. We currently ahandle boe usecases using a fabricated which has usual issues of dependency nesting and ugliness. So clean this up: for ARC700 don't use to cache pgd (even in UP) and do the opposite for ARCv2. And while here, switch to canonical pgd_offset(). Signed-off-by: Vineet Gupta <vgupta@kernel.org>
Commits on Aug 6, 2021
-
ARC: retire MMUv1 and MMUv2 support
These were present in ancient ARC700 cores which don't seem to be active in field anymore. Removal helps cleanup code and remove the hack for MMU_VER to MMU_V[3-4] conversion Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
There's no known/active customer using them with latest kernels anyways. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Commits on Aug 5, 2021
-
ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
And move them out of cmpxchg.h to canonical atomic.h Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
It only makes sense to do this for the LLSC config Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: cmpxchg/xchg: rewrite as macros to make type safe
Existing code forces/assume args to type "long" which won't work in LP64 regime, so prepare code for that Interestingly this should be a non functional change but I do see some codegen changes | bloat-o-meter vmlinux-cmpxchg-A vmlinux-cmpxchg-B | add/remove: 0/0 grow/shrink: 17/12 up/down: 218/-150 (68) | | Function old new delta | rwsem_optimistic_spin 518 550 +32 | rwsem_down_write_slowpath 1244 1274 +30 | __do_sys_perf_event_open 2576 2600 +24 | down_read 192 200 +8 | __down_read 192 200 +8 ... | task_work_run 168 148 -20 | dma_fence_chain_walk.part 760 736 -24 | __genradix_ptr_alloc 674 646 -28 Total: Before=6187409, After=6187477, chg +0.00% Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: xchg: !LLSC: remove UP micro-optimization/hack
It gets in the way of cleaning things up and is a maintenance pain-in-neck ! Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
- !LLSC now only needs a single spinlock for atomics and bitops - Some codegen changes (slight bloat) with generic bitops 1. code increase due to LD-check-atomic paradigm vs. unconditonal atomic (but dirty'ing the cache line even if set already). So despite increase, generic is right thing to do. 2. code decrease (but use of costlier instructions such as DIV vs. shifts based math) due to signed arithmetic. This needs to be revisited seperately. arc: static inline int test_bit(unsigned int nr, const volatile unsigned long *addr) ^^^^^^^^^^^^ generic: static inline int test_bit(int nr, const volatile unsigned long *addr) ^^^ Link: https://lore.kernel.org/r/20180830135749.GA13005@arm.com Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> [vgupta: wrote patch based on Will's poc, analysed codegen diffs] Signed-off-by: Vineet Gupta <vgupta@synopsys.com> -
ARC: atomics: implement relaxed variants
The current ARC fetch/return atomics provide fully ordered semantics only with 2 full barriers around the operation. Instead implement them as relaxed variants without any barriers and rely on generic code to generate the fully-ordered, acquire and release varaints by adding the appropriate full barriers. This helps elide some extra barriers in case of acquire/release/relaxed calls. bloat-o-meter for hsdk defconfig shows codegen improvements, although numbers below inflated due to unrelated inlining heuristic changes | bloat-o-meter vmlinux-643babe34fd7-non-relaxed vmlinux-45aa05cb44d7-relaxed | add/remove: 2/5 grow/shrink: 42/1222 up/down: 4158/-14312 (-10154) | Function old new delta | .. | sys_renameat 462 476 +14 | ip_mc_inc_group 424 436 +12 | do_read_cache_page 1882 1894 +12 | .. | refcount_dec_and_mutex_lock 254 250 -4 | refcount_dec_and_lock_irqsave 258 254 -4 | refcount_dec_and_lock 254 250 -4 | .. | tcp_v6_route_req 246 238 -8 | tcp_v4_destroy_sock 286 278 -8 | tcp_twsk_unique 352 344 -8 Link: https://lore.kernel.org/r/20180830144344.GW24142@hirez.programming.kicks-ass.net Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
This is a non-functional change since those wrappers are not used in kernel sources at all. Link: http://lists.infradead.org/pipermail/linux-snps-arc/2018-August/004246.html Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: atomic: !LLSC: use int data type consistently
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
!LLSC atomics use spinlock (SMP) or irq-disable (UP) to implement criticla regions. UP atomic_set() however was "cheating" by not doing any of that so and still being functional. Remove this anomaly (primarily as cleanup for future code improvements) given that this config is not worth hassle of special case code. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
ARC: atomics: disintegrate header
Non functional change, to ease future addition/removal Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Commits on Aug 2, 2021
Commits on Aug 1, 2021
-
Merge tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel…
….org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top' abort, uncovering a design flaw on how namespace information is kept. The fix for that is more than we can do right now, leave it for the next merge window. - Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up the decoding of some records. - Fix PMU alias matching. Thanks to James Clark and John Garry for these fixes. * tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: Revert "perf map: Fix dso->nsinfo refcounting" perf pmu: Fix alias matching perf cs-etm: Split --dump-raw-trace by AUX records
-
Merge tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kern…
…el/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Don't use r30 in VDSO code, to avoid breaking existing Go lang programs. - Change an export symbol to allow non-GPL modules to use spinlocks again. Thanks to Paul Menzel, and Srikar Dronamraju. * tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: Don't use r30 to avoid breaking Go lang powerpc/pseries: Fix regression while building external modules