Skip to content
Permalink
Aneesh-Kumar-K…
Switch branches/tags

Commits on Oct 20, 2021

  1. mm/mempolicy: wire up syscall set_mempolicy_home_node

    Cc: Ben Widawsky <ben.widawsky@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: linux-api@vger.kernel.org
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    kvaneesh authored and intel-lab-lkp committed Oct 20, 2021
  2. mm/mempolicy: add set_mempolicy_home_node syscall

    This syscall can be used to set a home node for the MPOL_BIND
    and MPOL_PREFERRED_MANY memory policy. Users should use this
    syscall after setting up a memory policy for the specified range
    as shown below.
    
    mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
    	    new_nodes->size + 1, 0);
    sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
    				  home_node, 0);
    
    The syscall allows specifying a home node/preferred node from which kernel
    will fulfill memory allocation requests first.
    
    For address range with MPOL_BIND memory policy, if nodemask specifies more
    than one node, page allocations will come from the node in the nodemask
    with sufficient free memory that is closest to the home node/preferred node.
    
    For MPOL_PREFERRED_MANY if the nodemask specifies more than one node,
    page allocation will come from the node in the nodemask with sufficient
    free memory that is closest to the home node/preferred node. If there is
    not enough memory in all the nodes specified in the nodemask, the allocation
    will be attempted from the closest numa node to the home node in the system.
    
    This helps applications to hint at a memory allocation preference node
    and fallback to _only_ a set of nodes if the memory is not available
    on the preferred node.  Fallback allocation is attempted from the node which is
    nearest to the preferred node.
    
    This helps applications to have control on memory allocation numa nodes and
    avoids default fallback to slow memory NUMA nodes. For example a system with
    NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of slow memory
    
     new_nodes = numa_bitmask_alloc(nr_nodes);
    
     numa_bitmask_setbit(new_nodes, 1);
     numa_bitmask_setbit(new_nodes, 2);
     numa_bitmask_setbit(new_nodes, 3);
    
     p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0);
     mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,  new_nodes->size + 1, 0);
    
     sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0);
    
    This will allocate from nodes closer to node 2 and will make sure kernel will
    only allocate from nodes 1, 2 and3. Memory will not be allocated from slow memory
    nodes 10, 11 and 12
    
    With MPOL_PREFERRED_MANY on the other hand will first try to allocate from the
    closest node to node 2 from the node list 1, 2 and 3. If those nodes don't have
    enough memory, kernel will allocate from slow memory node 10, 11 and 12 which
    ever is closer to node 2.
    
    Cc: Ben Widawsky <ben.widawsky@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: linux-api@vger.kernel.org
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    kvaneesh authored and intel-lab-lkp committed Oct 20, 2021
  3. mm/mempolicy: use policy_node helper with MPOL_PREFERRED_MANY

    A followup patch will enable setting a home node with MPOL_PREFERRED_MANY
    memory policy. To facilitate that switch to using policy_node helper.
    There is no functional change in this patch.
    
    Cc: Ben Widawsky <ben.widawsky@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: linux-api@vger.kernel.org
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    kvaneesh authored and intel-lab-lkp committed Oct 20, 2021

Commits on Oct 13, 2021

  1. Merge branches 'for-next/kexec', 'for-next/kselftest', 'for-next/misc…

    …', 'for-next/mm', 'for-next/mte', 'for-next/perf', 'for-next/pfn-valid' and 'for-next/scs' into for-next/core
    willdeacon committed Oct 13, 2021

Commits on Oct 12, 2021

  1. acpi/arm64: fix next_platform_timer() section mismatch error

    Fix modpost Section mismatch error in next_platform_timer().
    
      [...]
      WARNING: modpost: vmlinux.o(.text.unlikely+0x26e60): Section mismatch in reference from the function next_platform_timer() to the variable .init.data:acpi_gtdt_desc
      The function next_platform_timer() references
      the variable __initdata acpi_gtdt_desc.
      This is often because next_platform_timer lacks a __initdata
      annotation or the annotation of acpi_gtdt_desc is wrong.
    
      WARNING: modpost: vmlinux.o(.text.unlikely+0x26e64): Section mismatch in reference from the function next_platform_timer() to the variable .init.data:acpi_gtdt_desc
      The function next_platform_timer() references
      the variable __initdata acpi_gtdt_desc.
      This is often because next_platform_timer lacks a __initdata
      annotation or the annotation of acpi_gtdt_desc is wrong.
    
      ERROR: modpost: Section mismatches detected.
      Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.
      make[1]: *** [scripts/Makefile.modpost:59: vmlinux.symvers] Error 1
      make[1]: *** Deleting file 'vmlinux.symvers'
      make: *** [Makefile:1176: vmlinux] Error 2
      [...]
    
    Fixes: a712c3e ("acpi/arm64: Add memory-mapped timer support in GTDT driver")
    Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
    Acked-by: Hanjun Guo <guohanjun@huawei.com>
    Link: https://lore.kernel.org/r/20210823092526.2407526-1-liu.yun@linux.dev
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    JackieLiu1 authored and ctmarinas committed Oct 12, 2021
  2. arm64: ftrace: use function_nocfi for _mcount as well

    Commit 800618f ("arm64: ftrace: use function_nocfi for ftrace_call")
    only fixed address of ftrace_call but address of _mcount needs to be
    fixed as well. Use function_nocfi() to get the actual address of _mcount
    function as with CONFIG_CFI_CLANG, the compiler replaces function pointers
    with jump table addresses which breaks dynamic ftrace as the address of
    _mcount is replaced with the address of _mcount.cfi_jt.
    
    With mainline, this won't be a problem since by default
    CONFIG_DYNAMIC_FTRACE_WITH_REGS=y with Clang >= 10 as it supports
    -fpatchable-function-entry and CFI requires Clang 12 but for consistency
    we should add function_nocfi() for _mcount as well.
    
    Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
    Acked-by: Mark Rutland <mark.rutland@arm.com>
    Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
    Link: https://lore.kernel.org/r/20211011125059.3378646-1-sumit.garg@linaro.org
    Signed-off-by: Will Deacon <will@kernel.org>
    b49020 authored and willdeacon committed Oct 12, 2021
  3. arm64: asm: setup.h: export common variables

    When building the kernel with sparse enabled 'C=1' the following
    warnings can be seen:
    
    arch/arm64/kernel/setup.c:58:13: warning: symbol '__fdt_pointer' was not declared. Should it be static?
    arch/arm64/kernel/setup.c:84:25: warning: symbol 'boot_args' was not declared. Should it be static?
    
    Rework so the variables are exported, since these two variable are
    created and used in setup.c, also used in head.S.
    
    Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
    Link: https://lore.kernel.org/r/20211007195601.677474-1-anders.roxell@linaro.org
    Signed-off-by: Will Deacon <will@kernel.org>
    roxell authored and willdeacon committed Oct 12, 2021

Commits on Oct 11, 2021

  1. arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE

    For non-4K PAGE_SIZE configs, the largest gigantic huge page size is
    CONT_PMD_SHIFT order. On arm64 with 64K PAGE_SIZE, the gigantic page is
    16G. Therefore, one should be able to specify 'hugetlb_cma=16G' on the
    kernel command line so that one gigantic page can be allocated from CMA.
    However, when adding such an option the following message is produced:
    
    hugetlb_cma: cma area should be at least 8796093022208 MiB
    
    This is because the calculation for non-4K gigantic page order is
    incorrect in the arm64 specific routine arm64_hugetlb_cma_reserve().
    
    Fixes: abb7962 ("arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs")
    Cc: <stable@vger.kernel.org> # 5.9.x
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Link: https://lore.kernel.org/r/20211005202529.213812-1-mike.kravetz@oracle.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    mjkravetz authored and ctmarinas committed Oct 11, 2021

Commits on Oct 7, 2021

  1. kasan: Extend KASAN mode kernel parameter

    Architectures supported by KASAN_HW_TAGS can provide an asymmetric mode
    of execution. On an MTE enabled arm64 hw for example this can be
    identified with the asymmetric tagging mode of execution. In particular,
    when such a mode is present, the CPU triggers a fault on a tag mismatch
    during a load operation and asynchronously updates a register when a tag
    mismatch is detected during a store operation.
    
    Extend the KASAN HW execution mode kernel command line parameter to
    support asymmetric mode.
    
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Link: https://lore.kernel.org/r/20211006154751.4463-6-vincenzo.frascino@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    fvincenzo authored and willdeacon committed Oct 7, 2021
  2. arm64: mte: Add asymmetric mode support

    MTE provides an asymmetric mode for detecting tag exceptions. In
    particular, when such a mode is present, the CPU triggers a fault
    on a tag mismatch during a load operation and asynchronously updates
    a register when a tag mismatch is detected during a store operation.
    
    Add support for MTE asymmetric mode.
    
    Note: If the CPU does not support MTE asymmetric mode the kernel falls
    back on synchronous mode which is the default for kasan=on.
    
    Cc: Will Deacon <will@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
    Link: https://lore.kernel.org/r/20211006154751.4463-5-vincenzo.frascino@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    fvincenzo authored and willdeacon committed Oct 7, 2021
  3. arm64: mte: CPU feature detection for Asymm MTE

    Add the cpufeature entries to detect the presence of Asymmetric MTE.
    
    Note: The tag checking mode is initialized via cpu_enable_mte() ->
    kasan_init_hw_tags() hence to enable it we require asymmetric mode
    to be at least on the boot CPU. If the boot CPU does not have it, it is
    fine for late CPUs to have it as long as the feature is not enabled
    (ARM64_CPUCAP_BOOT_CPU_FEATURE).
    
    Cc: Will Deacon <will@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
    Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
    Link: https://lore.kernel.org/r/20211006154751.4463-4-vincenzo.frascino@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    fvincenzo authored and willdeacon committed Oct 7, 2021
  4. arm64: mte: Bitfield definitions for Asymm MTE

    Add Asymmetric Memory Tagging Extension bitfield definitions.
    
    Cc: Will Deacon <will@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20211006154751.4463-3-vincenzo.frascino@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    fvincenzo authored and willdeacon committed Oct 7, 2021
  5. kasan: Remove duplicate of kasan_flag_async

    After merging async mode for KASAN_HW_TAGS a duplicate of the
    kasan_flag_async flag was left erroneously inside the code.
    
    Remove the duplicate.
    
    Note: This change does not bring functional changes to the code
    base.
    
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Marco Elver <elver@google.com>
    Cc: Evgenii Stepanov <eugenis@google.com>
    Cc: Andrey Konovalov <andreyknvl@gmail.com>
    Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
    Link: https://lore.kernel.org/r/20211006154751.4463-2-vincenzo.frascino@arm.com
    Signed-off-by: Will Deacon <will@kernel.org>
    fvincenzo authored and willdeacon committed Oct 7, 2021
  6. selftests: arm64: Add coverage of ptrace flags for SVE VL inheritance

    Add a test that covers enabling and disabling of SVE vector length
    inheritance via the ptrace interface.
    
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20211005123537.976795-1-broonie@kernel.org
    Signed-off-by: Will Deacon <will@kernel.org>
    broonie authored and willdeacon committed Oct 7, 2021

Commits on Oct 4, 2021

  1. drivers/perf: Improve build test coverage

    Improve build test cover by allowing some drivers to build under
    COMPILE_TEST where possible.
    
    Some notes:
    - Mostly a dependency on CONFIG_ACPI is not really required for only
      building (but left untouched), but is required for TX2 which uses ACPI
      functions which have no stubs
    - XGENE required 64b dependency as it relies on some unsigned long perf
      struct fields being 64b
    - I don't see why TX2 requires NUMA to build, but left untouched
    - Added an explicit dependency on GENERIC_MSI_IRQ_DOMAIN for
      ARM_SMMU_V3_PMU, which is required for platform MSI functions
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Link: https://lore.kernel.org/r/1633085326-156653-3-git-send-email-john.garry@huawei.com
    Signed-off-by: Will Deacon <will@kernel.org>
    johnpgarry authored and willdeacon committed Oct 4, 2021
  2. drivers/perf: thunderx2_pmu: Change data in size tx2_uncore_event_upd…

    …ate()
    
    A LSL of 32 requires > 32b value to hold the result. However in
    tx2_uncore_event_update(), 1UL << 32 currently only works as unsigned
    long is 64b on a 64b system.
    
    If we want to compile test for a 32b system, we need unsigned long long,
    whose min size is 64b.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Link: https://lore.kernel.org/r/1633085326-156653-2-git-send-email-john.garry@huawei.com
    Signed-off-by: Will Deacon <will@kernel.org>
    johnpgarry authored and willdeacon committed Oct 4, 2021
  3. drivers/perf: hisi: Fix PA PMU counter offset

    The PA PMU counter offset was correct in [1] and the driver has
    already been verified. We want to keep the register offset using
    lower case character in later version that is consistent with
    the existed driver. Since there was no functional change, we
    didn't do more test. However there is typo when modified the PA
    PMU counter offset by mistake, so fix this bad mistake.
    
    [1] https://www.spinics.net/lists/arm-kernel/msg865263.html
    
    Cc: Will Deacon <will@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: John Garry <john.garry@huawei.com>
    Cc: Qi Liu <liuqi115@huawei.com>
    Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
    Link: https://lore.kernel.org/r/20210928123022.23467-1-zhangshaokun@hisilicon.com
    Signed-off-by: Will Deacon <will@kernel.org>
    zhangshk authored and willdeacon committed Oct 4, 2021

Commits on Oct 1, 2021

  1. arm64/mm: drop HAVE_ARCH_PFN_VALID

    CONFIG_SPARSEMEM_VMEMMAP is now the only available memory model on arm64
    platforms and free_unused_memmap() would just return without creating any
    holes in the memmap mapping.  There is no need for any special handling in
    pfn_valid() and HAVE_ARCH_PFN_VALID can just be dropped.  This also moves
    the pfn upper bits sanity check into generic pfn_valid().
    
    [rppt: rebased on v5.15-rc3]
    
    Link: https://lkml.kernel.org/r/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Acked-by: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Mike Rapoport <rppt@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210930013039.11260-3-rppt@kernel.org
    Signed-off-by: Will Deacon <will@kernel.org>
    Anshuman Khandual authored and willdeacon committed Oct 1, 2021
  2. dma-mapping: remove bogus test for pfn_valid from dma_map_resource

    dma_map_resource() uses pfn_valid() to ensure the range is not RAM.
    However, pfn_valid() only checks for availability of the memory map for a
    PFN but it does not ensure that the PFN is actually backed by RAM.
    
    As dma_map_resource() is the only method in DMA mapping APIs that has this
    check, simply drop the pfn_valid() test from dma_map_resource().
    
    Link: https://lore.kernel.org/all/20210824173741.GC623@arm.com/
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Acked-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210930013039.11260-2-rppt@kernel.org
    Signed-off-by: Will Deacon <will@kernel.org>
    rppt authored and willdeacon committed Oct 1, 2021
  3. arm64: trans_pgd: remove trans_pgd_map_page()

    The intend of trans_pgd_map_page() was to map contiguous range of VA
    memory to the memory that is getting relocated during kexec. However,
    since we are now using linear map instead of contiguous range this
    function is not needed
    
    Suggested-by: Pingfan Liu <kernelfans@gmail.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-16-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  4. arm64: kexec: remove cpu-reset.h

    This header contains only cpu_soft_restart() which is never used directly
    anymore. So, remove this header, and rename the helper to be
    cpu_soft_restart().
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-15-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  5. arm64: kexec: remove the pre-kexec PoC maintenance

    Now that kexec does its relocations with the MMU enabled, we no longer
    need to clean the relocation data to the PoC.
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-14-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  6. arm64: kexec: keep MMU enabled during kexec relocation

    Now, that we have linear map page tables configured, keep MMU enabled
    to allow faster relocation of segments to final destination.
    
    Cavium ThunderX2:
    Kernel Image size: 38M Iniramfs size: 46M Total relocation size: 84M
    MMU-disabled:
    relocation	7.489539915s
    MMU-enabled:
    relocation	0.03946095s
    
    Broadcom Stingray:
    The performance data: for a moderate size kernel + initramfs: 25M the
    relocation was taking 0.382s, with enabled MMU it now takes
    0.019s only or x20 improvement.
    
    The time is proportional to the size of relocation, therefore if initramfs
    is larger, 100M it could take over a second.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Tested-by: Pingfan Liu <piliu@redhat.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-13-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  7. arm64: kexec: install a copy of the linear-map

    To perform the kexec relocation with the MMU enabled, we need a copy
    of the linear map.
    
    Create one, and install it from the relocation code. This has to be done
    from the assembly code as it will be idmapped with TTBR0. The kernel
    runs in TTRB1, so can't use the break-before-make sequence on the mapping
    it is executing from.
    
    The makes no difference yet as the relocation code runs with the MMU
    disabled.
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-12-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  8. arm64: kexec: use ld script for relocation function

    Currently, relocation code declares start and end variables
    which are used to compute its size.
    
    The better way to do this is to use ld script, and put relocation
    function in its own section.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-11-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  9. arm64: kexec: relocate in EL1 mode

    Since we are going to keep MMU enabled during relocation, we need to
    keep EL1 mode throughout the relocation.
    
    Keep EL1 enabled, and switch EL2 only before entering the new world.
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-10-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  10. arm64: kexec: configure EL2 vectors for kexec

    If we have a EL2 mode without VHE, the EL2 vectors are needed in order
    to switch to EL2 and jump to new world with hypervisor privileges.
    
    In preparation to MMU enabled relocation, configure our EL2 table now.
    
    Kexec uses #HVC_SOFT_RESTART to branch to the new world, so extend
    el1_sync vector that is provided by trans_pgd_copy_el2_vectors() to
    support this case.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-9-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  11. arm64: kexec: pass kimage as the only argument to relocation function

    Currently, kexec relocation function (arm64_relocate_new_kernel) accepts
    the following arguments:
    
    head:		start of array that contains relocation information.
    entry:		entry point for new kernel or purgatory.
    dtb_mem:	first and only argument to entry.
    
    The number of arguments cannot be easily expended, because this
    function is also called from HVC_SOFT_RESTART, which preserves only
    three arguments. And, also arm64_relocate_new_kernel is written in
    assembly but called without stack, thus no place to move extra arguments
    to free registers.
    
    Soon, we will need to pass more arguments: once we enable MMU we
    will need to pass information about page tables.
    
    Pass kimage to arm64_relocate_new_kernel, and teach it to get the
    required fields from kimage.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-8-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  12. arm64: kexec: Use dcache ops macros instead of open-coding

    kexec does dcache maintenance when it re-writes all memory. Our
    dcache_by_line_op macro depends on reading the sanitized DminLine
    from memory. Kexec may have overwritten this, so open-codes the
    sequence.
    
    dcache_by_line_op is a whole set of macros, it uses dcache_line_size
    which uses read_ctr for the sanitsed DminLine. Reading the DminLine
    is the first thing the dcache_by_line_op does.
    
    Rename dcache_by_line_op dcache_by_myline_op and take DminLine as
    an argument. Kexec can now use the slightly smaller macro.
    
    This makes up-coming changes to the dcache maintenance easier on
    the eye.
    
    Code generated by the existing callers is unchanged.
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-7-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  13. arm64: kexec: skip relocation code for inplace kexec

    In case of kdump or when segments are already in place the relocation
    is not needed, therefore the setup of relocation function and call to
    it can be skipped.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Suggested-by: James Morse <james.morse@arm.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-6-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  14. arm64: kexec: flush image and lists during kexec load time

    Currently, during kexec load we are copying relocation function and
    flushing it. However, we can also flush kexec relocation buffers and
    if new kernel image is already in place (i.e. crash kernel), we can
    also flush the new kernel image itself.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-5-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  15. arm64: hibernate: abstract ttrb0 setup function

    Currently, only hibernate sets custom ttbr0 with safe idmaped function.
    Kexec, is also going to be using this functionality when relocation code
    is going to be idmapped.
    
    Move the setup sequence to a dedicated cpu_install_ttbr0() for custom
    ttbr0.
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-4-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  16. arm64: trans_pgd: hibernate: Add trans_pgd_copy_el2_vectors

    Users of trans_pgd may also need a copy of vector table because it is
    also may be overwritten if a linear map can be overwritten.
    
    Move setup of EL2 vectors from hibernate to trans_pgd, so it can be
    later shared with kexec as well.
    
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-3-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
  17. arm64: kernel: add helper for booted at EL2 and not VHE

    Replace places that contain logic like this:
    	is_hyp_mode_available() && !is_kernel_in_hyp_mode()
    
    With a dedicated boolean function  is_hyp_nvhe(). This will be needed
    later in kexec in order to sooner switch back to EL2.
    
    Suggested-by: James Morse <james.morse@arm.com>
    Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Link: https://lore.kernel.org/r/20210930143113.1502553-2-pasha.tatashin@soleen.com
    Signed-off-by: Will Deacon <will@kernel.org>
    soleen authored and willdeacon committed Oct 1, 2021
Older