Skip to content

Commit

Permalink
Merge branch 'akpm-current/current'
Browse files Browse the repository at this point in the history
  • Loading branch information
sfrothwell committed Feb 17, 2022
2 parents 1907fc2 + ab94bd5 commit 479e353
Show file tree
Hide file tree
Showing 280 changed files with 6,565 additions and 4,665 deletions.
5 changes: 5 additions & 0 deletions Documentation/admin-guide/cgroup-v2.rst
Expand Up @@ -1301,6 +1301,11 @@ PAGE_SIZE multiple when read back.
Amount of memory used to cache filesystem data,
including tmpfs and shared memory.

kernel (npn)
Amount of total kernel memory, including
(kernel_stack, pagetables, percpu, vmalloc, slab) in
addition to other kernel memory use cases.

kernel_stack
Amount of memory allocated to kernel stacks.

Expand Down
10 changes: 7 additions & 3 deletions Documentation/admin-guide/kdump/kdump.rst
Expand Up @@ -146,9 +146,9 @@ System kernel config options
CONFIG_SYSFS=y

Note that "sysfs file system support" might not appear in the "Pseudo
filesystems" menu if "Configure standard kernel features (for small
systems)" is not enabled in "General Setup." In this case, check the
.config file itself to ensure that sysfs is turned on, as follows::
filesystems" menu if "Configure standard kernel features (expert users)"
is not enabled in "General Setup." In this case, check the .config file
itself to ensure that sysfs is turned on, as follows::

grep 'CONFIG_SYSFS' .config

Expand Down Expand Up @@ -533,6 +533,10 @@ the following command::

cp /proc/vmcore <dump-file>

or use scp to write out the dump file between hosts on a network, e.g::

scp /proc/vmcore remote_username@remote_ip:<dump-file>

You can also use makedumpfile utility to write out the dump file
with specified options to filter out unwanted contents, e.g::

Expand Down
3 changes: 2 additions & 1 deletion Documentation/admin-guide/kernel-parameters.txt
Expand Up @@ -1649,7 +1649,7 @@
[KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
enabled.
Allows heavy hugetlb users to free up some more
memory (6 * PAGE_SIZE for each 2MB hugetlb page).
memory (7 * PAGE_SIZE for each 2MB hugetlb page).
Format: { on | off (default) }

on: enable the feature
Expand Down Expand Up @@ -3765,6 +3765,7 @@
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer
bit 6: print all CPUs backtrace (if available in the arch)

panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: <hex>[,nousertaint]
Expand Down
24 changes: 14 additions & 10 deletions Documentation/admin-guide/mm/damon/usage.rst
Expand Up @@ -108,19 +108,23 @@ In such cases, users can explicitly set the initial monitoring target regions
as they want, by writing proper values to the ``init_regions`` file. Each line
of the input should represent one region in below form.::

<target id> <start address> <end address>
<target idx> <start address> <end address>

The ``target id`` should already in ``target_ids`` file, and the regions should
be passed in address order. For example, below commands will set a couple of
address ranges, ``1-100`` and ``100-200`` as the initial monitoring target
region of process 42, and another couple of address ranges, ``20-40`` and
``50-100`` as that of process 4242.::
The ``target idx`` should be the index of the target in ``target_ids`` file,
starting from ``0``, and the regions should be passed in address order. For
example, below commands will set a couple of address ranges, ``1-100`` and
``100-200`` as the initial monitoring target region of pid 42, which is the
first one (index ``0``) in ``target_ids``, and another couple of address
ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one
(index ``1``) in ``target_ids``.::

# cd <debugfs>/damon
# echo "42 1 100
42 100 200
4242 20 40
4242 50 100" > init_regions
# cat target_ids
42 4242
# echo "0 1 100
0 100 200
1 20 40
1 50 100" > init_regions

Note that this sets the initial monitoring target regions only. In case of
virtual memory monitoring, DAMON will automatically updates the boundary of the
Expand Down
22 changes: 19 additions & 3 deletions Documentation/admin-guide/mm/zswap.rst
Expand Up @@ -130,9 +130,25 @@ attribute, e.g.::
echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled

When zswap same-filled page identification is disabled at runtime, it will stop
checking for the same-value filled pages during store operation. However, the
existing pages which are marked as same-value filled pages remain stored
unchanged in zswap until they are either loaded or invalidated.
checking for the same-value filled pages during store operation.
In other words, every page will be then considered non-same-value filled.
However, the existing pages which are marked as same-value filled pages remain
stored unchanged in zswap until they are either loaded or invalidated.

In some circumstances it might be advantageous to make use of just the zswap
ability to efficiently store same-filled pages without enabling the whole
compressed page storage.
In this case the handling of non-same-value pages by zswap (enabled by default)
can be disabled by setting the ``non_same_filled_pages_enabled`` attribute
to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``.
It can also be enabled and disabled at runtime using the sysfs
``non_same_filled_pages_enabled`` attribute, e.g.::

echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled

Disabling both ``zswap.same_filled_pages_enabled`` and
``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new
pages by zswap.

To prevent zswap from shrinking pool when zswap is full and there's a high
pressure on swap (this will result in flipping pages in and out zswap pool
Expand Down
33 changes: 24 additions & 9 deletions Documentation/admin-guide/sysctl/kernel.rst
Expand Up @@ -595,22 +595,35 @@ Documentation/admin-guide/kernel-parameters.rst).
numa_balancing
==============

Enables/disables automatic page fault based NUMA memory
balancing. Memory is moved automatically to nodes
that access it often.
Enables/disables and configures automatic page fault based NUMA memory
balancing. Memory is moved automatically to nodes that access it often.
The value to set can be the result of ORing the following:

Enables/disables automatic NUMA memory balancing. On NUMA machines, there
is a performance penalty if remote memory is accessed by a CPU. When this
feature is enabled the kernel samples what task thread is accessing memory
by periodically unmapping pages and later trapping a page fault. At the
time of the page fault, it is determined if the data being accessed should
be migrated to a local memory node.
= =================================
0 NUMA_BALANCING_DISABLED
1 NUMA_BALANCING_NORMAL
2 NUMA_BALANCING_MEMORY_TIERING
= =================================

Or NUMA_BALANCING_NORMAL to optimize page placement among different
NUMA nodes to reduce remote accessing. On NUMA machines, there is a
performance penalty if remote memory is accessed by a CPU. When this
feature is enabled the kernel samples what task thread is accessing
memory by periodically unmapping pages and later trapping a page
fault. At the time of the page fault, it is determined if the data
being accessed should be migrated to a local memory node.

The unmapping of pages and trapping faults incur additional overhead that
ideally is offset by improved memory locality but there is no universal
guarantee. If the target workload is already bound to NUMA nodes then this
feature should be disabled.

Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among
different types of memory (represented as different NUMA nodes) to
place the hot pages in the fast memory. This is implemented based on
unmapping and page fault too.


oops_all_cpu_backtrace
======================

Expand Down Expand Up @@ -751,6 +764,8 @@ bit 1 print system memory info
bit 2 print timer info
bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
bit 5 print all printk messages in buffer
bit 6 print all CPUs backtrace (if available in the arch)
===== ============================================

So for example to print tasks and memory info on panic, user can::
Expand Down
17 changes: 11 additions & 6 deletions Documentation/dev-tools/kasan.rst
Expand Up @@ -30,7 +30,7 @@ Software tag-based KASAN mode is only supported in Clang.

The hardware KASAN mode (#3) relies on hardware to perform the checks but
still requires a compiler version that supports memory tagging instructions.
This mode is supported in GCC 10+ and Clang 11+.
This mode is supported in GCC 10+ and Clang 12+.

Both software KASAN modes work with SLUB and SLAB memory allocators,
while the hardware tag-based KASAN currently only supports SLUB.
Expand Down Expand Up @@ -206,6 +206,9 @@ additional boot parameters that allow disabling KASAN or controlling features:
Asymmetric mode: a bad access is detected synchronously on reads and
asynchronously on writes.

- ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
allocations (default: ``on``).

- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
traces collection (default: ``on``).

Expand Down Expand Up @@ -279,8 +282,8 @@ Software tag-based KASAN uses 0xFF as a match-all pointer tag (accesses through
pointers with the 0xFF pointer tag are not checked). The value 0xFE is currently
reserved to tag freed memory regions.

Software tag-based KASAN currently only supports tagging of slab and page_alloc
memory.
Software tag-based KASAN currently only supports tagging of slab, page_alloc,
and vmalloc memory.

Hardware tag-based KASAN
~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -303,8 +306,8 @@ Hardware tag-based KASAN uses 0xFF as a match-all pointer tag (accesses through
pointers with the 0xFF pointer tag are not checked). The value 0xFE is currently
reserved to tag freed memory regions.

Hardware tag-based KASAN currently only supports tagging of slab and page_alloc
memory.
Hardware tag-based KASAN currently only supports tagging of slab, page_alloc,
and VM_ALLOC-based vmalloc memory.

If the hardware does not support MTE (pre ARMv8.5), hardware tag-based KASAN
will not be enabled. In this case, all KASAN boot parameters are ignored.
Expand All @@ -319,6 +322,8 @@ checking gets disabled.
Shadow memory
-------------

The contents of this section are only applicable to software KASAN modes.

The kernel maps memory in several different parts of the address space.
The range of kernel virtual addresses is large: there is not enough real
memory to support a real shadow region for every address that could be
Expand Down Expand Up @@ -349,7 +354,7 @@ CONFIG_KASAN_VMALLOC

With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
cost of greater memory usage. Currently, this is supported on x86,
riscv, s390, and powerpc.
arm64, riscv, s390, and powerpc.

This works by hooking into vmalloc and vmap and dynamically
allocating real shadow memory to back the mappings.
Expand Down
2 changes: 1 addition & 1 deletion Documentation/vm/balance.rst
Expand Up @@ -6,7 +6,7 @@ Memory Balancing

Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>

Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
well as for non __GFP_IO allocations.

The first reason why a caller may avoid reclaim is that the caller can not
Expand Down
29 changes: 24 additions & 5 deletions Documentation/vm/page_owner.rst
Expand Up @@ -89,22 +89,41 @@ Usage

Page allocated via order XXX, ...
PFN XXX ...
// Detailed stack
// Detailed stack

Page allocated via order XXX, ...
PFN XXX ...
// Detailed stack
// Detailed stack

The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows
in buf, uses regexp to extract the page order value, counts the times
and pages of buf, and finally sorts them according to the times.
and pages of buf, and finally sorts them according to the parameter(s).

See the result about who allocated each page
in the ``sorted_page_owner.txt``. General output:

XXX times, XXX pages:
Page allocated via order XXX, ...
// Detailed stack
// Detailed stack

By default, ``page_owner_sort`` is sorted according to the times of buf.
If you want to sort by the pages nums of buf, use the ``-m`` parameter.
If you want to sort by the page nums of buf, use the ``-m`` parameter.
The detailed parameters are:

fundamental function:

Sort:
-a Sort by memory allocation time.
-m Sort by total memory.
-p Sort by pid.
-r Sort by memory release time.
-s Sort by stack trace.
-t Sort by times (default).

additional function:

Cull:
-c Cull by comparing stacktrace instead of total block.

Filter:
-f Filter out the information of blocks whose memory has not been released.
4 changes: 1 addition & 3 deletions arch/arm/Kconfig
Expand Up @@ -37,6 +37,7 @@ config ARM
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_LD_ORPHAN_WARN
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
Expand Down Expand Up @@ -1514,9 +1515,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU

config ARCH_WANT_GENERAL_HUGETLB
def_bool y

config ARM_MODULE_PLTS
bool "Use PLTs to allow module memory to spill over into vmalloc area"
depends on MODULES
Expand Down
6 changes: 2 additions & 4 deletions arch/arm64/Kconfig
Expand Up @@ -24,6 +24,7 @@ config ARM64
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FILTER_PGPROT
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
Expand Down Expand Up @@ -206,7 +207,7 @@ config ARM64
select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN_GENERIC
select KASAN_VMALLOC if KASAN
select MODULES_USE_ELF_RELA
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
Expand Down Expand Up @@ -1253,9 +1254,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU

config ARCH_HAS_FILTER_PGPROT
def_bool y

# Supported by clang >= 7.0
config CC_HAVE_SHADOW_CALL_STACK
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
Expand Down
6 changes: 6 additions & 0 deletions arch/arm64/include/asm/vmalloc.h
Expand Up @@ -25,4 +25,10 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot)

#endif

#define arch_vmap_pgprot_tagged arch_vmap_pgprot_tagged
static inline pgprot_t arch_vmap_pgprot_tagged(pgprot_t prot)
{
return pgprot_tagged(prot);
}

#endif /* _ASM_ARM64_VMALLOC_H */
5 changes: 4 additions & 1 deletion arch/arm64/include/asm/vmap_stack.h
Expand Up @@ -17,10 +17,13 @@
*/
static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node)
{
void *p;

BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));

return __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
__builtin_return_address(0));
return kasan_reset_tag(p);
}

#endif /* __ASM_VMAP_STACK_H */
5 changes: 3 additions & 2 deletions arch/arm64/kernel/module.c
Expand Up @@ -58,12 +58,13 @@ void *module_alloc(unsigned long size)
PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));

if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
vfree(p);
return NULL;
}

return p;
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
}

enum aarch64_reloc_op {
Expand Down
3 changes: 0 additions & 3 deletions arch/arm64/kernel/setup.c
Expand Up @@ -406,9 +406,6 @@ static int __init topology_init(void)
{
int i;

for_each_online_node(i)
register_one_node(i);

for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
cpu->hotpluggable = cpu_can_disable(i);
Expand Down
1 change: 1 addition & 0 deletions arch/arm64/mm/hugetlbpage.c
Expand Up @@ -347,6 +347,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
{
size_t pagesize = 1UL << shift;

entry = pte_mkhuge(entry);
if (pagesize == CONT_PTE_SIZE) {
entry = pte_mkcont(entry);
} else if (pagesize == CONT_PMD_SIZE) {
Expand Down

0 comments on commit 479e353

Please sign in to comment.