Skip to content

Commit c5f1e2d

Browse files
sumanthkorikkarakpm00
authored andcommitted
mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers
Patch series "implement "memmap on memory" feature on s390". This series provides "memmap on memory" support on s390 platform. "memmap on memory" allows struct pages array to be allocated from the hotplugged memory range instead of allocating it from main system memory. s390 currently preallocates struct pages array for all potentially possible memory, which ensures memory onlining always succeeds, but with the cost of significant memory consumption from the available system memory during boottime. In certain extreme configuration, this could lead to ipl failure. "memmap on memory" ensures struct pages array are populated from self contained hotplugged memory range instead of depleting the available system memory and this could eliminate ipl failure on s390 platform. On other platforms, system might go OOM when the physically hotplugged memory depletes the available memory before it is onlined. Hence, "memmap on memory" feature was introduced as described in commit a08a2ae ("mm,memory_hotplug: allocate memmap from the added memory range"). Unlike other architectures, s390 memory blocks are not physically accessible until it is online. To make it physically accessible two new memory notifiers MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE are added and this notifier lets the hypervisor inform that the memory should be made physically accessible. This allows for "memmap on memory" initialization during memory hotplug onlining phase, which is performed before calling MEM_GOING_ONLINE notifier. Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure altmap cannot be written when adding memory - before it is set online. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Patches 2 allocates vmemmap pages from self-contained memory range for s390. It allocates memory map (struct pages array) from the hotplugged memory range, rather than using system memory by passing altmap to vmemmap functions. Patch 3 removes unhandled memory notifier types on s390. Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers on s390. MEM_PREPARE_ONLINE memory notifier makes memory block physical accessible via sclp assign command. The notifier ensures self-contained memory maps are accessible and hence enabling the "memmap on memory" on s390. MEM_FINISH_OFFLINE memory notifier shifts the memory block to an inaccessible state via sclp unassign command. Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390. This patch (of 5): Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Platforms such as x86 can support physical memory hotplug via ACPI. When there is physical memory hotplug, ACPI event leads to the memory addition with the following callchain: acpi_memory_device_add() -> acpi_memory_enable_device() -> __add_memory() After this, the hotplugged memory is physically accessible, and altmap support prepared, before the "memmap on memory" initialization in memory_block_online() is called. On s390, memory hotplug works in a different way. The available hotplug memory has to be defined upfront in the hypervisor, but it is made physically accessible only when the user sets it online via sysfs, currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap on memory" initialization is performed before calling MEM_GOING_ONLINE notifier. During the memory hotplug addition phase, altmap support is prepared and during the memory onlining phase s390 requires memory to be physically accessible and then subsequently initiate the "memmap on memory" initialization process. The memory provider will handle new MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE notifications and make the memory accessible. The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written (e.g., poisoned) when adding memory -- before it is set online. This allows for adding memory with an altmap that is not currently made available by a hypervisor. When onlining that memory, the hypervisor can be instructed to make that memory accessible via the new notifiers and the onlining phase will not require any memory allocations, which is helpful in low-memory situations. All architectures ignore unknown memory notifiers. Therefore, the introduction of these new notifiers does not result in any functional modifications across architectures. Link: https://lkml.kernel.org/r/20240108132747.3238763-1-sumanthk@linux.ibm.com Link: https://lkml.kernel.org/r/20240108132747.3238763-2-sumanthk@linux.ibm.com Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent e755c43 commit c5f1e2d

File tree

6 files changed

+65
-6
lines changed

6 files changed

+65
-6
lines changed

drivers/base/memory.c

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,7 @@ static int memory_block_online(struct memory_block *mem)
188188
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
189189
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
190190
unsigned long nr_vmemmap_pages = 0;
191+
struct memory_notify arg;
191192
struct zone *zone;
192193
int ret;
193194

@@ -207,9 +208,19 @@ static int memory_block_online(struct memory_block *mem)
207208
if (mem->altmap)
208209
nr_vmemmap_pages = mem->altmap->free;
209210

211+
arg.altmap_start_pfn = start_pfn;
212+
arg.altmap_nr_pages = nr_vmemmap_pages;
213+
arg.start_pfn = start_pfn + nr_vmemmap_pages;
214+
arg.nr_pages = nr_pages - nr_vmemmap_pages;
210215
mem_hotplug_begin();
216+
ret = memory_notify(MEM_PREPARE_ONLINE, &arg);
217+
ret = notifier_to_errno(ret);
218+
if (ret)
219+
goto out_notifier;
220+
211221
if (nr_vmemmap_pages) {
212-
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
222+
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages,
223+
zone, mem->altmap->inaccessible);
213224
if (ret)
214225
goto out;
215226
}
@@ -231,7 +242,11 @@ static int memory_block_online(struct memory_block *mem)
231242
nr_vmemmap_pages);
232243

233244
mem->zone = zone;
245+
mem_hotplug_done();
246+
return ret;
234247
out:
248+
memory_notify(MEM_FINISH_OFFLINE, &arg);
249+
out_notifier:
235250
mem_hotplug_done();
236251
return ret;
237252
}
@@ -244,6 +259,7 @@ static int memory_block_offline(struct memory_block *mem)
244259
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
245260
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
246261
unsigned long nr_vmemmap_pages = 0;
262+
struct memory_notify arg;
247263
int ret;
248264

249265
if (!mem->zone)
@@ -275,6 +291,11 @@ static int memory_block_offline(struct memory_block *mem)
275291
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
276292

277293
mem->zone = NULL;
294+
arg.altmap_start_pfn = start_pfn;
295+
arg.altmap_nr_pages = nr_vmemmap_pages;
296+
arg.start_pfn = start_pfn + nr_vmemmap_pages;
297+
arg.nr_pages = nr_pages - nr_vmemmap_pages;
298+
memory_notify(MEM_FINISH_OFFLINE, &arg);
278299
out:
279300
mem_hotplug_done();
280301
return ret;

include/linux/memory.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,17 @@ int set_memory_block_size_order(unsigned int order);
9696
#define MEM_GOING_ONLINE (1<<3)
9797
#define MEM_CANCEL_ONLINE (1<<4)
9898
#define MEM_CANCEL_OFFLINE (1<<5)
99+
#define MEM_PREPARE_ONLINE (1<<6)
100+
#define MEM_FINISH_OFFLINE (1<<7)
99101

100102
struct memory_notify {
103+
/*
104+
* The altmap_start_pfn and altmap_nr_pages fields are designated for
105+
* specifying the altmap range and are exclusively intended for use in
106+
* MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers.
107+
*/
108+
unsigned long altmap_start_pfn;
109+
unsigned long altmap_nr_pages;
101110
unsigned long start_pfn;
102111
unsigned long nr_pages;
103112
int status_change_nid_normal;

include/linux/memory_hotplug.h

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,22 @@ typedef int __bitwise mhp_t;
106106
* implies the node id (nid).
107107
*/
108108
#define MHP_NID_IS_MGID ((__force mhp_t)BIT(2))
109+
/*
110+
* The hotplugged memory is completely inaccessible while the memory is
111+
* offline. The memory provider will handle MEM_PREPARE_ONLINE /
112+
* MEM_FINISH_OFFLINE notifications and make the memory accessible.
113+
*
114+
* This flag is only relevant when used along with MHP_MEMMAP_ON_MEMORY,
115+
* because the altmap cannot be written (e.g., poisoned) when adding
116+
* memory -- before it is set online.
117+
*
118+
* This allows for adding memory with an altmap that is not currently
119+
* made available by a hypervisor. When onlining that memory, the
120+
* hypervisor can be instructed to make that memory available, and
121+
* the onlining phase will not require any memory allocations, which is
122+
* helpful in low-memory situations.
123+
*/
124+
#define MHP_OFFLINE_INACCESSIBLE ((__force mhp_t)BIT(3))
109125

110126
/*
111127
* Extended parameters for memory hotplug:
@@ -154,7 +170,7 @@ extern void adjust_present_page_count(struct page *page,
154170
long nr_pages);
155171
/* VM interface that may be used by firmware interface */
156172
extern int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
157-
struct zone *zone);
173+
struct zone *zone, bool mhp_off_inaccessible);
158174
extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages);
159175
extern int online_pages(unsigned long pfn, unsigned long nr_pages,
160176
struct zone *zone, struct memory_group *group);

include/linux/memremap.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ struct vmem_altmap {
2525
unsigned long free;
2626
unsigned long align;
2727
unsigned long alloc;
28+
bool inaccessible;
2829
};
2930

3031
/*

mm/memory_hotplug.c

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1087,7 +1087,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
10871087
}
10881088

10891089
int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
1090-
struct zone *zone)
1090+
struct zone *zone, bool mhp_off_inaccessible)
10911091
{
10921092
unsigned long end_pfn = pfn + nr_pages;
10931093
int ret, i;
@@ -1096,6 +1096,15 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
10961096
if (ret)
10971097
return ret;
10981098

1099+
/*
1100+
* Memory block is accessible at this stage and hence poison the struct
1101+
* pages now. If the memory block is accessible during memory hotplug
1102+
* addition phase, then page poisining is already performed in
1103+
* sparse_add_section().
1104+
*/
1105+
if (mhp_off_inaccessible)
1106+
page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages);
1107+
10991108
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
11001109

11011110
for (i = 0; i < nr_pages; i++)
@@ -1415,7 +1424,7 @@ static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size)
14151424
}
14161425

14171426
static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
1418-
u64 start, u64 size)
1427+
u64 start, u64 size, mhp_t mhp_flags)
14191428
{
14201429
unsigned long memblock_size = memory_block_size_bytes();
14211430
u64 cur_start;
@@ -1431,6 +1440,8 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
14311440
};
14321441

14331442
mhp_altmap.free = memory_block_memmap_on_memory_pages();
1443+
if (mhp_flags & MHP_OFFLINE_INACCESSIBLE)
1444+
mhp_altmap.inaccessible = true;
14341445
params.altmap = kmemdup(&mhp_altmap, sizeof(struct vmem_altmap),
14351446
GFP_KERNEL);
14361447
if (!params.altmap) {
@@ -1516,7 +1527,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
15161527
*/
15171528
if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
15181529
mhp_supports_memmap_on_memory(memory_block_size_bytes())) {
1519-
ret = create_altmaps_and_memory_blocks(nid, group, start, size);
1530+
ret = create_altmaps_and_memory_blocks(nid, group, start, size, mhp_flags);
15201531
if (ret)
15211532
goto error;
15221533
} else {

mm/sparse.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -908,7 +908,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
908908
* Poison uninitialized struct pages in order to catch invalid flags
909909
* combinations.
910910
*/
911-
page_init_poison(memmap, sizeof(struct page) * nr_pages);
911+
if (!altmap || !altmap->inaccessible)
912+
page_init_poison(memmap, sizeof(struct page) * nr_pages);
912913

913914
ms = __nr_to_section(section_nr);
914915
set_section_nid(section_nr, nid);

0 commit comments

Comments
 (0)