Skip to content

Commit

Permalink
Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/…
Browse files Browse the repository at this point in the history
…linux/kernel/git/tip/tip

Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
  • Loading branch information
torvalds committed Jun 26, 2023
2 parents 7cffdbe + bf5a8c2 commit 9244724
Show file tree
Hide file tree
Showing 66 changed files with 1,077 additions and 1,011 deletions.
20 changes: 6 additions & 14 deletions Documentation/admin-guide/kernel-parameters.txt
Expand Up @@ -818,20 +818,6 @@
Format:
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]

cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
Some features depend on CPU0. Known dependencies are:
1. Resume from suspend/hibernate depends on CPU0.
Suspend/hibernate will fail if CPU0 is offline and you
need to online CPU0 before suspend/hibernate.
2. PIC interrupts also depend on CPU0. CPU0 can't be
removed if a PIC interrupt is detected.
It's said poweroff/reboot may depend on CPU0 on some
machines although I haven't seen such issues so far
after CPU0 is offline on a few tested machines.
If the dependencies are under your control, you can
turn on cpu0_hotplug.

cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system

Expand All @@ -852,6 +838,12 @@
on every CPU online, such as boot, and resume from suspend.
Default: 10000

cpuhp.parallel=
[SMP] Enable/disable parallel bringup of secondary CPUs
Format: <bool>
Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
the parameter has no effect.

crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
Expand Down
13 changes: 2 additions & 11 deletions Documentation/core-api/cpu_hotplug.rst
Expand Up @@ -127,17 +127,8 @@ bring CPU4 back online::
$ echo 1 > /sys/devices/system/cpu/cpu4/online
smpboot: Booting Node 0 Processor 4 APIC 0x1

The CPU is usable again. This should work on all CPUs. CPU0 is often special
and excluded from CPU hotplug. On X86 the kernel option
*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
used. Some known dependencies of CPU0:

* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.

Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
on CPU0.
The CPU is usable again. This should work on all CPUs, but CPU0 is often special
and excluded from CPU hotplug.

The CPU hotplug coordination
============================
Expand Down
12 changes: 12 additions & 0 deletions MAINTAINERS
Expand Up @@ -5344,6 +5344,18 @@ F: include/linux/sched/cpufreq.h
F: kernel/sched/cpufreq*.c
F: tools/testing/selftests/cpufreq/

CPU HOTPLUG
M: Thomas Gleixner <tglx@linutronix.de>
M: Peter Zijlstra <peterz@infradead.org>
L: linux-kernel@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp/core
F: kernel/cpu.c
F: kernel/smpboot.*
F: include/linux/cpu.h
F: include/linux/cpuhotplug.h
F: include/linux/smpboot.h

CPU IDLE TIME MANAGEMENT FRAMEWORK
M: "Rafael J. Wysocki" <rafael@kernel.org>
M: Daniel Lezcano <daniel.lezcano@linaro.org>
Expand Down
23 changes: 23 additions & 0 deletions arch/Kconfig
Expand Up @@ -34,6 +34,29 @@ config ARCH_HAS_SUBPAGE_FAULTS
config HOTPLUG_SMT
bool

# Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
config HOTPLUG_CORE_SYNC
bool

# Basic CPU dead synchronization selected by architecture
config HOTPLUG_CORE_SYNC_DEAD
bool
select HOTPLUG_CORE_SYNC

# Full CPU synchronization with alive state selected by architecture
config HOTPLUG_CORE_SYNC_FULL
bool
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select HOTPLUG_CORE_SYNC

config HOTPLUG_SPLIT_STARTUP
bool
select HOTPLUG_CORE_SYNC_FULL

config HOTPLUG_PARALLEL
bool
select HOTPLUG_SPLIT_STARTUP

config GENERIC_ENTRY
bool

Expand Down
1 change: 1 addition & 0 deletions arch/arm/Kconfig
Expand Up @@ -125,6 +125,7 @@ config ARM
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UID16
select HAVE_VIRT_CPU_ACCOUNTING_GEN
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_REL
select NEED_DMA_MAP_STATE
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/include/asm/smp.h
Expand Up @@ -64,7 +64,7 @@ extern void secondary_startup_arm(void);

extern int __cpu_disable(void);

extern void __cpu_die(unsigned int cpu);
static inline void __cpu_die(unsigned int cpu) { }

extern void arch_send_call_function_single_ipi(int cpu);
extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
Expand Down
18 changes: 7 additions & 11 deletions arch/arm/kernel/smp.c
Expand Up @@ -288,15 +288,11 @@ int __cpu_disable(void)
}

/*
* called on the thread which is asking for a CPU to be shutdown -
* waits until shutdown has completed, or it is timed out.
* called on the thread which is asking for a CPU to be shutdown after the
* shutdown completed.
*/
void __cpu_die(unsigned int cpu)
void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
{
if (!cpu_wait_death(cpu, 5)) {
pr_err("CPU%u: cpu didn't die\n", cpu);
return;
}
pr_debug("CPU%u: shutdown\n", cpu);

clear_tasks_mm_cpumask(cpu);
Expand Down Expand Up @@ -336,11 +332,11 @@ void __noreturn arch_cpu_idle_dead(void)
flush_cache_louis();

/*
* Tell __cpu_die() that this CPU is now safe to dispose of. Once
* this returns, power and/or clocks can be removed at any point
* from this CPU and its cache by platform_cpu_kill().
* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose
* of. Once this returns, power and/or clocks can be removed at
* any point from this CPU and its cache by platform_cpu_kill().
*/
(void)cpu_report_death();
cpuhp_ap_report_dead();

/*
* Ensure that the cache lines associated with that completion are
Expand Down
1 change: 1 addition & 0 deletions arch/arm64/Kconfig
Expand Up @@ -222,6 +222,7 @@ config ARM64
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/smp.h
Expand Up @@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_mask(const struct cpumask *mask)

extern int __cpu_disable(void);

extern void __cpu_die(unsigned int cpu);
static inline void __cpu_die(unsigned int cpu) { }
extern void __noreturn cpu_die(void);
extern void __noreturn cpu_die_early(void);

Expand Down
14 changes: 5 additions & 9 deletions arch/arm64/kernel/smp.c
Expand Up @@ -332,17 +332,13 @@ static int op_cpu_kill(unsigned int cpu)
}

/*
* called on the thread which is asking for a CPU to be shutdown -
* waits until shutdown has completed, or it is timed out.
* Called on the thread which is asking for a CPU to be shutdown after the
* shutdown completed.
*/
void __cpu_die(unsigned int cpu)
void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
{
int err;

if (!cpu_wait_death(cpu, 5)) {
pr_crit("CPU%u: cpu didn't die\n", cpu);
return;
}
pr_debug("CPU%u: shutdown\n", cpu);

/*
Expand All @@ -369,8 +365,8 @@ void __noreturn cpu_die(void)

local_daif_mask();

/* Tell __cpu_die() that this CPU is now safe to dispose of */
(void)cpu_report_death();
/* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */
cpuhp_ap_report_dead();

/*
* Actually shutdown the CPU. This must never fail. The specific hotplug
Expand Down
1 change: 1 addition & 0 deletions arch/csky/Kconfig
Expand Up @@ -96,6 +96,7 @@ config CSKY
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select MAY_HAVE_SPARSE_IRQ
select MODULES_USE_ELF_RELA if MODULES
select OF
Expand Down
2 changes: 1 addition & 1 deletion arch/csky/include/asm/smp.h
Expand Up @@ -23,7 +23,7 @@ void __init set_send_ipi(void (*func)(const struct cpumask *mask), int irq);

int __cpu_disable(void);

void __cpu_die(unsigned int cpu);
static inline void __cpu_die(unsigned int cpu) { }

#endif /* CONFIG_SMP */

Expand Down
8 changes: 2 additions & 6 deletions arch/csky/kernel/smp.c
Expand Up @@ -291,20 +291,16 @@ int __cpu_disable(void)
return 0;
}

void __cpu_die(unsigned int cpu)
void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
{
if (!cpu_wait_death(cpu, 5)) {
pr_crit("CPU%u: shutdown failed\n", cpu);
return;
}
pr_notice("CPU%u: shutdown\n", cpu);
}

void __noreturn arch_cpu_idle_dead(void)
{
idle_task_exit();

cpu_report_death();
cpuhp_ap_report_dead();

while (!secondary_stack)
arch_cpu_idle();
Expand Down
1 change: 1 addition & 0 deletions arch/mips/Kconfig
Expand Up @@ -2287,6 +2287,7 @@ config MIPS_CPS
select MIPS_CM
select MIPS_CPS_PM if HOTPLUG_CPU
select SMP
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select SYNC_R4K if (CEVT_R4K || CSRC_R4K)
select SYS_SUPPORTS_HOTPLUG_CPU
select SYS_SUPPORTS_SCHED_SMT if CPU_MIPSR6
Expand Down
1 change: 1 addition & 0 deletions arch/mips/cavium-octeon/smp.c
Expand Up @@ -345,6 +345,7 @@ void play_dead(void)
int cpu = cpu_number_map(cvmx_get_core_num());

idle_task_exit();
cpuhp_ap_report_dead();
octeon_processor_boot = 0xff;
per_cpu(cpu_state, cpu) = CPU_DEAD;

Expand Down
1 change: 1 addition & 0 deletions arch/mips/include/asm/smp-ops.h
Expand Up @@ -33,6 +33,7 @@ struct plat_smp_ops {
#ifdef CONFIG_HOTPLUG_CPU
int (*cpu_disable)(void);
void (*cpu_die)(unsigned int cpu);
void (*cleanup_dead_cpu)(unsigned cpu);
#endif
#ifdef CONFIG_KEXEC
void (*kexec_nonboot_cpu)(void);
Expand Down
1 change: 1 addition & 0 deletions arch/mips/kernel/smp-bmips.c
Expand Up @@ -392,6 +392,7 @@ static void bmips_cpu_die(unsigned int cpu)
void __ref play_dead(void)
{
idle_task_exit();
cpuhp_ap_report_dead();

/* flush data cache */
_dma_cache_wback_inv(0, ~0);
Expand Down
14 changes: 5 additions & 9 deletions arch/mips/kernel/smp-cps.c
Expand Up @@ -503,8 +503,7 @@ void play_dead(void)
}
}

/* This CPU has chosen its way out */
(void)cpu_report_death();
cpuhp_ap_report_dead();

cps_shutdown_this_cpu(cpu_death);

Expand All @@ -527,20 +526,16 @@ static void wait_for_sibling_halt(void *ptr_cpu)
} while (!(halted & TCHALT_H));
}

static void cps_cpu_die(unsigned int cpu)
static void cps_cpu_die(unsigned int cpu) { }

static void cps_cleanup_dead_cpu(unsigned cpu)
{
unsigned core = cpu_core(&cpu_data[cpu]);
unsigned int vpe_id = cpu_vpe_id(&cpu_data[cpu]);
ktime_t fail_time;
unsigned stat;
int err;

/* Wait for the cpu to choose its way out */
if (!cpu_wait_death(cpu, 5)) {
pr_err("CPU%u: didn't offline\n", cpu);
return;
}

/*
* Now wait for the CPU to actually offline. Without doing this that
* offlining may race with one or more of:
Expand Down Expand Up @@ -624,6 +619,7 @@ static const struct plat_smp_ops cps_smp_ops = {
#ifdef CONFIG_HOTPLUG_CPU
.cpu_disable = cps_cpu_disable,
.cpu_die = cps_cpu_die,
.cleanup_dead_cpu = cps_cleanup_dead_cpu,
#endif
#ifdef CONFIG_KEXEC
.kexec_nonboot_cpu = cps_kexec_nonboot_cpu,
Expand Down
8 changes: 8 additions & 0 deletions arch/mips/kernel/smp.c
Expand Up @@ -690,6 +690,14 @@ void flush_tlb_one(unsigned long vaddr)
EXPORT_SYMBOL(flush_tlb_page);
EXPORT_SYMBOL(flush_tlb_one);

#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
{
if (mp_ops->cleanup_dead_cpu)
mp_ops->cleanup_dead_cpu(cpu);
}
#endif

#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST

static void tick_broadcast_callee(void *info)
Expand Down
1 change: 1 addition & 0 deletions arch/mips/loongson64/smp.c
Expand Up @@ -775,6 +775,7 @@ void play_dead(void)
void (*play_dead_at_ckseg1)(int *);

idle_task_exit();
cpuhp_ap_report_dead();

prid_imp = read_c0_prid() & PRID_IMP_MASK;
prid_rev = read_c0_prid() & PRID_REV_MASK;
Expand Down
1 change: 1 addition & 0 deletions arch/parisc/Kconfig
Expand Up @@ -57,6 +57,7 @@ config PARISC
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_REGS_AND_STACK_ACCESS_API
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select GENERIC_SCHED_CLOCK
select GENERIC_IRQ_MIGRATION if SMP
select HAVE_UNSTABLE_SCHED_CLOCK if SMP
Expand Down
4 changes: 2 additions & 2 deletions arch/parisc/kernel/process.c
Expand Up @@ -171,8 +171,8 @@ void __noreturn arch_cpu_idle_dead(void)

local_irq_disable();

/* Tell __cpu_die() that this CPU is now safe to dispose of. */
(void)cpu_report_death();
/* Tell the core that this CPU is now safe to dispose of. */
cpuhp_ap_report_dead();

/* Ensure that the cache lines are written out. */
flush_cache_all_local();
Expand Down
7 changes: 3 additions & 4 deletions arch/parisc/kernel/smp.c
Expand Up @@ -500,11 +500,10 @@ int __cpu_disable(void)
void __cpu_die(unsigned int cpu)
{
pdc_cpu_rendezvous_lock();
}

if (!cpu_wait_death(cpu, 5)) {
pr_crit("CPU%u: cpu didn't die\n", cpu);
return;
}
void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
{
pr_info("CPU%u: is shutting down\n", cpu);

/* set task's state to interruptible sleep */
Expand Down
1 change: 1 addition & 0 deletions arch/riscv/Kconfig
Expand Up @@ -123,6 +123,7 @@ config RISCV
select HAVE_RSEQ
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
Expand Down
2 changes: 1 addition & 1 deletion arch/riscv/include/asm/smp.h
Expand Up @@ -70,7 +70,7 @@ asmlinkage void smp_callin(void);

#if defined CONFIG_HOTPLUG_CPU
int __cpu_disable(void);
void __cpu_die(unsigned int cpu);
static inline void __cpu_die(unsigned int cpu) { }
#endif /* CONFIG_HOTPLUG_CPU */

#else
Expand Down

0 comments on commit 9244724

Please sign in to comment.