Skip to content

Conversation

PlaidCat
Copy link
Collaborator

@PlaidCat PlaidCat commented Apr 10, 2025

Update process (This kernel CentOS base for 4.18.0-553)

  • Kernel History Rebuild Process for all src.rpms hosted by RESF
  • Create sig-cloud-8/4.18.0-553.40.1.el8_10 branch
  • Check if any maintained code is included in the new el release.
  • Cherry-pick all code from previous branch into new branch (skipping unneeded code)
    • Fix conflicts as they arise
  • Build and Test

Removed Commits

[rolling release update] Checking if any of the commits from the old rolling release are already present in the new base branch
- Commit 1dad877d4bbb7e310c4c68f177498fee786d414e already present in new base branch: 1dad877d4bbb7e310c4c68f177498fee786d414e locking/atomic: Make test_and_*_bit() ordered on failure

[rolling release update] Removing commits from the new branch
1dad877d4bbb7e310c4c68f177498fee786d414e locking/atomic: Make test_and_*_bit() ordered on failure

Build

/mnt/code/kernel-src-tree-build
  CLEAN   scripts/basic
  CLEAN   .config
[TIMER]{MRPROPER}: 4s
x86_64 architecture detected, copying config
'configs/kernel-4.18.0-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-jmaple_sig-cloud-8_4.18.0-553.47.1.el8_10-226faf214012"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  YACC    scripts/kconfig/zconf.tab.c
  LEX     scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h

[SNIP]

  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1873s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko

[SNIP]

  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-jmaple_sig-cloud-8_4.18.0-553.47.1.el8_10-226faf214012+
[TIMER]{MODULES}: 11s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-jmaple_sig-cloud-8_4.18.0-553.47.1.el8_10-226faf214012+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 20s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-jmaple_sig-cloud-8_4.18.0-553.47.1.el8_10-226faf214012+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 4s
[TIMER]{BUILD}: 1873s
[TIMER]{MODULES}: 11s
[TIMER]{INSTALL}: 20s
[TIMER]{TOTAL} 1913s
Rebooting in 10 seconds

KSelfTests

$ grep '^ok ' 4.18.0-rocky8_10_rebuild-01aef32f4a9b+.kself.log | wc -l
206

$ grep '^ok '  4.18.0-jmaple_sig-cloud-8_4.18.0-553.47.1.el8_10-226faf214012+.keselftest.log | wc -l
206

ciq-sahlberg and others added 5 commits April 9, 2025 15:09
…tead of a two-phase approach

jira roc-2673
commit fbf6449

Instead of setting x86_virt_bits to a possibly-correct value and then
correcting it later, do all the necessary checks before setting it.

At this point, the #VC handler references boot_cpu_data.x86_virt_bits,
and in the previous version, it would be triggered by the CPUIDs between
the point at which it is set to 48 and when it is set to the correct
value.

    Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Adam Dunlap <acdunlap@google.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Jacob Xu <jacobhxu@google.com>
    Link: https://lore.kernel.org/r/20230912002703.3924521-3-acdunlap@google.com

Signed-off-by: Ronnie Sahlberg <rsahlberg@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira roc-2673
commit 3e32552

c->x86_cache_alignment is initialized from c->x86_clflush_size.
However, commit fbf6449 moved c->x86_clflush_size initialization
to later in boot without moving the c->x86_cache_alignment assignment:

  fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")

This presumably left c->x86_cache_alignment set to zero for longer
than it should be.

The result was an oops on 32-bit kernels while accessing a pointer
at 0x20.  The 0x20 came from accessing a structure member at offset
0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10.  kmalloc() can
evidently return ZERO_SIZE_PTR when it's given 0 as its alignment
requirement.

Move the c->x86_cache_alignment initialization to be after
c->x86_clflush_size has an actual value.

    Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: Nathan Chancellor <nathan@kernel.org>
    Link: https://lore.kernel.org/r/20231002220045.1014760-1-dave.hansen@linux.intel.com
    (cherry picked from commit 3e32552)
Signed-off-by: Ronnie Sahlberg <rsahlberg@ciq.com>

Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-2183
bug-fix x86/sev-es: Set x86_virt_bits
commit-author Paolo Bonzini <pbonzini@redhat.com>
commit 9a45819

In commit fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct
value straight away, instead of a two-phase approach"), the initialization
of c->x86_phys_bits was moved after this_cpu->c_early_init(c).  This is
incorrect because early_init_amd() expected to be able to reduce the
value according to the contents of CPUID leaf 0x8000001f.

Fortunately, the bug was negated by init_amd()'s call to early_init_amd(),
which does reduce x86_phys_bits in the end.  However, this is very
late in the boot process and, most notably, the wrong value is used for
x86_phys_bits when setting up MTRRs.

To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is
set/cleared, and c->extended_cpuid_level is retrieved.

Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com
(cherry picked from commit 9a45819)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…sizes()

jira LE-2183
bug-fix-prereq x86/sev-es: Set x86_virt_bits
commit-author Borislav Petkov (AMD) <bp@alien8.de>
commit 95bfb35

Drop 'vp_bits_from_cpuid' as it is not really needed.

No functional changes.

	Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240316120706.4352-1-bp@alien8.de
(cherry picked from commit 95bfb35)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-2183
bug-fix x86/sev-es: Set x86_virt_bits
commit-author Dave Hansen <dave.hansen@linux.intel.com>
commit 2a38e4c

tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.

Long Story:

The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0.  Normally, this value is populated from
c->x86_clflush_size.

Right now the code is set up to get c->x86_clflush_size from two
places.  First, modern CPUs get it from CPUID.  Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().

The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID.  But there are oddballs.

Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:

	cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
	if (cap0 & (1<<19))
		c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;

So they: land in get_cpu_address_sizes() and see that CPUID has level
0x80000008 and jump into the side of the if() that does not fill in
c->x86_clflush_size.  That assigns a 0 to c->x86_cache_alignment, and
hilarity ensues in code like:

        buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
                         GFP_KERNEL);

To fix this, always provide a sane value for ->x86_clflush_size.

Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs.  It would not, for instance, have worked on
the QEMU config.

1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel/GenuineIntel0000590_Clanton_03_CPUID.txt
2. You can also get this behavior if you use "-cpu 486,+clzero"
   in QEMU.

[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog
	   because bpetkov brutally murdered it recently. ]

Fixes: fbf6449 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
	Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
	Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
	Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
	Tested-by: Jörn Heusipp <osmanx@heusipp.de>
	Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linux.intel.com/
Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.de/
Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.intel.com
(cherry picked from commit 2a38e4c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

@jdieter
Copy link

jdieter commented Apr 10, 2025

@bmastbergen, since @PlaidCat is out, do you mind if I go ahead and merge this? I'm working on the dist-git side of things.

@bmastbergen bmastbergen merged commit 4727168 into sig-cloud-8/4.18.0-553.47.1.el8_10 Apr 10, 2025
2 checks passed
@PlaidCat PlaidCat deleted the jmaple_sig-cloud-8/4.18.0-553.47.1.el8_10 branch July 11, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants