Skip to content

Conversation

@PlaidCat
Copy link
Collaborator

@PlaidCat PlaidCat commented Nov 3, 2025

Update process (This kernel CentOS base for 5.14.0-570)

  • Kernel History Rebuild Process for all src.rpms hosted by RESF
  • Create sig-cloud-9/5.14.0-570.X.1.el9_6 branch
  • Check if any maintained code is included in the new el release.
  • Cherry-pick all code from previous branch into new branch (skipping unneeded code)
    • Fix conflicts as they arise
  • Build and Test

FIPS Integration

This is the integration of the FIPS changes into rlc-9/NVR. Previously both Sig Cloud Next (SCN) and Fips for 9.6 where maintained in seperate rolling releases based on the same base Rocky9_6 code. For 5.14.0-570.55.1 SCN was converted into rlc and FIPS continued like normal. The update to 5.14.0-570.58.1 we're going to merge in the 5.14.0-570.55.1 fips changes to rlc first with a rebase of the fips branch then a merge into rlc-9/5.14.0-570.55.1. After that we will proceed with normal forward porting which by default always runs the FIPS modification checks on the base RockyX_y version seen in the Forward Port Process section later.

git checkout fips-9-compliant/5.14.0-570.55.1
git rebase rlc-9/5.14.0-570.55.1
git checkout rlc-9/5.14.0-570.55.1
git merge --ff-only fips-9-compliant/5.14.0-570.55.1
python3 rolling-release-update.py --repo ../kernel-src-tree/ --new-base-branch rocky9_6 --old-rolling-branch rlc-9/5.14.0-570.55.1.el9_6 | tee ../RR.RLC+FIPS.$(git -C ../kernel-src-tree describe origin/rocky9_6).log

Removed Commits

None

Forward Port Process

[jmaple@devbox kernel-src-tree-tools]$ python3 rolling-release-update.py --repo ../kernel-src-tree/ --new-base-branch rocky9_6 --old-rolling-branch rlc-9/5.14.0-570.55.1.el9_6 | tee ../RR.RLC+FIPS.$(git -C ../kernel-src-tree describe origin/rocky9_6).log
[rolling release update] Rolling Product:  rlc-9
[rolling release update] Checking out branch:  rlc-9/5.14.0-570.55.1.el9_6
[rolling release update] Gathering all the RESF kernel Tags
[rolling release update] Found 24 RESF kernel tags
[rolling release update] Checking out branch:  rocky9_6
[rolling release update] Gathering all the RESF kernel Tags
[rolling release update] Found 25 RESF kernel tags
[rolling release update] Latest RESF tag sha:  b'e9f8d0801b38'
"e9f8d0801b384e84a02da8cc6128b45001f2ebab Rebuild rocky9_6 with kernel-5.14.0-570.55.1.el9_6"
[rolling release update] Checking for FIPS protected changes between the common tag and HEAD
[rolling release update] Checking for FIPS protected changes
[rolling release update] Getting SHAS e9f8d0801b38..HEAD
[rolling release update] Number of commits to check:  16
[rolling release update] Checking modifications of shas
[rolling release update] Checked 1 of 16 commits
[rolling release update] Checked 2 of 16 commits
[rolling release update] Checked 3 of 16 commits
[rolling release update] Checked 4 of 16 commits
[rolling release update] Checked 5 of 16 commits
[rolling release update] Checked 6 of 16 commits
[rolling release update] Checked 7 of 16 commits
[rolling release update] Checked 8 of 16 commits
[rolling release update] Checked 9 of 16 commits
[rolling release update] Checked 10 of 16 commits
[rolling release update] Checked 11 of 16 commits
[rolling release update] Checked 12 of 16 commits
[rolling release update] Checked 13 of 16 commits
[rolling release update] Checked 14 of 16 commits
[rolling release update] Checked 15 of 16 commits
[rolling release update] Checked 16 of 16 commits
[rolling release update] 0 of 16 commits have FIPS protected changes
[rolling release update] Checking out old rolling branch:  rlc-9/5.14.0-570.55.1.el9_6
[rolling release update] Finding the CIQ Kernel and Associated Upstream commits between the last resf tag and HEAD
[rolling release update] Last RESF tag sha:  b'e9f8d0801b38'
[rolling release update] Total commits in old branch: 23
[rolling release update] Checking out new base branch:  rocky9_6
[rolling release update] Finding the kernel version for the new rolling release
[rolling release update] New Branch to create: rlc-9/5.14.0-570.58.1.el9_6
[rolling release update] Creating new branch: rlc-9/5.14.0-570.58.1.el9_6
[rolling release update] Creating new branch for PR:  jmaple_rlc-9/5.14.0-570.58.1.el9_6
[rolling release update] Creating Map of all new commits from last rolling release fork
[rolling release update] Total commits in new branch: 15
[rolling release update] Checking if any of the commits from the old rolling release are already present in the new base branch
[rolling release update] Found 0 duplicate commits to remove
[rolling release update] Applying 23 remaining commits to the new branch
  [1/23] e11399e56e8d selftests/mm temporary fix of hmm infinite loop
  [2/23] 3ebeff786d7c tools: hv: Enable debug logs for hv_kvp_daemon
  [3/23] ef802c870b83 scsi: storvsc: Increase the timeouts to storvsc_timeout
  [4/23] a0d7a78675ca Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
  [5/23] e68a5ae66431 hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
  [6/23] df912f5b3d2e hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
  [7/23] dacda5a7a2dd hv_netvsc: Remove rmsg_pgcnt
  [8/23] 3f4cd04f0c2c Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
  [9/23] b9c1c19138a7 SUSE: patch: crypto-ecdh-implement-FIPS-PCT.patch
  [10/23] 4230ae381732 crypto: essiv - Zeroize keys on exit in essiv_aead_setkey()
  [11/23] 8b28da89f222 crypto: jitter - replace LFSR with SHA3-256
  [12/23] a49618acddab crypto: aead,cipher - zeroize key buffer after use
  [13/23] 33eba93dd946 crypto: ecdh - explicitly zeroize private_key
  [14/23] 557b9f10424f crypto: lib/mpi - Fix unexpected pointer access in mpi_ec_init
  [15/23] dc894f46fed3 crypto: Kconfig - Make CRYPTO_FIPS depend on the DRBG being built-in
  [16/23] 95681bd0eb17 random: Restrict extrng registration to init time
  [17/23] a5c1cd6e33d6 crypto: rng - Convert crypto_default_rng_refcnt into an unsigned int
  [18/23] 383cdb0e2067 crypto: rng - Only allow the DRBG to register as "stdrng" in FIPS mode
  [19/23] 39149a036174 crypto: drbg - Align buffers to at least a cache line
  [20/23] e149114976bf crypto: rng - Fix priority inversions due to mutex locks
  [21/23] 5ac150625a6d mm/gup: reintroduce pin_user_pages_fast_only()
  [22/23] 337788ae99e6 crypto: rng - Implement fast per-CPU DRBG instances
  [23/23] a7ad756472fc configs: Ensure FIPS settings defined
[rolling release update] Successfully applied all 23 commits

BUILD

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree-build
Running make mrproper...
[TIMER]{MRPROPER}: 6s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  LD [M]  sound/xen/snd_xen_front.ko
  BTF [M] sound/x86/snd-hdmi-lpe-audio.ko
  BTF [M] sound/virtio/virtio_snd.ko
  BTF [M] sound/usb/usx2y/snd-usb-usx2y.ko
  BTF [M] sound/xen/snd_xen_front.ko
[TIMER]{BUILD}: 1712s
Making Modules
  INSTALL /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
--
  STRIP   /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+/kernel/sound/virtio/virtio_snd.ko
  DEPMOD  /lib/modules/5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+
[TIMER]{MODULES}: 8s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+ \
        arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 22s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 6s
[TIMER]{BUILD}: 1712s
[TIMER]{MODULES}: 8s
[TIMER]{INSTALL}: 22s
[TIMER]{TOTAL} 1753s
Rebooting in 10 seconds

KSelfTests

[jmaple@devbox code]$ ~/workspace/auto_kernel_history_rebuild/Rocky10/rocky10/code/get_kselftest_diff.sh
kselftest.5.14.0-jmaple_sig-cloud-9_5.14.0-570.39.1.el9_6-f6a810230c4c+.log
318
kselftest.5.14.0-jmaple_rlc-9_5.14.0-570.55.1.el9_6-3f4cd04f0c2c+.log
318
kselftest.5.14.0-jmaple_fips-9-compliant_5.14.0-570.55.1.el9_6-516395f+.log
318
kselftest.5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+.log
318
Before: kselftest.5.14.0-jmaple_fips-9-compliant_5.14.0-570.55.1.el9_6-516395f+.log
After: kselftest.5.14.0-jmaple_rlc-9_5.14.0-570.58.1.el9_6-1165b486bc23+.log
Diff:
-ok 7 selftests: timers: raw_skew # SKIP
+ok 7 selftests: timers: raw_skew

PlaidCat and others added 23 commits November 3, 2025 12:50
jira SECO-170

In Rocky9 if you run ./run_vmtests.sh -t hmm it will fail and cause an
infinite loop on ASSERTs in FIXTURE_TEARDOWN()
This temporary fix is based on the discussion here
https://patchwork.kernel.org/project/linux-kselftest/patch/26017fe3-5ad7-6946-57db-e5ec48063ceb@suse.cz/#25046055

We will investigate further kselftest updates that will resolve the root
causes of this.

Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3207
feature tools_hv
commit-author Shradha Gupta <shradhagupta@linux.microsoft.com>
commit a9c0b33

Allow the KVP daemon to log the KVP updates triggered in the VM
with a new debug flag(-d).
When the daemon is started with this flag, it logs updates and debug
information in syslog with loglevel LOG_DEBUG. This information comes
in handy for debugging issues where the key-value pairs for certain
pools show mismatch/incorrect values.
The distro-vendors can further consume these changes and modify the
respective service files to redirect the logs to specific files as
needed.

	Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
	Reviewed-by: Naman Jain <namjain@linux.microsoft.com>
	Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com
	Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com>
(cherry picked from commit a9c0b33)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3545
commit-author Dexuan Cui <decui@microsoft.com>
commit b2f9665

Currently storvsc_timeout is only used in storvsc_sdev_configure(), and
5s and 10s are used elsewhere. It turns out that rarely the 5s is not
enough on Azure, so let's use storvsc_timeout everywhere.

In case a timeout happens and storvsc_channel_init() returns an error,
close the VMBus channel so that any host-to-guest messages in the
channel's ringbuffer, which might come late, can be safely ignored.

Add a "const" to storvsc_timeout.

	Cc: stable@kernel.org
	Signed-off-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1749243459-10419-1-git-send-email-decui@microsoft.com
	Reviewed-by: Long Li <longli@microsoft.com>
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b2f9665)
	Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3554
commit-author Michael Kelley <mhklinux@outlook.com>
commit 380b75d

vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver
and is hardcoded to create a single GPA range. To allow it to also be
used by the netvsc driver to create multiple GPA ranges, no longer
hardcode as having a single GPA range. Allow the calling driver to
specify the rangecount in the supplied descriptor.

Update the storvsc driver to reflect this new approach.

	Cc: <stable@vger.kernel.org> # 6.1.x
	Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 380b75d)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3554
commit-author Michael Kelley <mhklinux@outlook.com>
commit 4f98616

netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus
messages. This function creates a series of GPA ranges, each of which
contains a single PFN. However, if the rndis header in the VMBus
message crosses a page boundary, the netvsc protocol with the host
requires that both PFNs for the rndis header must be in a single "GPA
range" data structure, which isn't possible with
vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a
new function netvsc_build_mpb_array() to build a VMBus message with
multiple GPA ranges, each of which may contain multiple PFNs. Use
vmbus_sendpacket_mpb_desc() to send this VMBus message to the host.

There's no functional change since higher levels of netvsc don't
maintain or propagate knowledge of contiguous PFNs. Based on its
input, netvsc_build_mpb_array() still produces a separate GPA range
for each PFN and the behavior is the same as with
vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a
subsequent patch to provide the necessary grouping.

	Cc: <stable@vger.kernel.org> # 6.1.x
	Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 4f98616)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3554
commit-author Michael Kelley <mhklinux@outlook.com>
commit 41a6328

Starting with commit dca5161 ("hv_netvsc: Check status in
SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux
driver for Hyper-V synthetic networking (netvsc) occasionally reports
"nvsp_rndis_pkt_complete error status: 2".[1] This error indicates
that Hyper-V has rejected a network packet transmit request from the
guest, and the outgoing network packet is dropped. Higher level
network protocols presumably recover and resend the packet so there is
no functional error, but performance is slightly impacted. Commit
dca5161 is not the cause of the error -- it only added reporting
of an error that was already happening without any notice. The error
has presumably been present since the netvsc driver was originally
introduced into Linux.

The root cause of the problem is that the netvsc driver in Linux may
send an incorrectly formatted VMBus message to Hyper-V when
transmitting the network packet. The incorrect formatting occurs when
the rndis header of the VMBus message crosses a page boundary due to
how the Linux skb head memory is aligned. In such a case, two PFNs are
required to describe the location of the rndis header, even though
they are contiguous in guest physical address (GPA) space. Hyper-V
requires that two rndis header PFNs be in a single "GPA range" data
struture, but current netvsc code puts each PFN in its own GPA range,
which Hyper-V rejects as an error.

The incorrect formatting occurs only for larger packets that netvsc
must transmit via a VMBus "GPA Direct" message. There's no problem
when netvsc transmits a smaller packet by copying it into a pre-
allocated send buffer slot because the pre-allocated slots don't have
page crossing issues.

After commit 14ad6ed ("net: allow small head cache usage with
large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs
much more frequently in VMs with 16 or more vCPUs. It may occur every
few seconds, or even more frequently, in an ssh session that outputs a
lot of text. Commit 14ad6ed subtly changes how skb head memory is
allocated, making it much more likely that the rndis header will cross
a page boundary when the vCPU count is 16 or more. The changes in
commit 14ad6ed are perfectly valid -- they just had the side
effect of making the netvsc bug more prominent.

Current code in init_page_array() creates a separate page buffer array
entry for each PFN required to identify the data to be transmitted.
Contiguous PFNs get separate entries in the page buffer array, and any
information about contiguity is lost.

Fix the core issue by having init_page_array() construct the page
buffer array to represent contiguous ranges rather than individual
pages. When these ranges are subsequently passed to
netvsc_build_mpb_array(), it can build GPA ranges that contain
multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete
error status: 2". If instead the network packet is sent by copying
into a pre-allocated send buffer slot, the copy proceeds using the
contiguous ranges rather than individual pages, but the result of the
copying is the same. Also fix rndis_filter_send_request() to construct
a contiguous range, since it has its own page buffer array.

This change has a side benefit in CoCo VMs in that netvsc_dma_map()
calls dma_map_single() on each contiguous range instead of on each
page. This results in fewer calls to dma_map_single() but on larger
chunks of memory, which should reduce contention on the swiotlb.

Since the page buffer array now contains one entry for each contiguous
range instead of for each individual page, the number of entries in
the array can be reduced, saving 208 bytes of stack space in
netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217503

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503
	Cc: <stable@vger.kernel.org> # 6.1.x
	Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 41a6328)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3554
commit-author Michael Kelley <mhklinux@outlook.com>
commit 5bbc644

init_page_array() now always creates a single page buffer array entry
for the rndis message, even if the rndis message crosses a page
boundary. As such, the number of page buffer array entries used for
the rndis message must no longer be tracked -- it is always just 1.
Remove the rmsg_pgcnt field and use "1" where the value is needed.

	Cc: <stable@vger.kernel.org> # 6.1.x
	Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 5bbc644)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-3554
commit-author Michael Kelley <mhklinux@outlook.com>
commit 45a442f

With the netvsc driver changed to use vmbus_sendpacket_mpb_desc()
instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining
callers. Remove it.

	Cc: <stable@vger.kernel.org> # 6.1.x
	Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 45a442f)
	Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Shreeya Patel <spatel@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
In essiv_aead_setkey(), use the same logic as crypto_authenc_esn_setkey()
to zeroize keys on exit.

[Sultan: touched up commit message]

Signed-off-by: Jason Rodriguez <jrodriguez@ciq.com>
Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
        Using the kernel crypto API, the SHA3-256 algorithm is used as
        conditioning element to replace the LFSR in the Jitter RNG. All other
        parts of the Jitter RNG are unchanged.

        The application and use of the SHA-3 conditioning operation is identical
        to the user space Jitter RNG 3.4.0 by applying the following concept:

        - the Jitter RNG initializes a SHA-3 state which acts as the "entropy
          pool" when the Jitter RNG is allocated.

        - When a new time delta is obtained, it is inserted into the "entropy
          pool" with a SHA-3 update operation. Note, this operation in most of
          the cases is a simple memcpy() onto the SHA-3 stack.

        - To cause a true SHA-3 operation for each time delta operation, a
          second SHA-3 operation is performed hashing Jitter RNG status
          information. The final message digest is also inserted into the
          "entropy pool" with a SHA-3 update operation. Yet, this data is not
          considered to provide any entropy, but it shall stir the entropy pool.

        - To generate a random number, a SHA-3 final operation is performed to
          calculate a message digest followed by an immediate SHA-3 init to
          re-initialize the "entropy pool". The obtained message digest is one
          block of the Jitter RNG that is returned to the caller.

        Mathematically speaking, the random number generated by the Jitter RNG
        is:

        aux_t = SHA-3(Jitter RNG state data)

        Jitter RNG block = SHA-3(time_i || aux_i || time_(i-1) || aux_(i-1) ||
                                 ... || time_(i-255) || aux_(i-255))

        when assuming that the OSR = 1, i.e. the default value.

        This operation implies that the Jitter RNG has an output-blocksize of
        256 bits instead of the 64 bits of the LFSR-based Jitter RNG that is
        replaced with this patch.

        The patch also replaces the varying number of invocations of the
        conditioning function with one fixed number of invocations. The use
        of the conditioning function consistent with the userspace Jitter RNG
        library version 3.4.0.

        The code is tested with a system that exhibited the least amount of
        entropy generated by the Jitter RNG: the SiFive Unmatched RISC-V
        system. The measured entropy rate is well above the heuristically
        implied entropy value of 1 bit of entropy per time delta. On all other
        tested systems, the measured entropy rate is even higher by orders
        of magnitude. The measurement was performed using updated tooling
        provided with the user space Jitter RNG library test framework.

        The performance of the Jitter RNG with this patch is about en par
        with the performance of the Jitter RNG without the patch.

        Signed-off-by: Stephan Mueller <smueller@chronox.de>
        Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

            Back-port of commit bb897c5
            Author: Stephan Müller <smueller@chronox.de>
            Date:   Fri Apr 21 08:08:04 2023 +0200

Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
    I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding
    cryptographic information should be zeroized once they are no longer
    needed. Accomplish this by using kfree_sensitive for buffers that
    previously held the private key.

    Signed-off-by: Hailey Mothershead <hailmo@amazon.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

        Back-ported from commit 23e4099
        Author: Hailey Mothershead <hailmo@amazon.com>
        Date:   Mon Apr 15 22:19:15 2024 +0000

Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
private_key is overwritten with the key parameter passed in by the
caller (if present), or alternatively a newly generated private key.
However, it is possible that the caller provides a key (or the newly
generated key) which is shorter than the previous key. In that
scenario, some key material from the previous key would not be
overwritten. The easiest solution is to explicitly zeroize the entire
private_key array first.

Note that this patch slightly changes the behavior of this function:
previously, if the ecc_gen_privkey failed, the old private_key would
remain. Now, the private_key is always zeroized. This behavior is
consistent with the case where params.key is set and ecc_is_key_valid
fails.

Signed-off-by: Joachim Vandersmissen <git@jvdsn.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
[ Upstream commit ba3c557 ]

When the mpi_ec_ctx structure is initialized, some fields are not
cleared, causing a crash when referencing the field when the
structure was released. Initially, this issue was ignored because
memory for mpi_ec_ctx is allocated with the __GFP_ZERO flag.
For example, this error will be triggered when calculating the
Za value for SM2 separately.

Fixes: d58bb7e ("lib/mpi: Introduce ec implementation to MPI library")
Cc: stable@vger.kernel.org # v6.5
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
When FIPS mode is enabled (via fips=1), there is an absolute need for the
DRBG to be available. This is at odds with the fact that the DRBG can be
built as a module when in FIPS mode, leaving critical RNG functionality at
the whims of userspace.

Userspace could simply rmmod the DRBG module, or not provide it at all and
thus a different stdrng algorithm could be used without anyone noticing.

Additionally, when running a FIPS-enabled userspace, modprobe itself may
perform a getrandom() syscall _before_ loading a given module. As a result,
there's a possible deadlock scenario where the RNG core (crypto/rng.c)
initializes _before_ the DRBG, thereby installing its getrandom() override
without having an stdrng algorithm available. Then, when userspace calls
getrandom() which redirects to the override in crypto/rng.c,
crypto_alloc_rng("stdrng") invokes the UMH (modprobe) to load the DRBG
(which is aliased to stdrng). And *then* that modprobe invocation gets
stuck at getrandom() because there's no stdrng algorithm available!

There are too many risks that come with allowing the DRBG and RNG core to
be modular for FIPS mode. Therefore, make CRYPTO_FIPS require the DRBG to
be built-in, which in turn makes the DRBG require the RNG core to be
built-in. That way, it's guaranteed for these drivers to be built-in when
running in FIPS mode.

Also clean up the CRYPTO_FIPS option name and remove the CRYPTO_ANSI_CPRNG
dependency since it's obsolete for FIPS now.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>

Signed-off-by: Jonathan Maple <jmaple@ciq.com>
It is technically a risk to permit extrng registration by modules after
kernel init completes. Since there is only one user of the extrng interface
and it is imperative that it is the _only_ registered extrng for FIPS
compliance, restrict the extrng registration interface to only permit
registration during kernel init and only from built-in drivers.

This also eliminates the risks associated with the extrng interface itself
being designed to solely accommodate a single registration, which would
therefore permit the registered extrng to be overridden or even removed by
an unrelated module.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
There is no reason this refcount should be a signed int. Convert it to an
unsigned int, thereby also making it less likely to ever overflow.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
In FIPS mode, the DRBG must take precedence over all stdrng algorithms.
The only problem standing in the way of this is that a different stdrng
algorithm could get registered and utilized before the DRBG is registered,
and since crypto_alloc_rng() only allocates an stdrng algorithm when
there's no existing allocation, this means that it's possible for the wrong
stdrng algorithm to remain in use indefinitely.

This issue is also often impossible to observe from userspace; an RNG other
than the DRBG could be used somewhere in the kernel and userspace would be
none the wiser.

To ensure this can never happen, only allow stdrng instances from the DRBG
to be registered when running in FIPS mode. This works since the previous
commit forces the DRBG to be built into the kernel when CONFIG_CRYPTO_FIPS
is enabled, so the DRBG's presence is guaranteed when fips_enabled is true.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
None of the ciphers used by the DRBG have an alignment requirement; thus,
they all return 0 from .crypto_init, resulting in inconsistent alignment
across all buffers.

Align all buffers to at least a cache line to improve performance. This is
especially useful when multiple DRBG instances are used, since it prevents
false sharing of cache lines between the different instances.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Since crypto_devrandom_read_iter() is invoked directly by user tasks and is
accessible by every task in the system, there are glaring priority
inversions on crypto_reseed_rng_lock and crypto_default_rng_lock.

Tasks of arbitrary scheduling priority access crypto_devrandom_read_iter().
When a low-priority task owns one of the mutex locks, higher-priority tasks
waiting on that mutex lock are stalled until the low-priority task is done.

Fix the priority inversions by converting the mutex locks into rt_mutex
locks which have PI support.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Like pin_user_pages_fast(), but with the internal-only FOLL_FAST_ONLY flag.

This complements the get_user_pages*() API, which already has
get_user_pages_fast_only().

Note that pin_user_pages_fast_only() used to exist but was removed in
upstream commit edad1bb ("mm/gup: remove pin_user_pages_fast_only()")
due to it not having any users.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
When the kernel is booted with fips=1, the RNG exposed to userspace is
hijacked away from the CRNG and redirects to crypto_devrandom_read_iter(),
which utilizes the DRBG.

Notably, crypto_devrandom_read_iter() maintains just two global DRBG
instances _for the entire system_, and the two instances serve separate
request types: one instance for GRND_RANDOM requests (crypto_reseed_rng),
and one instance for non-GRND_RANDOM requests (crypto_default_rng). So in
essence, for requests of a single type, there is just one global RNG for
all CPUs in the entire system, which scales _very_ poorly.

To make matters worse, the temporary buffer used to ferry data between the
DRBG and userspace is woefully small at only 256 bytes, which doesn't do a
good job of maximizing throughput from the DRBG. This results in lost
performance when userspace requests >256 bytes; it is observed that DRBG
throughput improves by 70% on an i9-13900H when the buffer size is
increased to 4096 bytes (one page). Going beyond the size of one page up to
the DRBG maximum request limit of 65536 bytes produces diminishing returns
of only 3% improved throughput in comparison. And going below the size of
one page produces progressively less throughput at each power of 2: there's
a 5% loss going from 4096 bytes to 2048 bytes and a 9% loss going from 2048
bytes to 1024 bytes.

Thus, this implements per-CPU DRBG instances utilizing a page-sized buffer
for each CPU to utilize the DRBG itself more effectively. On top of that,
for non-GRND_RANDOM requests, the DRBG's operations now occur under a local
lock that disables preemption on non-PREEMPT_RT kernels, which not only
keeps each CPU's DRBG instance isolated from another, but also improves
temporal cache locality while the DRBG actively generates a new string of
random bytes.

Prefaulting one user destination page at a time is also employed to prevent
a DRBG instance from getting blocked on page faults, thereby maximizing the
use of the DRBG so that the only bottleneck is the DRBG itself.

Signed-off-by: Sultan Alsawaf <sultan@ciq.com>
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
We want to hard set the x86_64 FIPS required configs rather than rely on
default settings in the kernel, should these ever change without our
knowing it would not be something we would have actively checked.

The configs are a limited set of configs that is expanded out when
building using `make olddefconfig` a common practice in kernel building.

Note had to manually add the following since its normaly set by the RPM
build process.
CONFIG_CRYPTO_FIPS_NAME="Rocky Linux 9 Kernel Cryptographic API"

Signed-off-by: Jonathan Maple <jmaple@ciq.com>
@PlaidCat PlaidCat requested a review from a team November 3, 2025 19:14
@PlaidCat PlaidCat self-assigned this Nov 3, 2025
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat PlaidCat requested a review from a team November 3, 2025 20:26
@PlaidCat PlaidCat merged commit 1165b48 into rlc-9/5.14.0-570.58.1.el9_6 Nov 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

9 participants