[LTS 8.8] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #234

pvts-mat · 2025-04-28T23:09:31Z

[LTS 8.8]
CVE-2025-21927
VULN-56024

Problem

https://www.cve.org/CVERecord?id=CVE-2025-21927

In the Linux kernel, the following vulnerability has been resolved: nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() nvme_tcp_recv_pdu() doesn't check the validity of the header length. When header digests are enabled, a target might send a packet with an invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst() to access memory outside the allocated area and cause memory corruptions by overwriting it with the calculated digest. Fix this by rejecting packets with an unexpected header length.

Analysis and Solution

Context

NVME (Non-Volatile Memory Express) is a communication protocol designed for accessing high-speed storage media, particularly solid-state drives (SSDs). NVMe over Fabrics (NVMe-oF) is an extension of the NVMe protocol that allows NVMe commands to be sent over a network fabric, enabling remote access to NVMe storage devices.

The "target" mentioned in CVE description is the host providing access to the local NVME device (the server). The host importing the remote NVME device is called simply a "host", or "initiator" (the client). The module implementing NVMe-oF on target's side is nvmet-tcp, on the initiator's side it's nvme-tcp - the subject of this patch.

Applicability

All the key options related to NVMe-oF, specifically CONFIG_NVME_TCP enabling the nvme-tcp module, are enabled in ciqlts8_8. Per .config file created from configs/kernel-x86_64.config:

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
CONFIG_NVME_VERBOSE_ERRORS=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
CONFIG_NVME_RDMA=m
CONFIG_NVME_FC=m
CONFIG_NVME_TCP=m
CONFIG_NVME_TARGET=m
# CONFIG_NVME_TARGET_PASSTHRU is not set
CONFIG_NVME_TARGET_LOOP=m
CONFIG_NVME_TARGET_RDMA=m
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
CONFIG_NVME_TARGET_TCP=m

Solution

The solution in the mainline kernel is provided in the ad95bab commit. It was not backported to any stable kernel older than 6.12.

Naive cherry-picking results in conflicts with git's attempt to introduce additional functions (nvme_tcp_tls_configured, nvme_tcp_queue_tls) and code branches (nvme_tcp_c2h_term packet type check in the nvme_tcp_recv_pdu function) introduced in more recent versions of the module but not related to the bug fix.

A small change was made to the nvme_tcp_recv_pdu_supported function introduced in the official fix ad95bab for the sake of nvme_tcp_recv_pdu's behavior consistency between the scenarios of receiving a packet with a proper and an improper header - the removal of the nvme_tcp_c2h_term case.

Consider the behavior cases in case a packet with a proper header was received:

	Packet type:	X ∈ {c2h_term}	X ∈ {c2h_data, rsp, r2t}	X ∉ {c2h_term, c2h_data, rsp, r2t}
a	Mainline, after patch (`ad95bab`)	nvme_tcp_handle_X	nvme_tcp_handle_X	"unsupported pdu type …", -EINVAL
b	ciqlts8_8, after patch, c2h_term included	"unsupported pdu type …", -EINVAL	nvme_tcp_handle_X	"unsupported pdu type …", -EINVAL
c	ciqlts8_8, after patch, c2h_term excluded	"unsupported pdu type …", -EINVAL	nvme_tcp_handle_X	"unsupported pdu type …", -EINVAL

Then in case a packet with an improper header was received:

	Packet type:	X ∈ {c2h_term}	X ∈ {c2h_data, rsp, r2t}	X ∉ {c2h_term, c2h_data, rsp, r2t}
x	Mainline, after patch (`ad95bab`)	"pdu type %d has unexpected header length", -EPROTO	"pdu type %d has unexpected header length", -EPROTO	"unsupported pdu type …", -EINVAL
y	ciqlts8_8, after patch, c2h_term included	"pdu type %d has unexpected header length", -EPROTO	"pdu type %d has unexpected header length", -EPROTO	"unsupported pdu type …", -EINVAL
z	ciqlts8_8, after patch, c2h_term excluded	"unsupported pdu type …", -EINVAL	"pdu type %d has unexpected header length", -EPROTO	"unsupported pdu type …", -EINVAL

Solution a is to x not as b is to y, but as c is to z, thus the c, z pair was chosen.

kABI check: passed

DESCR_TARGET=1 DEBUG=1 CVE=CVE-2025-21927 ./ninja.sh _kabi_checked__x86_64--test--ciqlts8_8-CVE-2025-21927

[0/1] 	Check ABI of kernel [ciqlts8_8-CVE-2025-21927]	_kabi_checked__x86_64--test--ciqlts8_8-CVE-2025-21927
++ uname -m
+ python3 /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/check-kabi -k /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/Module.kabi_x86_64 -s vms/x86_64--build--ciqlts8_8/build_files/kernel-src-tree-ciqlts8_8-CVE-2025-21927/Module.symvers
kABI check passed
+ touch state/kernels/ciqlts8_8-CVE-2025-21927/x86_64/kabi_checked

Boot test: passed

boot-test.log

Kselftests: passed relative

Methodology

The selftests were source-compiled from the recent ciqlts8_8 branch (commit f10433c). The bpf suite was run from the kernel-selftests-internal package.

The tests were run using an explicit list which omitted certain tests known to give inconsistent results. Details in the src/run-kselftests.sh script of the rocky-patching project.

Coverage

android, bpf, breakpoints, capabilities, cgroup, core, cpu-hotplug, cpufreq, drivers/net/bonding, drivers/net/team, efivarfs, exec, firmware, fpu, ftrace, futex, gpio, intel_pstate, ipc, kcmp, kexec, kvm, lib, livepatch, membarrier, memfd, memory-hotplug, mount, mqueue, net (except ip_defrag.sh, udpgso_bench.sh, txtimestamp.sh, gro.sh), net/forwarding (except ipip_hier_gre_keys.sh, sch_ets.sh, sch_tbf_ets.sh, sch_tbf_prio.sh, sch_tbf_root.sh, tc_actions.sh) net/mptcp, netfilter (except nft_trans_stress.sh), nsfs, pstore, ptrace, rseq, sgx, sigaltstack, size, splice, static_keys, sync, sysctl, tc-testing, tdx, timens, timers, tpm2, user, vm, x86, zram,

Reference

kselftests–mix–ciqlts8_8–run1.log
kselftests–mix–ciqlts8_8–run2.log
kselftests–mix–ciqlts8_8–run3.log

Patch

kselftests–mix–ciqlts8_8-CVE-2025-21927–run1.log
kselftests–mix–ciqlts8_8-CVE-2025-21927–run2.log

Comparison

ktests.xsh diff -d kselftests-*.log

Column    File
--------  ---------------------------------------------------
Status0   kselftests--mix--ciqlts8_8--run1.log
Status1   kselftests--mix--ciqlts8_8--run2.log
Status2   kselftests--mix--ciqlts8_8--run3.log
Status3   kselftests--mix--ciqlts8_8-CVE-2025-21927--run1.log
Status4   kselftests--mix--ciqlts8_8-CVE-2025-21927--run2.log

TestCase                   Status0  Status1  Status2  Status3  Status4  Summary
net/mptcp:simult_flows.sh  fail     pass     pass     pass     pass     diff
net:reuseport_addr_any.sh  fail     pass     fail     fail     fail     diff
net:xfrm_policy.sh         fail     fail     pass     fail     pass     diff

All the differences are contained within the reference test batch itself, proving that the patch doesn't introduce any regression.

Differences highlights

net/mptcp:simult_flows.sh

This is a performance test, nondeterministic by nature.

Example of a successful run:

./ktests.xsh show kselftests--mix--ciqlts8_8--run2.log  --test net/mptcp:simult_flows.sh

# balanced bwidth                                             7409 max 7561       [ OK ]
# balanced bwidth - reverse direction                         7390 max 7561       [ OK ]
# balanced bwidth with unbalanced delay                       7387 max 7561       [ OK ]
# balanced bwidth with unbalanced delay - reverse direction   7391 max 7561       [ OK ]
# unbalanced bwidth                                           3905 max 4005       [ OK ]
# unbalanced bwidth - reverse direction                       3868 max 4005       [ OK ]
# unbalanced bwidth with unbalanced delay                     3894 max 4005       [ OK ]
# unbalanced bwidth with unbalanced delay - reverse direction 3868 max 4005       [ OK ]
# unbalanced bwidth with opposed, unbalanced delay            3859 max 4005       [ OK ]
# unbalanced bwidth with opposed, unbalanced delay - reverse direction3855 max 4005       [ OK ]
ok 1 selftests: net/mptcp: simult_flows.sh

The failed test:

./ktests.xsh show kselftests--mix--ciqlts8_8--run1.log  --test net/mptcp:simult_flows.sh

# balanced bwidth                                             7383 max 7561       [ OK ]
# balanced bwidth - reverse direction                         7390 max 7561       [ OK ]
# balanced bwidth with unbalanced delay                       7399 max 7561       [ OK ]
# balanced bwidth with unbalanced delay - reverse direction   7389 max 7561       [ OK ]
# unbalanced bwidth                                           transfer slower than expected! runtime 4413 ms, expected 4005 ms max 4005        [ fail ]
# client exit code 1, server 0
# 
# netns ns3-0-jEflZx socket stat for 10005:
# State     Recv-Q Send-Q Local Address:Port  Peer Address:Port Process                    
# TIME-WAIT 0      0           10.0.3.3:10005     10.0.1.1:52680 timer:(timewait,59sec,0)
# 	
# TIME-WAIT 0      0           10.0.3.3:10005     10.0.2.1:48335 timer:(timewait,59sec,0)
# 	
# 
# netns ns1-0-jEflZx socket stat for 10005:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw-------. 1 root root 16777216 Apr 28 07:29 /tmp/tmp.EqL5EPtmLv
# -rw-------. 1 root root 16777216 Apr 28 07:29 /tmp/tmp.FWMds3XXGs
# -rw-------. 1 root root 81920 Apr 28 07:29 /tmp/tmp.dNXMKVvXiU
# -rw-------. 1 root root 81920 Apr 28 07:29 /tmp/tmp.qJ11Vm2cpV
# unbalanced bwidth - reverse direction                       3838 max 4005       [ OK ]
# unbalanced bwidth with unbalanced delay                     3859 max 4005       [ OK ]
# unbalanced bwidth with unbalanced delay - reverse direction 3838 max 4005       [ OK ]
# unbalanced bwidth with opposed, unbalanced delay            3898 max 4005       [ OK ]
# unbalanced bwidth with opposed, unbalanced delay - reverse direction3838 max 4005       [ OK ]
not ok 1 selftests: net/mptcp: simult_flows.sh # exit=1

The test execution time exceeded the failing threshold by a little less than half a second.

net:reuseport_addr_any.sh

It's unclear at the moment what is the root case for the test failing in some cases and not in others. What is known however is that when it fails it's always for the same reason

./ktests.xsh show_groups kselftests*.log  --test collection:test

kselftests--mix--ciqlts8_8--run1.log:
kselftests--mix--ciqlts8_8--run3.log:
kselftests--mix--ciqlts8_8-CVE-2025-21927--run1.log:
kselftests--mix--ciqlts8_8-CVE-2025-21927--run2.log:
net:reuseport_addr_any.sh:
# UDP IPv4 ... pass
# UDP IPv6 ... pass
# UDP IPv4 mapped to IPv6 ... pass
# TCP IPv4 ... ./reuseport_addr_any: received on an unexpected socket
not ok 1 selftests: net: reuseport_addr_any.sh # exit=1

kselftests--mix--ciqlts8_8--run2.log:
net:reuseport_addr_any.sh:
# UDP IPv4 ... pass
# UDP IPv6 ... pass
# UDP IPv4 mapped to IPv6 ... pass
# TCP IPv4 ... pass
# TCP IPv6 ... pass
# TCP IPv4 mapped to IPv6 ... pass
# DCCP not supported: skipping DCCP tests
# SUCCESS
ok 1 selftests: net: reuseport_addr_any.sh

net:xfrm_policy.sh

When the test fails it's always for the same reason: exceeding the timeout of 300 seconds when inserting the policies in random order.

./ktests.xsh show_groups kselftests*.log  --test net:xfrm_policy.sh

kselftests--mix--ciqlts8_8--run1.log:
kselftests--mix--ciqlts8_8--run2.log:
kselftests--mix--ciqlts8_8-CVE-2025-21927--run1.log:
net:xfrm_policy.sh:
# PASS: policy before exception matches
# PASS: ping to .254 bypassed ipsec tunnel (exceptions)
# PASS: direct policy matches (exceptions)
# PASS: policy matches (exceptions)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies)
# PASS: direct policy matches (exceptions and block policies)
# PASS: policy matches (exceptions and block policies)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hresh changes)
# PASS: direct policy matches (exceptions and block policies after hresh changes)
# PASS: policy matches (exceptions and block policies after hresh changes)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hthresh change in ns3)
# PASS: direct policy matches (exceptions and block policies after hthresh change in ns3)
# PASS: policy matches (exceptions and block policies after hthresh change in ns3)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after htresh change to normal)
# PASS: direct policy matches (exceptions and block policies after htresh change to normal)
# PASS: policy matches (exceptions and block policies after htresh change to normal)
# PASS: policies with repeated htresh change
#
not ok 1 selftests: net: xfrm_policy.sh # TIMEOUT 300 seconds

kselftests--mix--ciqlts8_8--run3.log:
kselftests--mix--ciqlts8_8-CVE-2025-21927--run2.log:
net:xfrm_policy.sh:
# PASS: policy before exception matches
# PASS: ping to .254 bypassed ipsec tunnel (exceptions)
# PASS: direct policy matches (exceptions)
# PASS: policy matches (exceptions)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies)
# PASS: direct policy matches (exceptions and block policies)
# PASS: policy matches (exceptions and block policies)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hresh changes)
# PASS: direct policy matches (exceptions and block policies after hresh changes)
# PASS: policy matches (exceptions and block policies after hresh changes)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hthresh change in ns3)
# PASS: direct policy matches (exceptions and block policies after hthresh change in ns3)
# PASS: policy matches (exceptions and block policies after hthresh change in ns3)
# PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after htresh change to normal)
# PASS: direct policy matches (exceptions and block policies after htresh change to normal)
# PASS: policy matches (exceptions and block policies after htresh change to normal)
# PASS: policies with repeated htresh change
# PASS: policies inserted in random order
ok 1 selftests: net: xfrm_policy.sh

Deeper investigation would have to be done to determine whether simply increasing the timeout would stabilize the test.

Specific tests: suspended

An attempt was made to set up the NVME-oF network between the VM (initiator) and the physical host (target). Obtained a working NVME server, as per module's dmesg

[6641956.591574] nvmet_tcp: enabling port 1 (192.168.122.1:4420)

However, connection to the target was refused

nvme discover -t tcp -a 192.168.122.1 -s 4420

[  469.262269] nvme nvme0: failed to connect socket: -111
Failed to write to /dev/nvme-fabrics: Connection refused

Suspended the effort after some fruitless firewall adjustments. To resume on request.

jira VULN-56024 cve CVE-2025-21927 commit-author Maurizio Lombardi <mlombard@redhat.com> commit ad95bab upstream-diff Removed `nvme_tcp_c2h_term' case from `nvme_tcp_recv_pdu_supported' for the sake of consistency of `nvme_tcp_recv_pdu''s behavior relative to the upstream version, between the cases of proper and improper header. (What could be considered as "`c2h_term' type support" started with 84e0090 commit, not included in `ciqlts8_8''s history, so `nvme_tcp_recv_pdu_supported' in `ciqlts8_8' shouldn't report the `nvme_tcp_c2h_term' type as supported.) nvme_tcp_recv_pdu() doesn't check the validity of the header length. When header digests are enabled, a target might send a packet with an invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst() to access memory outside the allocated area and cause memory corruptions by overwriting it with the calculated digest. Fix this by rejecting packets with an unexpected header length. Fixes: 3f2304f ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> (cherry picked from commit ad95bab) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>

bmastbergen

🥌

thefossguy-ciq

🚤

jira LE-1907 Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4 commit-author Daniel Borkmann <daniel@iogearbox.net> commit c6d479b Add a big batch of test coverage to assert all aspects of the tcx link API: # ./vmtest.sh -- ./test_progs -t tc_links [...] ctrliq#225 tc_links_after:OK ctrliq#226 tc_links_append:OK ctrliq#227 tc_links_basic:OK ctrliq#228 tc_links_before:OK ctrliq#229 tc_links_chain_classic:OK ctrliq#230 tc_links_dev_cleanup:OK ctrliq#231 tc_links_invalid:OK ctrliq#232 tc_links_prepend:OK ctrliq#233 tc_links_replace:OK ctrliq#234 tc_links_revision:OK Summary: 10/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-9-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org> (cherry picked from commit c6d479b) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

jira LE-1907 Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4 commit-author Daniel Borkmann <daniel@iogearbox.net> commit ccd9a8b Add several new tcx test cases to improve test coverage. This also includes a few new tests with ingress instead of clsact qdisc, to cover the fix from commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free"). # ./test_progs -t tc [...] ctrliq#234 tc_links_after:OK ctrliq#235 tc_links_append:OK ctrliq#236 tc_links_basic:OK ctrliq#237 tc_links_before:OK ctrliq#238 tc_links_chain_classic:OK ctrliq#239 tc_links_chain_mixed:OK ctrliq#240 tc_links_dev_cleanup:OK ctrliq#241 tc_links_dev_mixed:OK ctrliq#242 tc_links_ingress:OK ctrliq#243 tc_links_invalid:OK ctrliq#244 tc_links_prepend:OK ctrliq#245 tc_links_replace:OK ctrliq#246 tc_links_revision:OK ctrliq#247 tc_opts_after:OK ctrliq#248 tc_opts_append:OK ctrliq#249 tc_opts_basic:OK ctrliq#250 tc_opts_before:OK ctrliq#251 tc_opts_chain_classic:OK ctrliq#252 tc_opts_chain_mixed:OK ctrliq#253 tc_opts_delete_empty:OK ctrliq#254 tc_opts_demixed:OK ctrliq#255 tc_opts_detach:OK ctrliq#256 tc_opts_detach_after:OK ctrliq#257 tc_opts_detach_before:OK ctrliq#258 tc_opts_dev_cleanup:OK ctrliq#259 tc_opts_invalid:OK ctrliq#260 tc_opts_mixed:OK ctrliq#261 tc_opts_prepend:OK ctrliq#262 tc_opts_replace:OK ctrliq#263 tc_opts_revision:OK [...] Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> (cherry picked from commit ccd9a8b) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

pvts-mat changed the title ~~nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu()~~ [LTS 8.8] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() Apr 28, 2025

bmastbergen self-requested a review April 29, 2025 12:59

bmastbergen approved these changes Apr 29, 2025

View reviewed changes

PlaidCat requested review from kerneltoast and thefossguy-ciq April 29, 2025 14:19

thefossguy-ciq approved these changes Apr 29, 2025

View reviewed changes

pvts-mat mentioned this pull request Apr 30, 2025

[LTS 8.6] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #236

Merged

PlaidCat merged commit 2b353ec into ctrliq:ciqlts8_8 Apr 30, 2025
2 checks passed

pvts-mat mentioned this pull request Apr 30, 2025

[LTS 9.2] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #238

Merged

pvts-mat mentioned this pull request May 12, 2025

[LTS 9.4] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LTS 8.8] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #234

[LTS 8.8] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #234

Uh oh!

pvts-mat commented Apr 28, 2025

Uh oh!

bmastbergen left a comment

Uh oh!

thefossguy-ciq left a comment

Uh oh!

Uh oh!

Uh oh!

[LTS 8.8] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #234

[LTS 8.8] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() #234

Uh oh!

Conversation

pvts-mat commented Apr 28, 2025

Problem

Analysis and Solution

Context

Applicability

Solution

kABI check: passed

Boot test: passed

Kselftests: passed relative

Methodology

Coverage

Reference

Patch

Comparison

Differences highlights

net/mptcp:simult_flows.sh

net:reuseport_addr_any.sh

net:xfrm_policy.sh

Specific tests: suspended

Uh oh!

bmastbergen left a comment

Choose a reason for hiding this comment

Uh oh!

thefossguy-ciq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!