Skip to content

Conversation

@roxanan1996
Copy link

@roxanan1996 roxanan1996 commented Oct 27, 2025

DESCRIPTION

Commit "do_change_type(): refuse to operate on unmounted/not ours mounts"
is the CVE fix. This was a clean cherry pick.

Commit "use uniform permission checks for all mount propagation changes"
was included because it has a "Fixes" reference to the previous commit.
This was not a clean cherry-pick, therefore I had to pick up commit:
"move_mount: allow to add a mount into an existing group).

And the former had a dependency "fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)"

NOTE: In case you check the patch diff with bigger context, you may see these
diff in do_change_type(). This is because these commits are missing, but
they are not relevant for this CVE:

  • 78aa08a ("fs: add path_mounted()")
  • 406fea7 ("mount: separate the flags accessed only under namespace_sem")
Screenshot From 2025-10-27 11-10-42

Commits

use uniform permission checks for all mount propagation changes

jira VULN-98610
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit cffd0441872e7f6b1fce5e78fb1c99187a291330
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)

jira VULN-98610
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit d8cc0362f918d020ca1340d7694f07062dc30f36
move_mount: allow to add a mount into an existing group

jira VULN-98610
cve-bf CVE-2025-38498
commit-author Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
commit 9ffb14ef61bab83fa818736bf3e7e6b6e182e8e2
do_change_type(): refuse to operate on unmounted/not ours mounts

jira VULN-98610
cve CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit 12f147ddd6de7382dad54812e65f3f08d05809fc

TESTING

BUILD

> grep -E -B 5 -A 5 '\[TIMER\]|^Starting Build' /home/rnicolescu/ciq/vms/ciqlts9_4/kernel-build-after.log
  CLEAN   scripts/mod
  CLEAN   scripts/selinux/genheaders
  CLEAN   scripts/selinux/mdp
  CLEAN   scripts
  CLEAN   include/config include/generated arch/x86/include/generated .config .config.old .version Module.symvers certs/signing_key.pem certs/signing_key.x509 certs/x509.genkey
[TIMER]{MRPROPER}: 8s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rnicolescu_ciqlts9_4-6fc92226e201"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  BTF [M] sound/usb/snd-usb-audio.ko
  BTF [M] sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  virt/lib/irqbypass.ko
  BTF [M] sound/xen/snd_xen_front.ko
  BTF [M] virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1412s
Making Modules
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
--
  STRIP   /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/sound/usb/misc/snd-ua101.ko
  SIGN    /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/sound/usb/misc/snd-ua101.ko
  STRIP   /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  SIGN    /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  DEPMOD  /lib/modules/5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+
[TIMER]{MODULES}: 9s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+ \
	arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 19s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+ and Index to 3
The default is /boot/loader/entries/3894cdebe7764464a6bb5aa6743dc10c-5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+.conf with index 3 and kernel /boot/vmlinuz-5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+
The default is /boot/loader/entries/3894cdebe7764464a6bb5aa6743dc10c-5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+.conf with index 3 and kernel /boot/vmlinuz-5.14.0-rnicolescu_ciqlts9_4-6fc92226e201+
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 8s
[TIMER]{BUILD}: 1412s
[TIMER]{MODULES}: 9s
[TIMER]{INSTALL}: 19s
[TIMER]{TOTAL} 1454s
Rebooting in 10 seconds

kernel-build-before.log
kernel-build-after.log

Kselftests

> /home/rnicolescu/ciq/vms/ciqlts9_4/kernel-tools/kselftest-diff.sh /home/rnicolescu/ciq/vms/ciqlts9_4
/home/rnicolescu/ciq/vms/ciqlts9_4/kselftest-before.log
367
/home/rnicolescu/ciq/vms/ciqlts9_4/kselftest-after.log
367
Before: /home/rnicolescu/ciq/vms/ciqlts9_4/kselftest-before.log
After: /home/rnicolescu/ciq/vms/ciqlts9_4/kselftest-after.log
Diff:
No differences found.

kselftest-before.log
kselftest-after.log

Check_kernel_commits including interdiff

> check_kernel_commits.py --repo /home/rnicolescu/ciq/vms/ciqlts9_4/kernel-src-tree-fix --pr_branch {rnicolescu}_ciqlts9_4 --base_branch ciqlts9_4
INTERDIFF Comparing commits cf0f3e37e614723425c0fdfa8470b6c9b16729e5 and 12f147ddd6de7382dad54812e65f3f08d05809fc...
interdiff result:
No diff between cf0f3e37e614723425c0fdfa8470b6c9b16729e5 and 12f147ddd6de7382dad54812e65f3f08d05809fc
INTERDIFF Comparing commits 71ac24bf28111fce77f3bfed7de221fb77426108 and 9ffb14ef61bab83fa818736bf3e7e6b6e182e8e2...
interdiff result:
No diff between 71ac24bf28111fce77f3bfed7de221fb77426108 and 9ffb14ef61bab83fa818736bf3e7e6b6e182e8e2
INTERDIFF Comparing commits 64d6ebe22bd251efd5a375a884d91136d69e05ca and d8cc0362f918d020ca1340d7694f07062dc30f36...
interdiff result:
No diff between 64d6ebe22bd251efd5a375a884d91136d69e05ca and d8cc0362f918d020ca1340d7694f07062dc30f36
INTERDIFF Comparing commits 6fc92226e201c864189d492da4fa50e8463477ae and cffd0441872e7f6b1fce5e78fb1c99187a291330...
interdiff result:
No diff between 6fc92226e201c864189d492da4fa50e8463477ae and cffd0441872e7f6b1fce5e78fb1c99187a291330
All referenced commits exist upstream and have no Fixes: tags.

jira VULN-98610
cve CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit 12f147d

Ensure that propagation settings can only be changed for mounts located
in the caller's mount namespace. This change aligns permission checking
with the rest of mount(2).

	Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 07b2088 ("beginning of the shared-subtree proper")
	Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com>
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 12f147d)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
@roxanan1996 roxanan1996 force-pushed the {rnicolescu}_ciqlts9_4 branch from 5c71cdb to 772b6c5 Compare October 27, 2025 10:15
@roxanan1996 roxanan1996 requested a review from a team October 27, 2025 12:07
@bmastbergen
Copy link
Collaborator

@roxanan1996 in lts-9.2 we cherry-picked 'move_mount: allow to add a mount into an existing group' as a prerequisite to the bugfix. Of course, that commit itself had a bugfix, so it ended up being four commits. See this PR where those four commits were added:

https://github.com/ctrliq/kernel-src-tree/pull/632/commits

They were all clean cherry-picks then which makes the code more consistent with upstream which might help with any future backports to this file. I wonder if we should do the same here?

@roxanan1996
Copy link
Author

roxanan1996 commented Oct 27, 2025

@roxanan1996 in lts-9.2 we cherry-picked 'move_mount: allow to add a mount into an existing group' as a prerequisite to the bugfix. Of course, that commit itself had a bugfix, so it ended up being four commits. See this PR where those four commits were added:

https://github.com/ctrliq/kernel-src-tree/pull/632/commits

They were all clean cherry-picks then which makes the code more consistent with upstream which might help with any future backports to this file. I wonder if we should do the same here?

Idk, to me it looked like new functionality and I thought it's not necessary for a CVE. At Canonical we were stricter with adding new functionality, even if that meant harder backports. If we only care about clean backports and making future backports easier, then, sure, I'll cherry pick those too.

jira VULN-98610
cve-bf CVE-2025-38498
commit-author Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
commit 9ffb14e

Previously a sharing group (shared and master ids pair) can be only
inherited when mount is created via bindmount. This patch adds an
ability to add an existing private mount into an existing sharing group.

With this functionality one can first create the desired mount tree from
only private mounts (without the need to care about undesired mount
propagation or mount creation order implied by sharing group
dependencies), and next then setup any desired mount sharing between
those mounts in tree as needed.

This allows CRIU to restore any set of mount namespaces, mount trees and
sharing group trees for a container.

We have many issues with restoring mounts in CRIU related to sharing
groups and propagation:
- reverse sharing groups vs mount tree order requires complex mounts
  reordering which mostly implies also using some temporary mounts
(please see https://lkml.org/lkml/2021/3/23/569 for more info)

- mount() syscall creates tons of mounts due to propagation
- mount re-parenting due to propagation
- "Mount Trap" due to propagation
- "Non Uniform" propagation, meaning that with different tricks with
  mount order and temporary children-"lock" mounts one can create mount
  trees which can't be restored without those tricks
(see https://www.linuxplumbersconf.org/event/7/contributions/640/)

With this new functionality we can resolve all the problems with
propagation at once.

Link: https://lore.kernel.org/r/20210715100714.120228-1-ptikhomirov@virtuozzo.com
	Cc: Eric W. Biederman <ebiederm@xmission.com>
	Cc: Alexander Viro <viro@zeniv.linux.org.uk>
	Cc: Christian Brauner <christian.brauner@ubuntu.com>
	Cc: Mattias Nissler <mnissler@chromium.org>
	Cc: Aleksa Sarai <cyphar@cyphar.com>
	Cc: Andrei Vagin <avagin@gmail.com>
	Cc: linux-fsdevel@vger.kernel.org
	Cc: linux-api@vger.kernel.org
	Cc: lkml <linux-kernel@vger.kernel.org>
Co-developed-by: Andrei Vagin <avagin@gmail.com>
	Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
	Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
	Signed-off-by: Andrei Vagin <avagin@gmail.com>
	Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
(cherry picked from commit 9ffb14e)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98610
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit d8cc036

9ffb14e "move_mount: allow to add a mount into an existing group"
breaks assertions on ->mnt_share/->mnt_slave.  For once, the data structures
in question are actually documented.

Documentation/filesystem/sharedsubtree.rst:
        All vfsmounts in a peer group have the same ->mnt_master.  If it is
	non-NULL, they form a contiguous (ordered) segment of slave list.

do_set_group() puts a mount into the same place in propagation graph
as the old one.  As the result, if old mount gets events from somewhere
and is not a pure event sink, new one needs to be placed next to the
old one in the slave list the old one's on.  If it is a pure event
sink, we only need to make sure the new one doesn't end up in the
middle of some peer group.

"move_mount: allow to add a mount into an existing group" ends up putting
the new one in the beginning of list; that's definitely not going to be
in the middle of anything, so that's fine for case when old is not marked
shared.  In case when old one _is_ marked shared (i.e. is not a pure event
sink), that breaks the assumptions of propagation graph iterators.

Put the new mount next to the old one on the list - that does the right thing
in "old is marked shared" case and is just as correct as the current behaviour
if old is not marked shared (kudos to Pavel for pointing that out - my original
suggested fix changed behaviour in the "nor marked" case, which complicated
things for no good reason).

	Reviewed-by: Christian Brauner <brauner@kernel.org>
Fixes: 9ffb14e ("move_mount: allow to add a mount into an existing group")
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit d8cc036)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98610
cve-bf CVE-2025-38498
commit-author Al Viro <viro@zeniv.linux.org.uk>
commit cffd044

do_change_type() and do_set_group() are operating on different
aspects of the same thing - propagation graph.  The latter
asks for mounts involved to be mounted in namespace(s) the caller
has CAP_SYS_ADMIN for.  The former is a mess - originally it
didn't even check that mount *is* mounted.  That got fixed,
but the resulting check turns out to be too strict for userland -
in effect, we check that mount is in our namespace, having already
checked that we have CAP_SYS_ADMIN there.

What we really need (in both cases) is
	* only touch mounts that are mounted.  That's a must-have
constraint - data corruption happens if it get violated.
	* don't allow to mess with a namespace unless you already
have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns).

That's an equivalent of what do_set_group() does; let's extract that
into a helper (may_change_propagation()) and use it in both
do_set_group() and do_change_type().

Fixes: 12f147d "do_change_type(): refuse to operate on unmounted/not ours mounts"
	Acked-by: Andrei Vagin <avagin@gmail.com>
	Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
	Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
	Reviewed-by: Christian Brauner <brauner@kernel.org>
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit cffd044)
	Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
@roxanan1996
Copy link
Author

@bmastbergen I updated it based on your feedback

@PlaidCat
Copy link
Collaborator

PlaidCat commented Oct 27, 2025

@roxanan1996 in lts-9.2 we cherry-picked 'move_mount: allow to add a mount into an existing group' as a prerequisite to the bugfix. Of course, that commit itself had a bugfix, so it ended up being four commits. See this PR where those four commits were added:
https://github.com/ctrliq/kernel-src-tree/pull/632/commits
They were all clean cherry-picks then which makes the code more consistent with upstream which might help with any future backports to this file. I wonder if we should do the same here?

Idk, to me it looked like new functionality and I thought it's not necessary for a CVE. At Canonical we were stricter with adding new functionality, even if that meant harder backports. If we only care about clean backports and making future backports easier, then, sure, I'll cherry pick those too.

Our general idea was brought up by a former RHEL Kernel Engineer. Basically adding things we care a little bit lees about as long a we don't break the kABI however major refactors are likely to be questionable.

Should there be 4 commits or 2 if we want to be similar to 9.2?

Indeed, we should keep the kernels similar. I wanted to bring this up, but the remediation script kinda breaks the rule of assigning the same CVE to yourself. Or maybe I am using it wrong.

@roxanan1996
Copy link
Author

Hmm, I think I forgot to update the branch...

@roxanan1996 roxanan1996 force-pushed the {rnicolescu}_ciqlts9_4 branch from 772b6c5 to 6fc9222 Compare October 28, 2025 08:53
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks For Pushing the updates

:shipit:

@roxanan1996 roxanan1996 merged commit b32d14b into ciqlts9_4 Oct 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants