- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10
[lts9_4] CVE-2025-38498 #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jira VULN-98610 cve CVE-2025-38498 commit-author Al Viro <viro@zeniv.linux.org.uk> commit 12f147d Ensure that propagation settings can only be changed for mounts located in the caller's mount namespace. This change aligns permission checking with the rest of mount(2). Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 07b2088 ("beginning of the shared-subtree proper") Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 12f147d) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
5c71cdb    to
    772b6c5      
    Compare
  
    | @roxanan1996 in lts-9.2 we cherry-picked 'move_mount: allow to add a mount into an existing group' as a prerequisite to the bugfix. Of course, that commit itself had a bugfix, so it ended up being four commits. See this PR where those four commits were added: https://github.com/ctrliq/kernel-src-tree/pull/632/commits They were all clean cherry-picks then which makes the code more consistent with upstream which might help with any future backports to this file. I wonder if we should do the same here? | 
| 
 Idk, to me it looked like new functionality and I thought it's not necessary for a CVE. At Canonical we were stricter with adding new functionality, even if that meant harder backports. If we only care about clean backports and making future backports easier, then, sure, I'll cherry pick those too. | 
jira VULN-98610 cve-bf CVE-2025-38498 commit-author Pavel Tikhomirov <ptikhomirov@virtuozzo.com> commit 9ffb14e Previously a sharing group (shared and master ids pair) can be only inherited when mount is created via bindmount. This patch adds an ability to add an existing private mount into an existing sharing group. With this functionality one can first create the desired mount tree from only private mounts (without the need to care about undesired mount propagation or mount creation order implied by sharing group dependencies), and next then setup any desired mount sharing between those mounts in tree as needed. This allows CRIU to restore any set of mount namespaces, mount trees and sharing group trees for a container. We have many issues with restoring mounts in CRIU related to sharing groups and propagation: - reverse sharing groups vs mount tree order requires complex mounts reordering which mostly implies also using some temporary mounts (please see https://lkml.org/lkml/2021/3/23/569 for more info) - mount() syscall creates tons of mounts due to propagation - mount re-parenting due to propagation - "Mount Trap" due to propagation - "Non Uniform" propagation, meaning that with different tricks with mount order and temporary children-"lock" mounts one can create mount trees which can't be restored without those tricks (see https://www.linuxplumbersconf.org/event/7/contributions/640/) With this new functionality we can resolve all the problems with propagation at once. Link: https://lore.kernel.org/r/20210715100714.120228-1-ptikhomirov@virtuozzo.com Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Mattias Nissler <mnissler@chromium.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: lkml <linux-kernel@vger.kernel.org> Co-developed-by: Andrei Vagin <avagin@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> (cherry picked from commit 9ffb14e) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98610 cve-bf CVE-2025-38498 commit-author Al Viro <viro@zeniv.linux.org.uk> commit d8cc036 9ffb14e "move_mount: allow to add a mount into an existing group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data structures in question are actually documented. Documentation/filesystem/sharedsubtree.rst: All vfsmounts in a peer group have the same ->mnt_master. If it is non-NULL, they form a contiguous (ordered) segment of slave list. do_set_group() puts a mount into the same place in propagation graph as the old one. As the result, if old mount gets events from somewhere and is not a pure event sink, new one needs to be placed next to the old one in the slave list the old one's on. If it is a pure event sink, we only need to make sure the new one doesn't end up in the middle of some peer group. "move_mount: allow to add a mount into an existing group" ends up putting the new one in the beginning of list; that's definitely not going to be in the middle of anything, so that's fine for case when old is not marked shared. In case when old one _is_ marked shared (i.e. is not a pure event sink), that breaks the assumptions of propagation graph iterators. Put the new mount next to the old one on the list - that does the right thing in "old is marked shared" case and is just as correct as the current behaviour if old is not marked shared (kudos to Pavel for pointing that out - my original suggested fix changed behaviour in the "nor marked" case, which complicated things for no good reason). Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 9ffb14e ("move_mount: allow to add a mount into an existing group") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit d8cc036) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
jira VULN-98610 cve-bf CVE-2025-38498 commit-author Al Viro <viro@zeniv.linux.org.uk> commit cffd044 do_change_type() and do_set_group() are operating on different aspects of the same thing - propagation graph. The latter asks for mounts involved to be mounted in namespace(s) the caller has CAP_SYS_ADMIN for. The former is a mess - originally it didn't even check that mount *is* mounted. That got fixed, but the resulting check turns out to be too strict for userland - in effect, we check that mount is in our namespace, having already checked that we have CAP_SYS_ADMIN there. What we really need (in both cases) is * only touch mounts that are mounted. That's a must-have constraint - data corruption happens if it get violated. * don't allow to mess with a namespace unless you already have enough permissions to do so (i.e. CAP_SYS_ADMIN in its userns). That's an equivalent of what do_set_group() does; let's extract that into a helper (may_change_propagation()) and use it in both do_set_group() and do_change_type(). Fixes: 12f147d "do_change_type(): refuse to operate on unmounted/not ours mounts" Acked-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit cffd044) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>
| @bmastbergen I updated it based on your feedback | 
| 
 Our general idea was brought up by a former RHEL Kernel Engineer. Basically adding things we care a little bit lees about as long a we don't break the  Should there be 4 commits or 2 if we want to be similar to 9.2? Indeed, we should keep the kernels similar. I wanted to bring this up, but the remediation script kinda breaks the rule of assigning the same CVE to yourself. Or maybe I am using it wrong. | 
| Hmm, I think I forgot to update the branch... | 
772b6c5    to
    6fc9222      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks For Pushing the updates

DESCRIPTION
Commit "do_change_type(): refuse to operate on unmounted/not ours mounts"
is the CVE fix. This was a clean cherry pick.
Commit "use uniform permission checks for all mount propagation changes"
was included because it has a "Fixes" reference to the previous commit.
This was not a clean cherry-pick, therefore I had to pick up commit:
"move_mount: allow to add a mount into an existing group).
And the former had a dependency "fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)"
NOTE: In case you check the patch diff with bigger context, you may see these
diff in do_change_type(). This is because these commits are missing, but
they are not relevant for this CVE:
Commits
TESTING
BUILD
kernel-build-before.log
kernel-build-after.log
Kselftests
kselftest-before.log
kselftest-after.log
Check_kernel_commits including interdiff