Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kpatch-build error with 5.19 kernel #1277

Closed
liu-song-6 opened this issue Jun 21, 2022 · 27 comments · Fixed by #1289
Closed

kpatch-build error with 5.19 kernel #1277

liu-song-6 opened this issue Jun 21, 2022 · 27 comments · Fixed by #1289

Comments

@liu-song-6
Copy link
Contributor

I hit kpatch-build error like

ERROR: invalid ancestor fs/proc/array.o for fs/proc/array.o.

with 5.19 kernel. Some debugging shows that this is caused by changes in .cmd files, i.e., array.o (in 5.19) instead of fs/proc/array.o (in 5.18). However, I haven't figured out what change in the kernel caused this change in .cmd files.

Something like this commit, fixes the issue. But I am not quite sure that's the right fix. Actually, I wonder whether we should fix it in the kernel.

@liu-song-6
Copy link
Contributor Author

The kernel commit that cause the issue is

commit cd968b97c49214e6557381bddddacbd0e0fb696e
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   5 weeks ago

    kbuild: make built-in.a rule robust against too long argument error

I guess we should either revert this, or ship some fix in kpatch-build.

@joe-lawrence
Copy link
Contributor

Hi @liu-song-6 : Sorry for the late reply, I saw the same error yesterday, but haven't had a chance to take a closer look. I haven't had to touch the ancestor logic in a few years, so I'll have to spend a some time remembering how it all worked (IIRC there was a caching mechanism or shortcut method of sorts?) Anyway, since upstream has motivation for the change, not just touching things for aesthetic reasons, I think kpatch will have to cope with the shorter .o filenames.

@liu-song-6
Copy link
Contributor Author

Something like this should fix it. Or, maybe we only need to grep for the short name?

@jpoimboe
Copy link
Member

jpoimboe commented Jul 1, 2022

Unfortunately I think that patch won't be precise enough, since there can be duplicate file names across the tree, it might find the wrong one when doing a deep find.

@joe-lawrence
Copy link
Contributor

Unfortunately I think that patch won't be precise enough, since there can be duplicate file names across the tree, it might find the wrong one when doing a deep find.

With basename grepping like @liu-song-6 proposed here, what if the deep find worked it's way back to the root of the source tree, searching along the way? (At the moment I believe it starts at the top.) This way we're not looking for something generic like init.o across the entire tree, but in gradually higher directories (ie, a/b/c/d/, then a/b/c/, a/b/, etc.)

@liu-song-6
Copy link
Contributor Author

Do we have an example patch that may break kpatch-build due to the duplicated name? I tried with some changes in fs/autofs/init.c, it fails for some other reasons.

@joe-lawrence
Copy link
Contributor

Do we have an example patch that may break kpatch-build due to the duplicated name? I tried with some changes in fs/autofs/init.c, it fails for some other reasons.

I don't have one at hand (I only used "init.c" an example of a filename that could feasibly be duplicated). Perhaps another way to look at the problem would be to feed the original vs. new implementation a kernel tree full of .o files and see if any incorrect parent objects are generated.

@liu-song-6
Copy link
Contributor Author

liu-song-6 commented Jul 22, 2022

I found an issue with the original version: we may match aaa.o with bbb-aaa.o. a4287fb fixed this.

But I guess it is still not 100% accurate?

@joe-lawrence
Copy link
Contributor

I ran two experiments for all object files from v5.19-rc7 and a .config similar to rhel-9:

  1. "Old" - The original find_parent_obj() with torvalds/linux@cd968b97c492 reverted > linux-5.19-rc7-revert
  2. "New" - The proposed new find_parent_obj() with torvalds/linux@cd968b97c492 in place > linux-5.19-rc7

A few observations:

  • New runs many more "deep finds"
  • When old and new report "invalid ancestor" they often disagree about the ancestor file
  • New seems to determine more ancestors than old, for example:
    • Old = arch/x86/events/intel/cstate.o -> invalid_ancestor arch/x86/events/intel/cstate.o
    • New = arch/x86/events/intel/cstate.o -> arch/x86/events/intel/intel-cstate.ko
  • However some (not limited to these) are incorrect:
    • (CONFIG_CPU_FREQ=y)
      Old = drivers/cpufreq/cpufreq.o -> vmlinux
      New = drivers/cpufreq/cpufreq.o -> drivers/cpufreq/acpi-cpufreq.ko
    • (CONFIG_PCIEAER=y)
      Old = drivers/pci/pcie/err.o -> vmlinux
      New = drivers/pci/pcie/err.o -> drivers/pci/pcie/aer_inject.ko

See attached test script and results (lmk if I can explain them further). In the end, the new find_parent_obj() seems to run faster and finds more ancestors, however are a few inaccurate results. I'm not sure how to easily test the same for the existing algorithm to compare.

I wonder since we're already wrapping ld with kpatch-ld, if we could somehow dump .o -> .ko / vmlinux mappings in this step? (It's late, so I'm only brainstorming here :)

results.tar.gz

@liu-song-6
Copy link
Contributor Author

liu-song-6 commented Jul 26, 2022

See attached test script and results (lmk if I can explain them further). In the end, the new find_parent_obj() seems to run faster and finds more ancestors, however are a few inaccurate results. I'm not sure how to easily test the same for the existing algorithm to compare.

I haven't figured out how to run the test script. I got many invalid_ancestor:

find-kobj-test drivers/md/md.o
drivers/md/md.o DID_DEEP_FIND=1 invalid_ancestor_drivers/md/md.o_for_drivers/md/md.o

Did I miss something?
Also it seems find_parent_obj() in the test script is not the same as a4287fb.

I wonder since we're already wrapping ld with kpatch-ld, if we could somehow dump .o -> .ko / vmlinux mappings in this step? (It's late, so I'm only brainstorming here :)

Are we using kpatch-ld? I didn't find it.

@joe-lawrence
Copy link
Contributor

I haven't figured out how to run the test script.

Sorry it was hacked together, I had find_parent_obj() copied from the original and then find_parent_obj_new() for the new version and just manually adjusted their calls for each case. Here is a version of the script with the function from a4287fb:
find-kobj-test-new.txt

And I ran it from the top of my kernel build like:
$ find . -name '*.o' -exec ~/kpatch/find-kobj-test-new {} \; | tee linux-5.19-rc7

I got many invalid_ancestor:
find-kobj-test drivers/md/md.o
drivers/md/md.o DID_DEEP_FIND=1 invalid_ancestor_drivers/md/md.o_for_drivers/md/md.o

Right, the new find_parent_obj() finds a few more ancestors while missing ones like this. Does this patch work with this PR?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c7ecb0bff..1ca189ec2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -213,6 +213,7 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 {
        int ret = 0;
 
+       nop();
        if (rdev && !rdev_need_serial(rdev) &&
            !test_bit(CollisionCheck, &rdev->flags))
                return;

I get the same invalid ancestor error as you pointed out.

I wonder since we're already wrapping ld with kpatch-ld, if we could somehow dump .o -> .ko / vmlinux mappings in this step? (It's late, so I'm only brainstorming here :)
Are we using kpatch-ld? I didn't find it.

Oops, I thought we had a separate wrapper script for the linker, but it's all included in kpatch-build/kpatch-cc. Originally I thought we might be able to catch a single ld invocation for vmlinux and then for modules to which we could snarf for ancestor data, but that is not the case. It might be possible to work backwards from here somehow, but unfortunately it doesn't look straightforward to me.

@liu-song-6
Copy link
Contributor Author

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c7ecb0bff..1ca189ec2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -213,6 +213,7 @@ void mddev_create_serial_pool(struct mddev *mddev, struct md_rdev *rdev,
 {
        int ret = 0;
 
+       nop();
        if (rdev && !rdev_need_serial(rdev) &&
            !test_bit(CollisionCheck, &rdev->flags))
                return;

I get the same invalid ancestor error as you pointed out.

This happens because we have

git -lw -e " md.o" xxx

where -w does not match with md.o with the space. Is there some grep match to fix this?

@joe-lawrence
Copy link
Contributor

This happens because we have

git -lw -e " md.o" xxx

where -w does not match with md.o with the space. Is there some grep match to fix this?

Ah, I see. Word regexp prevents md.o from matching foomd.o, then the space was added to prevent foo-md.o, but that breaks "/md.o" matching. I can't think of any grep flag that handle all these cases.

@joe-lawrence
Copy link
Contributor

joe-lawrence commented Jul 28, 2022

I hacked this up this morning to try and special-case the new format:
master...joe-lawrence:kpatch:v5.19-built-in-short-paths

Not really tested and not really proud of the hack, but I think these functions could be refactored at some point in the future. Trying to regex or grep flag our way out of this feels very brittle. There's gotta be a better way. (Maybe if we wrap ar too we could snarf all the object files going into built-in.a's?)

edit: that branch may work for vmlinux and foo.o -> foo.ko, but doesn't work for files like crypto/ecdh_helper.o -> crypto/ecdh_generic.ko. I'll need to wrap my brain around filter_parent_obj() first.

@liu-song-6
Copy link
Contributor Author

I think this version of grep_for_parent only works for build-in .o files. AFAICT, it can't cover kernel modules.

@joe-lawrence
Copy link
Contributor

I think this version of grep_for_parent only works for build-in .o files. AFAICT, it can't cover kernel modules.

(Sorry for belated reply, I was on PTO.) Unfortunately I realized that shortly after posting it last week. We should enumerate all the potential cases across kernel releases to come up with a better solution. It feels like whack-a-mole at the moment. In the meantime, liu-song-6@b52499b should be reasonably accurate for you if you can manually sanity check its results?

@liu-song-6
Copy link
Contributor Author

I guess it works for short term. It is not top priority for our production at the moment. But we do plan to use 5.19 as our next production kernel.

@jpoimboe
Copy link
Member

jpoimboe commented Aug 4, 2022

Because the file names in the .cmd files are relative to the current directory, I think there's a more precise way to do this. We can detect whether the filenames are relative to the kernel root (old way) or current directory (new way) and adjust the algorithm accordingly. Let me try to come up with something (time willing).

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 5, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
file paths relative to the .cmd file rather than relative to the root of
the kernel tree.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 5, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
file paths relative to the .cmd file rather than relative to the root of
the kernel tree.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
@jpoimboe
Copy link
Member

jpoimboe commented Aug 5, 2022

I think I've got it working with jpoimboe/kpatch@f108002. I tested with @joe-lawrence 's script on 5.18 and 5.19. Any more testing would be welcome.

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 5, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
file paths relative to the .cmd file rather than relative to the root of
the kernel tree.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
@androw
Copy link
Contributor

androw commented Aug 5, 2022

jpoimboe@f108002 failed with

➜  ~ kpatch-build -s ./linux meminfo-string.patch 
Using source directory at /home/androw/linux
Testing patch file(s)
Reading special section data
Building original source
Building patched source
ERROR: kpatch build failed. Check /home/androw/.kpatch/build.log for more details.
➜  ~ tail /home/androw/.kpatch/build.log
  GEN     Module.symvers
checking file fs/proc/meminfo.c
patching file fs/proc/meminfo.c
  SYNC    include/config/auto.conf.cmd
/usr/libexec/kpatch/kpatch-cc gcc: unknown compiler
scripts/Kconfig.include:44: Sorry, this compiler is not supported.
make[2]: *** [scripts/kconfig/Makefile:77: syncconfig] Error 1
make[1]: *** [Makefile:629: syncconfig] Error 2
make: *** [Makefile:731: include/config/auto.conf.cmd] Error 2
make: *** [include/config/auto.conf.cmd] Deleting file 'include/generated/autoconf.h'

liu-song-6@b52499b worked for me

@jpoimboe
Copy link
Member

jpoimboe commented Aug 5, 2022

@androw That looks like something wrong with your setup. I don't see how it could be related to my patch.

Did you have a file at /usr/libexec/kpatch/kpatch-cc?

Maybe something went wrong with your install. Instead try running with /git/path/to/kpatch-build directly from the git tree.

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 5, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 5, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
@androw
Copy link
Contributor

androw commented Aug 5, 2022

Yes a file is present here.
Maybe something was wrong on my side. Tried again and got the following error:

➜  ~ ./aports/testing/kpatch/src/kpatch-0.9.6/kpatch-build/kpatch-build -s /home/androw/linux /home/androw/test.patch 
Using source directory at /home/androw/linux
Testing patch file(s)
Reading special section data
Building original source
Building patched source
Extracting new and modified ELF sections
realpath: --relative-to=.: No such file or directory
realpath: --relative-to=/home/androw/linux/fs/proc: No such file or directory
realpath: --relative-to=/home/androw/linux/fs/proc/..: No such file or directory
realpath: --relative-to=.: No such file or directory
[...]
realpath: --relative-to=./fs/pstore: No such file or directory
realpath: --relative-to=./tools/objtool: No such file or directory
realpath: --relative-to=./tools/objtool/arch/x86: No such file or directory
ERROR: invalid ancestor fs/proc/meminfo.o for fs/proc/meminfo.o. Check /home/androw/.kpatch/build.log for more details.

@jpoimboe
Copy link
Member

jpoimboe commented Aug 5, 2022

Can you run with the '-d' option and share the output (at least up until the first error)?

@androw
Copy link
Contributor

androw commented Aug 6, 2022

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 17, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
@jpoimboe
Copy link
Member

realpath: --relative-to=.: No such file or directory

@androw I wonder if your version of 'realpath' is old? what does realpath --version show?

@androw
Copy link
Contributor

androw commented Aug 19, 2022

➜  ~ realpath 
BusyBox v1.35.0 (2022-08-02 19:43:12 UTC) multi-call binary.

Maybe it would be better with the coreutils one.
Edit: confirmed to work with coreutils.

@jpoimboe
Copy link
Member

Hm, for older and alternative toolchains I guess we need a bash alternative to realpath --relative-to

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Aug 19, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
swine added a commit to swine/kpatch that referenced this issue Oct 12, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

tweaked quoting to pass 'make check'

Signed-off-by: Pete Swain <swine@google.com>
jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Oct 12, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
liu-song-6 pushed a commit to liu-song-6/kpatch that referenced this issue Oct 14, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
liu-song-6 pushed a commit to liu-song-6/kpatch that referenced this issue Oct 28, 2022
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
surajjs95 pushed a commit to surajjs95/kpatch that referenced this issue May 9, 2023
Rewrite kobj_find() to deal with Linux 5.19, where the .cmd files use
object file paths relative to the .cmd file rather than relative to the
root of the kernel tree.

While at it, add several performance enhancements to prevent all
currently known deep finds.

This is all quite fiddly.  But it works.

Fixes dynup#1277.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants