Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the current solution of correlating static local variables #545

Closed
zhouchengming1 opened this issue Nov 11, 2015 · 17 comments · Fixed by #555
Closed

Questions about the current solution of correlating static local variables #545

zhouchengming1 opened this issue Nov 11, 2015 · 17 comments · Fixed by #555
Labels

Comments

@zhouchengming1
Copy link
Contributor

  1. Why undo the previous correlation in the iteration of base->symbols?
    If xxx.123 ---> xxx.123 (previous correlation by coincidence)
    xxx.256 ---> xxx.256 (previous correlation by coincidence)
    But real xxx.123 ---> xxx.256
    In this case, the code doesn't work. Because when find patched_sym for xxx.123,
    the xxx.256 in patched_object hasn't been de-correlated.
    So I think should undo all previous correlation before the new correlation.

  2. Is the iteration of base-sections wrong?
    old-object | new-object
    func1 | func1
    xxx.123 | xxx.123 (inline)
    func2 | func2
    xxx.256 | xxx.256
    xxx.123 | xxx.123 (inline)

    When find patched_sym for xxx.123, first find xxx.123 in func1 of new-object,
    But then find xxx.256 in func2 of new-object.
    So I think should not iterate the base-sections, when find one, just go out to next symbol.

@jpoimboe
Copy link
Member

Why undo the previous correlation in the iteration of base->symbols?
If xxx.123 ---> xxx.123 (previous correlation by coincidence)
xxx.256 ---> xxx.256 (previous correlation by coincidence)
But real xxx.123 ---> xxx.256

In this case, the code doesn't work. Because when find patched_sym for xxx.123,
the xxx.256 in patched_object hasn't been de-correlated.
So I think should undo all previous correlation before the new correlation.

Ah, right. Instead of uncorrelating and correlating in the same loop, they should be done in two separate loops, so that all of the uncorrelations are done before starting the correlations.

Is the iteration of base-sections wrong?
old-object | new-object
func1 | func1
xxx.123 | xxx.123 (inline)
func2 | func2
xxx.256 | xxx.256
xxx.123 | xxx.123 (inline)

When find patched_sym for xxx.123, first find xxx.123 in func1 of new-object,
But then find xxx.256 in func2 of new-object.
So I think should not iterate the base-sections, when find one, just go out to next symbol.

Ah, right, this is another problem. I need to think about it a little bit. Can you clarify your suggested solution?

This static local correlation has been really hard to get right :-/

@zhouchengming1
Copy link
Contributor Author

@jpoimboe , there is a solution that works fine in the same restrictions.
And I just have implemented it simply, also did some test on it.
I think I'd better send a patch to you later, maybe there are little problems in it.

@jpoimboe
Copy link
Member

@zhouchengming1 do you have a patch which reproduces these errors? I would like to test for this in the integration test suite.

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Nov 13, 2015
Refine the static local variable handling again.  This builds on a
previous patch by Zhou Chengming.

This fixes the following bugs reported by Zhou:

1.          xxx.123 ---> xxx.123 (previous correlation by coincidence)
            xxx.256 ---> xxx.256 (previous correlation by coincidence)
   But real xxx.123 ---> xxx.256

   In this case, the code doesn't work. Because when find patched_sym for
   xxx.123, the xxx.256 in patched_object hasn't been de-correlated.

2. old-object | new-object
        func1 | func1
      xxx.123 | xxx.123 (inline)
        func2 | func2
      xxx.256 | xxx.256
      xxx.123 | xxx.123 (inline)

   When find patched_sym for xxx.123, first find xxx.123 in func1 of new-object,
   But then find xxx.256 in func2 of new-object.
   So I think should not iterate the base-sections, when find one, just go out to next symbol.

Both of these problems can be fixed by splitting the code up into
multiple passes:

  1. uncorrelate all static locals
  2. correlate all static locals
  3. ensure each static local is referenced by all the same sections in
     both objects
  4. print warning on any new static locals

Fixes: dynup#545
@zhouchengming1
Copy link
Contributor Author

I have a patch, but it's for my module, not for the kernel.

@zhouchengming1
Copy link
Contributor Author

I'd like to test for you, when this git repository has merged your patch.

@jpoimboe
Copy link
Member

@zhouchengming1 It has been merged, please test and let me know if it works for you. Thanks!

@zhouchengming1
Copy link
Contributor Author

I just have tested the new kpatch.
Q1: gcc/kernel version mismatch ?
gccver="(GCC) 4.9.0"
kgccver="(GNU) 4.9.0"

   so I think should just compare the last number "4.9.0".
   Change (cut -d' ' -f2-) to (cut -d' ' -f3-);
   And (cut -d ' ' -f5-) to (cut -d ' ' -f6-).
   Is it a bug or a coding error...

Q2: works for me on correlating static local variables ?
ERROR: cmdline.o: symbol changed sections: .data.var.16226

   Because it's a section symbol.
   When undo the correlations for all the static locals, we forget it , should undo the correlation
   for it too.
   * Should also change name and correlate section-symbols later when found a patched_sym ?

   Additionly, I am wondering when var.17000 changed name to var.16226, then the patched 
   object had two symbols both called var.16226, is this a problem ?

Q3: print any warnings about new variables?
No need "return", or just can print one new variable.

Last:
I just also made a test-patch for the kernel.
I have sent it to you by email.

Thanks.

@jpoimboe
Copy link
Member

I just have tested the new kpatch.
Q1: gcc/kernel version mismatch ?
gccver="(GCC) 4.9.0"
kgccver="(GNU) 4.9.0"

so I think should just compare the last number "4.9.0".
Change (cut -d' ' -f2-) to (cut -d' ' -f3-);
And (cut -d ' ' -f5-) to (cut -d ' ' -f6-).
Is it a bug or a coding error...

This was broken with #548 and will be fixed with #561.

Q2: works for me on correlating static local variables ?
ERROR: cmdline.o: symbol changed sections: .data.var.16226

Because it's a section symbol.
When undo the correlations for all the static locals, we forget it , should undo the correlation
for it too.

The symbol's section should already be uncorrelated here and then re-correlated here.

Is it not working for you?

  • Should also change name and correlate section-symbols later when found a patched_sym ?

    Additionly, I am wondering when var.17000 changed name to var.16226, then the patched
    object had two symbols both called var.16226, is this a problem ?

Yeah, I think perhaps renaming the symbol isn't the best choice. Instead maybe we should store the twin's name in a separate variable which can be referenced in kpatch_create_dynamic_rela_sections().

Q3: print any warnings about new variables?
No need "return", or just can print one new variable.

Why?

I just also made a test-patch for the kernel.
I have sent it to you by email.

Thanks, I'll try it out.

@jpoimboe jpoimboe reopened this Nov 17, 2015
@jpoimboe
Copy link
Member

Ok, so two issues here.

First, I was able to recreate the error:

ERROR: cmdline.o: symbol changed sections: .data.var.18451

with this patch:

Subject: [PATCH] test patch for correlating static locals

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 fs/proc/cmdline.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index bd038ef..2a092c8 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,6 +3,14 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>

+
+static int inline new_func(int a)
+{
+   static int var=3;
+   var++;
+   return var+a+3;
+}
+
 static int inline static_func(int a);

 static int cmdline_proc_show(struct seq_file *m, void *v)
@@ -15,7 +23,7 @@ static int cmdline_proc_show(struct seq_file *m, void *v)
            seq_printf(m, "%s\n", "hello");
        }
    }
-   return 0;
+   return new_func(3);
 }

 static int inline static_func(int a)
-- 
1.7.7

There's definitely a bug there, though I'm not sure exactly what the problem is yet.

The second issue:

Before:

Relocation section [12] '.rela.text.cmdline_proc_show' for section [11] '.text.cmdline_proc_show' at offset 0xbea8 contains 11 entries:
  Offset              Type            Value               Addend Name
  0x0000000000000001  X86_64_NONE     000000000000000000      -4 __fentry__
  0x0000000000000009  X86_64_PC32     000000000000000000      -4 saved_command_line
  0x0000000000000019  X86_64_32S      000000000000000000      +8 .rodata.str1.1
  0x0000000000000021  X86_64_PC32     000000000000000000      -4 seq_printf
  0x0000000000000033  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000039  X86_64_PC32     000000000000000000      -4 .data.var.18447
  0x0000000000000042  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000053  X86_64_32S      000000000000000000      +8 .rodata.str1.1
  0x000000000000005c  X86_64_PC32     000000000000000000      -4 .data.var.18447
  0x0000000000000063  X86_64_32S      000000000000000000     +12 .rodata.str1.1
  0x0000000000000068  X86_64_PC32     000000000000000000      -4 seq_printf

After:

Relocation section [ 7] '.rela.text.cmdline_proc_show' for section [ 6] '.text.cmdline_proc_show' at offset 0xbec0 contains 13 entries:
  Offset              Type            Value               Addend Name
  0x0000000000000001  X86_64_NONE     000000000000000000      -4 __fentry__
  0x0000000000000009  X86_64_PC32     000000000000000000      -4 saved_command_line
  0x0000000000000019  X86_64_32S      000000000000000000      +0 .rodata.str1.1
  0x0000000000000021  X86_64_PC32     000000000000000000      -4 seq_printf
  0x000000000000002c  X86_64_PC32     000000000000000000      -4 .data.var.18444
  0x000000000000003c  X86_64_PC32     000000000000000000      -4 .data.var.18444
  0x0000000000000043  X86_64_PC32     000000000000000000      -4 .data.var.18455
  0x0000000000000049  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000052  X86_64_PC32     000000000000000000      -4 .data.var.18455
  0x0000000000000063  X86_64_32S      000000000000000000      +0 .rodata.str1.1
  0x000000000000006c  X86_64_PC32     000000000000000000      -4 .data.var.18451
  0x0000000000000073  X86_64_32S      000000000000000000      +4 .rodata.str1.1
  0x0000000000000078  X86_64_PC32     000000000000000000      -4 seq_printf

In theory the correlation mapping should be:

var.18451 -> var.18455
var.18447 -> var.18451
  (new)   -> var.18444

But the code tried to map it as:

var.18451 -> var.18444
var.18447 -> var.18455
  (new)   -> var.18451

There is no possible algorithm for coming up with this mapping which would always be correct in all scenarios. Therefore I think we will need to add some more restrictions.

@zhouchengming1
Copy link
Contributor Author

Ok, so two issues here.

First, I was able to recreate the error:

ERROR: cmdline.o: symbol changed sections: .data.var.18451
with this patch:

Subject: [PATCH] test patch for correlating static locals

Signed-off-by: Zhou Chengming zhouchengming1@huawei.com

fs/proc/cmdline.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index bd038ef..2a092c8 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,6 +3,14 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>

+static int inline new_func(int a)
+{

  • static int var=3;
  • var++;
  • return var+a+3;
    +}

static int inline static_func(int a);

static int cmdline_proc_show(struct seq_file *m, void *v)
@@ -15,7 +23,7 @@ static int cmdline_proc_show(struct seq_file *m, void *v)
seq_printf(m, "%s\n", "hello");
}
}

  • return 0;
  • return new_func(3);
    }

static int inline static_func(int a)

1.7.7
There's definitely a bug there, though I'm not sure exactly what the problem is yet.

** .data.var.18451 is the symbol of the section, we undo the correlation of the symbol var.18451 and its section indeed, but no the symbol of the section . **
So the function kpatch_compare_correlated_symbol find that the symbol .data.var.18451 has a twin, but its section don't have, and report this error.
In the process of undo correlation, we should include the symol of the section (.data.var.18451) too.

The second issue:

Before:

Relocation section [12] '.rela.text.cmdline_proc_show' for section [11] '.text.cmdline_proc_show' at offset 0xbea8 contains 11 entries:
Offset Type Value Addend Name
0x0000000000000001 X86_64_NONE 000000000000000000 -4 fentry
0x0000000000000009 X86_64_PC32 000000000000000000 -4 saved_command_line
0x0000000000000019 X86_64_32S 000000000000000000 +8 .rodata.str1.1
0x0000000000000021 X86_64_PC32 000000000000000000 -4 seq_printf
0x0000000000000033 X86_64_PC32 000000000000000000 -4 .data.var.18451
0x0000000000000039 X86_64_PC32 000000000000000000 -4 .data.var.18447
0x0000000000000042 X86_64_PC32 000000000000000000 -4 .data.var.18451
0x0000000000000053 X86_64_32S 000000000000000000 +8 .rodata.str1.1
0x000000000000005c X86_64_PC32 000000000000000000 -4 .data.var.18447
0x0000000000000063 X86_64_32S 000000000000000000 +12 .rodata.str1.1
0x0000000000000068 X86_64_PC32 000000000000000000 -4 seq_printf
After:

Relocation section [ 7] '.rela.text.cmdline_proc_show' for section [ 6] '.text.cmdline_proc_show' at offset 0xbec0 contains 13 entries:
Offset Type Value Addend Name
0x0000000000000001 X86_64_NONE 000000000000000000 -4 fentry
0x0000000000000009 X86_64_PC32 000000000000000000 -4 saved_command_line
0x0000000000000019 X86_64_32S 000000000000000000 +0 .rodata.str1.1
0x0000000000000021 X86_64_PC32 000000000000000000 -4 seq_printf
0x000000000000002c X86_64_PC32 000000000000000000 -4 .data.var.18444
0x000000000000003c X86_64_PC32 000000000000000000 -4 .data.var.18444
0x0000000000000043 X86_64_PC32 000000000000000000 -4 .data.var.18455
0x0000000000000049 X86_64_PC32 000000000000000000 -4 .data.var.18451
0x0000000000000052 X86_64_PC32 000000000000000000 -4 .data.var.18455
0x0000000000000063 X86_64_32S 000000000000000000 +0 .rodata.str1.1
0x000000000000006c X86_64_PC32 000000000000000000 -4 .data.var.18451
0x0000000000000073 X86_64_32S 000000000000000000 +4 .rodata.str1.1
0x0000000000000078 X86_64_PC32 000000000000000000 -4 seq_printf
In theory the correlation mapping should be:

var.18451 -> var.18455
var.18447 -> var.18451
(new) -> var.18444
But the code tried to map it as:

var.18451 -> var.18444
var.18447 -> var.18455
(new) -> var.18451
There is no possible algorithm for coming up with this mapping which would always be correct in all scenarios. Therefore I think we will need to add some more restrictions.

You are right, when I solved the first issue, this second issue caused another error.
cmdline.o: data section .data.var.16219 selected for inclusion

Because the var.16219 was correlated to a wrong static local , so the kpatch-build thought its section data ( initial value) was changed, and reported this error.

Thanks

@jpoimboe
Copy link
Member

Thanks, this was an excellent test case BTW. I'll have to figure out a way to incorporate it into the integration tests. For reference:

From 54d238d4942ffe53f67884a7f0b2fc9dc57236ac Mon Sep 17 00:00:00 2001
From: Zhou Chengming <zhouchengming1@huawei.com>
Date: Tue, 17 Nov 2015 20:59:30 +0800
Subject: [PATCH] test base for correlating static locals


Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 fs/proc/cmdline.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index cbd82df..bd038ef 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,15 +3,37 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>

+static int inline static_func(int a);
+
 static int cmdline_proc_show(struct seq_file *m, void *v)
 {
    seq_printf(m, "%s\n", saved_command_line);
+   if (v == NULL) {
+       static int var=1;
+       if (var+static_func(var)) {
+           var++;
+           seq_printf(m, "%s\n", "hello");
+       }
+   }
    return 0;
 }

+static int inline static_func(int a)
+{
+   static int var=2;
+   var++;
+   return var+a;
+}
+
 static int cmdline_proc_open(struct inode *inode, struct file *file)
 {
+   if (static_func(2))
    return single_open(file, cmdline_proc_show, NULL);
+   else {
+       static int var=3;
+       var++;
+       return var;
+   }
 }

 static const struct file_operations cmdline_proc_fops = {
-- 
1.7.7
From 2b98a061983e51e7e9a820af4e468ba2a1fc5b7c Mon Sep 17 00:00:00 2001
From: Zhou Chengming <zhouchengming1@huawei.com>
Date: Tue, 17 Nov 2015 21:02:03 +0800
Subject: [PATCH] test patch for correlating static locals

Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
---
 fs/proc/cmdline.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/proc/cmdline.c b/fs/proc/cmdline.c
index bd038ef..2a092c8 100644
--- a/fs/proc/cmdline.c
+++ b/fs/proc/cmdline.c
@@ -3,6 +3,14 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>

+
+static int inline new_func(int a)
+{
+   static int var=3;
+   var++;
+   return var+a+3;
+}
+
 static int inline static_func(int a);

 static int cmdline_proc_show(struct seq_file *m, void *v)
@@ -15,7 +23,7 @@ static int cmdline_proc_show(struct seq_file *m, void *v)
            seq_printf(m, "%s\n", "hello");
        }
    }
-   return 0;
+   return new_func(3);
 }

 static int inline static_func(int a)
-- 
1.7.7

@jpoimboe
Copy link
Member

I propose the following restrictions for static locals:

  1. No removal of any static local references.
  2. If there is more than one static local variable with the same name in the same function, their order of reference must not change. For example, if A.1 is accessed before A.2 in the original function, A.1 must also be accessed before A.2 in the new function.
  3. A new reference to a static local cannot be added to a function unless it's a reference to a new variable with a unique name within the function. This means if there's already a reference to a static local named 'var' in the function, the patch cannot add another reference to that 'var' or another 'var'.

These restrictions are more limiting but will make it much safer IMO. And I think there should usually be ways to work around these restrictions if you get creative with modifying the patch.

So I think I'll need to rewrite the code yet again.

@jpoimboe
Copy link
Member

I've fixed the first issue with #564.

@zhouchengming1
Copy link
Contributor Author

I propose the following restrictions for static locals:

No removal of any static local references.
If there is more than one static local variable with the same name in the same function, their order of reference must not change. For example, if A.1 is accessed before A.2 in the original function, A.1 must also be accessed before A.2 in the new function.

After seeing the relocation table above, I am worried that this restriction is hard to get. Because a little change (no new static local) may change the order of reference, and kpatch-build won't know it.

A new reference to a static local cannot be added to a function unless it's a reference to a new variable with a unique name within the function. This means if there's already a reference to a static local named 'var' in the function, the patch cannot add another reference to that 'var' or another 'var'.
These restrictions are more limiting but will make it much safer IMO. And I think there should usually be ways to work around these restrictions if you get creative with modifying the patch.

So I think I'll need to rewrite the code yet again.

@jpoimboe
Copy link
Member

After seeing the relocation table above, I am worried that this restriction is hard to get. Because a little change (no new static local) may change the order of reference, and kpatch-build won't know it.

Yes, there's no way for kpatch-build to enforce this restriction, and there's a very small chance that it could happen. The best we can do is clearly document this restriction.

The only other option I can think of is to make it even more restrictive: disallow patching of any functions which reference two static local variables of the same name.

jpoimboe added a commit to jpoimboe/kpatch that referenced this issue Sep 11, 2017
Normally, kpatch doesn't complain if you remove or rename a function.
This is a feature, because sometimes you have to rename a function in
order to patch it, if for example it doesn't have an fentry call.  In
the object code, it's treated as a new function.  You could get the same
result by copying/pasting the original function and giving the copy a
new name.  But renaming it makes it much easier to review the patch.

In RHEL 7.4, I tried to rename l2cap_config_rsp() to
l2cap_config_rsp_kpatch(), but it failed with:

  ERROR: l2cap_core.o: reference to static local variable CSWTCH.347 in l2cap_config_rsp was removed

This particular error is an easy fix, because the CSWTCH.* symbols are
read-only and are created by GCC.  So they shouldn't be correlated
anyway.

In the future, we will need a more general fix to allow the removal of
functions which use *any* static local variables.  Either automatically,
or by adding a manual annotation.  This can be handled when we rewrite
the static local variable handling in dynup#545.
@github-actions
Copy link

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

@github-actions github-actions bot added the stale label Jul 21, 2023
@github-actions
Copy link

This issue was closed because it was inactive for 7 days after being marked stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants