Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augeas plugin might crash #3120

Open
markus2330 opened this issue Oct 27, 2019 · 18 comments
Assignees
Labels
Milestone

Comments

@markus2330
Copy link
Contributor

@markus2330 markus2330 commented Oct 27, 2019

When doing kdb ls I got a crash in Augeas:

==14172== Callgrind, a call-graph generating cache profiler
==14172== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==14172== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==14172== Command: kdb ls /
==14172== 
==14172== For interactive control, run 'callgrind_control -h'.
==14172== brk segment overflow in thread #1: can't grow to 0x4830000
==14172== (see section Limitations in user manual)
==14172== NOTE: further instances of this message will not be shown
free(): double free detected in tcache 2

Sorry, I crashed by the signal SIGABRT
This should not have happened!

Please report the issue at https://issues.libelektra.org/
==14172== 
==14172== Process terminating with default action of signal 6 (SIGABRT): dumping core
==14172==    at 0x4B097BB: raise (raise.c:51)
==14172==    by 0x4AF4534: abort (abort.c:79)
==14172==    by 0x1D631B: catchSignal(int) (main.cpp:110)
==14172==    by 0x4B0983F: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==14172==    by 0x4B097BA: raise (internal-signals.h:84)
==14172==    by 0x4AF4534: abort (abort.c:79)
==14172==    by 0x4B4B507: __libc_message (libc_fatal.c:181)
==14172==    by 0x4B51C19: malloc_printerr (malloc.c:5341)
==14172==    by 0x4B536FC: _int_free (malloc.c:4193)
==14172==    by 0x4B41880: fclose@@GLIBC_2.2.5 (iofclose.c:77)
==14172==    by 0x4F09313: elektraAugeasGet (augeas.c:540)
==14172==    by 0x4908B3C: kdbGet (kdb.c:554)
==14172== 
==14172== Events    : Ir
==14172== Collected : 30693614075
==14172== 
==14172== I   refs:      30,693,614,075
zsh: abort      valgrind --tool=callgrind kdb ls /

The problem was quite obviously introduced by 339d34d which does not return correctly in the error cases anymore.

A code review is needed if other places with return in macros are affected.

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Oct 28, 2019

Could you please give me a reproducible example on how to trigger this error?
I pulled the latest master, installed it (with augeas plugin) and do not experience any problems.

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Oct 28, 2019

The error obviously only occurs if fopen fails. It could be reproduced by using LD_PRELOAD (e.g. see https://stackoverflow.com/questions/35771395/why-doesnt-ld-preload-trick-catch-open-when-called-by-fopen) with a shared library that fails on the Nth call of fopen. If you really want to do that, I can help you how to do it. It might also help to find other segfaults or unhelpful error messages.

Otherwise, as already written above, I think code review is the more efficient approach to find these problems.

@markus2330 markus2330 added the bug label Nov 11, 2019
@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 11, 2019

Any progress here? I got the same segfault again.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fdad623e535 in __GI_abort () at abort.c:79
#2  0x000055e1872b330c in catchSignal (signum=<optimized out>) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007fdad623e535 in __GI_abort () at abort.c:79
#6  0x00007fdad6295508 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fdad63a028d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#7  0x00007fdad629bc1a in malloc_printerr (str=str@entry=0x7fdad63a1f58 "free(): double free detected in tcache 2") at malloc.c:5341
#8  0x00007fdad629d6fd in _int_free (av=0x7fdad63d7c40 <main_arena>, p=0x55e18e5b2c30, have_lock=<optimized out>) at malloc.c:4193
#9  0x00007fdad628b881 in _IO_new_fclose (fp=fp@entry=0x55e18e5b2c40) at iofclose.c:77
#10 0x00007fdad5fa3314 in elektraAugeasGet (handle=<optimized out>, returned=0x55e1892bcdd0, parentKey=0x55e1891f61d0)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/augeas/augeas.c:540
#11 0x00007fdad659dc7d in elektraGetDoUpdate (parentKey=0x55e1891f61d0, split=0x55e1892b8320) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:592
#12 kdbGet (handle=<optimized out>, ks=0x55e1891b0390, parentKey=0x55e1891f61d0) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1236
#13 0x000055e187243a2a in kdb::KDB::get (this=<optimized out>, returned=..., parentKey=...)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/bindings/cpp/include/keyset.hpp:592
#14 0x000055e1872612b2 in FindCommand::execute (this=<optimized out>, cl=...) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/find.cpp:33
#15 0x000055e18724194d in main (argc=<optimized out>, argv=0x7ffc15dd7bc8) at /usr/include/c++/8/bits/unique_ptr.h:342
@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 11, 2019

I will add it higher in my priority

@Piankero Piankero referenced this issue Nov 13, 2019
0 of 16 tasks complete
@markus2330 markus2330 added this to the 0.9.1 milestone Nov 15, 2019
@markus2330 markus2330 added the urgent label Nov 15, 2019
@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 15, 2019

Any progress here? It is only about adding a few return statements and such crashes are really severe.

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 15, 2019

I will Look at it in sunday/monday morning...

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 15, 2019

Thank you. The faster the better as this (and #3197) are release blockers and lcdproc is waiting for a release.

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 17, 2019

If you really want to do that, I can help you how to do it. It might also help to find other segfaults or unhelpful error messages.

I fear that you need to help me because I cannot reproduce this error. Do you have a mounted yml file which has some problems in it or why exactly is the augeas plugin called?

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 17, 2019

Please look at the code. It is easy to see in the code what was done wrongly but such error is difficult to reproduce in all its forms.

Without looking at the code, we can never get Elektra as stable as it was before introducing the new errors. Plenty of regressions were introduced in #2686, this cannot be fixed without understanding the code. return also exists in Java, so the semantics should be clear to you.

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 17, 2019

Please look at the code.

I did it more than once ... The whole stacktrace tells me that when I try to close the file descriptor, there is a double "free()" call (free(): double free detected in tcache 2). Though I absolutely cannot relate this to anything I have changed.

It is easy to see in the code what was done wrongly

No it is absolutely not for me because this is related to errors in memory management. Also your sentence is partly insulting as you assume I am too dumb to find the problem because it is "easy"...

return also exists in Java, so the semantics should be clear to you.

Yes but not memory management or this kind of file opening/closing mechanism and as far as I understand this error it is related to this

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 17, 2019

Also your sentence is partly insulting as you assume I am too dumb to find the problem because it is "easy"...

This is your interpretation. "easy" was about that the code changes should not change any semantics, so it is enough to check if semantics were changed. What I said (and still say) is that you should look at the code without any assumptions of free/malloc/memory.

So forget everything about the stacktrace. Only look at 339d34d and about where return has been before and where it is missing now. One more hint: Macros are simple textual replacements, returning from them means to return from wherever the macro is used.

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 18, 2019

So forget everything about the stacktrace. Only look at 339d34d and about where return has been before and where it is missing now. One more hint: Macros are simple textual replacements, returning from them means to return from wherever the macro is used.

Ok thank you, this was the information I was missing. I was on the wrong path before.
I opened a PR here: #3244

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 18, 2019

Was this the only occurrence of a macro that had a return in the whole error changes?

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 18, 2019

Btw, augeas was the only case which I remember that had this sort of macro replacement. I replaced all three occurrences.

I will take an extra look this evening in the old revision to check for that

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 18, 2019

Thank you!

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 18, 2019

I have looked through the error settings in v0.8.26 and as far as I can tell it did not occur somewhere else than in the augeas section.

So I think after we merge the PR this can be closed

@markus2330

This comment has been minimized.

Copy link
Contributor Author

@markus2330 markus2330 commented Nov 18, 2019

Thank you for looking! So mounting and augeas were the only regressions.

@Piankero

This comment has been minimized.

Copy link
Contributor

@Piankero Piankero commented Nov 20, 2019

I think we may close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.