Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ast_coredumper: Increase reliability #446

Merged

Conversation

gtjoseph
Copy link
Member

@gtjoseph gtjoseph commented Nov 13, 2023

Instead of searching for the asterisk binary and the modules in the
filesystem, we now get their locations, along with libdir, from
the coredump itself...

For the binary, we can use gdb -c <coredump> ... "info proc exe".
gdb can print this even without having the executable and symbols.

Once we have the binary, we can get the location of the modules with
gdb ... "print ast_config_AST_MODULE_DIR

If there was no result then either it's not an asterisk coredump
or there were no symbols loaded. Either way, it's not usable.

For libdir, we now run "strings" on the note0 section of the
coredump (which has the shared library -> memory address xref) and
search for "libasteriskssl|libasteriskpj", then take the dirname.

Since we're now getting everything from the coredump, it has to be
correct as long as we're not crossing namespace boundaries like
running asterisk in a docker container but trying to run
ast_coredumper from the host using a shared file system (which you
shouldn't be doing).

There is still a case for using --asterisk-bin and/or --libdir: If
you've updated asterisk since the coredump was taken, the binary,
libraries and modules won't match the coredump which will render it
useless. If you can restore or rebuild the original files that
match the coredump and place them in a temporary directory, you can
use --asterisk-bin, --libdir, and a new --moddir option to point to
them and they'll be correctly captured in a tarball created
with --tarball-coredumps. If you also use --tarball-config, you can
use a new --etcdir option to point to what normally would be the
/etc/asterisk directory.

Also addressed various "shellcheck" issues.

Resolves: #445

@gtjoseph
Copy link
Member Author

cherry-pick-to: 18
cherry-pick-to: 20
cherry-pick-to: 21

@gtjoseph
Copy link
Member Author

Found a few things debian/ubuntu related. need to fix.

@InterLinked1
Copy link
Contributor

Found a few things debian/ubuntu related. need to fix.

This is with the current core dumper, not with your change, but somehow it seems to fail on Debian sometimes, even with the standard paths:

root@debian:/usr/src/asterisk-21.0.0# ls -la core
-rw------- 1 root root 180887552 Nov 16 16:49 core
root@debian:/usr/src/asterisk-21.0.0# /var/lib/asterisk/scripts/ast_coredumper core
No coredumps found
root@debian:/usr/src/asterisk-21.0.0# ls -la core
-rw------- 1 root root 180887552 Nov 16 16:49 core

@gtjoseph
Copy link
Member Author

@InterLinked1 Try this version and see if it resolves your issues in various environments.

@@ -411,7 +458,7 @@ find_pid() {
# Now that we have the pids, let's get the command and
# its args. We'll add them to an array indexed by pid.
declare -a candidates
while read LINE ; do
while read -r LINE ; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially avoid the regex with:

while read -r pid prog args ; do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way so if for some reason the result didn't match the regex, we'd skip that line and continue otherwise we won't be sure that we got a valid line.

@InterLinked1
Copy link
Contributor

InterLinked1 commented Nov 29, 2023

@InterLinked1 Try this version and see if it resolves your issues in various environments.

Still no bueno:

root@debian:/usr/src/asterisk-21.0.0# ls -la core
-rw------- 1 root root 226508800 Nov 29 12:49 core
root@debian:/usr/src/asterisk-21.0.0# /var/lib/asterisk/scripts/ast_coredumper core
No coredumps found

Ignore me, I forgot to make install to move the scripts.

Works much better now:

root@debian:/usr/src/asterisk-21.0.0# /var/lib/asterisk/scripts/ast_coredumper core
readlink: missing operand
Try 'readlink --help' for more information.
Examining core
    Does appear to be an asterisk coredump
    Coredump indicates executable 'asterisk'
    Searching for asterisk module directory
    Found asterisk module directory '/usr/lib/asterisk/modules'
Processing /usr/src/asterisk-21.0.0/core
    ASTBIN: asterisk
    MODDIR: /usr/lib/asterisk/modules
    ETCDIR: /etc/asterisk
    LIBDIR: /usr/lib
    Renaming /usr/src/asterisk-21.0.0/core to /usr/src/asterisk-21.0.0/core-asterisk-2023-11-29T17-49-18Z
    Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-thread1.txt
    Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-brief.txt
    Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-full.txt
    Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-locks.txt
    Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-info.txt

Instead of searching for the asterisk binary and the modules in the
filesystem, we now get their locations, along with libdir, from
the coredump itself...

For the binary, we can use `gdb -c <coredump> ... "info proc exe"`.
gdb can print this even without having the executable and symbols.

Once we have the binary, we can get the location of the modules with
`gdb ... "print ast_config_AST_MODULE_DIR`

If there was no result then either it's not an asterisk coredump
or there were no symbols loaded.  Either way, it's not usable.

For libdir, we now run "strings" on the note0 section of the
coredump (which has the shared library -> memory address xref) and
search for "libasteriskssl|libasteriskpj", then take the dirname.

Since we're now getting everything from the coredump, it has to be
correct as long as we're not crossing namespace boundaries like
running asterisk in a docker container but trying to run
ast_coredumper from the host using a shared file system (which you
shouldn't be doing).

There is still a case for using --asterisk-bin and/or --libdir: If
you've updated asterisk since the coredump was taken, the binary,
libraries and modules won't match the coredump which will render it
useless.  If you can restore or rebuild the original files that
match the coredump and place them in a temporary directory, you can
use --asterisk-bin, --libdir, and a new --moddir option to point to
them and they'll be correctly captured in a tarball created
with --tarball-coredumps.  If you also use --tarball-config, you can
use a new --etcdir option to point to what normally would be the
/etc/asterisk directory.

Also addressed many "shellcheck" findings.

Resolves: asterisk#445
@gtjoseph
Copy link
Member Author

root@debian:/usr/src/asterisk-21.0.0# /var/lib/asterisk/scripts/ast_coredumper core
readlink: missing operand
Try 'readlink --help' for more information.
Examining core
Does appear to be an asterisk coredump
Coredump indicates executable 'asterisk'
Searching for asterisk module directory
Found asterisk module directory '/usr/lib/asterisk/modules'
Processing /usr/src/asterisk-21.0.0/core
ASTBIN: asterisk
MODDIR: /usr/lib/asterisk/modules
ETCDIR: /etc/asterisk
LIBDIR: /usr/lib
Renaming /usr/src/asterisk-21.0.0/core to /usr/src/asterisk-21.0.0/core-asterisk-2023-11-29T17-49-18Z
Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-thread1.txt
Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-brief.txt
Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-full.txt
Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-locks.txt
Creating /tmp/core-asterisk-2023-11-29T17-49-18Z-info.txt

The readlink: missing operand should have been suppressed since you specified a core file on the command line. Just pushed up a fix for that.

@jcolp
Copy link
Member

jcolp commented Dec 4, 2023

@InterLinked1 Is this now in a working state for you?

@InterLinked1
Copy link
Contributor

@InterLinked1 Is this now in a working state for you?

Sorry - yup, it is, and that other spurious warning is gone now, too.
I'm not sure if there was some recent regression but ast_coredumper was previously not working on any of my machines recently (it always had mostly in the past). With this PR though all is well again.

@gtjoseph gtjoseph added the cherry-pick-test Trigger dry run of cherry-picks label Dec 6, 2023
@github-actions github-actions bot added cherry-pick-testing-in-progress Cherry-Pick tests in progress cherry-pick-checks-passed Cherry-Pick checks passed cherry-pick-gates-failed Cherry-Pick gates failed and removed cherry-pick-test Trigger dry run of cherry-picks cherry-pick-testing-in-progress Cherry-Pick tests in progress labels Dec 6, 2023
@InterLinked1
Copy link
Contributor

One more thing I just noticed now:

root@ess:/usr/src/asterisk-21.0.0# /var/lib/asterisk/scripts/ast_coredumper --RUNNING
Found a single asterisk instance running as process 848952
Dumping running asterisk process to /tmp/core-asterisk-running-2023-12-06T16-39-34Z
Dump is complete.
Couldn't get module directory from coredump!

However, it seems the backtrace was successfully extracted, so whatever happened didn't impede it from working.

Copy link

github-actions bot commented Dec 6, 2023

Successfully merged to branch master and cherry-picked to ["18","20","21"]

@gtjoseph gtjoseph deleted the master-coredumper-fix branch March 20, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: ast_coredumper isn't figuring out file locations properly in all cases
4 participants