Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration test with a static go binary is scoped, and uses tls transport #994

Closed
jrcheli opened this issue Jun 9, 2022 · 6 comments · Fixed by #1000
Closed

Add integration test with a static go binary is scoped, and uses tls transport #994

jrcheli opened this issue Jun 9, 2022 · 6 comments · Fixed by #1000
Assignees
Milestone

Comments

@jrcheli
Copy link
Contributor

jrcheli commented Jun 9, 2022

(belongs in transport test, maybe?)

This is related to the finding in #992.

@jrcheli jrcheli changed the title Add integration test where a static go binary is scoped, and uses tls Add integration test with a static go binary is scoped, and uses tls transport Jun 9, 2022
@jrcheli
Copy link
Contributor Author

jrcheli commented Jun 13, 2022

I added a test case to the existing transport test suite.
To "test the test", I did the following:

edited contrib/Makefile to delete the "no-threads" build flag for openssl 
make clean all test
cd test/integration
make transport

This failed as expected, since the "no-threads" build flag is required.
Then I did a git checkout contrib/Makefile to effectively add the no-threads build flag back in, rebuilt, and reran the transport integration test.

The test did not pass. Um, what?. 😧

@jrcheli
Copy link
Contributor Author

jrcheli commented Jun 14, 2022

I was able to capture a backtrace of the unexpected crash.

cd test/integration
make transport-shell
/usr/local/scope/scope-test
apt update
apt install gdb
gdb --args scope run -c tls://127.0.0.1:10090 -- /opt/test/bin/plainClientStatic
run

This is the output:

(gdb) run
Starting program: /usr/local/scope/bin/scope run -c tls://127.0.0.1:10090 -- /opt/test/bin/plainClientStatic
process 637 is executing new program: /tmp/libscope-1.1.0/ldscopedyn
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 637 is executing new program: /tmp/libscope-1.1.0/ldscopedyn
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 637 is executing new program: /opt/appscope/bin/linux/x86_64/scope
[New LWP 641]
[New LWP 642]
[New LWP 643]
[LWP 643 exited]
[LWP 642 exited]
[LWP 641 exited]
process 637 is executing new program: /root/.scope/ldscope
process 637 is executing new program: /tmp/libscope-1.1.0/ldscopedyn
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6269700 (LWP 644)]
[New Thread 0x8dd9b0 (LWP 645)]
[New Thread 0x8dd9b0 (LWP 646)]
[New Thread 0x8dd9b0 (LWP 647)]
Response status: 200 OK
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
[New Thread 0xc000038090 (LWP 648)]
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rel="author" href="mailto:webmasters@gnu.org" />

Thread 9 "ldscopedyn" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xc000039c90 (LWP 648)]
0x00007ffff6f20cbb in CONF_parse_list () from /tmp/libscope-1.1.0/libscope.so
(gdb) bt
#0  0x00007ffff6f20cbb in CONF_parse_list () from /tmp/libscope-1.1.0/libscope.so
#1  0x00007ffff6eb1bee in SSL_CTX_set_ciphersuites () from /tmp/libscope-1.1.0/libscope.so
#2  0x00007ffff6ebd334 in SSL_CTX_new_ex () from /tmp/libscope-1.1.0/libscope.so
#3  0x00007ffff6defabe in establishTlsSession (trans=0x7ffff6d0ce70) at src/transport.c:395
#4  0x00007ffff6df071a in checkPendingSocketStatus (trans=0x7ffff6d0ce70) at src/transport.c:755
#5  0x00007ffff6df10f0 in transportConnect (trans=0x7ffff6d0ce70) at src/transport.c:1030
#6  0x00007ffff6df709d in ctlConnect (ctl=0x7ffff667c1d0, who=CFG_CTL) at src/ctl.c:1150
#7  0x00007ffff6dd67fe in doConnection () at src/report.c:3410
#8  0x00007ffff6dd68e7 in doEvent () at src/report.c:3445
#9  0x00007ffff6da92c2 in reportPeriodicStuff () at src/wrap.c:897
#10 0x00007ffff6da9601 in handleExit () at src/wrap.c:991
#11 0x00007ffff6e027a0 in c_exit (stackaddr=0xc000045f60 "") at src/wrap_go.c:1949
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@jrcheli
Copy link
Contributor Author

jrcheli commented Jun 14, 2022

Digging in some more, I think I understand what the problem is. CONF_parse_list uses isspace(). The code for isspace() was built using libc headers, and we're trying to run with our internal libc that has an incompatible implementation of isspace() in the musl headers.

When I look at the disassembly of CONF_parse_list, I can see it doing bitwise operations for the implementation of isspace():

  1f9cb6:       e8 85 5c e8 ff          callq  7f940 <__ctype_b_loc@plt>
  1f9cbb:       48 8b 00                mov    (%rax),%rax
  1f9cbe:       66 90                   xchg   %ax,%ax
  1f9cc0:       f6 44 58 01 20          testb  $0x20,0x1(%rax,%rbx,2)
  1f9cc5:       74 b9                   je     1f9c80 <CONF_parse_list+0x30>

This is consistent with the glibc type.h include file on my machine:

# define isspace(c) __isctype((c), _ISspace)

where _ISspace comes from a bit flag implementation (from same glibc type.h include file)

enum
{
  _ISupper = _ISbit (0),    /* UPPERCASE.  */
  _ISlower = _ISbit (1),    /* lowercase.  */
  _ISalpha = _ISbit (2),    /* Alphabetic.  */
  _ISdigit = _ISbit (3),    /* Numeric.  */
  _ISxdigit = _ISbit (4),   /* Hexadecimal numeric.  */
  _ISspace = _ISbit (5),    /* Whitespace.  */
  _ISprint = _ISbit (6),    /* Printing.  */
  _ISgraph = _ISbit (7),    /* Graphical.  */
  _ISblank = _ISbit (8),    /* Blank (usually SPC and TAB).  */
  _IScntrl = _ISbit (9),    /* Control character.  */
  _ISpunct = _ISbit (10),   /* Punctuation.  */
  _ISalnum = _ISbit (11)    /* Alphanumeric.  */
};

This stack overflow link was helpful when trying to look at this the first time...
https://stackoverflow.com/questions/37702434/ctype-b-loc-what-is-its-purpose

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Changing gears to look at our the type.h of our internal (musl) libc, isspace() has a completely different implementation that has nothing to do with bitwise comparisons:

static __inline int __isspace(int _c)
{
    return _c == ' ' || (unsigned)_c-'\t' < 5;
}
#define isspace(a) __isspace(a)

@jrcheli
Copy link
Contributor Author

jrcheli commented Jun 15, 2022

So what to do about this? After consulting with @michalbiesek, we understand that it should be reasonable to just switch from using __ctype_b_loc to using scopelibc___ctype_b_loc. Originally, I didn't think that musl was binary compatible in this way, but musl does provide a __ctype_b_loc that will work if we switch to use it. bminor/musl@9372655
So cool.

So the solution here is to switch from using glibc's __ctype_b_loc to instead use our own musl's version of the same.

@jrcheli
Copy link
Contributor Author

jrcheli commented Jun 16, 2022

The last commit here is to fix the test execution of the new transport test case on ARM.
We don't yet support go on ARM processors, so when I added the execution of plainClientStatic (a go app) to the transport test, the transport test started failing when run on ARM. The test showed a real problem, that we were trying to run go applications under ldscope on the ARM architecture when we shouldn't try to do that.

This last commit “just runs” go applications if we’re running on ARM (without running them under ldscope).

@jrcheli jrcheli self-assigned this Jun 17, 2022
@jrcheli jrcheli added this to the Next Minor (1.1.0) milestone Jun 17, 2022
@jrcheli
Copy link
Contributor Author

jrcheli commented Jun 17, 2022

@abetones - from a docs point of view... there are two issues to consider here:

  1. We saw a crash of a static go application because we should have been using a function from our internal libc but were accidentally using the function from glibc. There is no need to mention this in the docs, imho, because we're just adding internal libc in 1.1.0, so there isn't a chance that someone has encountered this before and since we've fixed it, nobody will ever encounter it.
  2. The last commit is something that could be called out in the changelog and/or in requirements, at your discretion. We've already said in the requirements "AppScope cannot:" ... "Instrument Go executables on ARM." This was true and still is.

While working on this ticket, we realized that if someone tries to scope Go on ARM before 1.1.0, we'd try to scope it and fail to be able to do so. As a result, the go app wouldn't get executed and scope would return an error code.
The fix we added was to make it so that if someone tries to scope Go on ARM with 1.1.0 or after, we'll make sure that the go app is executed. It won't be instrumented, (it will be as if it was run without scope), but it will be executed now.

TLDR for 2); we're handling a case of unsupported applications (Go on ARM) more elegantly than before.

abetones pushed a commit that referenced this issue Jun 21, 2022
jrcheli pushed a commit that referenced this issue Jun 22, 2022
* add #994 fix to Changelog

* almost all the edits

* updating to talk about cloud properly

* fix double Cribls

* fix a fix that was wrong

* add missing word

* tweak a sentence

* corrections per reviews

* clarify mute section

Co-authored-by: Abe Raher <abe@cribl.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant