Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_pt_ls fails with Dyninst master #123

Closed
ssunny7 opened this issue Jul 14, 2016 · 20 comments
Closed

test_pt_ls fails with Dyninst master #123

ssunny7 opened this issue Jul 14, 2016 · 20 comments
Assignees
Labels
Milestone

Comments

@ssunny7
Copy link
Contributor

ssunny7 commented Jul 14, 2016

Both create and rewriter test_pt_lss fail with Dyninst's master branch. Branch used for testsuite was also master.

Output from running the tests using the -log and -verbose options:

Commencing test(s) ...
Thu Jul 14 16:21:33 CDT 2016
Linux bigking.cs.wisc.edu 2.6.32-573.7.1.el6.x86_64 #1 SMP Thu Sep 10 13:42:16 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
TESTDIR=/afs/cs.wisc.edu/p/paradyn/development/ssunny/dyninst/testsuite-install/bin/testsuite
[Tests with none]

Enabling DyninstAPI parsing debug
ParseThat.C[79]:  resolved parseThat to /u/s/s/ssunny/dev-home/dyninst/dyninst-install/bin/parseThat
ParseThat.C[79]:  resolved parseThat to /u/s/s/ssunny/dev-home/dyninst/dyninst-install/bin/parseThat
ParseThat.C[304]:  parseThat: /u/s/s/ssunny/dev-home/dyninst/dyninst-install/bin/parseThat
ParseThat.C[343]:  about to issue command: 
        '/u/s/s/ssunny/dev-home/dyninst/dyninst-install/bin/parseThat -i 1 -p 1 -v 7 --summary -t 300 -T 0 -o ./binaries/test_pt_ls_output1 /bin/ls / 1>./binaries/test_pt_ls_stdout1 2>./binaries/test_pt_ls_stderr1'
ParseThat.C[352]:  parseThat cmd failed with code 1

Following is the relevant portion of the output from the debug parsing log for this test:

ParseThat: /u/s/s/ssunny/dev-home/dyninst/dyninst-code/dyninstAPI/src/codegen-x86.C:1207: static bool insnCodeGen::modifyData(Dyninst::Address, NS_x86::instruction&, codeGen&): Assertion `!"Couldn't decode opcode of already known instruction!\n" failed.

Let me know if you need any other information.

@ssunny7 ssunny7 changed the title test_pt_ls fails Dyninst master test_pt_ls fails with Dyninst master Jul 14, 2016
@jdetter
Copy link
Contributor

jdetter commented Jul 15, 2016

This was fixed by a patch in v9.2.0_patches. Could you try merging v9.2.0_patches into your branch and rerunning the testsuite?

@jdetter jdetter mentioned this issue Jul 15, 2016
@jdetter jdetter closed this as completed Jul 15, 2016
@ssunny7 ssunny7 reopened this Jul 18, 2016
@ssunny7
Copy link
Contributor Author

ssunny7 commented Jul 18, 2016

I get a bunch of test failures now, all of which say Unhandled instruction syscall ECX. Attached is the full log from runTests. Reopening this issue, let me know if I've missed anything.

runtests.txt

@jdetter
Copy link
Contributor

jdetter commented Jul 18, 2016

@ssunny7 was this after merging with v9.2_patches?

@ssunny7
Copy link
Contributor Author

ssunny7 commented Jul 18, 2016

@jdetter Yes.

@jdetter
Copy link
Contributor

jdetter commented Jul 18, 2016

@ssunny7 Is this vanilla master or have you made changes? I just recloned master on Fedora23 and I am passing all tests. I am rebuilding on the CSL now to see if it could be an environment issue.

@ssunny7
Copy link
Contributor Author

ssunny7 commented Jul 18, 2016

@jdetter I cleaned up my code directory a little, and I no longer get all those errors. test_pt_ls, however, still fails in create mode. The bin folder of my dyninst installation is included in PATH.

The log file test_pt_ls_stderr1 in the binaries folder of the testsuite installation shows 72 lines each saying Warning: mix of recursive and guarded snippets @ 0x21e6890, picking guarded.

@jdetter
Copy link
Contributor

jdetter commented Jul 18, 2016

@ssunny7 The test_pt_ls failure is a known testsuite issue on x86_64. #62 still has not been resolved.

@jdetter jdetter closed this as completed Jul 18, 2016
@wrwilliams
Copy link
Member

pc_tls is not the same as test_pt_ls...

@wrwilliams wrwilliams reopened this Jul 18, 2016
@jdetter
Copy link
Contributor

jdetter commented Jul 18, 2016

@wrwilliams is test_pt_ls supposed to pass?

@wrwilliams
Copy link
Member

Assuming parseThat is found (which it is), yes.

@jdetter
Copy link
Contributor

jdetter commented Jul 18, 2016

Oh okay, sorry for the confusion!! I will look at this today.

Looks like somehow the mutatee fork is failing:

00121 "" Creating new BPatch object.
00124 "" Success.
00221 "" Requesting notification of mutatee exit.
00224 "" Success.
00321 "" Requestion notification of mutator fork.
00324 "" Success.
00421 "" Forking mutatee process.
00423 "Failure in BPatch::processCreate()" Error encountered.
Analysis complete.

@jdetter jdetter added this to the Release 9.2.1 milestone Jul 18, 2016
@jdetter
Copy link
Contributor

jdetter commented Jul 21, 2016

During startup we attempt to place a breakpoint at main. To populate main_function_, we call findFuncsByAll which checks for both pretty and mangled versions of main. It looks like we aren't able to find main as a pretty or mangled name. Then I checked the symbol table for /bin/ls on Fedora 23:

[detter@localhost exec-info]$ readelf -s /bin/ls | grep main
    28: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND textdomain@GLIBC_2.2.5 (3)
    33: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND bindtextdomain@GLIBC_2.2.5 (3)
    57: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (3)

@wrwilliams What should we do here? /bin/ls has no main and that's what is causing the failure.

@cuviper
Copy link
Contributor

cuviper commented Jul 21, 2016

Of course main exists - it's just not exposed in the stripped symbol table.

This is supposed to be solved by image::findMain(), which by heuristic tries to guess main as the first parameter to the first call from the ELF start address, presumably calling __libc_start_main. See image.C up to line ~554 with: mainAddress = get_immediate_operand(&preCall);

I recall this failed once because it only attempted for executables (ET_EXEC), not shared objects (ET_DYN), but in Fedora many executables are built as PIE which is also ET_DYN. So that's why it now also checks for the interpreter name. That's commit 44205db.

As a first pass, you could install coreutils-debuginfo to get your symbols back, and make sure everything else is working. Then remove debuginfo to get back to main searching...

@jdetter
Copy link
Contributor

jdetter commented Jul 21, 2016

@cuviper after running sudo dnf debuginfo-install coreutils, dyninst is able to find main as a pretty name and the test passes normally.

The other thing that I thought was weird is that nm couldn't find any symbols without passing the -D flag because all of the symbols were dynamic. Like you said, that's probably because /bin/ls is built as a PIE.

Thanks for the hints Josh, hopefully I can get this figured out tomorrow.

@cuviper
Copy link
Contributor

cuviper commented Jul 22, 2016

I had a nagging feeling this was still familiar, and I found out why. That heuristic is only looking for an immediate operand for the main address, but it was reported a while ago that PIE sometimes has an LEA based on RIP instead. https://lists.cs.wisc.edu/archive/dyninst-api/2014/msg00294.shtml

It appears this is now the case for F24's /bin/ls too:

$ readelf -h /bin/ls | grep Entry
  Entry point address:               0x5ab0
$ objdump -d /bin/ls | grep 5ab0: -A10
    5ab0:       31 ed                   xor    %ebp,%ebp
    5ab2:       49 89 d1                mov    %rdx,%r9
    5ab5:       5e                      pop    %rsi
    5ab6:       48 89 e2                mov    %rsp,%rdx
    5ab9:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    5abd:       50                      push   %rax
    5abe:       54                      push   %rsp
    5abf:       4c 8d 05 4a f5 00 00    lea    0xf54a(%rip),%r8        # 15010 <_obstack_memory_used@@Base+0x13d0>
    5ac6:       48 8d 0d d3 f4 00 00    lea    0xf4d3(%rip),%rcx        # 14fa0 <_obstack_memory_used@@Base+0x1360>
    5acd:       48 8d 3d cc de ff ff    lea    -0x2134(%rip),%rdi        # 39a0 <_init@@Base+0x3e0>
    5ad4:       e8 c7 dc ff ff          callq  37a0 <_init@@Base+0x1e0>

I don't think it was like this when I first fixed image::findMain for PIE, but now it seems we really should deal with it. My suggestion back then was to use dataflowAPI from the call to identify the value of %rdi, like PPC already does for r8.

Note that objdump easily figures out that this resolves to 39a0, which is indeed main.

$ eu-addr2line --pretty-print -f -e /bin/ls 39a0
main at ../src/ls.c:1249

@jdetter
Copy link
Contributor

jdetter commented Jul 22, 2016

@cuviper I had just figured this out and I was typing up an explanation but you beat me to it =)

I think if we use dataflowAPI here then it will be more resilient to compiler changes in the future. Would you like to implement this or do you want me to give it a try?

@cuviper
Copy link
Contributor

cuviper commented Jul 22, 2016

Please go for it! :)

@cuviper
Copy link
Contributor

cuviper commented Jul 22, 2016

Note that 32-bit will have to look for the last stack value as the argument we need.

It also appears that it's not necessarily the first call anymore!

$ echo 'int main() { return 0; }' | gcc -pie -m32 -x c -
$ entry=$(readelf -h a.out | grep Entry | cut -dx -f2)
$ objdump -d a.out | grep $entry: -A20 -m1
 450:   31 ed                   xor    %ebp,%ebp
 452:   5e                      pop    %esi
 453:   89 e1                   mov    %esp,%ecx
 455:   83 e4 f0                and    $0xfffffff0,%esp
 458:   50                      push   %eax
 459:   54                      push   %esp
 45a:   52                      push   %edx
 45b:   e8 22 00 00 00          call   482 <_start+0x32>
 460:   81 c3 a0 1b 00 00       add    $0x1ba0,%ebx
 466:   8d 83 30 e6 ff ff       lea    -0x19d0(%ebx),%eax
 46c:   50                      push   %eax
 46d:   8d 83 d0 e5 ff ff       lea    -0x1a30(%ebx),%eax
 473:   50                      push   %eax
 474:   51                      push   %ecx
 475:   56                      push   %esi
 476:   ff b3 f4 ff ff ff       pushl  -0xc(%ebx)
 47c:   e8 af ff ff ff          call   430 <__libc_start_main@plt>
 481:   f4                      hlt
 482:   8b 1c 24                mov    (%esp),%ebx
 485:   c3                      ret
 486:   66 90                   xchg   %ax,%ax

I guess that tiny function at 482 is how it gets the ip for ip-relative addressing in 32-bit mode.

@jdetter
Copy link
Contributor

jdetter commented Jul 22, 2016

Ok, thanks Josh for all of the help!! =)

@jdetter
Copy link
Contributor

jdetter commented Aug 22, 2016

This issue has been fixed on Fedora 23 and Ubuntu, there is a new issue on RHEL 6 that is causing this test to fail for a different reason.

@jdetter jdetter closed this as completed Aug 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants