Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved call resolution, testing and convert analyses to use IR iterator #204

Merged
merged 20 commits into from
Jun 11, 2024

Conversation

yousifpatti
Copy link
Contributor

@yousifpatti yousifpatti commented Apr 16, 2024

Currently passing all cases mentioned by Liam in #196 and #43 as well

  • Fix up some to dos
  • Code cleanup

@yousifpatti yousifpatti linked an issue Apr 16, 2024 that may be closed by this pull request
@yousifpatti yousifpatti changed the title Improved call resolution and testing Improved call resolution, testing and convert analyses to use IR iterator Apr 16, 2024
@yousifpatti
Copy link
Contributor Author

Hi Liam, can you have an initial run and let me know if the calls are resolved alright? I have tested the most common examples (IndirectCallsTests in the test folder) and they all pass @l-kent

@yousifpatti yousifpatti marked this pull request as ready for review June 4, 2024 00:08
@yousifpatti
Copy link
Contributor Author

I have put the loop issue in #213 so it can be investigated after the merge

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

Adding the memory regions into the IR doesn't appear to work correctly at present.

MemoryAssigns are not handled properly, with only the rhs being updated to use a new region - this should not happen (and #188 will prevent a MemoryAssign having a different Memory on the lhs and rhs).

Identifying regions seems like oversimplifies things at present. In the test basicassign_gamma0/clang for instance, there is the following section in main:

000002c5: R8 := 0x11000
000002cc: R8 := pad:64[mem[R8 + 0x34, el]:u32]
000002d1: R9 := 0x11000
000002d9: mem := mem with [R9 + 0x38, el]:u32 <- 31:0[R8]

0x11034 is the location of 32-bit global variable 'secret'.
0x11038 is the location of 32-bit global variable 'x'.

The region analysis collapses these accesses into a single region named 'abort', seemingly named after the pointer located at at 0x11018 that points to the external abort C library function. In such a simple case, where there is no ambiguity in the accesses, the analysis surely should be able to distinguish between these as regions?

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

It's also necessary to update the initialised memory sections when splitting mem into separate regions, and this doesn't happen.

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

What I would like to do is look at merging the commits up to 4632b95 now, and move the commits since then into a new branch since it's clear they're going to require much more work. Does this make sense as an approach?

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

evaluateExpressionWithSSA really needs to be implemented for all Exprs, it's quite limited at the moment

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

Whenever the analysis resolves an indirect call, it also seems to incorrectly 'resolve' the procedure's return statement as a call to whatever the indirect call that it has resolved is.

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

Test cases that still fail to resolve indirect calls:

  • functionpointer/clang_O2
  • functionpointer/gcc_O2
  • jumptable/clang
  • jumptable/clang_no_plt_no_pic
  • jumptable/clang_pic
  • jumptable2/clang
  • jumptable2/clang_no_plt_no_pic
  • jumptable2/clang_pic
  • jumptable2/gcc_O2

As always, jumptable3 and switch2 are not possible to resolve correctly due to BAP's limitations so I haven't listed them. I have not yet checked that all others are resolved correctly.

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

I want to fix the bug where returns get incorrectly resolved as indirect calls, then I'm happy to merge up to 4632b95

@l-kent
Copy link
Contributor

l-kent commented Jun 6, 2024

Since all the commits here are also in the yousif-regions-into-ir branch, I want to drop the last few commits from this branch that aren't ready to merge yet. Is that ok @yousifpatti?

@l-kent
Copy link
Contributor

l-kent commented Jun 7, 2024

The bug with returns being resolved incorrectly as indirect calls can be easily fixed by just not trying to resolve calls to R30, but it indicates a broader problem with the points-to analysis over-approximating stack accesses that it should be able to differentiate between.

The following is from the indirect_call/clang test. I've annotated it with the values for relevant variables given by the points-to analysis result at relevant points.

00000352:
00000356: R31 := R31 - 0x30
R31 == (stack_24(R31 + 40), stack_13(R31 + 16))
0000035c: #5 := R31 + 0x20
#5 == (stack_24(R31 + 40), stack_13(R31 + 16))
00000362: mem := mem with [#5, el]:u64 <- R29
00000368: mem := mem with [#5 + 8, el]:u64 <- R30
0000036e: R29 := R31 + 0x20
R29 == (stack_24(R31 + 40), stack_13(R31 + 16))
00000373: R8 := 0
0000037b: mem := mem with [R31 + 0xC, el]:u32 <- 31:0[R8]
00000382: mem := mem with [R29 - 4, el]:u32 <- 0
00000387: R8 := 0
0000038d: R8 := R8 + 0x754
00000395: mem := mem with [R31 + 0x10, el]:u64 <- R8
0000039a: R0 := 0
000003a0: R0 := R0 + 0x7DF
000003a5: R30 := 0x7A0
000003a7: call @printf with return %000003a9

000003a9:
000003ae: R8 := mem[R31 + 0x10, el]:u64
R8 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003b3: R30 := 0x7A8
000003b6: call R8 with return %000003b8

000003b8:
000003bd: R0 := pad:64[mem[R31 + 0xC, el]:u32]
R0 == stack_6(R31 + 12)
000003c3: #6 := R31 + 0x20
#6 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003c8: R29 := mem[#6, el]:u64
R29 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003cd: R30 := mem[#6 + 8, el]:u64
R30 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003d3: R31 := R31 + 0x30
000003d8: call R30 with noreturn

According to the analysis, stack_13 contains a pointer to 'greet', stack_24's contents is empty/unknown, and stack_6 contains the value 0bv32.

The analysis can't distinguish between accesses to stack_13 and stack_24, even though there shouldn't be any ambiguity. The analysis only resolves the indirect call to greet correctly at 000003b6 because it doesn't know what stack_24 points to and ignores it as a result.

I also expect that the region contents not taking the assignment point into account could pose problems in the future. Memory values can be reassigned, so a program that writes different values to the same location at multiple points would likely significantly break the analysis.

@l-kent l-kent mentioned this pull request Jun 7, 2024
@l-kent
Copy link
Contributor

l-kent commented Jun 10, 2024

@l-kent l-kent merged commit 762e65e into main Jun 11, 2024
2 checks passed
@l-kent l-kent deleted the yousif-ssa-steensgaard2 branch July 16, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Indirect Call Resolution (Using points-to)
2 participants