Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function summaries #227

Merged
merged 18 commits into from
Aug 19, 2024
Merged

Function summaries #227

merged 18 commits into from
Aug 19, 2024

Conversation

b-paul
Copy link
Contributor

@b-paul b-paul commented Jul 17, 2024

Generates ensures clauses about the gammas of output variables in procedures using a taint analysis (ish). As an example

int c; // L(c) = true

int f(int a, int b) {
    if (c) {
        return a;
    } else {
        return b;
    }
}

we know that the return value of this function could be affected by a or b, so we say ensures Gamma_out == Gamma_a || Gamma_out == Gama_b || Gamma_out == true.

This analysis does not currently scale to run on the cntlm-new binary. This is probably because of the time complexity of the IDE solver.

@ailrst
Copy link
Contributor

ailrst commented Jul 17, 2024

To clarify the approximation in the taint analysis, this runs the taint analysis separately for each each input variable being tainted, and treats taint as a fixed gamma value, so that taint-flow is security-level information-flow. It over-approximates towards high (where?), so for each variable the ensures condition ensures either low, equal to the input variables which taint it = possibly high.

@b-paul
Copy link
Contributor Author

b-paul commented Jul 17, 2024

Yeah, the initial implementation would perform a taint analysis for each input/global variable, where taint is the variable's gamma at the start of the procedure. Doing this over every variable tells us precisely which input gammas affect the gamma of each output variable. The current implementation does this more efficiently though, instead working over a powerset lattice to determine the set of input variables which taint an output variable, in a single analysis.

I'm not 100% sure what you mean by over-approximates to high, are you referring to how the results can miss cases where a gamma is definitely high depending on the input (For example int f(int x, int y) { return x + y; } and int f(int x, int y) { return c ? x : y; } generate the same ensures clause, however the first one could be saying ensures Gamma_out == Gamma_x && Gamma_y)?

@b-paul
Copy link
Contributor Author

b-paul commented Jul 18, 2024

This change improves the performance of variable dependency analysis by breaking the call graph into its strongly connected components using Tarjan's algorithm, and analyzing those individually. We get a reverse topological sort of the components from Tarjan's algorithm, which allows us to analyze our procedures in an order such that we can get "summaries" of the analysis for all procedures called by a particular procedure, which significantly reduces duplicated work. (Unfortunately the cntlm-new binary has a really big strongly connected component in it which makes this analysis still infeasible to run on the whole binary)

@b-paul
Copy link
Contributor Author

b-paul commented Jul 19, 2024

(Credit to @ j-tobler) Ensures clauses might be able to be rewritten as $(\bigwedge_{v \in \text{taint}(out)} \Gamma_v) \Rightarrow \Gamma_{out}$, the gamma of the output variable is lower than the join of the gammas of the tainters.

Comment on lines 252 to 254
if (y != xn) {
for succ <- outdep(n) do propagate(y, succ)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yousifpatti didn't you find that doing this caused some bugs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused the results of constantprop analysis to not be propagated properly. The effects may not be noticeable because evaluate expression may return TOP and you deal with it the same way. The best way to check is by checking the dot file of constant prop to see if values are being propagated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I remember that made the analyses resolve some indirect calls incorrectly. It would be good if we could figure out how to make this optimisation work but it should be reverted for now.

Comment on lines 736 to 747
/*
Logger.info("[!] Running Taint Analysis")
val taintResults = TaintAnalysis(IRProgram, specGlobalAddresses, constPropResult,
IRProgram.procedures.foldLeft(Map[CFGPosition, Set[analysis.Taintable]]()) {
(m, p) => m + (p -> Set(analysis.UnknownMemory()))
}
).analyze()

config.analysisResultsPath.foreach(s =>
writeToFile(printAnalysisResults(IRProgram, taintResults), s"${s}_taint$iteration.txt")
)
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a flag for this if we don't want it enabled by default, or is it not yet working?

Copy link
Contributor Author

@b-paul b-paul Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't any particular reason for commenting this out apart from just not wanting it to run so adding a flag should be fine, would you like that? These taint results were kinda just there for testing and might not mean much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the taint analysis gets run as part of generating the function summaries I don't think it makes sense to have it as a separate thing, so this bit could probably just be removed actually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yeah these results won't be used for anything you're right, this can be removed

Comment on lines 88 to 93
// TODO should the stack/link/frame pointers propagate taint?
val variables: Set[analysis.Taintable] = 0.to(28).map { n =>
Register(s"R$n", 64)
}.toSet ++ specGlobals.map { g =>
analysis.GlobalVariable(dsl.mem, BitVecLiteral(g.address, 64), g.size, g.name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only registers R0-R18 and R30 are allowed under the AArch64 standard to change values as the result of a procedure call. This means that there isn't a need to create summaries for the gamma values for R19-R29 and R31 - we already have those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's helpful, thanks! I was unaware

runTests(incorrectPrograms, incorrectPath, "incorrect", false, false)
runTests(correctPrograms, correctPath, "correct", true, false, false)
runTests(incorrectPrograms, incorrectPath, "incorrect", false, false, false)
runTests(correctWithAnalysisPrograms, correctWithAnalysisPath, "correct", true, false, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the BAP one) should use "correct-analysis" since that's the name of the folder the tests are in.

@l-kent
Copy link
Contributor

l-kent commented Jul 22, 2024

There seems to be room for optimisation here in relation to which variables this generates summaries for. In some of the tests, it will generate summaries for global variables even when the procedure does not modify memory at all, or only modifies memory because it calls rely (which just says that it does not modify memory).

@l-kent
Copy link
Contributor

l-kent commented Jul 22, 2024

I'm a bit confused about why it even generates summaries for the globals x and y in the get_two procedure in the function_summary test. get_two doesn't at all access those globals, or even memory at all, so why does the taint analysis say they're tainted?

@@ -931,6 +981,8 @@ object RunUtils {
result.reachingDefs,
ctx.program
)
Logger.info("[!] Generating Function Summaries")
modified = modified | IRTransform.generateFunctionSummaries(ctx, ctx.program, result.constPropResult, result.varDepsSummaries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to update modified here, at least at the moment. The procedure summaries don't have any impact (and can't have any impact) on any of the other analyses right now so there isn't any need to do another iteration.

Comment on lines +141 to +145
val tainters = relevantVars.map {
v => (v, Set())
}.toMap ++ getTainters(procedure, variables ++ rnaResults(procedure.begin) + UnknownMemory()).filter { (variable, taints) =>
relevantVars.contains(variable)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining what this represents and how the sets of tainters for each variable have been derived would help.

@l-kent
Copy link
Contributor

l-kent commented Jul 22, 2024

The summaries for function_summary3 seem incorrect. For f we get:

ensures gamma_load32(Gamma_mem, $x_addr);
ensures Gamma_R1;
ensures Gamma_R0;

and for g we get:

ensures (gamma_load32(Gamma_mem, $x_addr) || (gamma_load32(Gamma_mem, $x_addr) == old(gamma_load32(Gamma_mem, $x_addr))));
ensures ((Gamma_R0 || (Gamma_R0 == old(Gamma_R1))) || (Gamma_R0 == old(Gamma_R0)));

Neither of these procedures modify memory at all, so I'm unclear on why summaries for the global variable x are generated.

For f, how is it that R1 and R0 are ensured to be low? Shouldn't R1 and R0 both have security levels that are dependent on the input security level of R0?

These issues may be related to this example using a non-returning call which is not really properly supported at present.

@b-paul
Copy link
Contributor Author

b-paul commented Jul 22, 2024

I'm a bit confused about why it even generates summaries for the globals x and y in the get_two procedure in the function_summary test. get_two doesn't at all access those globals, or even memory at all, so why does the taint analysis say they're tainted?

You're right in that get_two shouldn't need to say anything about x and y. The reason it mentions those is because x and y are initially marked as tainting themselves, so at the end of the procedure they still taint themselves. It shouldn't be too hard to add a check for modifying memory (hopefully) though

For f, how is it that R1 and R0 are ensured to be low? Shouldn't R1 and R0 both have security levels that are dependent on the input security level of R0?

That is concerning, it certainly shouldn't be doing that 😬 I'm confused by how that even passed the test to be perfectly honest... My first guess was that it is something to do with the summaries generated by the strongly connected components, or something like that

I'll hopefully find time to work on this soon

@l-kent
Copy link
Contributor

l-kent commented Jul 22, 2024

The test passes because the 'assume false' (generated by the non-returning call to g() in f()) allows for any post condition of f() to be verified. That's part of what I mean by non-returning calls not being properly supported at the moment.

The reason it mentions those is because x and y are initially marked as tainting themselves, so at the end of the procedure they still taint themselves. It shouldn't be too hard to add a check for modifying memory (hopefully) though

Yes, it seems important to handle the case where a variable is not actually modified at all. It is probably difficult to do this robustly for now but at the very least it should be trivial to check whether memory is modified at all, and the memory region analysis should allow this to be improved in the future.

@l-kent
Copy link
Contributor

l-kent commented Aug 12, 2024

I'm just going to clean this up and merge it in, putting it behind a flag for now.

@l-kent l-kent merged commit ffd54a9 into main Aug 19, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants