-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function summaries #227
Function summaries #227
Conversation
To clarify the approximation in the taint analysis, this runs the taint analysis separately for each each input variable being tainted, and treats taint as a fixed gamma value, so that taint-flow is security-level information-flow. It over-approximates towards high (where?), so for each variable the ensures condition ensures either low, equal to the input variables which taint it = possibly high. |
Yeah, the initial implementation would perform a taint analysis for each input/global variable, where taint is the variable's gamma at the start of the procedure. Doing this over every variable tells us precisely which input gammas affect the gamma of each output variable. The current implementation does this more efficiently though, instead working over a powerset lattice to determine the set of input variables which taint an output variable, in a single analysis. I'm not 100% sure what you mean by over-approximates to high, are you referring to how the results can miss cases where a gamma is definitely high depending on the input (For example |
This change improves the performance of variable dependency analysis by breaking the call graph into its strongly connected components using Tarjan's algorithm, and analyzing those individually. We get a reverse topological sort of the components from Tarjan's algorithm, which allows us to analyze our procedures in an order such that we can get "summaries" of the analysis for all procedures called by a particular procedure, which significantly reduces duplicated work. (Unfortunately the cntlm-new binary has a really big strongly connected component in it which makes this analysis still infeasible to run on the whole binary) |
(Credit to @ j-tobler) Ensures clauses might be able to be rewritten as |
if (y != xn) { | ||
for succ <- outdep(n) do propagate(y, succ) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yousifpatti didn't you find that doing this caused some bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused the results of constantprop analysis to not be propagated properly. The effects may not be noticeable because evaluate expression may return TOP and you deal with it the same way. The best way to check is by checking the dot file of constant prop to see if values are being propagated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I remember that made the analyses resolve some indirect calls incorrectly. It would be good if we could figure out how to make this optimisation work but it should be reverted for now.
src/main/scala/util/RunUtils.scala
Outdated
/* | ||
Logger.info("[!] Running Taint Analysis") | ||
val taintResults = TaintAnalysis(IRProgram, specGlobalAddresses, constPropResult, | ||
IRProgram.procedures.foldLeft(Map[CFGPosition, Set[analysis.Taintable]]()) { | ||
(m, p) => m + (p -> Set(analysis.UnknownMemory())) | ||
} | ||
).analyze() | ||
|
||
config.analysisResultsPath.foreach(s => | ||
writeToFile(printAnalysisResults(IRProgram, taintResults), s"${s}_taint$iteration.txt") | ||
) | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add a flag for this if we don't want it enabled by default, or is it not yet working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't any particular reason for commenting this out apart from just not wanting it to run so adding a flag should be fine, would you like that? These taint results were kinda just there for testing and might not mean much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the taint analysis gets run as part of generating the function summaries I don't think it makes sense to have it as a separate thing, so this bit could probably just be removed actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually yeah these results won't be used for anything you're right, this can be removed
// TODO should the stack/link/frame pointers propagate taint? | ||
val variables: Set[analysis.Taintable] = 0.to(28).map { n => | ||
Register(s"R$n", 64) | ||
}.toSet ++ specGlobals.map { g => | ||
analysis.GlobalVariable(dsl.mem, BitVecLiteral(g.address, 64), g.size, g.name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only registers R0-R18 and R30 are allowed under the AArch64 standard to change values as the result of a procedure call. This means that there isn't a need to create summaries for the gamma values for R19-R29 and R31 - we already have those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's helpful, thanks! I was unaware
src/test/scala/SystemTests.scala
Outdated
runTests(incorrectPrograms, incorrectPath, "incorrect", false, false) | ||
runTests(correctPrograms, correctPath, "correct", true, false, false) | ||
runTests(incorrectPrograms, incorrectPath, "incorrect", false, false, false) | ||
runTests(correctWithAnalysisPrograms, correctWithAnalysisPath, "correct", true, false, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and the BAP one) should use "correct-analysis" since that's the name of the folder the tests are in.
There seems to be room for optimisation here in relation to which variables this generates summaries for. In some of the tests, it will generate summaries for global variables even when the procedure does not modify memory at all, or only modifies memory because it calls rely (which just says that it does not modify memory). |
I'm a bit confused about why it even generates summaries for the globals x and y in the get_two procedure in the function_summary test. get_two doesn't at all access those globals, or even memory at all, so why does the taint analysis say they're tainted? |
src/main/scala/util/RunUtils.scala
Outdated
@@ -931,6 +981,8 @@ object RunUtils { | |||
result.reachingDefs, | |||
ctx.program | |||
) | |||
Logger.info("[!] Generating Function Summaries") | |||
modified = modified | IRTransform.generateFunctionSummaries(ctx, ctx.program, result.constPropResult, result.varDepsSummaries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to update modified
here, at least at the moment. The procedure summaries don't have any impact (and can't have any impact) on any of the other analyses right now so there isn't any need to do another iteration.
val tainters = relevantVars.map { | ||
v => (v, Set()) | ||
}.toMap ++ getTainters(procedure, variables ++ rnaResults(procedure.begin) + UnknownMemory()).filter { (variable, taints) => | ||
relevantVars.contains(variable) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining what this represents and how the sets of tainters for each variable have been derived would help.
The summaries for function_summary3 seem incorrect. For f we get:
and for g we get:
Neither of these procedures modify memory at all, so I'm unclear on why summaries for the global variable x are generated. For f, how is it that R1 and R0 are ensured to be low? Shouldn't R1 and R0 both have security levels that are dependent on the input security level of R0? These issues may be related to this example using a non-returning call which is not really properly supported at present. |
You're right in that get_two shouldn't need to say anything about x and y. The reason it mentions those is because x and y are initially marked as tainting themselves, so at the end of the procedure they still taint themselves. It shouldn't be too hard to add a check for modifying memory (hopefully) though
That is concerning, it certainly shouldn't be doing that 😬 I'm confused by how that even passed the test to be perfectly honest... My first guess was that it is something to do with the summaries generated by the strongly connected components, or something like that I'll hopefully find time to work on this soon |
The test passes because the 'assume false' (generated by the non-returning call to g() in f()) allows for any post condition of f() to be verified. That's part of what I mean by non-returning calls not being properly supported at the moment.
Yes, it seems important to handle the case where a variable is not actually modified at all. It is probably difficult to do this robustly for now but at the very least it should be trivial to check whether memory is modified at all, and the memory region analysis should allow this to be improved in the future. |
I'm just going to clean this up and merge it in, putting it behind a flag for now. |
# Conflicts: # src/test/scala/SystemTests.scala
Generates ensures clauses about the gammas of output variables in procedures using a taint analysis (ish). As an example
we know that the return value of this function could be affected by
a
orb
, so we sayensures Gamma_out == Gamma_a || Gamma_out == Gama_b || Gamma_out == true
.This analysis does not currently scale to run on the
cntlm-new
binary. This is probably because of the time complexity of the IDE solver.