-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microblogging: From an obfuscated function to a synthesized LLVM IR #1078
Comments
Here is a project that also implements Triton -> LLVM: https://github.com/fvrmatteo/TritonASTLLVMIRTranslator As a sidenote, you can do the CMake a bit nicer since "recent" LLVM versions (10.0 or something): https://github.com/LLVMParty/LLVMCMakeTemplate/blob/master/cmake/LLVM.cmake The repository is using Hunter, but this is totally optional. Simply use include(LLVM.cmake)
target_link_libraries(myTarget PRIVATE
LLVM-Wrapper
) This is the 'idiomatic' CMake and it also works nicely on Windows. |
And the reverse transformation too (LLVM-IR -> TritonAST), which I found really useful to obtain more optimised ASTs to be propagated during the exploration or simplified ASTs to be fed to the SMT solver to crunch some opaque predicates! |
@fvrmatteo, let's do that then :). @mrexodia thanks mate 👍 |
It could be interesting to have an API that takes as input an already existing LLVM module, and lift the expression into a function with a user-provided name. It can help merge multiple liftings easily, other wise you need a potentially expensive cross-module copy of functions, which could be a performance bottleneck in some cases. |
I'm on it mamène. But the "with a user-provided name" is interesting. I will add it! Cheers mate ! |
We also renamed some files: * src/libtriton/ast/bitwuzla/tritonToBitwuzlaAst.cpp -> src/libtriton/ast/bitwuzla/tritonToBitwuzla.cpp * src/libtriton/ast/z3/tritonToZ3Ast.cpp -> src/libtriton/ast/z3/tritonToZ3.cpp * src/libtriton/ast/z3/z3ToTritonAst.cpp -> src/libtriton/ast/z3/z3ToTriton.cpp * src/libtriton/includes/triton/tritonToBitwuzlaAst.hpp -> src/libtriton/includes/triton/tritonToBitwuzla.hpp * src/libtriton/includes/triton/tritonToZ3Ast.hpp -> src/libtriton/includes/triton/tritonToZ3.hpp * src/libtriton/includes/triton/z3ToTritonAst.hpp -> src/libtriton/includes/triton/z3ToTriton.hpp As well as classes: * TritonToZ3Ast -> TritonToZ3 * Z3ToTritonAst -> Z3ToTriton * TritonToBitwuzlaAst -> TritonToBitwuzla
@fvrmatteo, @mrexodia, @aguinet, here we are: /* Init Triton */
triton::API ctx(triton::arch::ARCH_X86_64);
auto actx = ctx.getAstContext();
/* Triton to LLVM */
llvm::LLVMContext llvmContext;
triton::ast::TritonToLLVM ttllvm(llvmContext);
auto llvmModule = ttllvm.convert(node); /* llvm::Module */
/* LLVM to Triton */
triton::ast::LLVMToTriton llvmtt(actx);
auto node = llvmtt.convert(llvmModule.get()); /* Triton AST */ Both //! LLVM to Triton
TRITON_EXPORT triton::ast::SharedAbstractNode convert(llvm::Module* llvmModule, const std::string& fname="__triton");
//! Triton to LLVM
TRITON_EXPORT std::shared_ptr<llvm::Module> convert(const triton::ast::SharedAbstractNode& node, const std::string& fname="__triton"); Code here |
We also renamed some files: * src/libtriton/ast/bitwuzla/tritonToBitwuzlaAst.cpp -> src/libtriton/ast/bitwuzla/tritonToBitwuzla.cpp * src/libtriton/ast/z3/tritonToZ3Ast.cpp -> src/libtriton/ast/z3/tritonToZ3.cpp * src/libtriton/ast/z3/z3ToTritonAst.cpp -> src/libtriton/ast/z3/z3ToTriton.cpp * src/libtriton/includes/triton/tritonToBitwuzlaAst.hpp -> src/libtriton/includes/triton/tritonToBitwuzla.hpp * src/libtriton/includes/triton/tritonToZ3Ast.hpp -> src/libtriton/includes/triton/tritonToZ3.hpp * src/libtriton/includes/triton/z3ToTritonAst.hpp -> src/libtriton/includes/triton/z3ToTriton.hpp As well as classes: * TritonToZ3Ast -> TritonToZ3 * Z3ToTritonAst -> Z3ToTriton * TritonToBitwuzlaAst -> TritonToBitwuzla
Nice djo :) |
Is there a particular reason why the lifting to an LLVM in module is limited to a single node (if I'm understanding this correctly)? This is quite limiting and can make sense in the given example, but for instance with the provided function above as a test case, why not be able to lift a list of nodes to an LLVM module instead? |
Introduction
Software are getting more and more complex to analyze, they are bigger and better protected than years ago. So tools must follow this trend and be adapted to provide features that can deal with binaries as smoothly as possible. This is why I've started the Triton project 7 years ago, it is like a Swiss Army knife with one more feature today: the possibility to lift from the Triton AST to the LLVM IR.
Why LLVM IR?
LLVM is a compiler infrastructure which relies on its own IR [0] and provides so many tools and features for code optimization. Code optimizations are useful for deobfuscate parts of binary code and thus break some software protections [1]. Several tools already exist to lift binary code to LLVM IR [2, 3, 4, 5, 6, 7, 8, 9].
Unlike most of binary analysis tools, Triton is a bit different as it works on a dynamic paradigm, it represents the data flow of an execution on its own structured representation and provides some optimizations on it. These optimizations are possible as we can extract concrete information from the execution. For example, we can extract runtime values to simplify the path predicate built by the symbolic engine (useful when attacking virtual-based protection [1]). Second example, last week we introduced another optimization to synthesize obfuscated expressions and thus break MBA [10]. However, optimizations are always hard to develop and are a real academic fields. So what better than enjoying all things already done by the LLVM community on that part! Thus we can combine our optimizations from a dynamic paradigm plus compiler optimizations!
Another point is that when we simplified obfuscated code, in some scenarios it can be useful to translate back the Triton AST to binary code in order to rebuild an unprotected binary. This is the topic of the today.
All these arguments lead us to provide news features (commit: aa1dbb5).
Lifting engines
News classes are born:
LiftingEngine
,LiftingToLLVM
,LiftingToPython
,LiftingToSMT
andTritonToLLVM
.Lifting the Triton AST to Python and SMT files already existed, but were refactored into new classes. The new feature is the classes
LiftingToLLVM
andTritonToLLVM
.TritonToLLVM
converts atriton::ast::SharedAbstractNode
to allvm::Module
and can be used as standalone class. This class does not alter your current analysis state. For example, on C++ you have something like this:Then, once the
llvm::Module
got, feel free to use the power of the LLVM back-end. The classLiftingToLLVM
allows us to stream thellvm::Module
into astd::ostream
. For example on Python you may have something like below. All your symbolic variables involved in theecx
expression will be passed as an argument to an LLVM-IR function so thatecx = __triton(a, b)
Concrete example
Let's consider an obfuscated function that takes 2 arguments that the user can control. After a reverse engineering phase, we now that the function wants to hide the computation of those two arguments using MBA. So we:
AST_OPTIMIZATIONS
mode to perform classical AST optimizations during the runtimeeax
)The script is the following:
After synthesizing the data flow we know that the computation is
((a + ((a * b) * b)) + 0x1)
and the lifting to LLVM IR is the following:Now we are free to compile this function and to inject opcode into IDA to replace the obfuscated function by a clearer version. Note that at the first shot it may looks as a trivial simplification but without our synthesis implementation the obfuscated computation is the following:
In this case, it is not useful to use LLVM optimizations to improve the output as we fully synthesized it, but at least we can re-inject opcodes into IDA and continue the reverse engineering of the target. However, sometime LLVM optimizations are benefical [01] =).
References
[00] https://llvm.org/docs/LangRef.html
[01] https://github.com/JonathanSalwan/Tigress_protection
[02] https://github.com/lifting-bits/mcsema
[03] https://github.com/avast/retdec
[04] https://github.com/GaloisInc/reopt
[05] https://github.com/revng/revng
[06] https://github.com/cojocar/bin2llvm
[07] https://github.com/zneak/fcd
[08] https://github.com/draperlaboratory/fracture
[09] https://github.com/pgoodman/libbeauty
[10] https://tel.archives-ouvertes.fr/tel-01623849/document
The text was updated successfully, but these errors were encountered: