Extract engines into external libs #226

JonathanSalwan · 2015-12-07T19:46:30Z

Useful to plug any DBI engines.

External libs

Symbolic and Solver Engines (merged into one lib)
Taint Engine
SMT2-Lib (included bindings)

Pin core

Snapshot ('cause it's related to Pin's internals)
Python bindings

JonathanSalwan · 2015-12-08T13:03:01Z

After a brainstorming the split will look like this:

External lib

libtriton.{so/dll/dylib} (32 and 64-bits)

The libtriton includes:

Symbolic engine
Taint engine
smt2-lib
solver interface
API (C++)
Bindings Python plugged onto the C++ API

libtriton's namespaces

Basically, the library will look like this:

namespace triton
namespace triton::api
namespace triton::engines
namespace triton::engines::dynamic::taint
namespace triton::engines::dynamic::symbolic
namespace triton::engines::dynamic::symbolic::optimization
namespace triton::semantics::x86 (includes 32 and 64-bits)
namespace triton::interface
namespace triton::interface::solvers
namespace triton::interface::solvers::z3
namespace triton::smt2lib
namespace triton::bindings
namespace triton::bindings::python

Triton-Pin

The current pintool but lightened

The pintool will extract information used by the libtriton. Basically, a structure must be filled with appropriate information like the number and the type of the operands, opcodes, memory access etc. Then, this structure will be send to the libtriton which will perform the same thing that it's already present in the up-to-date source code.

Why this big step forward?

In several cases, we have to perform analysis on a trace. This trace may come from everywhere (not necessarily Pin-based only). For example, this trace may come from a database (offline analysis), another DBI engine like dynamorio or emulators like qemu, bochs and medusa. Extract all Triton features into an external lib (independent from any trace extractor) allows to the user to plug Triton everywhere. This is also a good way to support more architectures like ARM.

gitttt · 2015-12-22T13:58:26Z

Sounds great. I do not understand the last sentense though. Could you explan in a few words what needed to be done in order to add arm support ? Trace recording plus the translation of the trace into a format which the libtriton understands? Or will triton Be able to do the arm translation?

JonathanSalwan · 2015-12-22T14:21:27Z

Hey,

Trace recording plus the translation of the trace into a format which the libtriton understands?

Yes.

Or will triton Be able to do the arm translation?

Nop.

In fact, the new design allow us to plug any arch. Currently, we will release only the x86 semantics but you can easily add ARM semantics if you want.

The new design looks like this:

$ tree src
src
├── api
│   └── api.cpp
├── arch
│   ├── architecture.cpp
│   ├── bitsVector.cpp
│   ├── immediateOperand.cpp
│   ├── instruction.cpp
│   ├── memoryOperand.cpp
│   ├── operandWrapper.cpp
│   ├── registerOperand.cpp
│   └── x86
│       ├── x8664Cpu.cpp
│       ├── x86Cpu.cpp
│       ├── x86Semantics.cpp
│       └── x86Specifications.cpp
├── bindings
│   └── python
│       ├── init.cpp
│       ├── modules
│       │   ├── smt2libCallbacks.cpp
│       │   └── tritonCallbacks.cpp
│       ├── namespaces
│       │   ├── initArchNamespace.cpp
│       │   ├── initCpuSizeNamespace.cpp
│       │   ├── initOperandNamespace.cpp
│       │   ├── initRegNamespace.cpp
│       │   ├── initSmtAstNodeNamespace.cpp
│       │   └── initX86OpcodesNamespace.cpp
│       ├── objects
│       │   ├── PyBitvector.cpp
│       │   ├── PyImmediate.cpp
│       │   ├── PyInstruction.cpp
│       │   ├── PyMemory.cpp
│       │   ├── PyRegister.cpp
│       │   └── PySmtAstNode.cpp
│       ├── pyXFunctions.cpp
│       └── utils.cpp
├── ctx
│   └── context.cpp
├── engines
│   ├── symbolic
│   │   ├── symbolicEngine.cpp
│   │   ├── symbolicExpression.cpp
│   │   └── symbolicVariable.cpp
│   └── taint
│       └── taintEngine.cpp
├── includes
│   ├── *.hpp
├── os
│   └── unix
│       ├── syscallNumberToString.cpp
│       └── syscalls.cpp
└── smt2lib
    └── smt2lib.cpp

As you can see, there is a x86 repository which contains all information about the arch's spec. The file x86Semantics.cpp contains x86 semantics based on the SMT2-Lib format. Example with the XOR instruction:

namespace triton {
  namespace arch {
    namespace x86 {
      namespace semantics {

        [...]

        void xor_s(triton::arch::Instruction& inst) {
          auto dst = inst.operands[0];
          auto src = inst.operands[1];
          auto op1 = api.buildSymbolicOperand(dst);
          auto op2 = api.buildSymbolicOperand(src);

          auto node = smt2lib::bvxor(op1, op2);

          auto expr = api.createSymbolicExpression(inst, node, dst);

          api.taintUnion(expr, dst, src);

          clearFlag(inst, ID_TMP_CF, "Clears carry flag");
          clearFlag(inst, ID_TMP_OF, "Clears overflow flag");
          pf(inst, expr, dst);
          sf(inst, expr, dst);
          zf(inst, expr, dst);
        }

        [...]

      }; /* semantics namespace */
    }; /* x86 namespace */
  }; /* arch namespace */
}; /* triton namespace */

If you want to add the ARM semantics, you have to keep the same structure as x86 (when the v0.3 will be released, I will write a blog post about that).

Then, as you said, with the new design, you can plug any tracer to extract semantics (online, offline or whatever).

An offline example:

import  sys
from    triton import *


trace = [
    (0x400000, "\x48\x8b\x05\xb8\x13\x00\x00"), # mov        rax, QWORD PTR [rip+0x13b8]
    (0x400007, "\x48\x8d\x34\xc3"),             # lea        rsi, [rbx+rax*8]
    (0x40000b, "\x67\x48\x8D\x74\xC3\x0A"),     # lea        rsi, [ebx+eax*8+0xa]
    (0x400011, "\x66\x0F\xD7\xD1"),             # pmovmskb   edx, xmm1
    (0x400015, "\x89\xd0"),                     # mov        eax, edx
    (0x400017, "\x80\xf4\x99"),                 # xor        ah, 0x99
]


if __name__ == '__main__':

    #Set the arch
    setArchitecture(ARCH.X86_64)

    for (addr, opcodes) in trace:
        # Build an instruction
        inst = Instruction()

        # Setup opcodes
        inst.setOpcodes(opcodes)

        # Setup Address
        inst.setAddress(addr)

        # optional - Update the register state at each program point
        inst.updateContext(Register(REG.RAX, 0x4444444455555555));
        inst.updateContext(Register(REG.RBX, 0x1111111122222222));

        # optional - Add memory access <addr, size, content> at each program point
        inst.updateContext(Memory(0x66666666, 4, 0x31323334));

        # Process everything. Builds IR and updates engines.
        processing(inst)

    sys.exit(0)

As you can imagine, the new design allows you to use any tracer like:

Pin / DynamoRIO
Unicore / Qemu / Bochs
IDA's debugger(s)
A database, a file or whatever...

I do my best to release the v0.3 as soon as possible. I hope it will be available in January 2016.

Cheers,

gitttt · 2015-12-23T12:39:12Z

Thank you for your answer and for taking the time to add a more detailed explanation. At the bottom of the last code snippet your comment states "Builds ir". What do you meant by that? Given the new upcoming design, what do you think of the idea of supporting ir traces, i.e., a trace already translated to e.g. REIL or vex? Do you see any particular challenges or disadvantages? An obvious advantage would be the support of arm and other platforms with a one-time effort.

JonathanSalwan · 2015-12-23T13:07:00Z

What do you meant by that?

The SMT2-Lib representation.

i.e., a trace translates to REIL or vex? Do you see any particular challenges or disadvantages?

Disadvantages more than challenges. I'm not really convinced by REIL, VEX, BAP or whatever. Intermediate Representation are useful for generic analysis. In the case of Triton, what we want is to represent the control flow with a symbolic representation, solve constraints and perform simplifications.

In order to do that, the best representation is the SMT2-Lib.

Why the SMT2-Lib? Because it's an international initiative aimed at facilitating research and development in Satisfiability Modulo Theories. Which means that we can use any SAT/SMT solvers which support this format.

If we use another IR (REIL, VEX or BAP), the translation's chain looks like this:

ASM -> IR-1 (REIL, vex or BAP) -> IR-2 (SMT2-Lib)

Obviously, it's less performant than:

ASM -> IR-2 (SMT2-Lib)

An obvious advantage would be the support of arm and other platforms with a one-time effort.

It's already as one-time effort ASM -> SMT2-Lib like they did with x86 -> VEX or arm -> VEX.

You should consider SMT2-Lib like another IR and like every IR, there is no magic, we must convert each instruction semantics into the targeted representation :).

JonathanSalwan · 2016-01-05T12:37:10Z

Hey folks,

The v0.3 is still in progress (soon done) there is still the x86 semantics to develop and the Pin tracer tool. The Triton library will be separated from the Pin tool.

The new design looks like this:

JonathanSalwan · 2016-01-18T10:25:16Z

The development of the v0.3 is now available on a new branch. A doxygen is also available.

JonathanSalwan added the Core label Dec 7, 2015

JonathanSalwan self-assigned this Dec 7, 2015

JonathanSalwan added this to the v0.3 milestone Dec 7, 2015

JonathanSalwan added the Enhancement label Dec 7, 2015

JonathanSalwan mentioned this issue Jan 5, 2016

Enable / disable the symbolic and taint engine. #242

Closed

This was referenced Jan 5, 2016

Performs symbolic execution only on tainted instructions #215

Closed

Undefined symbols for architecture x86_64: #243

Closed

Flags error when a [15:8] access occurs #185

Closed

JonathanSalwan mentioned this issue Jan 12, 2016

PIN / application-level instrumentation vs. QEMU / system-level emulation #245

Closed

JonathanSalwan closed this as completed Jan 26, 2016

pfsun mentioned this issue Dec 12, 2016

About ARM Support #420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract engines into external libs #226

Extract engines into external libs #226

JonathanSalwan commented Dec 7, 2015

JonathanSalwan commented Dec 8, 2015

gitttt commented Dec 22, 2015

JonathanSalwan commented Dec 22, 2015

gitttt commented Dec 23, 2015

JonathanSalwan commented Dec 23, 2015

JonathanSalwan commented Jan 5, 2016

JonathanSalwan commented Jan 18, 2016

Extract engines into external libs #226

Extract engines into external libs #226

Comments

JonathanSalwan commented Dec 7, 2015

JonathanSalwan commented Dec 8, 2015

gitttt commented Dec 22, 2015

JonathanSalwan commented Dec 22, 2015

gitttt commented Dec 23, 2015

JonathanSalwan commented Dec 23, 2015

JonathanSalwan commented Jan 5, 2016

JonathanSalwan commented Jan 18, 2016