Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract engines into external libs #226

Closed
JonathanSalwan opened this issue Dec 7, 2015 · 7 comments
Closed

Extract engines into external libs #226

JonathanSalwan opened this issue Dec 7, 2015 · 7 comments
Assignees
Milestone

Comments

@JonathanSalwan
Copy link
Owner

Useful to plug any DBI engines.

External libs

  • Symbolic and Solver Engines (merged into one lib)
  • Taint Engine
  • SMT2-Lib (included bindings)

Pin core

  • Snapshot ('cause it's related to Pin's internals)
  • Python bindings
@JonathanSalwan JonathanSalwan self-assigned this Dec 7, 2015
@JonathanSalwan JonathanSalwan added this to the v0.3 milestone Dec 7, 2015
@JonathanSalwan
Copy link
Owner Author

After a brainstorming the split will look like this:

External lib

  • libtriton.{so/dll/dylib} (32 and 64-bits)

The libtriton includes:

  • Symbolic engine
  • Taint engine
  • smt2-lib
  • solver interface
  • API (C++)
  • Bindings Python plugged onto the C++ API

libtriton's namespaces

Basically, the library will look like this:

namespace triton
namespace triton::api
namespace triton::engines
namespace triton::engines::dynamic::taint
namespace triton::engines::dynamic::symbolic
namespace triton::engines::dynamic::symbolic::optimization
namespace triton::semantics::x86 (includes 32 and 64-bits)
namespace triton::interface
namespace triton::interface::solvers
namespace triton::interface::solvers::z3
namespace triton::smt2lib
namespace triton::bindings
namespace triton::bindings::python

Triton-Pin

  • The current pintool but lightened

The pintool will extract information used by the libtriton. Basically, a structure must be filled with appropriate information like the number and the type of the operands, opcodes, memory access etc. Then, this structure will be send to the libtriton which will perform the same thing that it's already present in the up-to-date source code.

Why this big step forward?

In several cases, we have to perform analysis on a trace. This trace may come from everywhere (not necessarily Pin-based only). For example, this trace may come from a database (offline analysis), another DBI engine like dynamorio or emulators like qemu, bochs and medusa. Extract all Triton features into an external lib (independent from any trace extractor) allows to the user to plug Triton everywhere. This is also a good way to support more architectures like ARM.

@gitttt
Copy link

gitttt commented Dec 22, 2015

Sounds great. I do not understand the last sentense though. Could you explan in a few words what needed to be done in order to add arm support ? Trace recording plus the translation of the trace into a format which the libtriton understands? Or will triton Be able to do the arm translation?

@JonathanSalwan
Copy link
Owner Author

Hey,

Trace recording plus the translation of the trace into a format which the libtriton understands?

Yes.

Or will triton Be able to do the arm translation?

Nop.

In fact, the new design allow us to plug any arch. Currently, we will release only the x86 semantics but you can easily add ARM semantics if you want.

The new design looks like this:

$ tree src
src
├── api
│   └── api.cpp
├── arch
│   ├── architecture.cpp
│   ├── bitsVector.cpp
│   ├── immediateOperand.cpp
│   ├── instruction.cpp
│   ├── memoryOperand.cpp
│   ├── operandWrapper.cpp
│   ├── registerOperand.cpp
│   └── x86
│       ├── x8664Cpu.cpp
│       ├── x86Cpu.cpp
│       ├── x86Semantics.cpp
│       └── x86Specifications.cpp
├── bindings
│   └── python
│       ├── init.cpp
│       ├── modules
│       │   ├── smt2libCallbacks.cpp
│       │   └── tritonCallbacks.cpp
│       ├── namespaces
│       │   ├── initArchNamespace.cpp
│       │   ├── initCpuSizeNamespace.cpp
│       │   ├── initOperandNamespace.cpp
│       │   ├── initRegNamespace.cpp
│       │   ├── initSmtAstNodeNamespace.cpp
│       │   └── initX86OpcodesNamespace.cpp
│       ├── objects
│       │   ├── PyBitvector.cpp
│       │   ├── PyImmediate.cpp
│       │   ├── PyInstruction.cpp
│       │   ├── PyMemory.cpp
│       │   ├── PyRegister.cpp
│       │   └── PySmtAstNode.cpp
│       ├── pyXFunctions.cpp
│       └── utils.cpp
├── ctx
│   └── context.cpp
├── engines
│   ├── symbolic
│   │   ├── symbolicEngine.cpp
│   │   ├── symbolicExpression.cpp
│   │   └── symbolicVariable.cpp
│   └── taint
│       └── taintEngine.cpp
├── includes
│   ├── *.hpp
├── os
│   └── unix
│       ├── syscallNumberToString.cpp
│       └── syscalls.cpp
└── smt2lib
    └── smt2lib.cpp

As you can see, there is a x86 repository which contains all information about the arch's spec. The file x86Semantics.cpp contains x86 semantics based on the SMT2-Lib format. Example with the XOR instruction:

namespace triton {
  namespace arch {
    namespace x86 {
      namespace semantics {

        [...]

        void xor_s(triton::arch::Instruction& inst) {
          auto dst = inst.operands[0];
          auto src = inst.operands[1];
          auto op1 = api.buildSymbolicOperand(dst);
          auto op2 = api.buildSymbolicOperand(src);

          auto node = smt2lib::bvxor(op1, op2);

          auto expr = api.createSymbolicExpression(inst, node, dst);

          api.taintUnion(expr, dst, src);

          clearFlag(inst, ID_TMP_CF, "Clears carry flag");
          clearFlag(inst, ID_TMP_OF, "Clears overflow flag");
          pf(inst, expr, dst);
          sf(inst, expr, dst);
          zf(inst, expr, dst);
        }

        [...]

      }; /* semantics namespace */
    }; /* x86 namespace */
  }; /* arch namespace */
}; /* triton namespace */

If you want to add the ARM semantics, you have to keep the same structure as x86 (when the v0.3 will be released, I will write a blog post about that).

Then, as you said, with the new design, you can plug any tracer to extract semantics (online, offline or whatever).

An offline example:

import  sys
from    triton import *


trace = [
    (0x400000, "\x48\x8b\x05\xb8\x13\x00\x00"), # mov        rax, QWORD PTR [rip+0x13b8]
    (0x400007, "\x48\x8d\x34\xc3"),             # lea        rsi, [rbx+rax*8]
    (0x40000b, "\x67\x48\x8D\x74\xC3\x0A"),     # lea        rsi, [ebx+eax*8+0xa]
    (0x400011, "\x66\x0F\xD7\xD1"),             # pmovmskb   edx, xmm1
    (0x400015, "\x89\xd0"),                     # mov        eax, edx
    (0x400017, "\x80\xf4\x99"),                 # xor        ah, 0x99
]


if __name__ == '__main__':

    #Set the arch
    setArchitecture(ARCH.X86_64)

    for (addr, opcodes) in trace:
        # Build an instruction
        inst = Instruction()

        # Setup opcodes
        inst.setOpcodes(opcodes)

        # Setup Address
        inst.setAddress(addr)

        # optional - Update the register state at each program point
        inst.updateContext(Register(REG.RAX, 0x4444444455555555));
        inst.updateContext(Register(REG.RBX, 0x1111111122222222));

        # optional - Add memory access <addr, size, content> at each program point
        inst.updateContext(Memory(0x66666666, 4, 0x31323334));

        # Process everything. Builds IR and updates engines.
        processing(inst)

    sys.exit(0)

As you can imagine, the new design allows you to use any tracer like:

  • Pin / DynamoRIO
  • Unicore / Qemu / Bochs
  • IDA's debugger(s)
  • A database, a file or whatever...

I do my best to release the v0.3 as soon as possible. I hope it will be available in January 2016.

Cheers,

@gitttt
Copy link

gitttt commented Dec 23, 2015

Thank you for your answer and for taking the time to add a more detailed explanation. At the bottom of the last code snippet your comment states "Builds ir". What do you meant by that? Given the new upcoming design, what do you think of the idea of supporting ir traces, i.e., a trace already translated to e.g. REIL or vex? Do you see any particular challenges or disadvantages? An obvious advantage would be the support of arm and other platforms with a one-time effort.

@JonathanSalwan
Copy link
Owner Author

What do you meant by that?

The SMT2-Lib representation.

i.e., a trace translates to REIL or vex? Do you see any particular challenges or disadvantages?

Disadvantages more than challenges. I'm not really convinced by REIL, VEX, BAP or whatever. Intermediate Representation are useful for generic analysis. In the case of Triton, what we want is to represent the control flow with a symbolic representation, solve constraints and perform simplifications.

In order to do that, the best representation is the SMT2-Lib.

Why the SMT2-Lib? Because it's an international initiative aimed at facilitating research and development in Satisfiability Modulo Theories. Which means that we can use any SAT/SMT solvers which support this format.

If we use another IR (REIL, VEX or BAP), the translation's chain looks like this:

ASM -> IR-1 (REIL, vex or BAP) -> IR-2 (SMT2-Lib)

Obviously, it's less performant than:

ASM -> IR-2 (SMT2-Lib)

An obvious advantage would be the support of arm and other platforms with a one-time effort.

It's already as one-time effort ASM -> SMT2-Lib like they did with x86 -> VEX or arm -> VEX.

You should consider SMT2-Lib like another IR and like every IR, there is no magic, we must convert each instruction semantics into the targeted representation :).

@JonathanSalwan
Copy link
Owner Author

Hey folks,

The v0.3 is still in progress (soon done) there is still the x86 semantics to develop and the Pin tracer tool. The Triton library will be separated from the Pin tool.

The new design looks like this:

libTriton v0.3

@JonathanSalwan
Copy link
Owner Author

The development of the v0.3 is now available on a new branch. A doxygen is also available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants