Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the project #12

Closed
sabastiaan opened this issue Jul 29, 2020 · 10 comments
Closed

Questions about the project #12

sabastiaan opened this issue Jul 29, 2020 · 10 comments
Labels
question Further information is requested

Comments

@sabastiaan
Copy link

Hi there, I couldn't really find a proper avenue to ask these questions, so sorry for the inconvenience.

How does this project compare to for example XED in terms of completeness and accuracy?

In the readme you included the following:

Resilient - tested against internal fuzzers and the famous mishegos tool.

Mishegos, while great, gives little insight into it's coverage. Did you not encounter any wrongly disassembled instructions while fuzzing, if not how do are you sure that you did miss critical instructions, and if did encounter errors, how should we view it's resilience in light of those results?

@vlutas
Copy link
Collaborator

vlutas commented Jul 29, 2020

Hello,

In terms of completeness, our aim is to support all valid x86/x64 instructions, which (except for possible bugs, which you can never exclude fully) we do - including the latest AMX instruction set.
As far as accuracy goes, our philosophy is simple: if the CPU would execute a given sequence of bytes, bddisasm should be able to decode it correctly; conversely, if the CPU would trigger an encoding exception, bddisasm should not decode that instruction. From this perspective, bddisasm should be similar to Xed - they both support every instruction, they provides the correct output and they provide as much information about the instruction as possible.

Mishegos is a great tool to identify issues which you would not normally see in regular code - a very simple example, the fact that the REX prefix is ignored if it's not preceding the opcode byte, or that segment prefixes (except for fs & gs) are ignored in 64 bit mode - these kind of instructions you would not see in normal code, and only by fuzzing, or by manually generating them would you end up with them.

Did you not encounter any wrongly disassembled instructions while fuzzing

The fact that it was tested against fuzzers, Mishegos and other internal tools makes it resilient; I think it's obvious that bugs were found (and there's always the possibility that bugs still exist - it's software, after all) and fixed - after all, testing and fuzzing is part of our development process.

We we are continuously making sure that bddisasm is properly tested - both manually, and automatically (as you can probably see in https://github.com/bitdefender/bddisasm/tree/master/bddisasm_test). We are also glad to implement any other new ideas you and the community might have, to improve bddisasm and make it as appealing as possible to others!

@sabastiaan
Copy link
Author

sabastiaan commented Jul 29, 2020

Hello,

Thank you for your comment.
I reworked sandsifter to use bddisasm. I will run this for a few nights and report back my findings. Some preliminary results:

0F1A34ADEC000000
ICLASS: NOP   CATEGORY: WIDENOP   EXTENSION: BASE  IFORM: NOP_GPRv_MEMv_0F1A   ISA_SET: PPRO
SHORT: nop esp, dword ptr [rbp*4+0xec]
main.c

int main()
{
     INSTRUX ix;
     unsigned char ins[8] = { 0x0f, 0x1a, 0x34, 0xad, 0xec, 0x00, 0x00, 0x00 };
     NDSTATUS status;
     char txt[ND_MIN_BUF_SIZE];

     status = NdDecodeEx(&ix, ins, sizeof(ins), ND_CODE_64, ND_DATA_64);
     if (!ND_SUCCESS(status))
     {
         printf("Decoding failed with error 0x%08x!\n", status);
         return -1;
     }
    else{
             printf("success!");
        return 0;
    }
}


./a.out 
Decoding failed with error 0x80000005!

@vlutas
Copy link
Collaborator

vlutas commented Jul 29, 2020

That is known (and normal), because bddisasm always decodes instructions as if MPX is on (decoding the same instruction with mpx option in Xed will yield the same result as bddisasm - invalid opcode).
We have plans to handle these special MPX cases, and they will be fixed in a subsequent commit.
LE: This applies to basically all instructions mapped onto the wide NOP space - bddisasm will „optimistically” try to decode as if all extensions are enabled (MPX, CET, CLDEMOTE).

@vlutas
Copy link
Collaborator

vlutas commented Jul 30, 2020

Starting with the latest commit (ed564db), by using the NdDecodeWithContext API, the caller can specify whether to decode MPX/CET/CLDEMOTE instructions or NOPs. The default behavior remains the current one - bddisasm will optimistically decode MPX/CET/CLDEMOTE instructions instead of NOPs, but now you can disable this by using ND_FEAT_NONE in ND_CONTEXT when calling NdDecodeWithContext.

@sabastiaan
Copy link
Author

Thanks for your work that helped a lot!

Does ND_CONTEXT also allow for filtering on CPU Modes and extensions? I saw them being defined in bddisasm.h but couldn't see how to use them.

With this most of the sandsifter output has become either SIGILL caused by invalid operands (but it can still recover what the instruction's length should be) or some SIGTRAP, but after a closer look it seems that there is a bug somewhere since no trap signals are generated when I run those findings in a patched binary.

@vlutas
Copy link
Collaborator

vlutas commented Jul 30, 2020

The ND_CONTEXT only allows you to filter based on mode (16, 32 or 64 bit), preferred vendor and features that are mapped on the NOP space (MPX, CET, CLDEMOTE - the reason for this is that these instructions are NOP, unless the feature is enabled).
However, if you need to do additional tests of validity, you can most certainly do:

  • checking if the instruction is valid in any given mode can be done using the INSTRUX.ValidModes field - for example, if you wish to check if the instruction ix is valid in ring 3, you would simply do: if (ix.ValidModes.Ring3)
  • checking for extensions can be done by inspecting the INSTRUX.IsaSet or INSTRUX.Category - for example, checking if an instruction ix belongs to the AVX512F you would do: if (ix.IsaSet == ND_SET_AVX512F)
  • checking if an instruction is supported or not on a given CPU can be done simply because the INSTRUX structure contains the CPUID leaf that must be inspected to determine support.

Thanks again for reaching out! If there are any more questions, we are glad to address them.

@sabastiaan
Copy link
Author

Some minor things I came across while trying to understand the projects a bit more:
For building the disasmtool_lix tool, what is the proper way to deal with rapidjson? I installed it into /usr/local/include however with the current CMake script it won't pick this up. I did get it to work with using a custom RapidJSONConfig.cmake, but feel there might be a better way.

Is there a difference between nd_decode and nd_decode_ex in pydism, since these are identical?

Are there differences feature-wise between disasmtool and disasmtool_lix?

@ianichitei
Copy link
Contributor

ianichitei commented Aug 3, 2020

For building the disasmtool_lix tool, what is the proper way to deal with rapidjson? I installed it into /usr/local/include however with the current CMake script it won't pick this up. I did get it to work with using a custom RapidJSONConfig.cmake, but feel there might be a better way.

I build and install rapidjson locally exactly like it is done for GitHub actions (see Build dependencies). On my machine, this installs the headers in /usr/local/include, and copies RapidJSON.pc to /usr/local/lib/pkgconfig/.

@ianichitei ianichitei added the question Further information is requested label Aug 5, 2020
@vlutas
Copy link
Collaborator

vlutas commented Aug 5, 2020

Is there a difference between nd_decode and nd_decode_ex in pydism, since these are identical?

No, there is no functional difference.

Are there differences feature-wise between disasmtool and disasmtool_lix?

Right now, disasmtool is more advanced compared to disasmtool_lix, but we plan to align the features (or perhaps even create a single tool) to eliminate differences.

@vlutas
Copy link
Collaborator

vlutas commented Aug 6, 2020

@sabastiaan , we have a public Slack - you can join here: https://kvm-vmi.herokuapp.com; there is a public bddisasm channel on that Slack, so we'd be glad if you'd join, and post your questions and feedback there.

In the meantime, I will close this, as there is no outstanding issue attached to this for the moment.

@vlutas vlutas closed this as completed Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants