integrating byteweight into bap. #22

ivg · 2014-11-26T14:22:27Z

So, we have byteweight merged into bap, but there're few issues, we need to discuss. We should understand, that currently it is mostly not a part of bap, but more a demo application. That's not bad, but it is not enough.
What we should do next, is to split it into a library/application parts. So that we can grab some neat stuff from byteweight, so that it can be used inside bap itself. Also, we need to make a plugin of byteweight. But before doing this we should figure out what kind of service does it provide. Currently in BAP there is only one service named bap.image that provides facilities to load and parse binary files. So it is time to add new service. Now we should try to figure out an interface of the service. Indeed, we need to figure out two interfaces, one for backend (i.e., service provider) and other for the frontend (service itself) (cf., elf_backend and image). So, lets start from the frontend. Two variants came to my mind: something like function start identifier (FSI) or function boundaries identifier (FBI). Currently, only dwarf can provide the latter. But since dwarf can be used in real conditions we can forget about it. Also we have elf itself, that can provide some useful information even for stripped binary. But afaik it can also provide only function starts (correct me if I'm wrong, but all the we can rely is dynsym table coupled with relocation table, and they give us only starting locations). So, my idea is, instead of starting with FBI and then downcasting it to FSI we should start with the latter. Another question is symbol names. I thing that function boundaries and function names are orthogonal ideas, and shouldn't be mixed. It would be a better idea to have a separate service, that will resolve names. So back to FSI. What this service actually can provide is the predicate over binary, that marks certain addresses as starts of functions, that gives us image -> addr seq or mem -> arch -> addr seq. The problem with this interfaces, is that it doesn't grant any access to file metainformation, so we can't implement any providers, that rely on this (like dwarf, or elf). That means, that FSI backend should work on a lower level, it should work directly with file, so we came out with Bigstring.t -> arch -> addr seq. Also, having in mind some other possible backend implementations, like based on llvm code, we can make it even a little bit more low-level:
Bigstring.t -> arch -> addr -> bool. So, I'm eager to hear others. Everyone is welcome.

The text was updated successfully, but these errors were encountered:

merged, but create #22

dbrumley · 2014-11-26T15:15:37Z

I agree with Ivan. In addition, we should think about how, architecture-wise, we want to split the training and classification part. Anyone using byteweight will probably want to do their own training, test accuracy (against symbols), and so on. Once they have a trie, they will want to use it. So one question is in the library, which tree do we load? Do we make someone specify, or is there a default? These are questions you two should resolve as well.

ivg · 2014-11-26T15:27:27Z

train is a program that we already have. It can be called to obtain signatures (it is not yet added to oasis, so that it wouldn't build automatically, see #23, but we can assume that it is already added).
And currently, byteweight comes as an executable, it doesn't provide a library level interface. We decided to move in a small steps: first to make it work as it is, and then split into parts, refactoring something useful. For example, I'm tempting to grab trie implemetation to Bap_types.

tiffanyb · 2014-11-26T15:37:42Z

I agree that we should split bw to application and libraries. In terms of
customized signature file, I propose that we support it in application but
not in library. This is because as one of the libraries in BAP, we only use
the signature that BAP generates. In this case one can consider BAP as a
user of bw. Similarly for training, I think we should regard it as an
application as well.
On Nov 26, 2014 10:27 AM, "Ivan Gotovchits" notifications@github.com
wrote:

train is a program that we already have. It can be called to obtain
signatures (it is not yet added to oasis, so that it wouldn't build
automatically, see #23
#23, but we can
assume that it is already added).
And currently, byteweight comes as an executable, it doesn't provide a
library level interface. We decided to move in a small steps: first to make
it work as it is, and then split into parts, refactoring something useful.
For example, I'm tempting to grab trie implemetation to Bap_types.

—
Reply to this email directly or view it on GitHub
#22 (comment)
.

ivg added enhancement help wanted labels Nov 26, 2014

ivg added a commit that referenced this issue Nov 26, 2014

Merge pull request #21 from tiffanyb/tables-review

24d9768

merged, but create #22

ivg mentioned this issue Nov 26, 2014

Add ByteWeight #21

Merged

ivg closed this as completed Feb 27, 2015

DukMastaaa mentioned this issue Jul 8, 2022

Correct LD2 instruction UQ-PAC/bap#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrating byteweight into bap. #22

integrating byteweight into bap. #22

ivg commented Nov 26, 2014

dbrumley commented Nov 26, 2014

ivg commented Nov 26, 2014

tiffanyb commented Nov 26, 2014

integrating byteweight into bap. #22

integrating byteweight into bap. #22

Comments

ivg commented Nov 26, 2014

dbrumley commented Nov 26, 2014

ivg commented Nov 26, 2014

tiffanyb commented Nov 26, 2014