-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrating byteweight into bap. #22
Comments
I agree with Ivan. In addition, we should think about how, architecture-wise, we want to split the training and classification part. Anyone using byteweight will probably want to do their own training, test accuracy (against symbols), and so on. Once they have a trie, they will want to use it. So one question is in the library, which tree do we load? Do we make someone specify, or is there a default? These are questions you two should resolve as well. |
|
I agree that we should split bw to application and libraries. In terms of
|
So, we have
byteweight
merged intobap
, but there're few issues, we need to discuss. We should understand, that currently it is mostly not a part of bap, but more a demo application. That's not bad, but it is not enough.What we should do next, is to split it into a library/application parts. So that we can grab some neat stuff from byteweight, so that it can be used inside
bap
itself. Also, we need to make a plugin ofbyteweight
. But before doing this we should figure out what kind of service does it provide. Currently in BAP there is only one service namedbap.image
that provides facilities to load and parse binary files. So it is time to add new service. Now we should try to figure out an interface of the service. Indeed, we need to figure out two interfaces, one for backend (i.e., service provider) and other for the frontend (service itself) (cf., elf_backend and image). So, lets start from the frontend. Two variants came to my mind: something like function start identifier (FSI) or function boundaries identifier (FBI). Currently, only dwarf can provide the latter. But since dwarf can be used in real conditions we can forget about it. Also we have elf itself, that can provide some useful information even for stripped binary. But afaik it can also provide only function starts (correct me if I'm wrong, but all the we can rely is dynsym table coupled with relocation table, and they give us only starting locations). So, my idea is, instead of starting withFBI
and then downcasting it toFSI
we should start with the latter. Another question is symbol names. I thing that function boundaries and function names are orthogonal ideas, and shouldn't be mixed. It would be a better idea to have a separate service, that will resolve names. So back to FSI. What this service actually can provide is the predicate over binary, that marks certain addresses as starts of functions, that gives usimage -> addr seq
ormem -> arch -> addr seq
. The problem with this interfaces, is that it doesn't grant any access to file metainformation, so we can't implement any providers, that rely on this (like dwarf, or elf). That means, that FSI backend should work on a lower level, it should work directly with file, so we came out withBigstring.t -> arch -> addr seq
. Also, having in mind some other possible backend implementations, like based on llvm code, we can make it even a little bit more low-level:Bigstring.t -> arch -> addr -> bool
. So, I'm eager to hear others. Everyone is welcome.The text was updated successfully, but these errors were encountered: