Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytecode offsets and stack offsets in Soot #787

Open
sohah opened this issue Oct 4, 2017 · 11 comments
Open

Bytecode offsets and stack offsets in Soot #787

sohah opened this issue Oct 4, 2017 · 11 comments
Assignees
Labels
enhancement good first issue Issues that new volunteers/contributors might consider working on.

Comments

@sohah
Copy link

sohah commented Oct 4, 2017

Hello,

We are interested in trying to instrument the Java Symbolic Pathfinder model checker in order to support static symbolic execution of code regions (Veritesting). To be able to do this, we need to identify regions of code that can be statically executed and to translate the static regions. We were hoping to use Soot to do this; Shimple is an SSA form that is straightforward to translate into static symbolic execution regions.

However in order to do so, we need to be able to precisely map bytecode addresses for instructions corresponding to locations in Shimple, as well as stack offsets and object field references to identifiers in Shimple. The instruction locations will be used to "trap" the SPF execution to start static symbolic execution. At these trap points, we need to be able to transfer information in and out of the stack locations and field references that are used by the static code region.

Is it possible to use "as-is" or modify Soot in order to provide this information? We are interested either in the information from the original code, or, if it is easier, rewritten bytecode from the Soot Shimple form. In previous versions of Soot, we could grab the bytecode instructions, but it looks like the current version using the ASM front end no longer has support for this.

Once again, the things we need specifically are:

  • the java bytecode offsets corresponding to Shimple branching statements
  • the stack offsets for local variables referenced in Shimple
  • the field numbers associated with field references in Shimple.

Thank you very much for your time!

@mbenz89 mbenz89 assigned mbenz89 and StevenArzt and unassigned mbenz89 Oct 6, 2017
@anddann
Copy link
Member

anddann commented Oct 10, 2017

Hi @sohah,
your question is interesting.
Unfortunately, I can't give you an exact answer to your question.

  • Class files are parsed to Jimple/Shimple by classes in the package soot.asm (SootClassBuilder, ASMMethodSource, ...) using ASM 5.0.
  • On the intermediate representation different body transformations are executed in Soot (e.g., PackManager, BodyPackManager)
    In these transformations, some bytecode instructions are split up into several Jimple/Shimple statements (e.g., LocalSplitter).
    Thus, one bytecode instruction does not necessarily correspond to one Jimple/Shimple instruction.

Anyway, it is worth to take a look at the ASM 5.0 API to check if it provides bytecode offsets for branching statements or stack offsets in general.

@ericbodden
Copy link
Member

Hello. I also believe that currently Soot's ASM frontend does not preserve bytecode offsets as Tags, as the ole Coffi frontend used to do. However, I am quite confident that this would be easy to add. Obtaining the offsets through ASM should rather simple. If you were to provide a pull request to extend Soot that way then we would be happy to accept it.

Note that in a first step you would have to generate the appropriate Tags on the Jimple IR, I believe, and then in a second step you would need to carry them over when Shimple is created from Jimple.

@jgarci40
Copy link

jgarci40 commented Oct 1, 2018

I'd like to know around what point in time the frontend or default frontend switched from Coffi to ASM. I have some software that relied on extracting bytecode offset tags for Android apps, which no longer seem to be working on later versions of Soot. I think the change of frontend may be the culprit. I am considering the possibility of implementing the above enhancement suggested by Eric.

@mbenz89 mbenz89 added the good first issue Issues that new volunteers/contributors might consider working on. label Nov 12, 2018
@vaibhavbsharma
Copy link

I've also run into the need for wanting bytecode offsets that Jimple statements correspond to and am close to making bytecode offsets for Jimple statements available via the BytecodeOffsetTag. Even though this is an almost 2-year old issue, would it still be valuable to have this fix be available in Soot?

@mbenz89
Copy link
Contributor

mbenz89 commented Jun 28, 2019

Of course! We would be happy to accept a pull request for the feature!

@vaibhavbsharma
Copy link

That's great news! I would like to contribute a fix for this feature. Unfortunately, I could not find an ASM API to extract the bytecode offset from ASM. Instead, I have made a small change to ASM itself that allows bytecode offsets to be assigned to ASM's instructions. Later, when Soot creates instructions in Jimple, it uses the bytecode offsets in ASM's instructions to construct BytecodeOffsetTag objects.

Since my changes in Soot depend on changes in ASM, should I first be trying to submit a pull request to ASM? Soot's pom.xml suggests that Soot uses version 5.2 of ASM. To make use of changes to ASM, would we also need to upgrade Soot's usage of ASM to 7.1 (ASM's latest version)?

@vaibhavbsharma
Copy link

I've made these changes in the ASM 5.2 as well as Soot. Since these changes in Soot depend on the changes in ASM, I think they would benefit from a review from you folks. Please find my ASM changes here and my corresponding changes in Soot here. As a result of these changes, I can see Soot populating correct offsets in the BytecodeOffsetTag tag into Jimple IR statements if the keep_offset option is turned on. Let me know what you folks think of these changes.

@mbenz89
Copy link
Contributor

mbenz89 commented Aug 8, 2019

We are not restricted to ASM version 5.2. Indeed, we are planning to upgrade to the latest version with the merge of our Java9 feature branch.
Your changes to Soot look fine. Regarding the changes to ASM, please contact the authors to get your work merged.

I'm happy to merge your changes to Soot as soon as we have an answer from the ASM people.

@vaibhavbsharma
Copy link

Thanks for the quick response. It is great that Soot is moving to the latest ASM version. I need a little time to port my ASM patch to its master branch. I will reply back once that is done.

@dheeraj135
Copy link

I have migrated @vaibhavbsharma 's ASM changes to latest version of ASM(HEAD of master branch). Please find the changes here. I tested these changes on small java Codes and they are identical to the values reported by Coffi. I am not very familiar with Soot and ASM, so it would be really great if someone here can verify these changes. Please note that you will have to apply @vaibhavbsharma's changes to Soot (here) to test my changes.

@BarrensZeppelin
Copy link

Is the difference between bytecode index and regular index that the bytecode index takes the size of instructions into account?
I.e. in bodies

ICONST_0
POP
ILOAD 0
POP

The POP instruction has the same index in the instruction list, but in the latter body its bytecode index is higher due to ILOAD needing an additional byte for the variable index?

Is it possible to retrieve the (regular) index of the bytecode instruction that generated a jimple instruction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue Issues that new volunteers/contributors might consider working on.
Projects
None yet
Development

No branches or pull requests

9 participants