Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Split out as new variable" support for stack variables #2573

Open
PsychoPast opened this issue Dec 20, 2020 · 17 comments
Open

"Split out as new variable" support for stack variables #2573

PsychoPast opened this issue Dec 20, 2020 · 17 comments
Assignees
Labels
Feature: Decompiler Status: Triage Information is being gathered

Comments

@PsychoPast
Copy link

Describe the solution you'd like
A clear and concise description of what you want to happen.
Right now, this feature only works with registers. (it works well!) When it comes to stack vars, the option isn't even there. It's a very useful feature when it comes to use different names and types for the same var.

@larkwiot
Copy link

This feature is desperately needed

@Danil6969
Copy link

Danil6969 commented Sep 2, 2021

You can see how reassignment works for uniques (those which labeled as HASH) and registers if you look at disassembly. Like assign HASH:39f5e6c302b:4 = uVar100. Also Ghidra does this automatically e.g. when the variable goes out of scope. Internally it just sees that previous definition of this varnode is now expired so it uses a new one instead with new high level (decompiler side) type and name. But stack frame on the other side is single and global within function meaning it is shared among code labels, blocks etc so nothing in it can just "expire". You may notice this too if you commit change for any stack local and open stack frame from disassembly. Just notice there is currently only one copy of it supported for the whole function otherwise it wouldn't be trivial at least to maintain all of these copies. Does this take place in other cases other than "spill and reload" idiom used by compilers to temporarily store something on stack that doesn't fit because of register pressure? If it really makes sense then how to edit such dynamic stack frame(s)?

@Danil6969
Copy link

Danil6969 commented Sep 2, 2021

I've found the correct usage for such feature, it's described here #1510. For now it would be better to just copy by parts the decompilation to some separate text file before you change the stack frame and after you remap locals for other part of function. Maybe it's possible to make more tables for stack frame but currently we don't even know how to handle such copies in stack frame editor.

@Wall-AF
Copy link

Wall-AF commented Sep 12, 2021

Is it possible to add a Block ID and Depth Number when assigning stack locations as types and variables? For example, in an if/while/... block, user declarations are scoped within that block, effectively creating a stack hierarchy.

We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!

@Danil6969
Copy link

Danil6969 commented Sep 13, 2021

Is it possible to add a Block ID and Depth Number when assigning stack locations as types and variables? For example, in an if/while/... block, user declarations are scoped within that block, effectively creating a stack hierarchy.

We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!

We don't have such concept as "block id" as we reassign each register or unique by just one address instead. The reason is in the complexity of operating on these data and user input if necessary (one address is easier than bunch of ids and depths). Exactly the same thing would apply to the stack frames if multiple copies are allowed. Also not sure what stack hierarchy would look like.
And we don't even need to create anything (i.e. custom blocks) but address based reassigns.

@Wall-AF
Copy link

Wall-AF commented Sep 16, 2021

Is it possible to add a Block ID and Depth Number when assigning stack locations as types and variables? For example, in an if/while/... block, user declarations are scoped within that block, effectively creating a stack hierarchy.
We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!

We don't have such concept as "block id" as we reassign each register or unique by just one address instead. The reason is in the complexity of operating on these data and user input if necessary (one address is easier than bunch of ids and depths). Exactly the same thing would apply to the stack frames if multiple copies are allowed. Also not sure what stack hierarchy would look like.
And we don't even need to create anything (i.e. custom blocks) but address based reassigns.

Okay, how about using two addresses (I'm assuming here that the address used is the one associated with either the register or stack address space). The second would be assocated with the code address space and it would be the first address in the current decompilation unit where it was used. You could then be able to split out as needed.

I've not delved far into the class ghidra.app.plugin.core.decompile.actions.IsolateVariableAction, but I see that current stack based varnodes seem to have the same mergeGroup for all instances at a particular code address which means the option Split Out As New Variable is unavailabe. Don't know if maybe a different test should be use for stack addreses or something more complex!

@Wall-AF
Copy link

Wall-AF commented Sep 5, 2022

Will this be placed on anyone's radar soon???

@ryanmkurtz ryanmkurtz added the Status: Triage Information is being gathered label Sep 7, 2022
@Swyter
Copy link

Swyter commented Jan 1, 2023

Sorry for the spam, but I was looking into it and I can't believe I have to do some custom union types just to clean up the decompilation, especially when working with custom layered structs this messes things up the more information you add.

@bommijn
Copy link

bommijn commented May 3, 2023

Any news on this ? @Swyter How do you use your workaround.

@Wall-AF
Copy link

Wall-AF commented May 3, 2023

I've successfully updated my assembly to change the offset for the instructions in the area I wish to isolate of the form BP + x (where x is the value of the current stack offset) to BP + y (where y is a new offset, larger than any current stack use) and also modifying the SUB SP,xx to add the number of additional bytes used to ensure stack analysis still works.

@Wall-AF
Copy link

Wall-AF commented Jul 5, 2023

I've successfully updated my assembly to change the offset for the instructions in the area I wish to isolate of the form BP + x (where x is the value of the current stack offset) to BP + y (where y is a new offset, larger than any current stack use) and also modifying the SUB SP,xx to add the number of additional bytes used to ensure stack analysis still works.

This only works when the change required fits into the size of x/y/xx (for a byte, you cannot change the value if you need a local variable ('-'ve offset) more than 127 bytes away). Changing the size of the offset isn't an option as this requires a longer instruction!

@Wall-AF
Copy link

Wall-AF commented Jul 8, 2023

Hi @caheckman, please don't take this the wrong way, but as this has been around for a while now, is there a window for this to be looked into soon?

@Wall-AF
Copy link

Wall-AF commented Jul 11, 2023

@caheckman I'm getting more and more functions I need to decompile that are making use of the same stack reference for different variable types (especially the creation and destruction of what amounts to temporary C++ objects) within C++ code blocks like within if/for/while statements. This is making the implementation of this feature imperative in order to continue the project. Please, can you give me a glimmer of hope??? I appreciate you seem to be alone in these request types, and I would be willing to assist if there were some decent documentation of how the decompiler works! Thanks for you hard work.

@Danil6969
Copy link

Danil6969 commented Jul 11, 2023

Have you tried using unions of structs which in turn represent locals for each frame separately? Like union frames_FUN_xxxxxxxx { struct frame_0; struct frame_1; ... } and struct frame_n { local_off0_t; local_off1_t; ... }

@Wall-AF
Copy link

Wall-AF commented Jul 12, 2023

@Danil6969 thanks for this great suggestion. It works well for different structures using the exact same location, however, and here's the rub, most of the datatypes are overlapping stack locations! To combat that, you end up with deeply nested union/struct combo's, a bit impractical!

@Wall-AF
Copy link

Wall-AF commented Sep 3, 2023

@caheckman as OO languages allow the creation/destruction of local (hence stack) objects within code blocks (ie within an if statement or a for loop or even a for loops index), this facility would help significantly. Is there a simple mechanism whereby code block structures could be identified and used to house stack copies that only get copied when a user indicates the necessity?

@Wall-AF
Copy link

Wall-AF commented Sep 13, 2023

If the current process is defined as:

  1. A Function has a Stack
  2. A Stack has a set/list of Stack Addresses
  3. A Stack Address has some Attributes (datatype, name, etc.)

Could'nt we add another layer of indirection such that it would be:

  1. A Function has a Stack
  2. A Stack has a set/list of Stack Addresses
  3. A Stack Address has a map of Attributes applying to Function Code Addresses

The Stack Analyser would be able to determine the first Function Code Address each particular Stack Address was used by prespecified datatypes and once all datatypes have been identified, flow analysis could fill the Function Code Addreses that use the same datatype. Also, where Classes are defined, their constructors and destructors could identify the first and last Function Code Addresses to which they apply. Finally a User Facility would allow the analyst to specify the first and last Function Code Addresses for other datatypes, defined within the function, that would trigger the flow analiser to fill in the missing Function Code Addresses.

Unfortunately I don't have enough detailed knowledge of Ghidra's internals to complete this myself!

This would also assist in solving #5769 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
Development

No branches or pull requests

8 participants