Replies: 5 comments
-
How are you getting the Pcode? In particular, are you getting it from the decompiler (via a That being said, depending on the architecture and compiler there can be some subtleties when passing structures to functions, particularly if they are passed in registers. There is some discussion in #4195 and #4167 that might be relevant. |
Beta Was this translation helpful? Give feedback.
-
I'm working with the My code is like the following def getCallerInfo(func, caller, options = DecompileOptions(), ifc = DecompInterface()):
print("function: '%s'" % caller.name)
target_addr = func.entryPoint
ifc.setOptions(options)
ifc.openProgram(currentProgram)
monitor = ConsoleTaskMonitor()
res = ifc.decompileFunction(caller, 60, monitor)
high_func = res.getHighFunction()
if high_func:
opiter = high_func.getPcodeOps()
while opiter.hasNext():
op = opiter.next()
mnemonic = str(op.getMnemonic())
if mnemonic == "CALL":
inputs = op.getInputs()
addr = inputs[0].getAddress()
args = inputs[1:] # List of VarnodeAST types
if addr == target_addr:
source_addr = op.getSeqnum().getTarget()
print("Call to {} at {} has {} arguments: {}".format(addr, source_addr, len(args), args))
# WHAT SHOULD I DO HERE?
# this below what I have done
for pos, arg in enumerate(args):
refined = get_vars_from_varnode(caller, arg)
print "found variable '%s' for arg%d" % (refined, pos) where def get_stack_var_from_varnode(func, varnode):
if type(varnode) not in [Varnode, VarnodeAST]:
raise Exception("Invalid value. Expected `Varnode` or `VarnodeAST`, got {}.".format(type(varnode)))
bitness_masks = {
'16': 0xffff,
'32': 0xffffffff,
'64': 0xffffffffffffffff,
}
try:
addr_size = currentProgram.getMetadata()['Address Size']
bitmask = bitness_masks[addr_size]
except KeyError:
raise Exception("Unsupported bitness: {}. Add a bit mask for this target.".format(addr_size))
local_variables = func.getAllVariables()
vndef = varnode.getDef()
if vndef:
vndef_inputs = vndef.getInputs()
for defop_input in vndef_inputs:
defop_input_offset = defop_input.getAddress().getOffset() & bitmask
for lv in local_variables:
unsigned_lv_offset = lv.getMinAddress().getUnsignedOffset() & bitmask
if unsigned_lv_offset == defop_input_offset:
return lv
# If we get here, varnode is likely a "acStack##" variable.
hf = get_high_function(func)
lsm = hf.getLocalSymbolMap()
for vndef_input in vndef_inputs:
defop_input_offset = vndef_input.getAddress().getOffset() & bitmask
for symbol in lsm.getSymbols():
if symbol.isParameter():
continue
if defop_input_offset == symbol.getStorage().getFirstVarnode().getOffset() & bitmask:
return symbol
# unable to resolve stack variable for given varnode
return None
def get_vars_from_varnode(func, node, variables=None):
if type(node) not in [PcodeOpAST, VarnodeAST]:
raise Exception("Invalid value passed. Got {}.".format(type(node)))
# create `variables` list on first call. Do not make `variables` default to [].
if variables == None:
variables = []
# We must use `getDef()` on VarnodeASTs
if type(node) == VarnodeAST:
# For `get_stack_var_from_varnode` see:
# https://github.com/HackOvert/GhidraSnippets
# Ctrl-F for "get_stack_var_from_varnode"
var = get_stack_var_from_varnode(func, node)
if var and type(var) != HighSymbol:
variables.append(var)
node = node.getDef()
if node:
variables = get_vars_from_varnode(func, node, variables)
# We must call `getInputs()` on PcodeOpASTs
elif type(node) == PcodeOpAST:
nodes = list(node.getInputs())
for node in nodes:
if type(node.getHigh()) == HighLocal:
variables.append(node.getHigh())
else:
variables = get_vars_from_varnode(func, node, variables)
return variables the log that I obtain is the following
(this doesn't perfectly match my example because has two arguments, the first is a number the second a struct) |
Beta Was this translation helpful? Give feedback.
-
Ok, with that explanation I think I have a better understanding of what you are asking. The short answer is it's a bit involved, not always possible, and depends on the architecture, compiler, any analysis done to the program (both automatically and manually), and how general you want your solution to be. For the global variable case, if you call However, for the case of passing a structure as a local variable, there are a number of complications. You asked specifically about the stack, but structures can also be passed in registers, so I'll mention both cases. The first complication is that you might need to look at more than one function. If you're lucky the fields of the structure will have been initialized to constants in the body of the function in question. But they could also be set using complicated expressions involving global variables and parameters to the function (for example). In general you will run into the limits of static analysis. The second is whether or not the function signature has been defined using the structure definition. For example, suppose you have a function foo which takes one parameter: a 16-byte structure consisting of 2 longs. Using gcc on x64 linux, calls to foo will pass the structure in two registers - RDI and RSI. By default, Ghidra will probably think that foo has two arguments. If you correct the signature of foo manually, then Ghidra will see that there is one argument spread out over two registers. In this case, it will invent a 16-byte varnode in the unique space and use PIECE ops (in the The third is that some calling conventions actually have pretty complicated rules concerning how small structures are passed in registers, which Ghidra (at the moment) does't handle automatically. In these cases you'd have to manually correct the functions to use "Custom Storage". This is discussed more in the issues I linked above. Finally, if the structure or a pointer to the structure is passed on the stack, you then have to deal with stack analysis. From analyzing the pcode for the call instruction and tracing back inputs/looking at Determining what could be written to those stack locations before the function call is a more complicated analysis. You'd have to deal with whether or not Ghidra knows the correct type of the local variable, which writes to that stack location could be the last write to that location before the call site in question, and the possibility of aliasing if pointers are involved. For your specific case, you might try experimenting in the GUI to see if you can figure out a way to determine a way to associate writes to the stack with parameters to functions in straightforward cases. If you are lucky, each local variable will have an Here are some useful snippets of code for experimenting in python:
If the local variables you see in the listing differ from what you see in the decompiler, you can use the "Commit Local Names" action in the decompiler to save the local variable names to the listing then apply types in the listing. This can be a bit confusing, but basically the decompiler is free to use its best guesses for things like types or number of parameters until the user or certain analyzers have saved information to the Ghidra database. Once they're saved, the decompiler and the listing should agree. If you haven't already, it's probably worth reading the "Program Annotations Affecting the Decompiler" section of the Ghidra help since things you do can change the high pcode you get. For the same reason, you should also ensure that the params/return are correct and have been committed to the database for any function you care about (and ideally all functions) via the "Commit Params/Return" decompiler action or the "Decompiler Parameter ID" analyzer. As an alternative, for the easy cases where the stack is written to immediately before the function call, you might consider scripting the emulator (see EmuX86DeobfuscateExampleScript.java for an example). For each call site, walk back to a call instruction or the beginning of the basic block, initialize the stack pointer to something reasonable, cross your fingers, emulate up to the function call, and examine what values the emulator wrote to memory. I seem to have written a novel. Hopefully this is at least somewhat helpful. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for this extended response, I really appreciated it. I'll try to experiment with your suggestions, for now I'm using the information from the listing (the # set the formatting output for the listing
# so that we can extract information from it
codeUnitFormat = CodeUnitFormat(
CodeUnitFormatOptions(
CodeUnitFormatOptions.ShowBlockName.ALWAYS,
CodeUnitFormatOptions.ShowNamespace.ALWAYS,
"",
True,
True,
True,
True,
True,
True,
True)
)
def getSymbolFromAnnotation(annotation):
"""The label referenced from an instruction is something like
prefix_address+offset
"""
match = re.match(r"(.+?) (r.+)=>(.+?):(.+?),\[sp,", annotation)
if not match:
print "annotation failed:", annotation
return None
match = match.group(4)
try:
offset_plus = match.index("+")
except ValueError:
return getSymbol(match, None)
return None
# this is the function as above, you can jump to the internal loop
def getCallerInfo(func, caller, options = DecompileOptions(), ifc = DecompInterface()):
...
if high_func:
opiter = high_func.getPcodeOps()
while opiter.hasNext():
op = opiter.next()
mnemonic = str(op.getMnemonic())
if mnemonic == "CALL":
inputs = op.getInputs()
# we are going to save the argument of the requested call
# but we are not interested to the address that is the inputs[0]
# argument from the PcodeOp
calling_args = [0] * (len(inputs) - 1)
addr = inputs[0].getAddress()
args = inputs[1:] # List of VarnodeAST types
if addr == target_addr:
source_addr = op.getSeqnum().getTarget()
print("Call to {} at {} has {} arguments: {}".format(addr, source_addr, len(args), args))
for pos, arg in enumerate(args):
# var = arg.getHigh()
# print "var", var, var.getSymbol(), var.getDataType()
# print "lsm", lsm.findLocal(arg.getAddress(), None)
if pos != 0:
print "initial arg%d: %s" % (pos, arg)
refined = get_vars_from_varnode(caller, arg)
if len(refined) > 0:
refined = refined[0]
print "found variable '%s' for arg%d" % (refined, pos)
# here we are going to create an ordered list with all the references to the given variable
# that happen before the call and return only the last one that hopefully is the one
# setting the value
# Bad enough this is a struct so the variable points to the start address of the struct
# if you want a specific field you have to add its offset
offset_field = refined.getStackOffset() + refined.getDataType().getComponent(4).getOffset()
# print "offset_field", offset_field
refs = sorted([(_.getFromAddress().getOffset(), _)
for _ in ref_mgr.getReferencesTo(refined)
if _.getFromAddress() < source_addr
and _.getStackOffset() == offset_field],
key = lambda _ : _[0])[-1]
instr = getInstructionAt(refs[1].getFromAddress())
annotation = codeUnitFormat.getRepresentationString(instr)
from_annotation = getSymbolFromAnnotation(annotation)
print "symbol from annotations", from_annotation
output = getDataAt(from_annotation.getAddress()) if from_annotation else None
calling_args[pos] = output
it seems to work: via a table I then show the list of location with function and arguments, in my case I'm interested to a field of the struct (by the way, extra question, is it possible via ghidra scripting to create the table that dock to the GUI?). Before that I also tried to extract the variable from the decompiled def getCLine(c_markup, address):
"""Try to find the line in the C code for the given address"""
# c_markup is a ClangTokenGroup
queue = deque()
queue.append(c_markup)
while True:
tmp = queue.pop()
if tmp.getMinAddress() == address and tmp.getMaxAddress() == address:
return tmp
filtered = [(n, _) for n, _ in enumerate(list(tmp))
if _.getMinAddress() is not None and _.getMinAddress() <= address and _.getMaxAddress() >= address]
for index, node in filtered:
queue.append(tmp.Child(index)) but bad enough the address I obtain doesn't give me the address corresponding to the right line in the |
Beta Was this translation helpful? Give feedback.
-
@gipi maybe you meet the problem in this blog. |
Beta Was this translation helpful? Give feedback.
-
Suppose you have the following code
and I want to analyze it and obtain a list (for example) containing the calling address of the
do_something()
function and the corresponding string contained in theuri
component of the struct passed as argument.For a global struct, with a fixed address this is pretty simple, you use
getDataAt()
and you have theData
but for a local argument seems pretty difficult.What's the easiest mode to accomplish this? In my experiments I'm able to find the local variable linked to the argument of the function and I tried to analyze the Pcode but seems pretty difficult to extract the actual value if the operation involves moving data from the register to the stack (for example in my code, probably you have
uri
as amov rX, 3
and thenstore rX, [sp, #uri]
).Beta Was this translation helpful? Give feedback.
All reactions