# Ghidra Snippets Kotlin

*Ghidra Snippets Kotlin* is a collection of Kotlin examples showing how to work with [Ghidra](https://ghidra-sre.org/) APIs. There are three primary APIs covered here, the [Flat Program API][1], the [Flat Decompiler API][2], and everything else ("Complex API"). The Flat APIs are 'simple' versions of the full fledged Complex Ghidra API. They are a great starting point for anyone looking to develop Ghidra modules and/or scripts.

At some point however, you need to reach outside of the Flat APIs to do really interesting things. Everything here is just a mashup to get the job done. There's more than one way to accomplish each task so make sure you ask yourself if these snippets are really what you need before copy/pasta.

In [1]:
fun type(variable: Any) = variable::class.simpleName

## Latest Release & API Docs
* Loooking for the latest release of Ghidra? [Download it here][0]. 
* Loooking for the latest API Docs? In Ghidra's UI, select **Help** -> **Ghidra API Help** to unpack the docs and view them.
* Just want a quick online API reference? [Ghidra.re](https://ghidra.re/) hosts an online version of the [API docs](https://ghidra.re/ghidra_docs/api/index.html) (may not be up to date).

# Contributing

Feel free to submit pull requests to master on this repo with any modifications you see fit. Most of these snippets are meant to be verbose, but if you know of a better way to do something, please share it via pull request or submitting an issue. Thanks!

## Working with Projects
A Ghidra *Project* (class [GhidraProject][4]) contains a logical set of program binaries related to a reverse engineering effort. Projects can be shared (collaborative) or non-shared (private). The snippets in this section deal with bulk import and analysis, locating project files on disk, and more.

### Get the name and location on disk of the current project
If you're looking to automate analysis using headless scripts you'll likley have to deal with project management. This includes adding binaries to existing projects, cleaning up old projects, or perhaps syncing analysis to shared projects.

In [2]:
val state = getState()
val project = state.getProject()
val program = state.getCurrentProgram()
val locator = project.getProjectData().getProjectLocator()
println("type(state):           %s".format(type(state)))
println("type(project):         %s".format(type(project)))
println("type(program):         %s".format(type(program)))
println("type(locator):         %s".format(type(locator)))
println("Project Name:          %s".format(locator.getName()))
println("Files in this project: %s".format(project.getProjectData().getFileCount()))
println("Is a remote project:   %s".format(locator.isTransient()))
println("Project location:      %s".format(locator.getLocation()))
println("Project directory:     %s".format(locator.getProjectDir()))
println("Lock file:             %s".format(locator.getProjectLockFile()))
println("Marker file:           %s".format(locator.getMarkerFile()))
println("Project URL:           %s".format(locator.getURL()))

type(state):           GhidraState
type(project):         DefaultProject
type(program):         ProgramDB
type(locator):         ProjectLocator
Project Name:          KotlinScript
Files in this project: 10
Is a remote project:   false
Project location:      /home/sciver/Work/ghidraKotlinScript
Project directory:     /home/sciver/Work/ghidraKotlinScript/KotlinScript.rep
Lock file:             /home/sciver/Work/ghidraKotlinScript/KotlinScript.lock
Marker file:           /home/sciver/Work/ghidraKotlinScript/KotlinScript.gpr
Project URL:           ghidra:/home/sciver/Work/ghidraKotlinScript/KotlinScript


### List all programs in the current project
A Ghidra project is a logical collection binaries that relate to a specific RE effort. This might be a single executable with multiple shared objects, or multiple executables with numerous third-party libraries, kernel modules, and drivers.

In [3]:
val state = getState()
val project = state.getProject()
val locator = project.getProjectData().getProjectLocator()
val projectMgr = project.getProjectManager()
val activeProject = projectMgr.getActiveProject()
val projectData = activeProject.getProjectData()
val rootFolder = projectData.getRootFolder()

println("type(state):           %s".format(type(state)))
println("type(project):         %s".format(type(project)))
println("type(projectMgr):      %s".format(type(projectMgr)))
println("type(activeProject):   %s".format(type(activeProject)))
println("type(projectData):     %s".format(type(projectData)))
println("type(rootFolder):      %s".format(type(rootFolder)))

val projectName = locator.getName()
val fileCount = projectData.getFileCount()
val files = rootFolder.getFiles()

println("The project '%s' has %s files in it:".format(projectName, fileCount))
for(file in files) {
	println("\t%s".format(file))
}

type(state):           GhidraState
type(project):         DefaultProject
type(projectMgr):      GhidraProjectManager
type(activeProject):   DefaultProject
type(projectData):     ProjectFileManager
type(rootFolder):      RootGhidraFolder
The project 'KotlinScript' has 10 files in it:


## Working with Programs
A *Program* is a binary component within a Ghidra Project. The snippets in this section deal with gathering information about the Programs within a Project.

### List the current program name and location on disk

In [4]:
val state = getState()
val currentProgram = state.getCurrentProgram()
val name = currentProgram.getName()
val location = currentProgram.getExecutablePath()
println("The currently loaded program is: '%s'".format(name))
println("Its location on disk is: '%s'".format(location))

The currently loaded program is: 'control.elf'
Its location on disk is: '/home/sciver/Work/ghidraKotlinScript/demo/bin/debug/control.elf'


### List the name and size of program sections

In [5]:
val blocks = currentProgram.getMemory().getBlocks()
for(block in blocks) {
    println("Name: %s, Size: %s".format(block.getName(), block.getSize()))
}

Name: segment_2.1, Size: 792
Name: .interp, Size: 28
Name: .note.gnu.property, Size: 32
Name: .note.gnu.build-id, Size: 36
Name: .note.ABI-tag, Size: 32
Name: .gnu.hash, Size: 36
Name: .dynsym, Size: 192
Name: .dynstr, Size: 137
Name: .gnu.version, Size: 16
Name: .gnu.version_r, Size: 32
Name: .rela.dyn, Size: 192
Name: .rela.plt, Size: 48
Name: .init, Size: 27
Name: .plt, Size: 48
Name: .plt.got, Size: 16
Name: .plt.sec, Size: 32
Name: .text, Size: 517
Name: .fini, Size: 13
Name: .rodata, Size: 25
Name: .eh_frame_hdr, Size: 68
Name: .eh_frame, Size: 264
Name: .init_array, Size: 8
Name: .fini_array, Size: 8
Name: .dynamic, Size: 496
Name: .got, Size: 80
Name: .data, Size: 16
Name: .bss, Size: 8
Name: EXTERNAL, Size: 56
Name: .comment, Size: 43
Name: .debug_abbrev, Size: 221
Name: .debug_aranges, Size: 48
Name: .debug_info, Size: 783
Name: .debug_line, Size: 332
Name: .debug_str, Size: 692
Name: .shstrtab, Size: 346
Name: .strtab, Size: 537
Name: .symtab, Size: 1704
Name: _elfSectionHea

## Working with Functions
A *Function* is a subroutine within an Program. The snippets in this section deal with gathering information about Functions and modifying them within an Program.

### Enumerate all functions printing their name and address
There are at least two ways to do this. The output is the same for each method.

In [6]:
// Method 1:
var func = getFirstFunction()
while(func != null) {
    println("Function: ${func.name} @ 0x${func.entryPoint}")
    func = getFunctionAfter(func)
}
// Method 2:
val fm = currentProgram.getFunctionManager()
val funcs = fm.getFunctions(true)
funcs?.forEach {
    println("Function: ${it?.name} @ 0x${it?.entryPoint}")
}

Function: _init @ 0x00101000
Function: FUN_00101020 @ 0x00101020
Function: __cxa_finalize @ 0x00101050
Function: puts @ 0x00101060
Function: printf @ 0x00101070
Function: _start @ 0x00101080
Function: deregister_tm_clones @ 0x001010b0
Function: register_tm_clones @ 0x001010e0
Function: __do_global_dtors_aux @ 0x00101120
Function: frame_dummy @ 0x00101160
Function: main @ 0x00101169
Function: __libc_csu_init @ 0x00101210
Function: __libc_csu_fini @ 0x00101280
Function: _fini @ 0x00101288
Function: _ITM_deregisterTMCloneTable @ 0x00105000
Function: puts @ 0x00105008
Function: printf @ 0x00105010
Function: __libc_start_main @ 0x00105018
Function: __gmon_start__ @ 0x00105020
Function: _ITM_registerTMCloneTable @ 0x00105028
Function: __cxa_finalize @ 0x00105030
Function: _init @ 0x00101000
Function: FUN_00101020 @ 0x00101020
Function: __cxa_finalize @ 0x00101050
Function: puts @ 0x00101060
Function: printf @ 0x00101070
Function: _start @ 0x00101080
Function: deregister_tm_clones @ 0x001010b

### Get a function name by address
This snippet will help you correlate addresses with associated function names.

In [7]:
// helper function to get a Ghidra Address type
fun getAddress(offset:Long) = currentProgram.getAddressFactory().getDefaultAddressSpace().getAddress(offset)

//get a FunctionManager reference for the current program
val functionManager = currentProgram.getFunctionManager()

//  getFunctionAt() only works with function entryPoint addresses!
//  returns `null` if address is not the address of the first
//  instruction in a defined function. Consider using
//  getFunctionContaining() method instead.
var addr = getAddress(0x101140)
val funcName = functionManager.getFunctionAt(addr)?.getName()
println(funcName)

// check if a specific address resides in a function
addr = getAddress(0x101149)
println(functionManager.isInFunction(addr))

// get the function an address belongs to, returns `None` if the address
//  is not part of a defined function.
addr = getAddress(0x101149)
println(functionManager.getFunctionContaining(addr))

null
true
__do_global_dtors_aux


### Get a function address by name
Have a name for a function but want the entry point address for it? This will help you do that. Just remember that two or more functions can share the same name (due to function overloading), so Ghidra will return an array (Python list) you have to consider iterating over.

In [8]:
// Note that multiple functions can share the same name, so Ghidra's API
// returns a list of `Function` types. Just keep this in mind.
val name = "main"
val funcs = getGlobalFunctions(name)
println("Found ${funcs.size} function(s) with the name '$name'")
funcs.forEach {
    println("${it.name} is located at 0x${it.entryPoint}")
}

Found 1 function(s) with the name 'main'
main is located at 0x00101169


### Get cross references to a function
Ghidra makes it easy to find all cross references to a function using `getReferencesTo`. To use this, you'll just need the function's entry address which can be acquired using the `getEntryPoint` method on a function object.  Let's take a look at an example where we find all cross references to functions named "system".

In [9]:
val fm = currentProgram.getFunctionManager()
val funcs = fm.getFunctions(true)
funcs.filter {
    it.name == "main"
}.forEach {
    println("\nFound '${it.name}' @ 0x${it.entryPoint}")
    val entryPoint = it.entryPoint
    val references = getReferencesTo(entryPoint)
    references.forEach { 
        println(it)
    }
}


Found 'main' @ 0x00101169
From: Entry Point To: 00101169 Type: EXTERNAL Op: -1 DEFAULT
From: 00102048 To: 00101169 Type: INDIRECTION Op: 0 ANALYSIS
From: 001020f0 To: 00101169 Type: DATA Op: 0 ANALYSIS
From: 001010a1 To: 00101169 Type: DATA Op: 1 ANALYSIS


### Analyzing function call arguments
This snippet uses a `TARGET_ADDR` which should be the address of a call to return the call arguments at that address. Thanks to [gipi](https://github.com/gipi) for [suggesting](https://github.com/HackOvert/GhidraSnippets/issues/4) this much cleaner way to obtain function call arguments than previously listed!

In [10]:
import ghidra.app.decompiler.DecompileOptions
import ghidra.app.decompiler.DecompInterface
import ghidra.util.task.ConsoleTaskMonitor

// Disassembly shows: 00434f6c    CALL    FUN_00433ff0
// Decompiler shows:  uVar1 = FUN_00433ff0(param_1,param_2,param_3);
val TARGET_ADDR = toAddr(0x10114d)

val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

val func = getFunctionContaining(TARGET_ADDR)
val res = ifc.decompileFunction(func, 60, monitor)
val high_func = res.getHighFunction()
val pcodeops = high_func.getPcodeOps(TARGET_ADDR)
pcodeops.forEach { 
    println(it.getInputs().joinToString())
}

### Analyzing function call arguments at cross references
In this snippet we locate cross references to a target function (`TARGET_FUNC`) and show how we can analyze the arguments passed to each call. This can be helpful in analyzing malware, or potentially vulnerable functions.  For malware analysis, this may help "decrypt" strings, or in vulnerability research this may help locate functions that may be vulnerable if called with an incorrect value.  The specific analysis performed on the arguments of a called target function are up to you. This snippet will allow you to add your own analysis as you see fit.

In [11]:
import ghidra.app.decompiler.DecompileOptions
import ghidra.app.decompiler.DecompInterface
import ghidra.util.task.ConsoleTaskMonitor

// helper function to get a Ghidra Address type
fun getAddress(offset:Long) = currentProgram.getAddressFactory().getDefaultAddressSpace().getAddress(offset)

val TARGET_FUNC = "a"
var targetAddr = getAddress(0)
// Step 1. Get functions that call the target function ('callers')
val funcs = getGlobalFunctions(TARGET_FUNC)
val callers = funcs.filter {
    it.name == TARGET_FUNC
}.flatMap {
    println("\nFound ${it.name} @ 0x${it.entryPoint}")
    targetAddr = it.entryPoint
    val references = getReferencesTo(targetAddr)
    references.map {
        getFunctionContaining(it.fromAddress)
    }
}.filterNotNull()

// Step 2. Decompile all callers and find PCODE CALL operations leading to `target_add`
val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

callers.forEach {
    val res = ifc.decompileFunction(it, 60, monitor)
    val high_func = res.highFunction
    high_func?.let {
        it.pcodeOps.forEach { 
            val mnemonic = it.mnemonic
            if (mnemonic == "CALL") {
                val inputs = it.inputs
                val addr = inputs.first().address
                val args = inputs.slice(1 until inputs.size)
                println("$args")
                if (addr == targetAddr) {
                    println("Call to ${addr} at ${it.seqnum.target} has ${args.size} arguments: ${args}")
                }
            }
        }
    }
}

### Rename functions based on strings
In some cases strings will either hint at or give the name of a function when symbols aren't available.  In these cases, we can rename functions based on these strings to help perform reverse engineering.  This becomes a daunting task when there are hundreds or thousands of these cases, making the process of copy, rename, paste, a soul crushing task.  The power of tools like Ghidra is that you can script this.  Take for example this decompiled function found in a binary:
```
undefined4 FUN_00056b58(void)
{
	undefined4 in_r3;

	register_function(1, "core_Init_Database", FUN_00058294);
	register_function(1, "core_Clear_Database", FUN_00058374);
	register_function(1, "core_Auth_Database", FUN_00058584);
	register_function(1, "core_Add_User_Database", FUN_00058650);
	
	// ... hundreds more in this function and in others ...
	
	return in_r3;
}
```

In this function, we have calls to a `register_function` which correlates an event string with a handler function.  We want to rename these functions so that `FUN_00058294` becomes `core_Init_Database` and so on.  Below is code that performs this task for every `register_function` in the target binary.

In [None]:
import ghidra.app.decompiler.DecompileOptions
import ghidra.app.decompiler.DecompInterface
import ghidra.util.task.ConsoleTaskMonitor
import ghidra.app.util.bin.BinaryReader
import ghidra.app.util.bin.RandomAccessMutableByteProvider
import java.io.File
import ghidra.program.database.ProgramDB

// helper function to get a Ghidra Address type
fun getAddress(offset:Long) = currentProgram.getAddressFactory().getDefaultAddressSpace().getAddress(offset)

fun runTransaction(msg:String, transaction:() -> Unit) {
    val id = currentProgram.startTransaction(msg)
    transaction()
    currentProgram.endTransaction(id, true)
}

val f = File(currentProgram.executablePath)
val mem = currentProgram.getMemory()
val provider = RandomAccessMutableByteProvider(f)
val reader = BinaryReader(provider, true)

fun readString(offset:Long) = reader.readAsciiString(offset - currentProgram.imageBase.offset)

val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

val functionManager = currentProgram.getFunctionManager()
val funcs = functionManager.getFunctions(true)
val registerFunction = getGlobalFunctions("register_function").first()

val entryPoint = registerFunction.entryPoint
val references = getReferencesTo(entryPoint)
// val id = currentProgram.startTransaction("modify")
runTransaction("modify") {
    references.map {
        functionManager.getFunctionContaining(it.fromAddress)
    }.filterNotNull().toSet().forEach {
        val res = ifc.decompileFunction(it, 60, monitor)
        val high_func = res.highFunction
        high_func?.let {
            it.pcodeOps.forEach { 
                it?.run { 
                    if (mnemonic == "CALL") {
                        val callTarget = it.getInput(0)
                        if (callTarget.address == entryPoint) {
                            val coreName = it.getInput(1)
                            val coreFunc = it.getInput(2)
                            val coreNameAddr = when(coreName.def.mnemonic) {
                                "COPY" -> coreName.def.getInput(0).offset
                                "PTRSUB" -> coreName.def.getInput(1).offset
                                else -> 0
                            }
                            val coreString = readString(coreNameAddr)
    //                         createAsciiString(toAddr(coreNameAddr))
                            println(coreString)
                            val coreFuncAddr = toAddr(coreFunc.def.getInput(1).offset)
                            println(coreFuncAddr)
                            val coreFuncObj = functionManager.getFunctionAt(coreFuncAddr)
                            val trans:()->Unit = {
                                coreFuncObj.setName(coreString, ghidra.program.model.symbol.SourceType.DEFAULT)
                            }
                            trans()
    //                         println(type(transaction.))
    //                         runTransaction("modify Name", trans) 
                        }
                    }
                }
            }
        }
    }
}
// currentProgram.endTransaction(id, true)

## Working with Instructions

### Print all instructions in a select function
Just like `objdump` or a `disas` command in GDB, Ghidra provides a way to dump instructions if you need.  You might do this to generate input for another application, or for documenting issues found during analysis. Whatever you use case might be, you can easily acquire the address, opcodes, and instruction text for a target function, or specific addresses.

In [13]:
val listing = currentProgram.getListing()
val mainFunc = getGlobalFunctions("main")[0] // assume there's only 1 function named 'main'
val addrSet = mainFunc.body
val codeUnits = listing.getCodeUnits(addrSet, true) // true means 'forward'

codeUnits.forEach {
    println("0x%s : %-16s %s".format(it.address, it.bytes.map {"%02x".format(it)}. joinToString(""), it.toString()))
}
// for codeUnit in codeUnits:
// 	print("0x{} : {:16} {}".format(codeUnit.getAddress(), hexlify(codeUnit.getBytes()), codeUnit.toString()))

0x00101169 : f30f1efa         ENDBR64
0x0010116d : 55               PUSH RBP
0x0010116e : 4889e5           MOV RBP,RSP
0x00101171 : 4883ec10         SUB RSP,0x10
0x00101175 : c745fc00000000   MOV dword ptr [RBP + -0x4],0x0
0x0010117c : eb23             JMP 0x001011a1
0x0010117e : 837dfc05         CMP dword ptr [RBP + -0x4],0x5
0x00101182 : 7418             JZ 0x0010119c
0x00101184 : 8b45fc           MOV EAX,dword ptr [RBP + -0x4]
0x00101187 : 89c6             MOV ESI,EAX
0x00101189 : 488d3d740e0000   LEA RDI,[0x102004]
0x00101190 : b800000000       MOV EAX,0x0
0x00101195 : e8d6feffff       CALL 0x00101070
0x0010119a : eb01             JMP 0x0010119d
0x0010119c : 90               NOP
0x0010119d : 8345fc01         ADD dword ptr [RBP + -0x4],0x1
0x001011a1 : 837dfc09         CMP dword ptr [RBP + -0x4],0x9
0x001011a5 : 7ed7             JLE 0x0010117e
0x001011a7 : eb22             JMP 0x001011cb
0x001011a9 : 837dfc05         CMP dword ptr [RBP + -0x4],0x5
0x001011ad : 7502             JNZ 0

### Find all calls and jumps to a register
This snippet shows how to find all instructions in an x86 program that are calls or jumps to a regitser. This information can be useful when attempting to track down a crash by researching code flow in hard-to-debug targets. `getRegister` returns `None` when the specified index is not a regitser. We use `startswith('J')` to account for all jump variants. This is not architecture agnostic and a little goofy but it gets the job done.

In [14]:
val listing = currentProgram.getListing()
val func = getFirstFunction()
val entryPoint = func.getEntryPoint()
val instructions = listing.getInstructions(entryPoint, true)

instructions.filter {
    (it.mnemonicString.startsWith("CALL") || it.mnemonicString.startsWith("J"))
}.filter {
    it.getRegister(0) != null
}.forEach {
    println("0x${it.address} : ${it}")
}

0x00101014 : CALL RAX
0x001010cf : JMP RAX
0x00101110 : JMP RAX


### Count all mnemonics in a binary
While recently preparing to teach some introductary x86, I wanted to know the most used mnemonics appearing in a given application to make sure I covered them. This is insanely easy to do in Binary Ninja, but a bit more involved in Ghidra. Essentially, we track mnemonics in a dictionary and increment the count as we process all instructions in a binary.  

This requires getting a `InstructionDB` and using the `getMnemonicString` method to determine the mnemonic of the native assembly instruction. At the end of this snippet, we copy/pasta code from StackOverflow to sort our collected data without really thinking about how it works and we call it a day. All joking aside, this is a pretty neat way to prioritize which instructions you should focus on learning if you're learning a new architecture and don't know where to begin.


In [15]:
val func = getFirstFunction()
var inst = getInstructionAt((func.getEntryPoint()))
// val instructions = mutableMapOf<String, Long>().withDefault { 0 }
// while(inst != null) {
//     val mnmonic = inst.mnemonicString
//     instructions.put(mnmonic, instructions.getValue(mnmonic) + 1)
//     inst = inst.getNext()
// }
val instList = mutableListOf<String>()
while(inst != null) {
    val mnmonic = inst.mnemonicString
    instList += mnmonic
    inst = inst.next
}
instList.groupBy { it }.mapValues { it.value.size }.toList().sortedByDescending { it.second }

[(MOV, 24), (JMP, 16), (LEA, 14), (ENDBR64, 13), (PUSH, 13), (CALL, 11), (JZ, 10), (CMP, 10), (RET, 9), (POP, 8), (SUB, 7), (ADD, 6), (NOP, 5), (TEST, 3), (SAR, 3), (JNZ, 3), (XOR, 2), (AND, 1), (HLT, 1), (SHR, 1), (JLE, 1), (JG, 1), (LEAVE, 1)]

## Tokenize a function

Tokenize decompiled functions with ClangNode interface

In [16]:
import ghidra.app.decompiler.DecompileOptions
import ghidra.app.decompiler.DecompInterface
import ghidra.util.task.ConsoleTaskMonitor
import ghidra.app.decompiler.*
import kotlinx.serialization.*
import kotlinx.serialization.json.*

val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

val functionManager = currentProgram.getFunctionManager()
val funcs = functionManager.getFunctions(true)

enum class TokenType {
    BREAK, COMMENT_TOKEN, FIELD_TOKEN, FUNC_NAME_TOKEN, FUNC_PROTO, FUNCTION, LABEL_TOKEN,
    OP_TOKEN, RETURN_TYPE, STATEMENT, SYNTAX_TOKEN, TOKEN, TOKEN_GROUP, TYPE_TOKEN, VARIABLE_DECL, VARIABLE_TOKEN
}

@Serializable
data class Token(
    val type: TokenType,
    val token: String?,
    val child: List<Token>?
) {
    fun flatten():List<String> {
        if (token != null) {
            return listOf(token)
        } else {
            return child?.flatMap { it.flatten() } ?: listOf<String>()
        }
//         return token ?: child?.map {it.flatten()}?.filterNotNull()?.joinToString()
    }
}


fun ClangNode.expand():Token? {
    val type = when(this) {
        is ClangBreak -> { TokenType.BREAK; return null }
        is ClangCommentToken -> { TokenType.COMMENT_TOKEN; return null }
        is ClangFieldToken -> { TokenType.FIELD_TOKEN}
        is ClangFuncNameToken -> { TokenType.FUNC_NAME_TOKEN }
        is ClangFuncProto -> { TokenType.FUNC_PROTO }
        is ClangFunction -> { TokenType.FUNCTION }
        is ClangLabelToken -> { TokenType.LABEL_TOKEN }
        is ClangOpToken -> { TokenType.OP_TOKEN }
        is ClangReturnType -> { TokenType.RETURN_TYPE }
        is ClangStatement -> { TokenType.STATEMENT }
        is ClangSyntaxToken -> {TokenType.SYNTAX_TOKEN }
        is ClangToken -> { TokenType.TOKEN }
        is ClangTokenGroup -> { TokenType.TOKEN_GROUP}
        is ClangTypeToken -> { TokenType.TYPE_TOKEN }
        is ClangVariableDecl -> { TokenType.VARIABLE_DECL }
        is ClangVariableToken -> { TokenType.VARIABLE_TOKEN }
        else -> {TokenType.TOKEN}
    }
    if (numChildren() == 0) {
        return if(!toString().isEmpty() && !toString().isBlank()) Token(type, toString(), null) else null
    } else {
        return Token(
            type,
            null, 
            (0 until numChildren()).map {
                Child(it).expand()
            }.filterNotNull()
        )
    }
}
funcs.forEach { 
    val res = ifc.decompileFunction(it, 60, monitor)
    val markUp = res.cCodeMarkup
    val tokens = markUp.expand()
    val jsonToken = Json.encodeToString(tokens)
    println(tokens?.flatten())
    println()
    println(jsonToken)
    println()
}

[int, _init, (, EVP_PKEY_CTX, *, ctx, ), {, int, iVar1, ;, iVar1, =, __gmon_start__, (, ), ;, return, iVar1, ;, }]

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"int","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"_init","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN","token":"EVP_PKEY_CTX","child":null},{"type":"OP_TOKEN","token":"*","child":null},{"type":"TOKEN","token":"ctx","child":null}]},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN","token":"int","child":null},{"type":"TOKEN","token":"iVar1","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"STATEMENT","token":null,"child":[{"type":"TOKEN","token":"iVar1","child":null},


{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"int","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"main","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"SYNTAX_TOKEN","token":"void","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN","token":"int","child":null},{"type":"TOKEN","token":"i","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN_GROUP","token":null,"child":[{"type":"OP_TOKEN","token":"for","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"STATEMENT","token":null,"child":[{"type":"TOKEN","token":"i","child":null},{"type":"OP_TOKEN","token":"=","child":null},{"type":"TOKEN","token":"0","ch

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"int","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"printf","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN","token":"char","child":null},{"type":"OP_TOKEN","token":"*","child":null},{"type":"TOKEN","token":"__format","child":null}]},{"type":"SYNTAX_TOKEN","token":",","child":null},{"type":"SYNTAX_TOKEN","token":"...","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"STATEMENT","token":null,"child":[{"type":"OP_TOKEN","token":"halt_baddata","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null}]},{"type":"SYNTAX_TOKEN","token":"}","child"

## Working with Variables

### Get a stack variable from a Varnode or VarnodeAST
When working with refined PCode you'll almost exclusively be dealing with `VarnodeAST` or `PCodeOpAST` objects. Correlating these objects to stack variables is not something exposed by the Ghidra API (as far as I can tell in v9.2.2). This leads to a complex mess of taking a varnode and comparing it to the decompiler's stack variable symbols for a given function.  It's not intutitive, and quite frankly, it's been the most confusing and complex thing I've done with the Ghidra API to date. 

This function works when you're passing a Varnode/AST of a simple variable, say something like this:


```c
memset(local_88,0,0x10);
```

If you want to know what that the first argument (`local_88`) is named "local_88" and get its size, you can use this function.  If that first parameter include nested `PCodeOpAST`s however, say something like `(char *)local_a8`, this function will not work because the variable is "wrapped" inside of a `CAST` operation. In order to work, this needs to be paired with `get_vars_from_varnode` (ctrl-f to find its definition) to unwrap this onion, isolate the variable VarnodeAST, and then pass it here. There's an example of doing that in this document under the heading "Get stack variables from a PcodeOpAST".

In [None]:
TODO()
// def get_stack_var_from_varnode(func, varnode):
//     if type(varnode) not in [Varnode, VarnodeAST]:
//         raise Exception("Invalid value. Expected `Varnode` or `VarnodeAST`, got {}.".format(type(varnode)))
    
//     bitness_masks = {
//         '16': 0xffff,
//         '32': 0xffffffff,
//         '64': 0xffffffffffffffff,
//     }

//     try:
//       addr_size = currentProgram.getMetadata()['Address Size']
//       bitmask = bitness_masks[addr_size]
//     except KeyError:
//       raise Exception("Unsupported bitness: {}. Add a bit mask for this target.".format(addr_size))

//     local_variables = func.getAllVariables()
//     vndef = varnode.getDef()
//     if vndef:
//         vndef_inputs = vndef.getInputs()
//         for defop_input in vndef_inputs:
//             defop_input_offset = defop_input.getAddress().getOffset() & bitmask
//             for lv in local_variables:
//                 unsigned_lv_offset = lv.getMinAddress().getUnsignedOffset() & bitmask
//                 if unsigned_lv_offset == defop_input_offset:
//                     return lv
        
//         # If we get here, varnode is likely a "acStack##" variable.
//         hf = get_high_function(func)
//         lsm = hf.getLocalSymbolMap()
//         for vndef_input in vndef_inputs:
//             defop_input_offset = vndef_input.getAddress().getOffset() & bitmask
//             for symbol in lsm.getSymbols():
//                 if symbol.isParameter(): 
//                     continue
//                 if defop_input_offset == symbol.getStorage().getFirstVarnode().getOffset() & bitmask:
//                     return symbol

//     # unable to resolve stack variable for given varnode
//     return None


### Get stack variables from a PcodeOpAST
If you took a look at the code under the section "Get a stack variable from a Varnode or VarnodeAST", you'll probably be asking why that code works for something like: `memset(local_88,0,0x10);` but it fails for `strchr((char *)local_a8,10);`. The reason is that `local_88` is a `VarnodeAST` while `(char *)local_a8` is a `PcodeOpAST`. In other words, the `local_a8` varnode is "wrapped" inside of a `PcodeOpAST` and you can't associate it to any kind of meaningful value without first "unwrapping" it. Of course, a wrapped `VarnodeAST` could be wrapped in numerous `CAST` operations and `INT_ADD` operations, etc.  So how do we handle this? Recursion. * shudder *. 

Fair warning, recursion is my computer science nemisis. If you look at this code and think "this is odd" - you're probably right!

That being said, let's slap something like `(char *)local_a8` into this function and see if we can get the associated variable name(s) out of it. It's perfectly fine to pass a `PcodeOpAST` with multiple variables (e.g. `(long)iVar2 + local_a0`) into this function. It will just return a list of all variables contained in that `VarnodeAST`.


In [None]:
TODO()
// def get_vars_from_varnode(func, node, variables=None):
//     if type(node) not in [PcodeOpAST, VarnodeAST]:
//         raise Exception("Invalid value passed. Got {}.".format(type(node)))

//     # create `variables` list on first call. Do not make `variables` default to [].
//     if variables == None:
//         variables = []

//     # We must use `getDef()` on VarnodeASTs
//     if type(node) == VarnodeAST:
//         # For `get_stack_var_from_varnode` see:
//         # https://github.com/HackOvert/GhidraSnippets 
//         # Ctrl-F for "get_stack_var_from_varnode"
//         var = get_stack_var_from_varnode(func, node)
//         if var and type(var) != HighSymbol:
//             variables.append(var.getName())
//         node = node.getDef()
//         if node:
//             variables = get_vars_from_varnode(func, node, variables)

//     # We must call `getInputs()` on PcodeOpASTs
//     elif type(node) == PcodeOpAST:
//         nodes = list(node.getInputs())
//         for node in nodes:
//             if type(node.getHigh()) == HighLocal:
//                 variables.append(node.getHigh().getName())
//             else:
//                 variables = get_vars_from_varnode(func, node, variables)

//     return variables

## Working with Basic Blocks
Basic Blocks are collections of continuous non-branching instructions within Functions. They are joined by conditional and non-conditional branches, revealing valuable information about a program and function's code flow. This section deals with examples working with Basic Block models.

In [19]:
import ghidra.program.model.block.BasicBlockModel
import ghidra.util.task.ConsoleTaskMonitor

val FUNC_NAME="main"
val blockModel = BasicBlockModel(currentProgram)
val monitor = ConsoleTaskMonitor()
val func = getGlobalFunctions(FUNC_NAME)[0]

println("Basic block details for function '${FUNC_NAME}':")
val blocks = blockModel.getCodeBlocksContaining(func.body, monitor)

//  print first block
println("\t[*] ${func.entryPoint} ")
blocks.forEach {
    val dest = it.getDestinations(monitor)
    while(dest.hasNext()) {
        val dbb = dest.next()
//         For some odd reason `getCodeBlocksContaining()` and `.next()`
//         return the root basic block after CALL instructions (x86). To filter
//         these out, we use `getFunctionAt()` which returns `None` if the address
//         is not the entry point of a function. See:
//         https://github.com/NationalSecurityAgency/ghidra/issues/855
        if(getFunctionAt(dbb.getDestinationAddress()) == null)
            println("\t[*] ${dbb} ")
    }
}

Basic block details for function 'main':
	[*] 00101169 
	[*] 0010117c -> 001011a1 
	[*] 00101182 -> 0010119c 
	[*] 00101182 -> 00101184 
	[*] 0010119a -> 0010119d 
	[*] 0010119c -> 0010119d 
	[*] 0010119d -> 001011a1 
	[*] 001011a5 -> 0010117e 
	[*] 001011a5 -> 001011a7 
	[*] 001011a7 -> 001011cb 
	[*] 001011ad -> 001011b1 
	[*] 001011ad -> 001011af 
	[*] 001011af -> 001011cb 
	[*] 001011c7 -> 001011cb 
	[*] 001011cf -> 001011a9 
	[*] 001011cf -> 001011d1 
	[*] 001011d5 -> 001011df 
	[*] 001011d5 -> 001011d7 
	[*] 001011db -> 001011ed 
	[*] 001011db -> 001011dd 
	[*] 001011dd -> 001011fb 
	[*] 001011eb -> 00101208 
	[*] 001011f9 -> 00101208 
	[*] 00101207 -> 00101208 


## Working with the Decompiler

### Print decompiled code for a specific function
Ghidra's decompiler is an exceptional resource. In certain cases you might want to extract the decompiler output for a list of functions. Here's an easy way to gather that information.  Just add your own file I/O code to dump everything to individual files for analysis.

In [20]:
import ghidra.util.task.ConsoleTaskMonitor
import ghidra.app.decompiler.*

val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

val func = getGlobalFunctions("main")[0]
val results = ifc.decompileFunction(func, 0, ConsoleTaskMonitor())
println(results.getDecompiledFunction().getC())


int main(void)

{
  int i;
  
  for (i = 0; i < 10; i = i + 1) {
    if (i != 5) {
      printf("%d\n",(ulong)(uint)i);
    }
  }
  while (0 < i) {
    if (i != 5) {
      printf("%d\n",(ulong)(uint)i);
      i = i + -1;
    }
  }
  if (i == 0) {
    puts("zero");
  }
  else if (i == 1) {
    puts("one");
  }
  else {
    puts("default");
  }
  return 0;
}




### Getting variable information from the decompiler
Ghidra's decompiler performs a lot of analysis in order to recover variable information.  If you're interested in getting this information you'll need to use the decompiler interface to get a high function and its symbols.  Once you have this data, you can enumerate the symbols and retrieve information about variables in the target function.  Let's take a look at an example decompiled function:


```c++
undefined8 func(int param_1,int param_2)
{
  long in_FS_OFFSET;
  uint auStack88 [8];
  undefined4 auStack56 [10];
  long local_10;
  
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  auStack56[param_1] = 1;
  printf("%d\n",(ulong)auStack88[param_2]);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
    __stack_chk_fail();
  }
  return 0;
}
```

The decompiled function above shows two stack variables that seem interesting to us; auStack88 and auStack56.  Let's get that information programmatically.

In [21]:
import ghidra.util.task.ConsoleTaskMonitor
import ghidra.app.decompiler.*

val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

val func = getGlobalFunctions("main")[0]
val results = ifc.decompileFunction(func, 0, ConsoleTaskMonitor())
val highFunc = results.getHighFunction()
val localSymbolMap = highFunc.localSymbolMap
val symbols = localSymbolMap.symbols

symbols.withIndex().forEach { 
    println("\nSymbol ${it.index + 1}:")
    it.value.run {
        println("  name:         ${name}")
        println("  dataType:     ${dataType}")
        println("  getPCAddress: 0x${pcAddress}")
        println("  size:         ${size}")
        println("  storage:      ${storage}")
        println("  parameter:    ${isParameter}")
        println("  readOnly:     ${isReadOnly}")
        println("  typeLocked:   ${isTypeLocked}")
        println("  nameLocked:   ${isNameLocked}")
        println("  slot:         ${this.categoryIndex}")
    }
    
//     println("  parameter:    ${it.value.parameter}")
//     println("  readOnly:     ${it.value.readOnly}")
//     println("  typeLocked:   ${it.value.typeLocked}")
//     println("  nameLocked:   ${it.value.nameLocked}")
//     println("  slot:         ${it.value.slot}")
}


Symbol 1:
  name:         i
  dataType:     int
  getPCAddress: 0xnull
  size:         4
  storage:      Stack[-0xc]:4
  parameter:    false
  readOnly:     false
  typeLocked:   true
  nameLocked:   true
  slot:         -1


## Working with Comments

### Get all Automatic comments for a function

Ghidra adds "automatic comments" (light gray in color) in the EOL field. Here's how you can access those comments.

In [22]:
import ghidra.app.util.DisplayableEol

val listing = currentProgram.listing
val func = getGlobalFunctions("frame_dummy")[0]
val addrSet = func.body
val codeUnits = listing.getCodeUnits(addrSet, true)

codeUnits.forEach {
    val deol = DisplayableEol(it, true, true, true, true, 5, true, true)
    if (deol.hasAutomatic()) {
        val ac = deol.getAutomaticComment()
        println(ac::class.simpleName)
        println(ac.joinToString())
        println(ac[0])
    }
}

Array
undefined register_tm_clones()
undefined register_tm_clones()


### Get specific comment types for all functions

Ghidra supports 5 unique comment types users can add to their projects. This snippet shows you show to print all comments by type. This snippet is a slightly modified version of what user `u/securisec` posted in the Ghidra subreddit, `r/ghidra`. Thanks!


In [23]:
val functionManager = currentProgram.getFunctionManager()
val listing = currentProgram.getListing()
val funcs = functionManager.getFunctions(true) // True means iterate forward

// val commentTypesMap = mapOf (
//     0 to "EOL", 
//     1 to "PRE", 
//     2 to "POST",
//     3 to "PLATE",
//     4 to "REPEATABLE",
// )
val commentTypes = listOf("EOL", "PRE", "POST", "PLATE", "REPEATABLE")

funcs.forEach {
    val addrSet = it.body
    val codeUnits = listing.getCodeUnits(addrSet, true)
    codeUnits.forEach { 
        codeUnit -> {
            commentTypes.forEachIndexed { 
                index, value -> {
                    val comment = codeUnit.getComment(index + 1)
                    comment.run {
                        println("[${it.name}: {codeUnit.address}] ${value}: {comment}")
                    }
                }
                
            }
        }
        
    }
}

## Working with PCode

### Emulating a function
Instruction emulation is an extremely powerful technique to asist in static code analysis.  Rarely however, do we have the full memory context in which dynamic code executes. So while emulation can bring an element of 'dynamic' analysis to static RE, it's typically plagued with problems of unknown memory state. For simple code this might be no problem. For object oriented code this can be a major difficulty.  Either way, some element of emulation can help tremendously in speeding up analysis.  Ghidra uses its internal intermediate representation (PCode) to define what instructions do. Emulation is the process of tracking these changes in a cumulative state. Unfortunely Ghidra doesn't provide fancy GUI controls around the emulator (yet), but it's fully scriptable. Ghidra v9.1 added emprovements to the `EmulatorHelper` class, which is really quite amazing. Here's a simple example of what you can do with it.

In [24]:
import ghidra.app.emulator.EmulatorHelper
import ghidra.program.model.symbol.SymbolUtilities

fun getSymbolAddress(symbolName:String) = SymbolUtilities.getLabelOrFunctionSymbol(currentProgram, symbolName, null)?.address

val paramRegisterList = currentProgram.programContext.registers

In [25]:
val CONTROLLED_RETURN_OFFSET = 0

// Identify function to be emulated
val mainFunctionEntry = getSymbolAddress("main")

//  Establish emulation helper, please check out the API docs
//  for `EmulatorHelper` - there's a lot of helpful things
//  to help make architecture agnostic emulator tools.
val emuHelper = EmulatorHelper(currentProgram)

// Set controlled return location so we can identify return from emulated function
val controlledReturnAddress = toAddr(CONTROLLED_RETURN_OFFSET)

// Set initial RIP
val mainFunctionEntryLong = mainFunctionEntry!!.offset
emuHelper.writeRegister(emuHelper.pcRegister, mainFunctionEntryLong)

// For x86_64 `registers` contains 872 registers! You probably don't
// want to print all of these. Just be aware, and print what you need.
// To see all supported registers. just print `registers`.
// We won't use this, it's just here to show you how to query
// valid registers for your target architecture.
val registers = paramRegisterList

//  Here's a list of all the registers we want printed after each
//  instruction. Modify this as you see fit, based on your architecture.
val regFilter = listOf(
    "RIP", "RAX", "RBX", "RCX", "RDX", "RSI", "RDI", 
    "RSP", "RBP", "rflags"
)

//  Setup your desired starting state. By default, all registers
//  and memory will be 0. This may or may not be acceptable for
//  you. So please be aware.
emuHelper.writeRegister("RAX", 0x20)
emuHelper.writeRegister("RSP", 0x000000002FFF0000)
emuHelper.writeRegister("RBP", 0x000000002FFF0000)

//  There are a couple of ways to write memory, use `writeMemoryValue` if you want
//  to set a small typed value (e.g. uint64). Use `writeMemory` if you're mapping in
//  a lot of memory (e.g. from a debugger memory dump). Note that each of these
//  methods write with different endianess, see the example output.
emuHelper.writeMemoryValue(toAddr(0x000000000008C000), 4, 0x99AABBCC)  // writes big endian
emuHelper.writeMemory(toAddr(0x00000000000CF000), ubyteArrayOf(0x99u, 0xaau, 0xbbu, 0xccu).toByteArray())//"\x99\xAA\xBB\xCC") // writes little endian

// You can verify writes worked, or just read memory at select points
// during emulation. Here's a couple of examples:
val mem1 = emuHelper.readMemory(toAddr(0x000000000008C000), 4)
val mem2 = emuHelper.readMemory(toAddr(0x00000000000CF000), 4)

println("Memory at 0x000000000008C000: ${mem1.joinToString("") { "%02x".format(it) }}")
println("Memory at 0x00000000000CF000: ${mem2.joinToString("") { "%02x".format(it) }}")

println("Emulation starting at 0x${mainFunctionEntry}")

while(monitor.isCancelled() == false) {
    // Check the current address in the program counter, if it's
    // zero (our `CONTROLLED_RETURN_OFFSET` value) stop emulation.
    // Set this to whatever end target you want.
    val executionAddress = emuHelper.getExecutionAddress()
    if(executionAddress == controlledReturnAddress) {
        println("Emulation complete.")
        break
    }
    
    // Print current instruction and the registers we care about
    println("Address: 0x${executionAddress} (${getInstructionAt(executionAddress)})")
    regFilter.forEach {
        println("  ${it} = 0x%016x".format(emuHelper.readRegister(it)))
    }
    // single step emulation
    val success = emuHelper.step(monitor)
    if (success == false) {
        val lastError = emuHelper.getLastError()
        printerr("Emulation Error: '${lastError}'")
        break
    }
}

// Cleanup resources and release hold on currentProgram
emuHelper.dispose()

Memory at 0x000000000008C000: ccbbaa99
Memory at 0x00000000000CF000: 99aabbcc
Emulation starting at 0x00101169
Address: 0x00101169 (ENDBR64)
  RIP = 0x0000000000101169
  RAX = 0x0000000000000020
  RBX = 0x0000000000000000
  RCX = 0x0000000000000000
  RDX = 0x0000000000000000
  RSI = 0x0000000000000000
  RDI = 0x0000000000000000
  RSP = 0x000000002fff0000
  RBP = 0x000000002fff0000
  rflags = 0x0000000000000000
Address: 0x0010116d (PUSH RBP)
  RIP = 0x000000000010116d
  RAX = 0x0000000000000020
  RBX = 0x0000000000000000
  RCX = 0x0000000000000000
  RDX = 0x0000000000000000
  RSI = 0x0000000000000000
  RDI = 0x0000000000000000
  RSP = 0x000000002fff0000
  RBP = 0x000000002fff0000
  rflags = 0x0000000000000000
Address: 0x0010116e (MOV RBP,RSP)
  RIP = 0x000000000010116e
  RAX = 0x0000000000000020
  RBX = 0x0000000000000000
  RCX = 0x0000000000000000
  RDX = 0x0000000000000000
  RSI = 0x0000000000000000
  RDI = 0x0000000000000000
  RSP = 0x000000002ffefff8
  RBP = 0x000000002fff0000
  rfl

### Dumping Raw PCode
PCode exists in two primary forms you as a user should consider, "raw" and "refined".  In documentation both forms are simply referred to as "PCode" making it confusing to talk about - so I distinguish between the forms using raw and refined. Just know theses are not universally accepted terms. 

So raw PCode is the first pass, and the form that's displayed in the "Listing" pane inside the Ghidra UI.  It's extremely verbose and explicit. This is the form you want to use when emulating, if you're writing a symbolic executor, or anything of the sort.  If you want details from the decompiler passes, you want to analyze refined PCode, not this stuff!  So what does it look like and how do you access it? Let's take a look.

In [26]:
getGlobalFunctions("main")[0]::class

class ghidra.program.database.function.FunctionDB

In [27]:
import ghidra.program.database.function.FunctionDB
fun dumpRawPcode(func: FunctionDB) {
    val funcBody = func.body
    val listing = currentProgram.getListing()
    val ops = listing.getInstructions(funcBody, true)
    ops.forEach {
        val rawPcode = it.pcode
        println("$it")
        rawPcode.forEach {
            println("  ${it}")
        }
    }
}
dumpRawPcode(getGlobalFunctions("main")[0] as FunctionDB)

ENDBR64
PUSH RBP
  (unique, 0xed00, 8) COPY (register, 0x28, 8)
  (register, 0x20, 8) INT_SUB (register, 0x20, 8) , (const, 0x8, 8)
   ---  STORE (const, 0x1b1, 8) , (register, 0x20, 8) , (unique, 0xed00, 8)
MOV RBP,RSP
  (register, 0x28, 8) COPY (register, 0x20, 8)
SUB RSP,0x10
  (register, 0x200, 1) INT_LESS (register, 0x20, 8) , (const, 0x10, 8)
  (register, 0x20b, 1) INT_SBORROW (register, 0x20, 8) , (const, 0x10, 8)
  (register, 0x20, 8) INT_SUB (register, 0x20, 8) , (const, 0x10, 8)
  (register, 0x207, 1) INT_SLESS (register, 0x20, 8) , (const, 0x0, 8)
  (register, 0x206, 1) INT_EQUAL (register, 0x20, 8) , (const, 0x0, 8)
  (unique, 0x13180, 8) INT_AND (register, 0x20, 8) , (const, 0xff, 8)
  (unique, 0x13200, 1) POPCOUNT (unique, 0x13180, 8)
  (unique, 0x13280, 1) INT_AND (unique, 0x13200, 1) , (const, 0x1, 1)
  (register, 0x202, 1) INT_EQUAL (unique, 0x13280, 1) , (const, 0x0, 1)
MOV dword ptr [RBP + -0x4],0x0
  (unique, 0x3100, 8) INT_ADD (register, 0x28, 8) , (const, 0xffffff

  (unique, 0x13280, 1) INT_AND (unique, 0x13200, 1) , (const, 0x1, 1)
  (register, 0x202, 1) INT_EQUAL (unique, 0x13280, 1) , (const, 0x0, 1)
CMP dword ptr [RBP + -0x4],0x0
  (unique, 0x3100, 8) INT_ADD (register, 0x28, 8) , (const, 0xfffffffffffffffc, 8)
  (unique, 0xbf80, 4) LOAD (const, 0x1b1, 4) , (unique, 0x3100, 8)
  (register, 0x200, 1) INT_LESS (unique, 0xbf80, 4) , (const, 0x0, 4)
  (unique, 0xbf80, 4) LOAD (const, 0x1b1, 4) , (unique, 0x3100, 8)
  (register, 0x20b, 1) INT_SBORROW (unique, 0xbf80, 4) , (const, 0x0, 4)
  (unique, 0xbf80, 4) LOAD (const, 0x1b1, 4) , (unique, 0x3100, 8)
  (unique, 0x29300, 4) INT_SUB (unique, 0xbf80, 4) , (const, 0x0, 4)
  (register, 0x207, 1) INT_SLESS (unique, 0x29300, 4) , (const, 0x0, 4)
  (register, 0x206, 1) INT_EQUAL (unique, 0x29300, 4) , (const, 0x0, 4)
  (unique, 0x13180, 4) INT_AND (unique, 0x29300, 4) , (const, 0xff, 4)
  (unique, 0x13200, 1) POPCOUNT (unique, 0x13180, 4)
  (unique, 0x13280, 1) INT_AND (unique, 0x13200, 1) , (const, 0

### Dumping Refined PCode
PCode exists in two primary forms you as a user should consider, "raw" and "refined".  In documentation both forms are simply referred to as "PCode" making it confusing to talk about - so I distinguish between the forms using raw and refined. Just know theses are not universally accepted terms. 

So refined PCode is heavily processed. It highly relates to the output you see in the decompiler, and if you're interested in making use of the Ghidra decompiler passes, this is the form of PCode you'll want to analyze. There are many interesting aspects of refined PCode we do not cover here, including `unique` values and name spaces. Just know that what might appear to be simple has a lot of analysis backing it and digging into these refined PCode elements are worth your time.

In [28]:
import ghidra.util.task.ConsoleTaskMonitor
import ghidra.app.decompiler.*
import ghidra.program.database.function.FunctionDB
import ghidra.program.model.pcode.HighFunction

fun FunctionDB.getHighFunction():HighFunction {
    val options = DecompileOptions()
    val monitor = ConsoleTaskMonitor()
    val ifc = DecompInterface()
    ifc.setOptions(options)
    ifc.openProgram(currentProgram)
    // Setting a simplification style will strip useful `indirect` information.
    // Please don't use this unless you know why you're using it.
    ifc.setSimplificationStyle("normalize") 
    val res = ifc.decompileFunction(this, 60, monitor)
    val high = res.getHighFunction()
    return high
}

fun FunctionDB.dumpRefinedPcode() {
    val highFunc = getHighFunction()
    val opIter = highFunc.pcodeOps
    opIter.forEach {
//         print(it::class)
        println("0x${it.seqnum.target}: ${it}")
    }
}

(getGlobalFunctions("main")[0] as FunctionDB).dumpRefinedPcode()

0x00101175: (stack, 0xfffffffffffffff4, 4) COPY (const, 0x0, 4)
0x0010117c:  ---  BRANCH (ram, 0x1011a1, 1)
0x0010117e: (register, 0x206, 1) INT_EQUAL (stack, 0xfffffffffffffff4, 4) , (const, 0x5, 4)
0x00101182:  ---  CBRANCH (ram, 0x10119c, 1) , (register, 0x206, 1)
0x00101187: (register, 0x30, 8) INT_ZEXT (stack, 0xfffffffffffffff4, 4)
0x00101195:  ---  CALL (ram, 0x101070, 8) , (const, 0x102004, 8) , (register, 0x30, 8)
0x0010119a:  ---  BRANCH (ram, 0x10119d, 1)
0x0010119d: (unique, 0xbf80, 4) INT_ADD (stack, 0xfffffffffffffff4, 4) , (const, 0x1, 4)
0x001011a1: (stack, 0xfffffffffffffff4, 4) MULTIEQUAL (stack, 0xfffffffffffffff4, 4) , (unique, 0xbf80, 4)
0x001011a5: (unique, 0xd100, 1) INT_SLESS (stack, 0xfffffffffffffff4, 4) , (const, 0xa, 4)
0x001011a5:  ---  CBRANCH (ram, 0x10117e, 1) , (unique, 0xd100, 1)
0x001011a9: (register, 0x206, 1) INT_EQUAL (stack, 0xfffffffffffffff4, 4) , (const, 0x5, 4)
0x001011ad:  ---  CBRANCH (ram, 0x1011b1, 1) , (register, 0x206, 1)
0x001011b4: (re

## Tokenize instructions

In [29]:
currentProgram.getExecutablePath()
val basicBlockModel = BasicBlockModel(currentProgram, true)
val listing = currentProgram.getListing()
val functions = currentProgram.getFunctionManager().getFunctions(true)
functions.filter { !it.isThunk() }.forEach {
    println("Function: ${it.name}")
    val basicBlocks = basicBlockModel.getCodeBlocksContaining(it.body, monitor)
    basicBlocks.forEach { 
        val insts = listing.getInstructions(it, true)
        val tokens = insts.map {
            val list = mutableListOf<String>(it.getMnemonicString())
            val num = it.getNumOperands()
            for(i in (0 until num)) {
                list += it.getDefaultOperandRepresentation(i)
            }
            list
        }
        println(tokens)
    }
    println()
}


Function: _init
[[ENDBR64], [SUB, RSP, 0x8], [MOV, RAX, qword ptr [0x00103fe8]], [TEST, RAX, RAX], [JZ, 0x00101016]]
[[CALL, RAX]]
[[ADD, RSP, 0x8], [RET]]

Function: FUN_00101020
[[PUSH, qword ptr [0x00103fb8]], [JMP, qword ptr [0x00103fc0]]]

Function: _start
[[ENDBR64], [XOR, EBP, EBP], [MOV, R9, RDX], [POP, RSI], [MOV, RDX, RSP], [AND, RSP, -0x10], [PUSH, RAX], [PUSH, RSP], [LEA, R8, [0x101280]], [LEA, RCX, [0x101210]], [LEA, RDI, [0x101169]], [CALL, qword ptr [0x00103fe0]], [HLT]]

Function: deregister_tm_clones
[[LEA, RDI, [0x104010]], [LEA, RAX, [0x104010]], [CMP, RAX, RDI], [JZ, 0x001010d8]]
[[MOV, RAX, qword ptr [0x00103fd8]], [TEST, RAX, RAX], [JZ, 0x001010d8]]
[[JMP, RAX]]
[[RET]]

Function: register_tm_clones
[[LEA, RDI, [0x104010]], [LEA, RSI, [0x104010]], [SUB, RSI, RDI], [MOV, RAX, RSI], [SHR, RSI, 0x3f], [SAR, RAX, 0x3], [ADD, RSI, RAX], [SAR, RSI, 0x1], [JZ, 0x00101118]]
[[MOV, RAX, qword ptr [0x00103ff0]], [TEST, RAX, RAX], [JZ, 0x00101118]]
[[JMP, RAX]]
[[RET]]

Func

## Find Main

In [30]:
import ghidra.app.decompiler.DecompileOptions
import ghidra.app.decompiler.DecompInterface
import ghidra.util.task.ConsoleTaskMonitor

val entry = getGlobalFunctions("entry")[0]
val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

// val func = getFunctionContaining(TARGET_ADDR)
val res = ifc.decompileFunction(entry, 60, monitor)
val high_func = res.getHighFunction()
val pcodeops = high_func.getPcodeOps()
fun runTransaction(msg:String, transaction:() -> Unit) {
    val id = currentProgram.startTransaction(msg)
    transaction()
    currentProgram.endTransaction(id, true)
}
pcodeops.forEach { 
    if(it.mnemonic == "CALL") {
        println(it)
        val mainAddress = toAddr(it.getInput(1).def.getInput(1).offset)
        val mainFunc = getFunctionAt(mainAddress)
        runTransaction("Find main") {
            mainFunc.setName("main", ghidra.program.model.symbol.SourceType.USER_DEFINED)
        }
        println(mainFunc)
    }
}

 ---  CALL (ram, 0x105018, 8) , (unique, 0x10000021, 8) , (stack, 0x0, 8) , (register, 0x20, 8) , (unique, 0x10000029, 8) , (unique, 0x10000031, 8) , (register, 0x10, 8) , (register, 0x20, 8)
main


## Rename Local Variable

In [32]:
fun runTransaction(msg:String, transaction:() -> Unit) {
    val id = currentProgram.startTransaction(msg)
    transaction()
    currentProgram.endTransaction(id, true)
}
val OLD_NAME = "local_c"
val NEW_NAME = "i"
val func = getGlobalFunctions("main")[0]
val localVars = func.localVariables
localVars.filter {
    it.name == OLD_NAME
}.forEach {
    runTransaction("rename variable") {
        it.setName(NEW_NAME, ghidra.program.model.symbol.SourceType.USER_DEFINED)
    }
}

In [53]:
import ghidra.program.model.data.*;
import ghidra.app.services.DataTypeManagerService;

// val dataTypeManager = currentProgram.dataTypeManager
// dataTypeManager
val service = state.tool.getService(DataTypeManagerService::class.java)
// val dataType = service.getDataType("int")
val dataTypeManagers = service.dataTypeManagers
dataTypeManagers.forEach {
//     it.allDataTypes.forEach {
//         println(it)
//     }
    val dataType = it.getDataType("int")
//     if(dataType != null) {
        println(dataType)
//     }
}
// dataTypeManager.allDataTypes.forEach { 
//     println(it.name)
// }

// val dataType = dataTypeManager.getDataType("int")

null
null
null


In [47]:
dataTypeManagers

[ghidra.program.model.data.BuiltInDataTypeManager@edab4f9, FileDataTypeManager - generic_clib_64, ghidra.program.database.data.ProgramDataTypeManager@7b419075]

In [1]:
import ghidra.app.decompiler.DecompileOptions
import ghidra.app.decompiler.DecompInterface
import ghidra.util.task.ConsoleTaskMonitor
import ghidra.app.decompiler.*
import kotlinx.serialization.*
import kotlinx.serialization.json.*

val options = DecompileOptions()
val monitor = ConsoleTaskMonitor()
val ifc = DecompInterface()
ifc.setOptions(options)
ifc.openProgram(currentProgram)

val functionManager = currentProgram.getFunctionManager()
val funcs = functionManager.getFunctions(true)

enum class TokenType {
    BREAK, COMMENT_TOKEN, FIELD_TOKEN, FUNC_NAME_TOKEN, FUNC_PROTO, FUNCTION, LABEL_TOKEN,
    OP_TOKEN, RETURN_TYPE, STATEMENT, SYNTAX_TOKEN, TOKEN, TOKEN_GROUP, TYPE_TOKEN, VARIABLE_DECL, VARIABLE_TOKEN
}

@Serializable
data class Token(
    val type: TokenType,
    val token: String?,
    val child: List<Token>?
) {
    fun flatten():List<String> {
        if (token != null) {
            return listOf(token)
        } else {
            return child?.flatMap { it.flatten() } ?: listOf<String>()
        }
//         return token ?: child?.map {it.flatten()}?.filterNotNull()?.joinToString()
    }
}


fun ClangNode.expand():Token? {
    val type = when(this) {
        is ClangBreak -> { TokenType.BREAK; return null }
        is ClangCommentToken -> { TokenType.COMMENT_TOKEN; return null }
        is ClangFieldToken -> { TokenType.FIELD_TOKEN}
        is ClangFuncNameToken -> { TokenType.FUNC_NAME_TOKEN }
        is ClangFuncProto -> { TokenType.FUNC_PROTO }
        is ClangFunction -> { TokenType.FUNCTION }
        is ClangLabelToken -> { TokenType.LABEL_TOKEN }
        is ClangOpToken -> { TokenType.OP_TOKEN }
        is ClangReturnType -> { TokenType.RETURN_TYPE }
        is ClangStatement -> { TokenType.STATEMENT }
        is ClangSyntaxToken -> {TokenType.SYNTAX_TOKEN }
        is ClangToken -> { TokenType.TOKEN }
        is ClangTokenGroup -> { TokenType.TOKEN_GROUP}
        is ClangTypeToken -> { TokenType.TYPE_TOKEN }
        is ClangVariableDecl -> { TokenType.VARIABLE_DECL }
        is ClangVariableToken -> { TokenType.VARIABLE_TOKEN }
        else -> {TokenType.TOKEN}
    }
    if (numChildren() == 0) {
        return if(!toString().isEmpty() && !toString().isBlank()) Token(type, toString(), null) else null
    } else {
        return Token(
            type,
            null, 
            (0 until numChildren()).map {
                Child(it).expand()
            }.filterNotNull()
        )
    }
}
funcs.forEach { 
    val res = ifc.decompileFunction(it, 60, monitor)
    val markUp = res.cCodeMarkup
    val tokens = markUp.expand()
    val jsonToken = Json.encodeToString(tokens)
    println(tokens?.flatten())
    println()
    println(jsonToken)
    println()
}

[int, _init, (, EVP_PKEY_CTX, *, ctx, ), {, int, iVar1, ;, iVar1, =, __gmon_start__, (, ), ;, return, iVar1, ;, }]

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"int","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"_init","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN","token":"EVP_PKEY_CTX","child":null},{"type":"OP_TOKEN","token":"*","child":null},{"type":"TOKEN","token":"ctx","child":null}]},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"TOKEN","token":"int","child":null},{"type":"TOKEN","token":"iVar1","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"STATEMENT","token":null,"child":[{"type":"TOKEN","token":"iVar1","child":null},

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"void","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"register_tm_clones","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"SYNTAX_TOKEN","token":"void","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"STATEMENT","token":null,"child":[{"type":"OP_TOKEN","token":"return","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null}]},{"type":"SYNTAX_TOKEN","token":"}","child":null}]}

[void, __do_global_dtors_aux, (, void, ), {, if, (, completed.8061, !=, '\0', ), {, return, ;, }, __cxa_finalize, (, __dso_handle, ), ;, deregister_tm_clones, (, ), ;, completed.8061, =, 1, ;, return, ;, }]

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE",

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"int","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"main","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"SYNTAX_TOKEN","token":"void","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"STATEMENT","token":null,"child":[{"type":"FUNC_NAME_TOKEN","token":"register_function","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"TOKEN","token":"\"create\"","child":null},{"type":"OP_TOKEN","token":",","child":null},{"type":"TOKEN","token":"create","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null},{"type":"STATEMENT","token":null,"child":[{"type":"FUNC_NAME_TOKEN","token":"register_function","child":null},{"type":"SYNTA


[void, __libc_start_main, (, void, ), {, halt_baddata, (, ), ;, }]

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"child":[{"type":"RETURN_TYPE","token":null,"child":[{"type":"TOKEN","token":"void","child":null}]},{"type":"FUNC_NAME_TOKEN","token":"__libc_start_main","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"SYNTAX_TOKEN","token":"void","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":"{","child":null},{"type":"TOKEN_GROUP","token":null,"child":[{"type":"STATEMENT","token":null,"child":[{"type":"OP_TOKEN","token":"halt_baddata","child":null},{"type":"SYNTAX_TOKEN","token":"(","child":null},{"type":"SYNTAX_TOKEN","token":")","child":null}]},{"type":"SYNTAX_TOKEN","token":";","child":null}]},{"type":"SYNTAX_TOKEN","token":"}","child":null}]}

[void, __gmon_start__, (, void, ), {, halt_baddata, (, ), ;, }]

{"type":"FUNCTION","token":null,"child":[{"type":"FUNC_PROTO","token":null,"c

In [37]:
import ghidra.app.decompiler.component.DecompilerUtils
val token = (currentLocation as ghidra.app.decompiler.DecompilerLocation).token

DecompilerUtils.getForwardSlice(token.pcodeOp.getInput(1))

[(unique, 0x1000003b, 8), (const, 0x10113c, 8)]

(unique, 0x1000003b, 8)