<div class="bci-header">
    
<div class="bci-header-image">
  <img src="../images/bcilogo.svg"/>
    </div>
<div class="bci-header-text">
    <div class="bci-header-class"> Ghidra Automations </div>
    <div class="bci-header-sub"> Mnemonic Frequency </div>
    <div class="bci-header-author">Dr. Kayla Afanador</div>
    
<br><br><br>

<div class="markdown-box">
<div class="markdown-text">

<div id="outline" class="outline">   
Notebook Outline 
</div>
<ol>
    <li><a href="#feat">Introduction to the Feature</a></li>
    <li><a href="#extract">Feature Extraction</a></li>
    <li><a href="#vis">Feature Visualization</a></li>
</ol>

    
 

<div class="markdown-box">
<div class="markdown-text">

<div id="feat">
<h1> Introduction to the Feature</h1>
    </div>

A mnemonic string is a short, symbolic representation of an assembly instruction. In the x86 architecture, each instruction has a corresponding mnemonic string that represents the operation it performs.

For example, in x86 assembly, the instruction: 
    
```c
mov eax, 0
```
<br>   
is used to move the value 0 into the EAX register. The corresponding mnemonic string for this instruction is "mov".

Other examples of mnemonic strings for x86 assembly instructions include:
<ul>
    <li>add</li>
    <li>jmp</li>
    <li>call</li>
    <li>ret</li>
</ul>

<div class="markdown-box">
<div class="markdown-text">

<div id="vuln">
<h2> Mnemonic Frequency and Vulnerabilies </h2>
    </div>

Mnemonic frequency can be associated with certain types of vulnerabilities. For example, certain types of buffer overflow vulnerabilities may be associated with specific sequences of assembly instructions, such as those that involve copying large amounts of data into memory locations without proper bounds checking. In such cases, the mnemonic frequency of the instruction used to copy the data (e.g. "mov", "memcpy", etc.) may be higher in vulnerable code than in non-vulnerable code.

Another example is the use of certain instruction like "jmp" or "call" could indicate a control flow hijack vulnerability, where an attacker can change the execution flow of a program to their own advantage.

<div class="markdown-box">
<div class="markdown-text">    

<div id="extract">
<h1> Feature Extraction</h1>
    </div>   

In [None]:
# GOAL: get frequency of each mnemonic in the program

import ghidra.program.model.listing

# Create an empty dictionary to store the frequency of each mnemonic
# mnemonic as key and frequency as value.
from collections import defaultdict
mnemonic_count = defaultdict(int)

# Get the listing of the current program
listing = currentProgram.getListing()

# Get the array of instructions in the program
# instructions = listing.getInstructions(currentProgram.getMinAddress(), currentProgram.getMaxAddress())
instructions = listing.getInstructions(True)

# Iterate over the instructions in the program
for instruction in instructions:
    # Get the mnemonic string
    mnemonic = instruction.getMnemonicString()
    # add to dict
    mnemonic_count[mnemonic] += 1
    
print(mnemonic_count)

In [None]:
mnemonic_count = {'ENDBR64': 8, 'SUB': 6, 'MOV': 31, 'TEST': 3, 'JZ': 6, 'CALL': 11, 'ADD': 5, 'RET': 12, 'PUSH': 16, 'JMP': 10, 'XOR': 2, 'POP': 10, 'AND': 1, 'HLT': 1, 'CMP': 3, 'SHR': 1, 'SAR': 3, 'JNZ': 2, 'LEA': 8, 'NOP': 3, 'LEAVE': 1}

<div class="markdown-box">
<div class="markdown-text">
    
<div id="vis">
<h1> Feature Visualization</h1>
    </div>
Use the pandas library and seaborn library to create a bar chart of the mnemonic count:
    

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Create a dataframe from the mnemonic_count dictionary
df = pd.DataFrame.from_dict(mnemonic_count, orient='index', columns=['count'])

In [None]:
# Use seaborn to create a bar chart of the mnemonic count
fig, ax = plt.subplots(figsize=(12, 6))
g = sns.barplot(x=df.index, y=df['count'], palette='mako')
plt.xlabel('Mnemonic', size=15)
plt.ylabel('Frequency',size=15)
g.set_title('Mnemonic Frequency',size=20)
plt.tick_params(rotation=90, labelsize=12)
sns.despine()
sns.set_style("white")
plt.show()
g.figure.savefig("mnemonicFreq.png")

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
g = sns.scatterplot(x=df.index, y=df['count'], size=df['count'], sizes=(50,200))
plt.xlabel('Mnemonic', size=15)
plt.ylabel('Frequency',size=15)
g.set_title('Mnemonic Frequency',size=20)
plt.tick_params(rotation=90, labelsize=12)
sns.despine()
sns.set_style("white")
plt.show()
g.figure.savefig("mnemonicFreqBUBBLE.png")

----