<div class="bci-header">
    
<div class="bci-header-image">
  <img src="images/bcilogo.svg"/>
    </div>
<div class="bci-header-text">
    <div class="bci-header-class"> Ghidra Automations </div>
    <div class="bci-header-sub"> Identifying Vulnerable Functions (CWE-676)</div>
    <div class="bci-header-author">Dr. Kayla Afanador</div>
    
<br><br><br>
    
</div></div>

<div class="markdown-box">
<div class="markdown-text">

<div id="outline" class="outline">   
Notebook Outline 
</div>
<ol>
    <li><a href="#intro"> Potentially Vulnerable Functions</a></li>
    <li><a href="#extract">Feature Extraction</a></li>
    <li><a href="#vis">Feature Visualization</a></li>
</ol>

<div class="markdown-box">
<div class="markdown-text">

<div id="intro">
<h1> Potentially Vulnerable Functions </h1>
    </div>

Write a Ghidra script to identify vulnerabilities related to <a href="https://cwe.mitre.org/data/definitions/676.html">CWE-676</a>, Use of Potentially Dangerous Function. 
   
    
<section class="quotes">
<blockquote class="overlayed">
The product invokes a potentially dangerous function that could introduce a vulnerability if it is used incorrectly, but the function can also be used safely. 
<cite><a href="https://cwe.mitre.org/data/definitions/676.html">CWE-676</a></cite>
</blockquote>
</section>
    

<br><br>
Banned \& known vulnerable .c functions:

<ul>
    <li> <a href="https://docs.microsoft.com/en-us/previous-versions/bb288454(v=msdn.10)?redirectedfrom=MSDN">MSDN List</a> 
    <li> <a href="https://github.com/intel/safestringlib/wiki/SDL-List-of-Banned-Functions">Banned List</a>
</ul>

</div>

<div class="markdown-box">
<div class="markdown-text">

<div id="extract">
<h1> Feature Extraction</h1>
    </div>

Our goal is to identify calls to potentially vulnerable functions. 
    
<ol>
    <li> Create a list of known vulnerable functions (e.g. <code>known_vulnerable[]</code>) </li> 
    <li> Get a list of all functions in the binary (e.g. <code>all_functions[]</code>)</li> 
    <li> See if any function from Step 2 is found in the list created during Step 1 <code>known_vulnerable[]</code>. If so, </li> 
    <ol> 
        <li> Identify any function that <i>calls</i> the known_vulnerable function </li> 
    </ol> 
</ol>

<div class="exercise">

<div class="exercise-title">
<h2>Exercise: Writing the Script </h2>
</div>


<div class="exercise-body">
    
<div class="exercise-body-subhead">
    <br/>Overview<br/> 
</div>
    
In this exercise we'll write a script to identify calls to potentially vulnerable functions. 

<b> Note: If you're unsure where to start, check out the Ghidra_API video, Section 2 </b>
    
<div class="exercise-body-subhead">
    <br>Procedure<br>
</div>

<ol class="indent-list">
    <li> Begin with the code outline and procedure that we previously discussed </li>
    <li> Complete each of the #TODOs </li> 
    </ol> 

<div class="markdown-box">
<div class="markdown-text">

    
<h3>Student Work</h3>


<div class="markdown-box">
<div class="markdown-text">

<div id="headless">
<h2> Ghidra Headless </h2>
    </div>


Source: 
    
```bash
<GhidraInstallDir>/support/analyzeHeadlessREADME.html
```
<br>

<P>
Users initiate Headless operation using the <typewriter>analyzeHeadless</typewriter> shell script. 
    
Resource: https://github.com/NationalSecurityAgency/ghidra/blob/master/Ghidra/RuntimeScripts/Common/support/analyzeHeadlessREADME.html

In [None]:
!cat ~/ghidra/support/analyzeHeadlessREADME.html

<div class="markdown-box">
<div class="markdown-text">
<details>
<summary>Ghidra Headless Command Examples</summary>
<br><br>
    
In general, you'll run the scripts using something similar to the following: 
    
```python
/GHIDRA/support/analyzeHeadless projectlocation/ TestProject -import myexes/ -analysisTimeoutPerFile num_seconds -deleteProject -scriptPath wheretosearchforscripts -postScript yourCustom.py -scriptlog my_log.log
```
---

<br>
Import a binary /binaries/binary1.exe to a local Ghidra Project named Project1. Analysis is on by default.

```python
analyzeHeadless /Users/user/ghidra/projects Project1 -import /binaries/binary1.exe
```
---

<br>
Import all *.exe binaries from a local folder to a local Ghidra project named Project1, suppressing analysis.

```python
analyzeHeadless /Users/user/ghidra/projects Project1 -import /Users/user/sourceFiles/*.exe -noanalysis
```
---

<br>   
Recursively run scripts and analysis over all the binaries in the folder folderTwo of the existing project named Project2.

```python
analyzeHeadless /Users/user/ghidra/Projects Project2/folderTwo -scriptPath /user/scripts -preScript FixupPreScript.java -process -recursive
```
---
    
<br>
Create a new project, import and analyze a file, then delete the project when done.

```python
analyzeHeadless /Users/user/ghidra/projects ANewProject -import /binaries/binary2.exe -deleteProject
```
---

<br>
Set a timeout value, in seconds, for analysis (analysis will abort if it takes longer than the set timeout value).

```python
analyzeHeadless /Users/user/ghidra/projects MyProject -import /binaries/binary2.exe -analysisTimeoutPerFile 100
```
---


<div class="markdown-box">
<div class="markdown-text">
<details>
<summary>Ghidra Headless Actions</summary>
<br><br>
    
When other parameters are specified, the following types of actions may be performed:

<ul>
  <LI><a href="#import">Import</a> a single file or directory of executable(s) (recursively or 
  non-recursively).</LI>
  <LI>Process a single file or directory of executable(s) already present in 
  an existing project (recursively or non-recursively).</LI>
  <LI>Run any number of non-GUI Ghidra pre-processing scripts on each 
  executable.</LI>
  <LI>Turn analysis on or off for each executable.</LI>
  <LI>Run any number of non-GUI Ghidra <a href="#postScript">post-processing scripts</a> on each 
  executable.</LI>
  <LI>Write to a <a href="#log">log</a> with information about each file processed; 
  <a href="#scriptLog">separated logging</a> is available for scripts.</LI>
  <LI>Keep or <a href="#deleteProject">delete</a> a created project.</LI>
  <LI>Save any changes made to the project/file, or operate in a read-only 
  manner in <a href="#import"><code>-import</code></a> or 
  <a href="#process"><code>-process</code></a> modes.</LI>
  <LI>Use pre- and/or post-processing scripts to dictate program disposition. For 
  example, scripts can dictate whether further processing (i.e., analysis or other scripts) should 
  be aborted and whether the current file should be deleted after all processing is complete.</LI>
    </ul> 
    

<div class="markdown-box">
<div class="markdown-text">
<details>
<summary>Ghidra Headless Command Line Options/Parameters</summary>
<br><br>

<PRE>
    analyzeHeadless <a href="#projLocation">&lt;project_location&gt;</a> &lt;<a href="#projName">project_name&gt;[/&lt;folder_path&gt;]</a> | ghidra://&lt;server&gt;[:&lt;port&gt;]/&lt;repository_name&gt;[/&lt;folder_path&gt;]</a>
        [[<a href="#import">-import [&lt;directory&gt;|&lt;file&gt;]+</a>] | [-process [&lt;project_file&gt;]]
        [-preScript &lt;ScriptName&gt;&nbsp;[&lt;arg&gt;]*]
        [<a href="#postScript">-postScript &lt;ScriptName&gt;&nbsp[&lt;arg&gt;]*</a>]
        [<a href="#scriptPath">-scriptPath &quot;&lt;path1&gt;[;&lt;path2&gt;...]&quot;</a>]
        [-propertiesPath &quot;&lt;path1&gt;[;&lt;path2&gt;...]&quot;]
        [<a href="#scriptLog">-scriptlog &lt;path to script log file&gt;</a>]
        [-log &lt;path to log file&gt;]
        [-overwrite]
        [<a href="#recursive">-recursive</a>]
        [-readOnly]
        [<a href="#deleteProject">-deleteProject</a>]
        [-noanalysis]
        [-processor &lt;languageID&gt;]
        [-cspec &lt;compilerSpecID&gt;]
        [<a href="#timeout">-analysisTimeoutPerFile &lt;timeout in seconds&gt;</a>]
        [-keystore &lt;KeystorePath&gt;]
        [-connect [&lt;userID&gt;]]
        [-p]
        [-commit [&quot;&lt;comment&gt;&quot;]]
        [-okToDelete]
        [-max-cpu &lt;max cpu cores to use&gt;]
        [-max-cpu &lt;max cpu cores to use&gt;]
        [loader &lt;desired loader name&gt;]
</PRE>

<ol>
    <LI>
    <a name="projLocation"><code>&lt;project_location&gt;</code></a><br>The directory 
    that either contains an existing Ghidra project (in -import or -process mode) or will contain a 
    newly created project (in -import mode for a local project).
    <br>
    <i><b>You must specify either a project location and project name, or a Ghidra Server repository URL.</b> 
    <br><br>
    <LI>
    <a name="projName"><code>&lt;project_name&gt;[/&lt;folder_path&gt;]</code></a><br>
    The name of either an existing project (in <code>-import</code> or 
    <code>-process</code> mode) or new project (in <code>-import</code> mode) 
    to be created in the above directory. If the optional folder path is included, imports will be 
    rooted under this project folder. In <code>-import</code> mode with 
    <code>-recursive</code> enabled, any folders in the folder path that do not already 
    exist in the project will be created (even if nested).
    <br>
    <i><b>You must specify either a project location and project name, or a Ghidra Server repository URL.</b></i>
    </LI>
    <br><br>
    <LI>
    <a name="import"><code>-import [&lt;directory&gt;|&lt;file&gt;]+</code></a><br>
    <i>Note: <code>-import</code> and <code>-process</code> can not both be 
    present in the parameters list.</i>
    <br>
    Specifies one or more executables (or directories of executables) to import. When importing a 
    directory, a folder with the same name will be created in the Ghidra project. When using the 
    <code>-recursive</code> parameter, each executable that is found in a recursive 
    search through the given directory will be stored in the project in the same relative location 
    (i.e., any directories found under the import directory will also be created in the project).
    <br>
    Operating system-specific wildcard characters can be used when importing files and/or directories. 
    Please see the Wildcards section for more details.
    <br>
    When importing multiple executables/directories in the same session, use one of the following 
    methods:
    <ul>
    <LI>List multiple directories and/or executables after the <code>-import</code> 
    option, separated by a space.</LI>
  <code>import /Users/myDir/peFiles /Users/myDir/otherFiles/test.exe</code>
    <LI>Repeat the <code>-import</code> option multiple times (each use of 
    <code>-import</code> may be separated by other parameters) to import from more 
    than one directory or file source.</LI>
  <code>import /Users/myDir/peFiles -recursive -import /Users/myDir/otherFiles/test.exe</code>
    </ul>
    <br><br>
    <li>
    <a name="recursive"><code>-recursive</code></a><br>
    If present, enables recursive descent into directories and project sub-folders when a directory/
    folder has been specified in <code>-import</code> or <code>-process</code> 
    modes.
    </li>
    <br><br>
    <LI>
    <a name="postScript"><code>-postScript &lt;ScriptName.ext&gt;&nbsp[&lt;arg&gt;]*</code></a><br>
    Identifies the name of a code that will execute after analysis, and an optional list 
    of arguments to pass to the script. The script name must include its file extension (i.e., 
    <code>MyScript.java</code>).
    <br>
    <B><I>This parameter expects the script name only; do not include the path to the script.</I></B> The
    Headless Analyzer searches specific default locations for the named script, but additional script 
    director(ies) may also be specified (see the <a href="#scriptPath"><code>-scriptPath</code>
    </a> argument for more information).
    <br>
    This option must be repeated to specify additional scripts. See the <a href="#scripting">Scripting</a> 
    section for a description of advanced scripting capabilities.
    </LI>
    <br><br>
    <LI>
    <a name="scriptPath"><code>-scriptPath &quot;&lt;path1&gt;[;&lt;path2&gt;...]&quot;</code></a>
    <br>Specifies the search path(s) for scripts, including secondary scripts (a script invoked from 
    another script). A path may start with <code>GHIDRA_SCRIPT</code>, which corresponds 
    to the Ghidra installation directory, or <code>USER_HOME</code>, which corresponds 
    to the user's home directory. On Unix systems, these home variables must be escaped using a 
    &apos;<code>\</code>&apos; (backslash) character.
    <br>
    Unix: <code>-scriptPath &quot;\$GHIDRA_HOME/Ghidra/Features/Base/ghidra_scripts;/myscripts&quot;</code>
    </LI>

The <code>scriptPath</code> parameter is optional. If it is not present, the 
Headless Analyzer will search the following paths for the specified script(s):
<br>
  <ul>
    <LI><code>$USER_HOME/ghidra_scripts</code></LI>
    <LI>All <code>ghidra_script</code> subdirectories that exist in the Ghidra distribution</LI>
    </ul>
    <br><br>
    <LI>
    <a name="scriptLog"><code>-scriptlog &lt;path to script log file&gt;</code></a><br>
    Sets the location of the file that stores logging information from pre- and post-scripts. If a 
    path to a script log file is not set, script logs are written to <code>script.log</code> 
    in the user directory, by default.
    </LI>
    <br><br> 
    <LI>
    <a name="deleteProject"><code>-deleteProject</code></a><br>
    If present, the Ghidra project will be deleted after scripts and/or analysis have completed 
    (only applies if the project has been created in the current session with 
    <a href="#import"><code>-import</code></a>; existing projects are never deleted).
    This project delete option is assumed when the <code>-readOnly</code> option is specified 
    for import operations which create a new project.
    </LI>
    <br><br>
    <LI>
    <a name="timeout"><code>-analysisTimeoutPerFile &lt;timeout in seconds&gt;</code></a><br>
    Sets a timeout value (in seconds) for analysis. If analysis on a file exceeds the specified time, 
    analysis is interrupted and processing continues as scheduled (i.e., to the 
    <a href="#postScript"><code>-postScript</code></a> stage, if specified). Results 
    from individual analyzers that have completed processing prior to timeout will still be saved 
    with the program. Post-scripts can be used to detect that analysis has timed out (in Headless 
    processing ONLY) by calling the <code>getHeadlessAnalysisTimeoutStatus()</code> method. 
    </LI>


<div class="exercise">

<div class="exercise-title">
<h2>Exercise: Headless </h2>
</div>


<div class="exercise-body">
    
<div class="exercise-body-subhead">
    <br/>Overview<br/> 
</div>
    
In this exercise we'll use Ghidra in "headless" mode. 
    
    
<div class="exercise-body-subhead">
    <br>Procedure<br>
</div>

<ol class="indent-list">
    <li> Execute the cells in the student work section to verify that you can use Ghidra in headless mode (from Jupyter) </li> 
    <li> Assumptions: </li> 
        <ul> 
            <li> your script is named: <code>vulnerable_functions.py</code> </li> 
            <li> your script is saved at: <code>~/Desktop/jupyter/ghidra_headless/ </code>  </li> 
        </ul>
    </ol> 

<div class="markdown-box">
<div class="markdown-text">

    
<h3>Student Work</h3>


In [None]:
ghidraHeadlessScipts = "~/Desktop/jupyter/ghidra_headless/"

In [None]:
executables_path = ghidraHeadlessScipts + 'executables'
ghidraHeadless_path = '~/ghidra/support/analyzeHeadless'
tempProject_path = ghidraHeadlessScipts
pythonScript_path = ghidraHeadlessScipts + 'vulnerable_functions.py'
analysisTimeout = 30
log_name = 'my_log.log'

In [None]:
!{ghidraHeadless_path} {tempProject_path} TempProject -import {executables_path} -analysisTimeoutPerFile {analysisTimeout} -deleteProject -scriptPath {tempProject_path} -postScript {pythonScript_path} -scriptlog {log_name}

<div class="markdown-box">
<div class="markdown-text">

<div id="headless">
<h2> Preparing for Batch Analysis </h2>
    </div>

As we prepare for batch analysis, we need to consider how the data will be processed. Let's assume we run our script against 100 binaries, and each of those binaries has a function called main. We may end up with a list of potentially vulnerable functions and no way to distinguish which program they're in. 

<div class="exercise">

<div class="exercise-title">
<h2>Exercise: Adding Metadata </h2>
</div>

<div class="exercise-body">
    
<div class="exercise-body-subhead">
    <br/>Overview<br/> 
</div>
    
In this exercise we'll extract metadata about each binary.
    
<div class="exercise-body-subhead">
    <br>Procedure<br>
</div>

<ol class="indent-list">
    <li> Open ghidra_basics.py (script manager > examples > ghidra_basics > basic editor)</li>
    <li> Copy the section with program info </li>
    <li> Modify our existing script to extract the following: </li>
    <ul> 
        <li> Program Name </li> 
        <li> Creation Date </li> 
        <li> MD5 </li> 
    </ul> 
    <li> Write all data to a CSV for processing</li>
</ol>

<div class="markdown-box">
<div class="markdown-text">

    
<h3>Student Work</h3>
