Merge branch 'develop'

gaasedelen · Apr 23, 2020 · 710b13f · 710b13f
2 parents 675cc87 + 69a595a
commit 710b13f
Show file tree

Hide file tree

Showing 74 changed files with 22,197 additions and 3,168 deletions.
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2017-2018 Markus Gaasedelen
+Copyright (c) 2017-2020 Markus Gaasedelen
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -5,14 +5,15 @@
 
 ## Overview
 
-Lighthouse is a code coverage plugin for [IDA Pro](https://www.hex-rays.com/products/ida/), and [Binary Ninja](https://binary.ninja/). The plugin makes use of interactive disassemblers to map, explore, and visualize externally collected code coverage data when symbols or source may not be available for a given binary.
+Lighthouse is a powerful code coverage plugin for [IDA Pro](https://www.hex-rays.com/products/ida/) and [Binary Ninja](https://binary.ninja/). As an extension of the leading disassemblers, this plugin enables one to interactively explore code coverage data in new and innovative ways when symbols or source may not be available for a given binary.
 
 This plugin is labeled only as a prototype & code resource for the community. 
 
 Special thanks to [@0vercl0k](https://twitter.com/0vercl0k) for the inspiration.
 
 ## Releases
 
+* v0.9 -- Python 3 support, custom coverage formats, coverage cross-refs, theming subsystem, much more.
 * v0.8 -- Binary Ninja support, HTML coverage reports, consistent styling, many tweaks, bugfixes.
 * v0.7 -- Frida, C++ demangling, context menu, function prefixing, tweaks, bugfixes.
 * v0.6 -- Intel pintool, cyclomatic complexity, batch load, bugfixes.
@@ -22,113 +23,111 @@ Special thanks to [@0vercl0k](https://twitter.com/0vercl0k) for the inspiration.
 * v0.2 -- Multifile support, performance improvements, bugfixes.
 * v0.1 -- Initial release
 
-# IDA Pro Installation
+# Installation
 
-Lighthouse is a cross-platform (Windows, macOS, Linux) python plugin, supporting IDA Pro 6.8 and newer.
+Lighthouse is a cross-platform (Windows, macOS, Linux) Python 2/3 plugin. It takes zero third party dependencies, making the code both portable and easy to install.
 
-- Copy the contents of the `plugin` folder to the IDA plugins folder
-    - On Windows, the folder is at `C:\Program Files (x86)\IDA 6.8\plugins`
-    - On macOS, the folder is at `/Applications/IDA\ Pro\ 6.8/idaq.app/Contents/MacOS/plugins`
-    - On Linux, the folder may be at `/opt/IDA/plugins/`
+1. From your disassembler's python console, run the following command to find its plugin directory:
+   - **IDA Pro**: `os.path.join(idaapi.get_user_idadir(), "plugins")`
+   - **Binary Ninja**: `binaryninja.user_plugin_path()`
 
-It has been primarily developed and tested on Windows, so that is where we expect the best experience.
-
-# Binary Ninja Installation (Experimental)
-
-At this time, support for Binary Ninja is considered experimental. Please feel free to report any bugs that you encounter.
-
-You can install Lighthouse & PyQt5 for Binary Ninja by following the instructions below.
-
-## Windows Installation
-
-1. Install PyQt5 from a Windows command prompt with the following command:
-
-```
-pip install --target="%appdata%\Binary Ninja\plugins\Lib\site-packages" python-qt5
-```
-
-2. Copy the contents of the `/plugin/` folder in this repo to your Binary Ninja [plugins folder](https://docs.binary.ninja/guide/plugins/index.html#using-plugins).
-
-## Linux Installation
-
-1. Install PyQt5 from a Linux shell with the following command:
-
-```
-sudo apt install python-pyqt5
-```
-
-2. Copy the contents of the `/plugin/` folder in this repo to your Binary Ninja [plugins folder](https://docs.binary.ninja/guide/plugins/index.html#using-plugins).
-
-## macOS Installation
-
-¯\\\_(ツ)\_/¯
+2. Copy the contents of this repository's `/plugin/` folder to the listed directory.
+3. Restart your disassembler.
 
 # Usage
 
-Lighthouse loads automatically when a database is opened, installing a handful of menu entries into the disassembler.
+Once properly installed, there will be a few new menu entries available in the disassembler. These are the entry points for a user to load coverage data and start using Lighthouse.
 
 <p align="center">
 <img alt="Lighthouse Menu Entries" src="screenshots/open.gif"/>
 </p>
 
-These are the entry points for a user to load and view coverage data.
+Lighthouse is able to load a few different 'flavors' of coverage data. To generate coverage data that can be loaded into Lighthouse, please look at the [README](https://github.com/gaasedelen/lighthouse/tree/master/coverage) in the coverage directory of this repository.
 
 ## Coverage Painting
 
-Lighthouse 'paints' the active coverage data across the three major IDA views as applicable. Specifically, the Disassembly, Graph, and Pseudocode views.
+While Lighthouse is in use, it will 'paint' the active coverage data across all of the code viewers available in the disassembler. Specifically, this will apply to your linear disassembly, graph, and decompiler windows.
 
 <p align="center">
 <img alt="Lighthouse Coverage Painting" src="screenshots/painting.png"/>
 </p>
 
-In Binary Ninja, only the Disassembly and Graph views are supported.
+In Binary Ninja, only the linear disassembly, graph, and IL views are supported. Support for painting decompiler output in Binary Ninja will be added to Lighthouse in the *near future* as the feature stabilizes.
 
-## Coverage Overview
+# Coverage Overview
 
-The Coverage Overview is a dockable widget that provides a function level view of the active coverage data for the database.
+The Coverage Overview is a dockable widget that will open up once coverage has been loaded into Lighthouse. 
 
 <p align="center">
 <img alt="Lighthouse Coverage Overview" src="screenshots/overview.png"/>
 </p>
 
-This table can be sorted by column, and entries can be double clicked to jump to their corresponding disassembly.
+This interactive widget provides a function level view of the loaded coverage data. It also houses a number of tools to manage loaded data and drive more advanced forms of coverage analysis. 
 
 ## Context Menu
 
-Right clicking the table in the Coverage Overview will produce a context menu with a few basic amenities.
+Right clicking the table in the Coverage Overview will produce a context menu with a few basic amenities to extract information from the table, or manipulate the database as part of your reverse engineering process.
 
 <p align="center">
 <img alt="Lighthouse Context Menu" src="screenshots/context_menu.gif"/>
 </p>
 
-These actions can be used to quickly manipulate or interact with entries in the table.
+If there are any other actions that you think might be useful to add to this context menu, please file an issue and they will be considered for a future release of Lighthouse.
+
+## Coverage ComboBox
+
+Loaded coverage data and user constructed compositions can be selected or deleted through the coverage combobox.
+
+<p align="center">
+<img alt="Lighthouse Coverage ComboBox" src="screenshots/combobox.gif"/>
+</p>
+
+## HTML Coverage Report
+
+Lighthouse can generate a rudimentary HTML coverage report of the active coverage. 
+A sample report can be seen [here](https://rawgit.com/gaasedelen/lighthouse/master/testcase/report.html).
 
-## Coverage Composition
+<p align="center">
+<img alt="Lighthouse HTML Report" src="screenshots/html_report.gif"/>
+</p>
+
+# Coverage Shell
 
-Building relationships between multiple sets of coverage data often distills deeper meaning than their individual parts. The shell at the bottom of the [Coverage Overview](#coverage-overview) provides an interactive means of constructing these relationships.
+At the bottom of the coverage overview window is the coverage shell. This shell can be used to perform logic-based operations that combine or manipulate the loaded coverage sets.
 
 <p align="center">
 <img alt="Lighthouse Coverage Composition" src="screenshots/shell.gif"/>
 </p>
 
-Pressing `enter` on the shell will evaluate and save a user constructed composition.
+This feature is extremely useful in exploring the relationships of program execution across multiple runs. In other words, the shell can be used to 'diff' execution between coverage sets and extract a deeper meaning that is otherwise obscured within the noise of their individual parts.
 
 ## Composition Syntax
 
 Coverage composition, or _Composing_ as demonstrated above is achieved through a simple expression grammar and 'shorthand' coverage symbols (A to Z) on the composing shell. 
 
 ### Grammar Tokens
 * Logical Operators: `|, &, ^, -`
-* Coverage Symbol: `A, B, C, ..., Z`
-* Coverage Range: `A,C`, `Q,Z`, ...
+* Coverage Symbol: `A, B, C, ..., Z, *`
 * Parenthesis: `(...)`
 
 ### Example Compositions
-* `A & B`
-* `(A & B) | C`
-* `(C & (A - B)) | (F,H & Q)`
 
-The evaluation of the composition may occur right to left, parenthesis are suggested for potentially ambiguous expressions.
+1. Executed code that is *shared* between coverage `A` and coverage `B`:
+```
+A & B
+```
+
+2. Executed code that is *unique* only to coverage `A`:
+```
+A - B
+```
+
+3. Executed code that is *unique* to `A` or `B`, but not `C`:
+```
+(A | B) - C
+```
+
+Expressions can be of arbitrary length or complexity, but the evaluation of the composition may occur right to left. So parenthesis are suggested for potentially ambiguous expressions.
 
 ## Hot Shell
 
@@ -142,7 +141,7 @@ The hot shell serves as a natural gateway into the unguided exploration of compo
 
 ## Search
 
-Using the shell, one can search and filter the functions listed in the coverage table by prefixing their query with `/`.
+Using the shell, you can search and filter the functions listed in the coverage table by prefixing their query with `/`.
 
 <p align="center">
 <img alt="Lighthouse Search" src="screenshots/search.gif"/>
@@ -158,60 +157,27 @@ Entering an address or function name into the shell can be used to jump to corre
 <img alt="Lighthouse Jump" src="screenshots/jump.gif"/>
 </p>
 
-## Coverage ComboBox
+# Coverage Cross-references (Xref)
 
-Loaded coverage data and user constructed compositions can be selected or deleted through the coverage combobox.
+While using Lighthouse, you can right click any basic block (or instruction) and use the 'Coverage Xref' action to see which coverage sets executed the selected block. Double clicking any of the listed entries will instantly switch to that coverage set.
 
 <p align="center">
-<img alt="Lighthouse Coverage ComboBox" src="screenshots/combobox.gif"/>
+<img alt="Lighthouse Xref" src="screenshots/xref.gif"/>
 </p>
 
-## HTML Coverage Report
+This pairs well with the 'Coverage Batch' feature, which allows you to quickly load and aggregate thousands of coverage files into Lighthouse. Cross-referencing a block and selecting a 'set' will load the 'guilty' set from disk as a new coverage set for you to explore separate from the batch.
 
-Lighthouse can generate a rudimentary HTML coverage report of the active coverage. 
-A sample report can be seen [here](https://rawgit.com/gaasedelen/lighthouse/master/testcase/report.html).
+# Themes
+
+Lighthouse ships with two default themes -- a 'light' theme, and a 'dark' one. Depending on the colors currently used by your disassembler, Lighthouse will attempt to select the theme that seems most appropriate.
 
 <p align="center">
-<img alt="Lighthouse HTML Report" src="screenshots/html_report.gif"/>
+<img alt="Lighthouse Themes" src="screenshots/themes.png"/>
 </p>
 
-# Collecting Coverage
-
-Before using Lighthouse, one will need to collect code coverage data for their target binary / application.
-
-The examples below demonstrate how one can use [DynamoRIO](http://www.dynamorio.org), [Intel Pin](https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool) or [Frida](https://www.frida.re) to collect Lighthouse compatible coverage against a target. The `.log` files produced by these instrumentation tools can be loaded directly into Lighthouse.
-
-## DynamoRIO
+The theme files are stored as simple JSON on disk and are highly configurable. If you are not happy with the default themes or colors, you can create your own themes and simply drop them in the user theme directory.
 
-Code coverage data can be collected via DynamoRIO's [drcov](http://dynamorio.org/docs/page_drcov.html) code coverage module. 
-
-Example usage:
-
-```
-..\DynamoRIO-Windows-7.0.0-RC1\bin64\drrun.exe -t drcov -- boombox.exe
-```
-
-## Intel Pin
-
-Using a [custom pintool](coverage/pin) contributed by [Agustin Gianni](https://twitter.com/agustingianni), the Intel Pin DBI can also be used to collect coverage data.
-
-Example usage:
-
-```
-pin.exe -t CodeCoverage64.dll -- boombox.exe
-```
-
-For convenience, binaries for the Windows pintool can be found on the [releases](https://github.com/gaasedelen/lighthouse/releases) page. macOS and Linux users need to compile the pintool themselves following the [instructions](coverage/pin#compilation) included with the pintool for their respective platforms.
-
-## Frida (Experimental)
-
-Lighthouse offers limited support for Frida based code coverage via a custom [instrumentation script](coverage/frida) contributed by [yrp](https://twitter.com/yrp604). 
-
-Example usage:
-
-```
-sudo python frida-drcov.py bb-bench
-```
+Lighthouse will remember your theme preference for future loads and uses.
 
 # Future Work
 
@@ -223,11 +189,11 @@ Time and motivation permitting, future work may include:
 * Coverage & profiling treemaps
 * ~~Additional coverage sources, trace formats, etc~~
 * Improved pseudocode painting
-* Lighthouse console access, headless usage
-* Custom themes
-* Python 3 support
+* ~~Lighthouse console access~~, headless usage
+* ~~Custom themes~~
+* ~~Python 3 support~~
 
-I welcome external contributions, issues, and feature requests.
+I welcome external contributions, issues, and feature requests. Please make any pull requests to the `develop` branch of this repository if you would like them to be considered for a future release.
 
 # Authors
 

diff --git a/coverage/README.md b/coverage/README.md
@@ -0,0 +1,79 @@
+# Collecting Coverage
+
+Before using Lighthouse, one will need to collect code coverage data for their target binary / application.
+
+The examples below demonstrate how one can use [DynamoRIO](http://www.dynamorio.org), [Intel Pin](https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool) or [Frida](https://www.frida.re) to collect Lighthouse compatible coverage against a target. The `.log` files produced by these instrumentation tools can be loaded directly into Lighthouse.
+
+## DynamoRIO
+
+Code coverage data can be collected via DynamoRIO's [drcov](http://dynamorio.org/docs/page_drcov.html) code coverage module. 
+
+Example usage:
+
+```
+..\DynamoRIO-Windows-7.0.0-RC1\bin64\drrun.exe -t drcov -- boombox.exe
+```
+
+## Intel Pin
+
+Using a [custom pintool](coverage/pin) contributed by [Agustin Gianni](https://twitter.com/agustingianni), the Intel Pin DBI can also be used to collect coverage data.
+
+Example usage:
+
+```
+pin.exe -t CodeCoverage64.dll -- boombox.exe
+```
+
+For convenience, binaries for the Windows pintool can be found on the [releases](https://github.com/gaasedelen/lighthouse/releases) page. macOS and Linux users need to compile the pintool themselves following the [instructions](coverage/pin#compilation) included with the pintool for their respective platforms.
+
+## Frida (Experimental)
+
+Lighthouse offers limited support for Frida based code coverage via a custom [instrumentation script](coverage/frida) contributed by [yrp](https://twitter.com/yrp604). 
+
+Example usage:
+
+```
+sudo python frida-drcov.py bb-bench
+```
+
+# Other Coverage Formats
+
+Lighthouse is flexible as to what kind of coverage or 'trace' file formats it can load. Below is an outline of these human-readable text formats that are arguably the easiest to output from a custom tracer. 
+
+## Module + Offset (modoff)
+
+A 'Module+Offset' coverage file / trace is a highly recommended coverage format due to its simplicity and readability:
+
+```
+boombox+3a06
+boombox+3a09
+boombox+3a0f
+boombox+3a15
+...
+```
+
+Each line of the trace represents an executed instruction or basic block in the instrumented program. The line *must* name an executed module eg `boombox.exe` and a relative offset to the executed address from the imagebase. 
+
+It is okay for hits from other modules (say, `kernel32.dll`) to exist in the trace. Lighthouse will not load coverage for them.
+
+## Address Trace (Instruction, or Basic Block)
+
+Perhaps the most primitive coverage format, Lighthouse can also consume an 'absolute address' style trace:
+
+```
+0x14000419c
+0x1400041a0
+0x1400045dc
+0x1400045e1
+0x1400045e2
+...
+```
+
+Note that these address traces can be either instruction addresses, or basic block addresses -- it does not matter. The main caveat is that addresses in the trace *must* match the address space within the disassembler database. 
+
+If an address cannot be mapped into a function in the disassembler database, Lighthouse will simply discard it.
+
+## Custom Trace Formats
+
+If you are adamant to use a completely custom coverage format, you can try to subclass Lighthouse's `CoverageFile` parser interface. Once complete, simply drop your parser into the `parsers` folder.
+
diff --git a/coverage/frida/frida-drcov.py b/coverage/frida/frida-drcov.py
@@ -227,12 +227,12 @@ def create_header(mods):
 
     header_modules = '\n'.join(entries)
 
-    return header + header_modules + '\n'
+    return ("%s%s\n" % (header, header_modules)).encode("utf-8")
 
 # take the recv'd basic blocks, finish the header, and append the coverage
 def create_coverage(data):
-    bb_header = 'BB Table: %d bbs\n' % len(data)
-    return bb_header + ''.join(data)
+    bb_header = b'BB Table: %d bbs\n' % len(data)
+    return bb_header + b''.join(data)
 
 def on_message(msg, data):
     #print(msg)
@@ -323,7 +323,7 @@ def main():
     script.on('message', on_message)
     script.load()
 
-    print('[*] Now collecting info, control-D to terminate....')
+    print('[*] Now collecting info, control-C or control-D to terminate....')
 
     sys.stdin.read()