Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELF Snapshotting and Fuzzing #102

Closed
wants to merge 7 commits into from
Closed

Conversation

Kasimir123
Copy link

PR to add support for ELF files to WTF. For full usage instructions and a demo check out the README in the linux_mode directory.

Added files:

  • raw2dmp
    • Program that converts the raw dump from qemu into a mem.dmp that can be read by WTF.
  • scripts
    • fuzzbkpt.py
      • Creates a class used to set the breakpoint in our executable file.
      • Sets up memory so that it can be dumped once the breakpoint is hit.
    • kernel.py
      • Allows us to access structures using gdb.
      • Lets us read from and write to memory.
    • qemu.py
      • Sets up the cpu command so that we can create the regs.json and symbol-store.json file after the dump is performed.
    • utils.py
      • Utility that lets us update the json files.
  • setup.sh
    • Sets up the environment for taking a snapshot of an elf file.
    • Installs dependencies for qemu and the python scripts.
    • Clones and builds the debug version of qemu for x86_64.
    • Makes the raw2dmp executable.
  • snapshot
    • bpkt.py
      • File used to set the breakpoint in our executable.
      • Needs to be updated with a file name and an address to break on.
    • gdb_client.sh
      • Connects to the remote gdb server and runs bpkt.py.
    • gdb_server.sh
      • Starts up the qemu image inside of gdb.
      • Can be used whenever you need to access the image.
    • move_to_fuzzer.sh
      • Converts the raw file to mem.dmp.
      • Creates a directory in targets and sets it up for fuzzing with required directories.
      • Moves mem.dmp, regs.json, and symbol-store.json into states.
      • Moves over the recompile_wtf script.
    • qemu_file_upload.sh
      • Moves a file from the local machine into the qemu image so that it can be used for snapshotting.
    • recompile_wtf.sh
      • Compiles the elf version of WTF and moves it into the target directory.
  • vars
    • Contains all of the variables needed by the other scripts.

Modified Files:

  • utils.cc
    • Modified the file to check for a preprocessor definition to change between the normal WTF build and the elf WTF build.
    • Tested switching between the two builds. You will need to make clean and remove the cmake cache files but it will build the two versions with the same code base.

@0vercl0k
Copy link
Owner

Woot this looks awesome, thank you for sending this in 🙏🏽🎊!

I need to finish investigating another issue that came earlier but then I will get on reviewing this 🙂.

Cheers

Copy link
Owner

@0vercl0k 0vercl0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I took a first stab at this, and to me it feels the below are the main areas that needs addressing:

  • Let's find out what's going on w/ the segment registers on Linux. Once understood this means, there is no need for the ifdef and means we don't need a specific script to build wtf in this magic configuration.
  • Let's try to merge the gdb/qemu logic in a single Python file. The way it is split is a little bit confusing (to me) and I think it'd be clearer if we could have one (or two if necessary) script instead.
  • It'd be great if we could generate the dump automatically once the breakpoint is hit. It also means we can transparently transform the raw dump into a 'dmp' file directly from Python and get rid of raw2dump, its Makefile and the relevant bits to build it into the various scripts.

Also, let me know if you are interested to work on those otherwise, I am happy to directly push changes / address the above on your branch directly :)

@@ -31,7 +31,7 @@
*.out
*.app

src/wtf/fuzzer_*
# src/wtf/fuzzer_*
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revert this change - it allows users to have their own fuzzers and is a cheap safeguard to not publish them by mistakes.

targets/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this one is relevant either.


Seg_t *Segments[] = {&CpuState.Es, &CpuState.Fs, &CpuState.Cs,
&CpuState.Gs, &CpuState.Ss, &CpuState.Ds};
#if ELF_COMPILATION != 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think we shouldn't need this. Either there is a bug in my sanitization below, or there is something weird going on in Linux's segment registers. Either or, it's something that should probably fixed somewhere else.

Do you have more details on this? Like which register fails the sanitization and what is their values? Happy to investigate.

@@ -0,0 +1,72 @@
#include "../../src/libs/kdmp-parser/src/lib/kdmp-parser-structs.h"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I think it might be better to have this being done directly in the Python script that generate the memory dump. Based on the README this is currently done manually via a connection to the QEMU monitor, but is this something that could be automated once the breakpoint is hit?

This means it's transparent for the user, it also means we don't need to figure out CMakefiles / Makefile for this.

@@ -0,0 +1 @@
raw2dmp
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be gone once raw2dump's logic is merged into the Python script.

@@ -0,0 +1,28 @@
# imports
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this file, let's merge it into a single file where all the gdb / dump logic is implemented.

Comment on lines +17 to +19
# compile raw2dmp
cd ../../raw2dmp
make
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be remove if raw2dump's logic included in a Python script.

Comment on lines +8 to +10
# convert the raw dump to mem.dmp
../raw2dmp/raw2dmp raw

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part should hopefully disappear

Comment on lines +11 to +29
# sets the target folder for fuzzing
TARGET_FOLDER=${WTF}/targets/$1

# creates the target folder
mkdir ${TARGET_FOLDER}

# create the required directories for wtf
mkdir ${TARGET_FOLDER}/crashes
mkdir ${TARGET_FOLDER}/inputs
mkdir ${TARGET_FOLDER}/outputs
mkdir ${TARGET_FOLDER}/state

# move created files into the target folder
mv mem.dmp ${TARGET_FOLDER}/state/
mv regs.json ${TARGET_FOLDER}/state/
mv symbol-store.json ${TARGET_FOLDER}/state/

# move recompilation script to target folder
cp recompile_wtf.sh ${TARGET_FOLDER}/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think some of that logic should be moved at the dump stage. It makes it one less step for the user to remember to run and it looks fairly easy to automate this.

Also removing the ifdef in wtf mean that it doesn't need to be recompiled either so we shouldn't need that part either.

@@ -0,0 +1,17 @@
#!/bin/bash
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we find out what's going on w/ the registers, we shouldn't need another build script for wtf and be able to get rid of this.

jasocrow added a commit to jasocrow/wtf-linux-snapshot that referenced this pull request Jan 24, 2024
This is based on Kasamir123's pull request at
0vercl0k#102 plus some scripts in snapchange for
automatically setting up a Linux VM target.

The following improvements have been made as compared to Kasamir123's original
pull request:

* Fixed bug when calling mlockall, allowing us to remove page touching code
* Code requires no custom #ifdefs in wtf
* Linux snapshots work w/fuzzing via KVM. Kasamir123's code had some issues with
  gathering segment registers, and our updates fix these issues, allowing for
  KVM support
* Kasamir123's code injects shellcode into the target process by overwriting
  code, but never restored the original code. We now restore the original code
* Snapshotting is more streamlined, only taking a few manual steps once
  everything is configured
* Some improvements from 0vercl0k's suggestions from ELF Snapshotting and
  Fuzzing 0vercl0k#102, like implementing raw2dmp in Python
* Support for setting breakpoints on symbols in ELF targets plus use of symbols
  in fuzz harnesses
* IDA script for generating coverage breakpoints list so that targets can be
  fuzzed with KVM
* Target VM can run with HW acceleration enabled, Kasamir123's scripts for
  running the VM and taking a snapshot only worked with SW emulation
* Works with recent Linux kernel versions
@0vercl0k
Copy link
Owner

Closing this as superseded by #192

@0vercl0k 0vercl0k closed this Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants