Skip to content

Latest commit

 

History

History
72 lines (59 loc) · 6.62 KB

todo.md

File metadata and controls

72 lines (59 loc) · 6.62 KB

Development Plan

Todo List

While we have successfully migrated StochFuzz to a new system design, we can still improve StochFuzz from multiple places.

  • NEW SYSTEM DESIGN (daemon), which separates AFL and StochFuzz and makes advanced fuzzing possible.
  • In release version, remove unnecessary z_log (e.g., z_debug, z_trace, and etc.).
  • Support probabilisitic disassembly.
  • Mark .text section non-writable.
  • Support C++ exceptions (via pushing the original ret_addr onto the stack).
  • When a CP_RETADDR is found, support updating other CP_RETADDR from the same callee.
  • Use-def analysis on EFLAG register to avoid unnecessary context switching.
  • Support pre-disassembly (linear disassembly) -- IT SEEMS NOT A GOOD IDEA.
  • Support jrcxz and jecxz instructions.
  • It may be a good idea to additionally hook SIGILL caused by mis-patched instructions. In that design, exiting the program with a specific status code (in SIGSEGV handler) is a better approach, compared with raising SIGILL. It can also avoid recursive signal handling.
  • Support retaddr patch when pdisasm is enabled (check retaddr's probability) -- it seems impossible. Note that we cannot guarantee the control flow is returned from the callee even the returen address is visited.
  • A better frontend for passing arguments.
  • Use runtime arguments to set different modes, instead of makefile.
  • Use simple linear disassembly to check the existence of inlined data.
  • Read PLT table to get library functions' names, and support the white-list for library functions.
  • Correctly handle timeout from AFL.
  • Use shared memory for .text section, to avoid the expensive patch commands.
  • Support self-correction procedure (delta debugging).
  • Support non-return analysis on UCFG, with the help of the white-list for library functions.
  • Support the on-the-fly probability recalculation.
  • Add a new flag/option to enable early instrumentation for fork server (i.e., before the entrypoint of binary).
  • Enable periodic checking (for coverage feedback) to determine those false postives which do not lead to crashes.
  • Add tailed invalid instructions for those basic blocks terminated by bad decoding.
  • Add a license.
  • Do not use a global sys_config, but put the options into each object.
  • Current TP_EMIT is only compatible with fuzzers compiled with AFL_MAP_SIZE = (1 << 16), we need to change the underlying implementation of TP_EMIT to automatically fit the AFL_MAP_SIZE.
  • Fix the bugs when rewriting PIE binary and support it.
  • Place ENDBR64 instruction before the AFL trampoline. The phantom program will crash otherwise.
  • Support binaries compiled with gcc ASAN (clang would inline ASAN functions).
  • Use g_hash_table_iter_init instead of g_hash_table_get_keys.
  • Apply AddrDict to all possible places..
  • Apply Iter to all possible places..
  • Support other disassembly backends, for the initial disassembly (e.g., XDA).
  • Calculate entropy to check the existence of inlined data (ADVANCED).
  • Remove legacy code (e.g., the function of building bridges by Rewriter is no longer needed).
  • Instead of patching a fixed invalid instruction (0x2f), randomly choose an invalid instruction to patch. More details can be found here.
  • Automatically scale the number of executions triggering checking runs (based on the result of previous checking run).
  • Set the default log level as WARN (note that we need to update make test and make benchmark).
  • Use a general method to add segments in the given ELF instead of using the simple PT_NOTE trick.
  • Fix the failed Github Actions on Ubuntu 20.04 (the root cause is unknown currently).
  • Add more stress test for rewriting PIE binary.
  • Support binaries compiled with MSAN.

Challenges

We additionally have some challenges which may cause troubles or make StochFuzz not that easy to use. We are trying to resolve them.

  • The fixed LOOKUP_TABLE_ADDR is mixed with other random addresses, which may cause bugs in PIE binary.
  • The glibc code contains some overlapping instructions (e.g., the instructions with the LOCK prefix), which may cause troubles for the patcher and pdisasm.

There are some other challenges introduced by the new system design.

  • The input file may be changed by the previous crashed executing, which makes the next execution incorrect. But it seems ok in practice, because fuzzing is a highly repeative procedure which can fix the incorrect feedback automatically and quickly.
  • Timeout needs to be set up separately for AFL and StochFuzz, which may bother the users a little bit.
  • The auto-scaled timeout of AFL may cause incorrect error diagnosis (the dd_status may be invalid), so it is highly recommended to specify a timeout (>= 1000ms or >= AFL_HANG_TMOUT if set) for AFL by -t option, to disable the feature of auto-scaled timeout.

Note that in the old design, we can fully control AFL, so that we can create a new input file for the next execution, use the same timeout, or disable the auto-scaled timeout to avoid aforementioned challenges.

Pending Development Decisions

Currently, there are many steps which we are hesitating to take. We may need to carefully evaluate them. If you have any suggestion, please kindly let us know. We are happy to take any possible discussion about improving StochFuzz.

  • Currently, we use a lookup table to translate indirect call/jump on the fly. We are not sure whether it is necessary because simply patching a jump instruction at the target address may also work well. Note that a large lookup table may increase the cache missing rate and the overhead of process forking.
  • For now, to support the advanced strategy, we maintain a retaddr mapping and do O(log n) online binary searching to find the original retaddr when unwinding stack. It may be better to maintain a retaddr lookup table which supports O(1) looking up. But also, this lookup table will extremely increase the memory usage as well as the cache missing rate and the overhead of process forking.
  • Hook more signals to collect address information for a better error diagnosis, which, on the other hand, may cause conflicts of signal handlers set by the subject program.