Skip to content
Vladimir Panteleev edited this page Sep 26, 2019 · 64 revisions

What is DustMite?

DustMite is a tool for minimizing D source code. It was inspired by Tigris Delta and a thread on digitalmars.D.learn. Reducing C++ code also works quite well via --split *.{c,cpp,h,hpp}:d.

DustMite will parse the source code into a simple hierarchy, and attempt to shrink it by deleting fragments iteratively, as long as the result satisfies a user-specified condition.

The author's DConf 2014 presentation, "Reducing D Bugs", covers DustMite (an introduction and thorough tutorial) in the first half.

How to install DustMite?

Building DustMite from source is very simple, see Building DustMite for instructions. DustMite is now also included with DMD as a compiled binary.

What can it be used for?

  1. For compiler developers: reducing compiler bug test cases.
  2. Finding the source of ambiguous or misleading compiler error messages (e.g. errors with the file/line information pointing inside Phobos)
  3. Alternative unit test code coverage (DustMite can remove all code that does not affect the execution of your unit tests - see below).
  4. Similarly, if you have complete test coverage, it can be used for reducing the source tree to a minimal tree which includes support for only enabled unittests. This can be used to create a version of a program or library with a test-defined subset of features.
  5. The --obfuscate option can obfuscate your code's identifiers.
  6. It can be easily adapted to work with other languages and file formats.

How is it better than Tigris Delta?

  • Easy to use (takes only two arguments, no need to fiddle with levels, catches common user mistakes)
  • Extra features
  • Native Windows support
  • Readable output (comments and indentation are preserved)
  • Native support for multiple files (accepts a path to an entire directory for input)
  • Written for D
  • Written in D
  • Not written in Perl
  • Can recognize constructs such as try/catch, function invariants (in/out/body)

How to use it?

  1. Formulate a condition command, which should exit with a status code of 0 when DustMite is on the right track, and anything else otherwise.
    • Example: dmd test.d 2>&1 | grep -F "Assertion failed" (make sure you didn't do set -o pipefail)
    • Your command will be ran from inside the testing directory. You should use relative paths to the files that are being reduced.
    • It is recommended that your test command doesn't print anything to neither stdout nor stderr - this will break the progress indicator.
    • For non-trivial tests, you may want to place the commands in a shell script.
    • You can find some useful test script snippets here.
  2. Copy all the files that dustmite is to minimize to a new directory.
    • DustMite can minimize single files as well. name.ext will be treated like name/name.ext.
  3. If you'd like to test your condition command at this point, don't forget to clean up temporary files afterwards.
    DustMite will try to reduce all files from the specified directory, however at the moment it will not go further than simply trying to remove non-.d files.
  4. Run: dustmite path/to/directory test-command
    • You may safely terminate DustMite at any point. The current-best results will be in path/to/directory.reduced.
  5. After running out of nodes to try to remove, dustmite will exit. The reduced tree will be in path/to/directory.reduced.

Tutorial

This tutorial is merely a more wordy / less inclusive version of the above, with examples. See the above section for a more technical reference; this section is only meant to make DustMite more easily approachable.

Example: internal compiler error while compiling your program.

For our example, let's look at a single-file project that causes an ICE (internal compiler error) when we attempt to compile it. Many real-life cases consist of multi-module projects; the procedure is generally the same, except you'll probably be using a build tool instead of calling the compiler directly.

We can see the files (well, file) that comprise our project, and the problem we are trying to reduce in the following example shell transcript:

~/myproject $ ls
main.d
~/myproject $ dmd main.d
Internal error: e2ir.c 683

Step 1: Formulate a test command

We need to formulate a command that DustMite will use. This command needs to test for the presence of the problem in a variation of our project's code base. The exit code of the command determines the result: a zero exit code communicates that the problem still exists, while a non-zero exit code indicates that the problem does not occur in the given codebase.

For our example, we will be running the compiler, redirecting its output, and grepping it for the presence of our particular error message. grep will return a status of 0 if the sought string is encountered in its input, which is the redirected compiler output.

~/myproject $ dmd main.d 2>&1 | grep -F "Internal error"
Internal error: e2ir.c 683
~/myproject $ echo $?
0

The entire test command for our example is: dmd main.d 2>&1 | grep -F "Internal error".

Note that DustMite will first chdir to a temporary, test directory, so any paths in your test command should be relative to your codebase's root (as seen above). It's OK if the test command leaves behind temporary/intermediary files.

If your test command gets unwieldy, feel free to put it in a shell script, and pass the path to the script to DustMite. Since the test script isn't part of the minimized codebase, you shouldn't put it in the same directory as your source files. Consider placing it in the parent directory - the same directory containing myproject. Then, the test command you'll need to past to DustMite would be ../my-test-script.sh (.. because it will be ran from inside the temporary test directory).

You can find some example test scripts for common tasks here.

Step 2: Prepare the file set to minimize

We should create a copy of our codebase for Dustmite to work with. DustMite doesn't know what to do with intermediate files, binaries, VCS files, and other files that are irrelevant to reproducing our problem. They will only slow down the reduction process, thus we should make a copy of our codebase with only the bare minimum files required to reproduce the problem present.

You don't need to perform this step if your project directory is already "clean", and has no unnecessary files.

~/myproject $ find -name '*.o' -delete
# -- OR --
~/myproject $ cd ..
~ $ mkdir myproject-clean
~ $ cp myproject/*.d myproject-clean/

Step 3: Run DustMite

It's time to call DustMite, and tell it where the files to reduce are, and which command it should use to check if it's on the right track. Those would be the first and second parameters to dustmite, respectively.

~ $ dustmite myproject-clean 'dmd main.d 2>&1 | grep -F "Internal error"'
None => Yes
############### ITERATION 0 ################
( ... )
Done in 1 sec and 728 ms; reduced version is in myproject-clean.reduced
~ $ ls myproject-clean.reduced/
main.d

As the output tells us, a minimal version of our codebase (which still exhibits the problem tested for by our test command) can be found in input-directory.reduced.

Troubleshooting

Initial test fails

The command you specified returned a non-zero status for the input data set. DustMite can't know which reductions are helpful if it can't start with a known good state.

Try running DustMite with the --no-redirect switch to see the output of the failing test command. This might reveal a useful error message which could indicate the source of the problem.

Reduced to empty set

As above - the command you specified returned a zero status even for an empty directory. Common causes are commands which always return zero, or commands which incorrectly use absolute file names.

Result directory already exists

The directory <inputpath>.reduced exists, probably from a previous DustMite run. DustMite will exit to avoid overwriting a previous run's results. Delete the directory to start a fresh run.

Error while attempting to ..., retrying

DustMite couldn't clean up a temporary directory. This can happen on Windows if the directory or one of its files is opened by another process, e.g. a file manager or an antivirus program. DustMite will automatically retry in a second; if the problem doesn't go away, try using Process Explorer to find the process keeping an open handle to the specified file / directory.

"Permission denied" on POSIX systems

DustMite does not preserve file permissions. If the codebase you are trying to reduce needs to contain executable files, you need to add a chmod to your test script/command.

Note that DustMite currently does not support reducing the contents of any file types that require executable permissions (e.g. shell scripts), so there should rarely be a reason to include executable files in the reduced codebase. If the file in question is the test script you've written for DustMite, consider placing it directly outside the directory, and calling it using ../my-test-script.sh.

Hangs or error message pop-ups during the reduction process

Sometimes, DustMite may create a program that results in a crash or infinite loop during compilation or program execution. See the Useful test scripts page for information on how to deal with such situations.

How does it work?

A DustMite run has the following general outline:

  1. Parse options and validate input.
  2. Load all files to memory, and parse .d files into a hierarchical data structure.
    • The data structure is a tree, where each node has a head, a list of children, and a tail.
    • head and tail are strings – slices over the original file.
    • The top-most level of the hierarchy represents files. The second level represents top-level constructs in each file, etc. Files don't have a head or tail, only the filename and children.
    • When traversing the tree in the order head - children - tail, all of the slices will cover the entire file, without any gaps or overlapping.
    • The basic idea of parsing D source code is simple: } and ; are block-terminators.
  3. Optimize the tree. Currently, this is done by rearranging all children into a binary tree.
  4. "Test" the input data without any modifications. If the test command fails (exits with a non-zero status), abort.
    • "Testing" means saving the current hierarchy to a temporary directory (path/to/directory.test), chdir-ing to it, and running the user-specified command.
  5. Iteratively attempt to remove subtrees of the data. If the test command succeeds after removing a subtree, the subtree is removed permanently and the process continues.
  6. When DustMite can't find a node to remove that doesn't cause the test command to fail, it considers the reduction complete.

Advanced usage

Windows is slow

Because process creation is comparatively much slower on Windows, a minimization will take a lot more time on Windows than on Unix-based operating systems. If possible, use an Unix-based OS to minimize large code bases.

Intermediate results

You can preview intermediate results by peeking inside path/to/directory.reduced. You should have no problems on *nix, but on Windows entering the directory may cause DustMite to pause, since it may not be able to clean up the directory when overwriting the files. DustMite will keep retrying automatically, and will resume once you leave the directory.

You can watch DustMite's progress in real time using a shell script such as the following:

#!/bin/sh
watch -cn 0.1 "zsh -c 'ls -al $1.reduced/**/*.d ; colordiff -ru $1.reduced $1.test'"

If src contains the directory with the code being reduced and src.test / src.reduced are the test/result directories, invoke the script as dustmite-watch src.

A more elaborate version of this script, which also works with DustMite's -j option, is available here.

Command-line options

DustMite has a few command-line options. Run dustmite --help to see them.

Useful test scripts

You can find several test scripts for common tasks (e.g. timeouts, detecting specific segfaults) on this page.

Minimizing the standard library

A fully-minimized test case for a compiler bug shouldn't depend on the standard library. There are a few ways to minimize Phobos along with the rest of your code.

One way is to rename the entire Phobos package:

  1. Copy the std directory to the input directory
  2. Rename it to mystd
  3. Search and replace std. with mystd. (in both Phobos and your code)
    • Example command: find . -name '*.d' | xargs perl -pi -e 's/\bstd([\.)])/mystd$1/g'

Note that the compiler hard-codes some names in Phobos, e.g. for std.math internals.

Alternatively, after copying Phobos to your project, remove the standard location from the compiler search path. You will also need to explicitly compile and link your local version of the Phobos sources together with your code - otherwise, the linker will use the version of the code from the pre-compiled static library (phobos.lib / libphobos.a). If you build your test case with a build tool, it should take care of this - but watch out for default package exclusion options (e.g. with rdmd, you'll need to add --include std).

Selective minimization

Selective minimization may be useful if don't want to remove certain blocks from the input, which would otherwise satisfy your test condition.
For example, you may not want DustMite to remove unittest blocks if you're testing for unit test coverage.

dustmite has a --noremove option, which takes a regular expression. DustMite will not remove nodes whose head or tail is covered by any of the specified regular expressions.

--noremove also applies to file names. File names are the files' paths relative from the root of the test directory, and use forward slashes as directory separators on all platforms.

You can also surround code that is not to be removed around the magic words DustMiteNoRemoveStart and DustMiteNoRemoveStop. Note that if you place them in comments, you won't be able to use --strip-comments.

If you need DustMite not to remove parts of the code that actually get executed at runtime, you can use DMD's -cov option in combination with DustMite's --coverage option to instruct DustMite not to remove covered lines.

Alternatively, you may:

  • test the presence of the desired blocks in your test script using a combination of grep / wc -l
  • place unittests/etc. in a separate file outside of the input file set, then copy or concatenate it before testing

Ordered minimization

Combine several dustmite calls with selective minimization (e.g. different --noremove parameters) to control the order of minimization. Example:

# Reduce everything outside the gtkd package
dustmite testdir ../testscript --noremove "^gtkd/.*\.d$"
# Now reduce everything else
mv testdir.reduced testdir_pass2
dustmite testdir_pass2 ../testscript
# Final result is now in testdir_pass2.reduced

Obfuscation

DustMite can obfuscate your program if you pass the --obfuscate switch. In this mode, DustMite will collect a list of words in the program, and attempt to substitute each in turn with incremental or randomly-generated ones.

By default, DustMite will generate substitutions in lexicographical order. If you need to preserve identifier lengths (e.g. when reducing linker problems), use the --keep-length switch.

Custom parsers

If you'd like to add support for a custom language or file format, see the Entity structure and the loadFile function from the dsplit module.