Skip to content

Frequently Asked Questions

David Roman edited this page Feb 29, 2024 · 1 revision

Why do I keep getting import errors/warnings?

This most commonly occurs when installing a newer version of binwalk over an older version with an incompatible API (e.g., pre v2.0). To avoid such issues, first uninstall any existing binwalk installations before installing the latest version:

$ sudo python setup.py uninstall

How do I plot entropy graphs without an X server?

The pyqtgraph module that binwalk uses to plot graphs requires an X server; this can be problematic if running binwalk in an automated fashion, particularly on a headless server.

The X virtual framebuffer (xvfb) can be used to graph entropy plots without running a graphical display; the xvfb-run wrapper script provided by most Linux distros is particularly handy:

$ xvfb-run binwalk --entropy --save firmware.bin

Why does binwalk's signature scan report false positive results?

Binwalk does a pretty good job of analyzing potential file signatures and filtering out obvious false positives, but it is not perfect.

Some signatures are more difficult to validate than others, and binwalk will always err on the side of caution; that is, it would rather report a potential false positive so that you can then independently validate or invalidate it, rather than not report a questionably valid result.

Binwalk found an XYZ file, but the XYZ utility can't process it.

This is commonly a result of a false positive result, but not necessarily.

Utilities for extracting or reading certain file types may be improperly implemented or simply don't support some features of the file type; just because your utility can't handle the file does not necessarily mean that it is a false positive.

For example, Zip files found in many firmware images won't extract properly with normal unzip utilities (they often report that the Zip file is missing an end of central directory structure). Java's jar utility however, will extract these files just fine.

Exercising some common sense can usually help determine if binwalk or your utilities are to blame:

  • Did binwalk report a file size? If so, is it a reasonable size, or does it seem too large or too small?
  • Did binwalk report any version information for the file? If so, is it a valid version number for the XYZ file type?
  • Did binwalk report a file name or other string data from the file? If so, are the strings readable or jibberish?

If the additional data provided by binwalk points to it being a valid file, you might want to check your utilities.

Binwalk doesn't recognize XYZ file type.

First, you can check the magic files to see if binwalk has a signature for the XYZ file type. If not, you may write your own, or submit the file type for inclusion into binwalk (please provide as much information as possible regarding the file type, as well as a sample file if possible).

If binwalk has a signature for the XYZ file type and that signature is included in the scan, is it being flagged as invalid? Run binwalk with the -I option to show all invalid results:

$ binwalk -I firmware.bin

If XYZ file type is supported and is either not detected, or incorrectly being flagged as invalid, please submit the issue and include a copy of the file in question (or a link to download the file).

Binwalk doesn't recognize XYZ file if only scanning the first few bytes.

This is true of certain files, such as tarballs, whose signature is located at some non-zero offset. The magic bytes for tarball files, for example, are located 257 bytes into the actual tarball file, so you must scan at least the first 257 bytes of a tarball file before binwalk can properly identify it.

Does binwalk work on Windows?

Native Windows support is currently under development. Core functionality has been tested successfully on Windows 7 using the latest code from the master branch.

Why are some extracted files larger than expected?

This is a common source of confusion, and this concern has been raised in issues #153 and #367. The bottom line is that this is intentional and expected behavior and shouldn't be a cause for concern. For those interested in the details, keep reading...

Let's address the general problem of data carving first, as that seems to be the root of confusion regarding that question. It is important to distinguish the difference between "extraction" and "carving" in the context of binwalk.

Carving data out of a file is simply just running dd over a selected portion of the file. The carved data requires no additional manipulation. JPEG files are a good example; once a JPEG is carved out of a firmware image (or any other file for that matter), you can open the JPEG image in an image viewer/editor without any additional manipulation of the JPEG data.

Extracting data on the other hand, requires first carving some selected data out of the firmware image, and then performing manipulation of that carved data in order to present it in a useful manner to the user. File systems are a good example here; one can easily dd the raw bytes of a CramFS image out of a firmware file, but it's much more useful to actually extract the file system's contents (ELF files, config files, HTML, JPEG, etc) and display them to the user. This requires carving the CramFS data out of the original file and then running some extraction utility to process the CramFS image and extract its contents.

Binwalk is primarily a signature analysis and extraction utility, specifically focused on firmware. Can it perform general data carving? Yes, but that's really only there as a necessary means to an end. Binwalk's --carve command line option can be useful, especially if you want to examine some data for which you have a signature but no extraction utility, but you have to be aware of its limitations. Can you cram four people into a Porche 911 and drive cross country? Sure, but there are probably better vehicles you can use for that purpose. There are many data carving utilities out there that pre-date binwalk and will do a much better job at carving out files like JPEG's, word docs, etc, if that's what you want to do.

Why mention all this? Because automated data carving is difficult. Many common file types have no size field in their header, nor do they have any kind of end-of-file marker. Zlib compressed files are a good example of this. So what to do about it?

One option would be to say, "Well, I see that there is a zlib signature at offset 0, and a JPEG signature at offset 512, so the zlib data must only be 512 bytes long." But what if the JPEG signature is a false positive, and the zlib data is actually 1024 bytes long? Now you get nothing because your carved zlib data is truncated and can't be properly decompressed, and the JPEG image doesn't even exist because it was a false positive. Since there is always a chance for false positive signature results, binwalk does not take this approach to extraction.

A second option would be to say, "OK, well let's write a JPEG analyzer that can examine the suspected JPEG data and validate that it actually loads as a real JPEG image; that will prevent false positives". Binwalk can, and does, do this for some selected file types, but it would be very time consuming to have to do that for every single signature that binwalk supports. It would also make adding signatures very difficult and time consuming.

To address this, binwalk simply says, "If we know the size of the data that we're carving, then only carve that size. Otherwise, take all the data up to the end of the file and let the extraction utility deal with it." While this is inefficient in terms of disk space, that is its only real drawback. Most extraction utilities don't care about trailing data, false positive signatures don't prevent real data from being extracted, and we don't have to write code to support every single signature, making adding signatures as simple as editing a text file. Additionally there are work-arounds to address disk space usage, such as the --size and --rm command line options.

So if you carve data with binwalk you should expect that any trailing data will be included unless binwalk knows how to determine the length of the data you asked binwalk to carve out (this is usually built into a signature rule, or done via a plugin). Generally, this won't prevent you from using/examining/processing the carved data, but the carved file size may be larger than you anticipated. This is expected and intentional behavior.