Skip to content

Signature File Format

E:V:A edited this page Nov 26, 2018 · 15 revisions

The Basics

Binwalk's signature file format is based on the libmagic file format and is mostly compatible with signatures created for the UNIX file utility. This makes creating, customizing and sharing signatures very easy.

To understand the basic format of a signature, let's create a new signature for a fictitious firmware header. The header structure is:

struct header
{
   char magic[4];        //Magic bytes are: 'SIG0'
   char description[12];
   int32_t header_size;
   int32_t image_size;
   int32_t creation_date;
};

The resulting magic signature for this header format looks like:

0    string    SIG0     SIG0 firmware header,
>4   string    x        description: "%s",
>16  lelong    x        header size: %d,
>20  lelong    x        size: %d,
>24  ledate    x        date: %s

Most practical signatures are not much more complex than this.

There are four columns for each line:

  • The first column is the data offset.
  • The second column is the data type.
  • The third column is the condition field (x is a wildcard matching anything).
  • The fourth column is the optional text and data formatting to display.

The first line of any signature contains the actual "magic bytes" which uniquely identify that signature (the string SIG0 in the above example).

Note the use of the indent level character (>) on all except the first line.

All comments begin with the pound sign #.

Supported Data Types

Binwalk signature files support the following data types:

Data Type Description
byte 1-byte (8-bit) integer
short 2-byte integer
long 4-byte integer
quad 8-byte integer
date 4-byte UNIX date field
string Arbitrary sequence of bytes
regex A regular expression to be matched

All integer data types (byte, short, long, quad, date) support the following endianness prefixes:

Prefix Example Description
be belong Big endian
le lelong Little endian

If no endianness prefix is provided, big endian is assumed. Best practice dictates that the endianness should be explicitly specified.

All integer data types (byte, short, long, quad, date) also support the following signedness prefixes:

Prefix Example Description
u ubelong Unsigned

If no signedness prefix is specified, the value is assumed to be signed.

Comparison Operators

The following comparison operators are supported when evaluating the condition field:

Condition Example Description
= =0x1234 True if the value from the file equals the specified value
! !0x1234 True if the value from the file does not equal the specified value
< <0x1234 True if the value from the file is less than the specified value
> >0x1234 True if the value from the file is greater than the specified value
& &0x1234 True if the value from the file is not zero when ANDed with the specified value
^ ^0x1234 True if the value from the file is not zero when XORed with the specified value

If no condition is specified, = is assumed:

# SIG0 firmware signature
0    string    SIG0     SIG0 firmware header,
>4   string    x        description: "%s",
>16  lelong    x        header size: %d,
>20  lelong    >0       size: %d, # This line is only processed if the value at offset 20 is greater than 0
>24  ledate    x        date: %s

Conditional Statements

The greater than sign > at the beginning of a signature line indicates that line's indentation level. Lines with higher indentation levels (more > characters) are only processed if the comparison from the preceding line evaluated to True. This allows the creation of basic conditional if statements inside the signature:

# SIG0 firmware signature
0    string    SIG0     SIG0 firmware header,
>4   byte      !0
>>4  string    x        description: "%s", # This line is only processed if the byte at offset 4 is not 0
>16  lelong    x        header size: %d,
>20  lelong    x        size: %d,
>24  ledate    x        date: %s

Arithmetic Operators

Various arithmetic expressions can be applied to both the offset or data type fields:

Expression Example Description
& belong&0xFF Bitwise AND
| belong|0xFF Bitwise OR
^ belong^0xFF Bitwise XOR
<< belong<<4 Logical left shift
>> belong>>4 Logical right shift
** belong**4 Exponent
+ belong+4 Addition
- belong-4 Subtraction
* belong*4 Multiplication
/ belong/4 Division

A simple practical example of this is the BSD 2.x filesystem, which specifies its size in kilobytes; thus, to display the size in bytes, the size field must be multiplied by 1024 before being displayed:

# BSD 2.x file system image; used in RetroBSD for PIC32
0        string        FS\x3C\x3C       BSD 2.x filesystem,
>8       lelong        x                size: %d kilobytes,
>8       lelong*1024   x                size: %d bytes,

Indirect Offsets

For file types with variable size fields, values inside the file itself may be used to specify the offset:

Syntax Description
(4.l) The offset is a little-endian long value, located 4 bytes into the file
(4.L) The offset is a big-endian long value, located 4 bytes into the file
(4.s) The offset is a little-endian short value, located 4 bytes into the file
(4.S) The offset is a big-endian short value, located 4 bytes into the file
(4.b) The offset is a single byte value, located 4 bytes into the file
(4.B) The offset is a single byte value, located 4 bytes into the file

A simple practical example are Microsoft PE files, which contain a 4-byte little-endian pointer to the PE header at offset 60:

0          string    MZ    Microsoft
>(60.l)    string    PE    portable executable # The PE header starts with "PE"
>(60.l)    string    !PE   MS-DOS executable   # If no PE header, it must be MS-DOS

Arithmetic operators can also be applied to indirect offsets:

0          string    MZ    Microsoft
>(60.l)    string    PE    portable executable
>(60.l)    string    !PE   MS-DOS executable
>(60.l+4)  lelong    x     0x%X # Print out the four byte value at PE header + 4

Tagging Metadata

Binwalk supports the use of special "tags" which give signatures additional control over the scan process.

All tags are enclosed in braces {}. Tag keywords which require arguments should be followed by a colon : and the required argument. Tag arguments can be hardcoded values (e.g., {size:14}), or format strings (e.g., {size:%d}).

Currently supported tags are:

Keyword Argument Type Description
adjust int Adjust the reported signature offset by n number of bytes
invalid N/A Marks a signature as invalid
jump int Tells binwalk to jump to the specified offset and resume scanning
many N/A Tells binwalk to only display the first hit to a signature, even if the hits do not directly follow eachother.
name str Specifies the name of the file (used during extraction)
location int Specifies an expected offset where the signature should be found in a file
size int Specifies the size of the file (used during extraction)
string N/A Truncates any strings on the current line to strlen bytes
strlen int Specifies the size of a string (used with string)

Tag a Signature Result as Invalid

The most common tag is invalid, which can be used to build false positive detection directly into any magic signature:

0    string    SIG0     SIG0 firmware header,
>4   string    x        description: "%s",
>16  lelong    x        header size: %d,
>20  lelong    <1       {invalid} # Firmware size shouldn't be 0 bytes or less
>20  lelong    x        size: %d,
>24  ledate    x        date: %s

All other tags are ignored if an invalid tag is encountered while applying a signature to a block of data.

Specify a Signature's Data Size

Tags that take no arguments, such as invalid, are fairly straight forward to use. However, tags that accept arguments are quite easy as well. The most commonly used tags that require arguments are size and jump, which respectively allow you to specify the size of the data to extract from a file, and a relative offset that binwalk should continue scanning from.

For example, let's say we have a file system named FooBar, and in the FooBar file system header there is a size field that says how big the file system is:

0    string    FooBar\x00\x00   FooBar filesystem,
>8   lelong    x                size: %d

During extraction, we would only want to extract size bytes of data from the input file. We can tell binwalk about the size field using the size tag:

0    string    FooBar\x00\x00   FooBar filesystem,
>8   lelong    x                size: %d
>8   lelong    x                {size:%d}

Likewise, we probably don't want binwalk to waste its time scanning all the data inside the FooBar file system, so we can tell binwalk to jump ahead by size bytes:

0    string    FooBar\x00\x00   FooBar filesystem,
>8   lelong    x                size: %d
>8   lelong    x                {size:%d}
>8   lelong    x                {jump:%d}

The size and jump tags will not be displayed to the end user, but only used internally by binwalk.

Specifying the Length of Non-NULL-Terminated Strings

The strlen and string tags are used when the length of a not-NULL-terminated string is stored separately from the string itself.

A simple practical example are ZIP archive headers, which include the names of the archived files:

>0      string      PK              ZIP archive,
>26     leshort     x               length of file name: %d bytes,
>26     leshort     x               {strlen:%d} # The strlen tag must come before the string tag
>30     string      x               file name: {string}%s # Only strlen bytes of this string will be printed