Skip to content
Dion Mendel edited this page Jun 25, 2023 · 24 revisions

Navigation


Primitive Types

BinData supports the common primitive types that are used when working with binary data. Namely:

  • length based strings
  • zero terminated strings
  • byte based integers - signed or unsigned, big or little endian and of any size
  • bit based integers - unsigned big or little endian integers of any size
  • floating point numbers - single or double precision floats in either big or little endian

Primitives may be manipulated individually, but is more common to work with them as part of a record.

Examples of individual usage:

int16 = BinData::Int16be.new(941)
int16.to_binary_s #=> "\003\255"

fl = BinData::FloatBe.read("\100\055\370\124") #=> 2.71828174591064
fl.num_bytes #=> 4

fl * int16 #=> 2557.90320057996

Common Parameters

There are several parameters that are specific to all primitives.

:initial_value

This contains the initial value that the primitive will contain after initialization. This is useful for setting default values.

obj = BinData::String.new(initial_value: "hello ")
obj + "world" #=> "hello world"

obj.assign("good-bye " )
obj + "world" #=> "good-bye world"

:value

The primitive will always contain this value. Reading or assigning will not change the value. This parameter is used to define constants or dependent fields.

pi = BinData::FloatLe.new(value: Math::PI)
pi.assign(3)
puts pi #=> 3.14159265358979


class IntList < BinData::Record
  uint8 :len, value: -> { data.length }
  array :data, type: :uint32be
end

list = IntList.new([42, 59, 71])
list.len #=> 3

:assert

When reading or assigning, will raise a ValidityError if the value read does not match the value of this parameter.

obj = BinData::String.new(assert: -> { /aaa/ =~ value })
obj.read("baaa!") #=> "baaa!"
obj.read("bbb") #=> raises ValidityError

obj = BinData::String.new(assert: "foo")
obj.read("foo") #=> "foo"
obj.assign("bar") #=> raises ValidityError

:asserted_value

A combination of :assert and :value. Used as a shortcut when both :assert and :value have the same values. The following are logically equivalent.

obj = BinData::Uint32be.new(assert: 42, value: 42)
obj = BinData::Uint32be.new(asserted_value: 42)

Numerics

There are three kinds of numeric types that are supported by BinData. Byte-based, bit-based and floating point.

Byte based integers

These are the common integers that are used in most low level programming languages (C, C++, Java etc). These integers can be signed or unsigned. The endian must be specified so that the conversion is independent of architecture. The bit size of these integers must be a multiple of 8. Examples of byte based integers are:

uint16be : unsigned 16 bit big endian integer

int8 : signed 8 bit integer

int32le : signed 32 bit little endian integer

uint40be : unsigned 40 bit big endian integer

The be | le suffix may be omitted if the endian keyword is in use.

Bit based integers

These integers are used to define bitfields in records. Bitfields default to unsigned and big endian, but signed and little endian may be specified explicitly. Little endian bitfields are rare, but do occur in older file formats (e.g. The file allocation table for FAT12 filesystems is stored as an array of 12bit little endian integers).

An array of bit based integers will be packed according to their endian.

In a record, adjacent bitfields will be packed according to their endian. All other fields are byte-aligned.

Examples of bit based integers are:

bit1 : 1 bit big endian integer (may be used as boolean, see below.)

bit4le : 4 bit little endian integer

sbit4le : 4 bit signed little endian integer

bit32 : 32 bit big endian integer

sbit32 : 32 bit signed big endian integer

The difference between byte and bit based integers of the same number of bits (e.g. uint8 vs bit8) is one of alignment.

This example is packed as 3 bytes

class A < BinData::Record
  bit4  :a
  uint8 :b
  bit4  :c
end

Data is stored as: AAAA0000 BBBBBBBB CCCC0000

Whereas this example is packed into only 2 bytes

class B < BinData::Record
  bit4 :a
  bit8 :b
  bit4 :c
end

Data is stored as: AAAABBBB BBBBCCCC

The number of bits in a bit based integer can be declared dynamically with the :nbits parameter. Bit based integers exist for all four combinations of signed and endian (bit, sbit, bit_le, sbit_le).

class Rectangle < BinData::Record
  bit5 :bit_length
  sbit :xmin, nbits: :bit_length
  sbit :xmax, nbits: :bit_length
  sbit :ymin, nbits: :bit_length
  sbit :ymax, nbits: :bit_length
end

Bit1 acting as boolean

Bit1 can be assigned boolean values as a convenience. The resultant value will be either 0 or 1, regardless of whether the assigned value was an integer or boolean.

bit = BinData::Bit1.new

bit = true
bit.value #=> 1

bit = false
bit.value #=> 0

Floating point numbers

BinData supports 32 and 64 bit floating point numbers, in both big and little endian format. These types are:

float_le

single precision 32 bit little endian float

float_be

single precision 32 bit big endian float

double_le

double precision 64 bit little endian float

double_be

double precision 64 bit big endian float

The _be | _le suffix may be omitted if the endian keyword is in use.

Example

Here is an example declaration for an Internet Protocol network packet.

class IP_PDU < BinData::Record
  endian :big

  bit4   :version, value: 4
  bit4   :header_length
  uint8  :tos
  uint16 :total_length
  uint16 :ident
  bit3   :flags
  bit13  :frag_offset
  uint8  :ttl
  uint8  :protocol
  uint16 :checksum
  uint32 :src_addr
  uint32 :dest_addr
  string :options, read_length: :options_length_in_bytes
  string :data, read_length: -> { total_length - header_length_in_bytes }

  def header_length_in_bytes
    header_length * 4
  end

  def options_length_in_bytes
    header_length_in_bytes - 20
  end
end

Three of the fields have parameters.

  • The version field always has the value 4, as per the standard.
  • The options field is read as a raw string, but not processed.
  • The data field contains the payload of the packet. Its length is calculated as the total length of the packet minus the length of the header.

Strings

BinData supports two types of strings - explicitly sized and zero terminated. Strings are treated internally as a sequence of 8bit bytes. BinData supports string encodings. See this FAQ entry for details.

Sized Strings

Sized strings may have a set length (in bytes). If an assigned value is shorter than this length, it will be padded to this length. If no length is set, the length is taken to be the length of the assigned value.

There are several parameters that are specific to sized strings.

:length

The fixed length of the string. If a shorter string is set, it will be padded to this length. Longer strings will be truncated.

obj = BinData::String.new(length: 6)
obj.read("abcdefghij")
obj #=> "abcdef"

obj = BinData::String.new(length: 6)
obj.assign("abcd")
obj #=> "abcd\000\000"

obj = BinData::String.new(length: 6)
obj.assign("abcdefghij")
obj #=> "abcdef"

:read_length

The length in bytes to use when reading a value. This is used in the case where a string is read and then written with a possibly different length.

obj = BinData::String.new(read_length: 5)
obj.read("abcdefghij")
obj #=> "abcde"
obj.assign("abc")
obj.write(io) #=> "abc"

:read_length is also needed to prevent ambiguity when declaring a String with both value and length.

The following is ambiguous. Does it read 2 or 3 bytes? Does it write 2 or 3 bytes?

obj = BinData::String.new(value: "abc", length: 2)

Using :read_length prevents the ambiguity. It reads 2, but writes 3 bytes.

obj = BinData::String.new(value: "abc", read_length: 2)

:pad_front or :pad_left

Boolean, default false. Signifies that the padding occurs at the front of the string rather than the end.

obj = BinData::String.new(length: 6, pad_front: true)
obj.assign("abcd")
obj.snapshot #=> "\000\000abcd"

:pad_byte

Defaults to "\0". The character to use when padding a string to a set length. Valid values are Integers and Strings of one byte. Multi byte padding is not supported.

obj = BinData::String.new(length: 6, pad_byte: 'A')
obj.assign("abcd")
obj.snapshot #=> "abcdAA"
obj.to_binary_s #=> "abcdAA"

:trim_padding

Boolean, default false. If set, the value of this string will have all pad_bytes trimmed from the end of the string. The value will not be trimmed when writing.

obj = BinData::String.new(length: 6, trim_padding: true)
obj.assign("abcd")
obj.snapshot #=> "abcd"
obj.to_binary_s #=> "abcd\000\000"

Zero Terminated Strings

These strings are modeled on the C style of string - a sequence of bytes terminated by a null ("\0") byte.

obj = BinData::Stringz.new
obj.read("abcd\000efgh")
obj #=> "abcd"
obj.num_bytes #=> 5
obj.to_binary_s #=> "abcd\000"

User Defined Primitive Types

Most user defined types will be Records but occasionally we'd like to create a custom primitive type.

Let us revisit the Pascal String example.

class PascalString < BinData::Record
  uint8  :len,  value: -> { data.length }
  string :data, read_length: :len
end

We'd like to make PascalString a user defined type that behaves like a BinData::BasePrimitive object so we can use :initial_value etc. Here's an example usage of what we'd like:

class Favourites < BinData::Record
  pascal_string :language, initial_value: "ruby"
  pascal_string :os,       initial_value: "linux"
end

f = Favourites.new
f.os = "freebsd"
f.to_binary_s #=> "\004ruby\007freebsd"

We create this type of custom string by inheriting from BinData::Primitive (instead of BinData::Record) and implementing the #get and #set methods.

class PascalString < BinData::Primitive
  uint8  :len,  value: -> { data.length }
  string :data, read_length: :len

  def get;   self.data; end
  def set(v) self.data = v; end
end

A user defined primitive type has both an internal (binary structure) and an external (ruby interface) representation. The internal representation is encapsulated and inaccessible from the external ruby interface.

Consider a LispBool type that uses :t for true and nil for false. The binary representation is a signed byte with value 1 for true and -1 for false.

class LispBool < BinData::Primitive
  int8 :val

  def get
    case self.val
    when 1
      :t
    when -1
      nil
    else
      nil  # unknown value, default to false
    end
  end

  def set(v)
    case v
    when :t
      self.val = 1
    when nil
      self.val = -1
    else
      self.val = -1 # unknown value, default to false
    end
  end
end

b = LispBool.new

b.assign(:t)
b.to_binary_s #=> "\001"

b.read("\xff")
b.snapshot #=> nil

#read and #write use the internal representation. #assign and #snapshot use the external representation. Mixing them up will lead to undefined behaviour.

b = LispBool.new
b.assign(1) #=> undefined.  Don't do this.

Advanced User Defined Primitive Types

Sometimes a user defined primitive type can not easily be declaratively defined. In this case you should inherit from BinData::BasePrimitive and implement the following three methods:

#value_to_binary_string(value)

Takes a ruby value (String, Numeric etc) and converts it to the appropriate binary string representation.

#read_and_return_value(io)

Reads a number of bytes from io and returns a ruby object that represents these bytes.

#sensible_default()

The ruby value that a clear object should return.

If you wish to access parameters from inside these methods, you can use eval_parameter(key).

Here is an example of a big integer implementation.

# A custom big integer format.  Binary format is:
#   1 byte  : 0 for positive, non zero for negative
#   x bytes : Little endian stream of 7 bit bytes representing the
#             positive form of the integer.  The upper bit of each byte
#             is set when there are more bytes in the stream.
class BigInteger < BinData::BasePrimitive

  def value_to_binary_string(value)
    negative = (value < 0) ? 1 : 0
    value = value.abs
    bytes = [negative]
    loop do
      seven_bit_byte = value & 0x7f
      value >>= 7
      has_more = value.nonzero? ? 0x80 : 0
      byte = has_more | seven_bit_byte
      bytes.push(byte)

      break if has_more.zero?
    end

    bytes.collect { |b| b.chr }.join
  end

  def read_and_return_value(io)
    negative = read_uint8(io).nonzero?
    value = 0
    bit_shift = 0
    loop do
      byte = read_uint8(io)
      has_more = byte & 0x80
      seven_bit_byte = byte & 0x7f
      value |= seven_bit_byte << bit_shift
      bit_shift += 7

      break if has_more.zero?
    end

    negative ? -value : value
  end

  def sensible_default
    0
  end

  def read_uint8(io)
    io.readbytes(1).unpack1("C")
  end
end