Skip to content
Craig Sapp edited this page Mar 16, 2015 · 2 revisions

Introduction

The binasc command-line program has similar functionality to that of hex editors: it converts binary files into hexadecimal digits in an ASCII file. By default the output text will contain comment lines showing printable characters represented by the hex bytes. The binasc program can convert the ASCII hex codes back into a binary file. In addition to hex bytes, the program can compile ASCII characters, 4- and 8-byte floating-point numbers as well as 2- , 3- and 4-byte integers into binary content. The program can also generate Variable Length Values from integers for compiling standard MIDI files. Multi-byte floats and ints can be specified to be little or big endian.

Synopsis

binasc [-a | -b | -c output.bin ] input [ > output.txt ]
cat input.bin | binasc [-a|-b] [ > output.txt ]
cat input.txt | binasc -c output.bin
option meaning
-a Display only ASCII printable characters contained in binary input file (no hex bytes)
-b Display only hex bytes contain in binary input file (no ASCII-printable characters)
-c file Input file contains hex bytes (or other formats of bytes described below) which will be compiled into binary data stored in file
--mod # Set the number of hex bytes displayed on each line. The default is 25 hex bytes.
--wrap # Set the line length when the -a option is used. The default is 75 characters.
--midi parse binary data as a standard MIDI file.
-h view help for the program.

# 1. Listing hex bytes and extracting ASCII-character content #

The binasc program can convert a file into an ASCII list of hexadecimal numbers that represent each byte in the input file as well as display any printable ASCII characters associated with the hexadecimal numbers. The default style for the output is shown below: each line of hexadecimal bytes is followed by a comment line starting with a semi-colon (;) which displays the ASCII character representation for the byte if it is printable.

binasc input > output.txt

 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01 00 00 00 ac 
;    E  L  F                                                                

 8c 04 08 34 00 00 00 68 5e 00 00 00 00 00 00 34 00 20 00 05 00 28 00 16 00 
;          4           h  ^                    4                 (          

 15 00 06 00 00 00 34 00 00 00 34 80 04 08 34 80 04 08 a0 00 00 00 a0 00 00 
;                   4           4           4                               

 00 05 00 00 00 04 00 00 00 03 00 00 00 d4 00 00 00 d4 80 04 08 d4 80 04 08 
;                                                                           

 13 00 00 00 13 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 
;                                                                           

 80 04 08 00 80 04 08 78 5a 00 00 78 5a 00 00 05 00 00 00 00 10 00 00 01 00 
;                      x  Z        x  Z                                     

 00 00 78 5a 00 00 78 ea 04 08 78 ea 04 08 2c 02 00 00 38 03 00 00 06 00 00 
;       x  Z        x           x           ,           8                   

 00 00 10 00 00 02 00 00 00 04 5c 00 00 04 ec 04 08 04 ec 04 08 a0 00 00 00 
;                               \                                           

 a0 00 00 00 06 00 00 00 04 00 00 00 2f 6c 69 62 2f 6c 64 2d 6c 69 6e 75 78 
;                                     /  l  i  b  /  l  d  -  l  i  n  u  x 

 2e 73 6f 2e 32 00 00 25 00 00 00 38 00 00 00 00 00 00 00 0d 00 00 00 20 00 
; .  s  o  .  2        %           8                                        

The two main viewing options are -a and -b. The -a option will suppress display of the hex bytes and only show ASCII printable characters. Printable characters will be separated by a space when one or more intermediate bytes are not printable (or the printable character is a space). The -a functions similar to the strings command-line program available on most unix systems, and is a good way to search for text in a binary file. Here is printable character only output using the same file as in the default style show above:

binasc -a input

ELF 4 h^ 4 ( 4 4 4 xZ xZ xZ x x , 8 \ /lib/ld-linux.so.2 % 8 # / 5 ! % , "
& 7 $ 6 ) 1 + 0 - 2 3 4 ( ' * . ) p ? ` h E 1 K " ] L " n \ " | " L h U < i
( < > ( 8 @ ( = D > K > e , v 0 , ) E . l I l 3 y E | Q i a C \ | ' | ! !
__gmon_start__ libg++.so.2.7.2 _DYNAMIC _GLOBAL_OFFSET_TABLE_ _init _fini
__builtin_vec_new __builtin_delete __builtin_new __builtin_vec_delete
__ls__7ostreamPCc __ctype_b __ctype_tolower write__7ostreamPCci
get__7istreamRc _vt.3ios _vt.7ostream.3ios __ls__7ostreami cerr exit
__strtod_internal __ls__7ostreamc cout strchr strcmp atexit
libstdc++.so.2.7.2 __11fstreambasei _vt.7istream.3ios _vt.8ifstream.3ios
__11fstreambaseiPCcii open__11fstreambasePCcii _vt.8iostream.3ios
_vt.7fstream.3ios close__11fstreambase _._7fstream _._8ifstream
getline__7istreamPcic read__7istreamPci hex__FR3ios __ls__7ostreaml
endl__FR7ostream libm.so.6 libc.so.6 __libc_init_first bsearch qsort
__strtol_internal strcpy strncpy strtok _environ __environ environ _start
_etext _edata __bss_start _end 1 0 @ h | - ! ( ' , * + ) $ . / % # " & U S

The width of each text line can be controlled with the --width option. For example, here is the same text wrapped into 40 columns instead of the default of 75 columns:

binasc -a --width 40 input

ELF 4 h^ 4 ( 4 4 4 xZ xZ xZ x x , 8 \
/lib/ld-linux.so.2 % 8 # / 5 ! % , " & 7
$ 6 ) 1 + 0 - 2 3 4 ( ' * . ) p ? ` h E
1 K " ] L " n \ " | " L h U < i ( < > (
8 @ ( = D > K > e , v 0 , ) E . l I l 3
y E | Q i a C \ | ' | ! !
__gmon_start__ libg++.so.2.7.2 _DYNAMIC
_GLOBAL_OFFSET_TABLE_ _init _fini
__builtin_vec_new __builtin_delete
__builtin_new __builtin_vec_delete
__ls__7ostreamPCc __ctype_b
__ctype_tolower write__7ostreamPCci
get__7istreamRc _vt.3ios
_vt.7ostream.3ios __ls__7ostreami cerr
exit __strtod_internal __ls__7ostreamc
cout strchr strcmp atexit
libstdc++.so.2.7.2 __11fstreambasei
_vt.7istream.3ios _vt.8ifstream.3ios
__11fstreambaseiPCcii
open__11fstreambasePCcii
_vt.8iostream.3ios _vt.7fstream.3ios
close__11fstreambase _._7fstream
_._8ifstream getline__7istreamPcic
read__7istreamPci hex__FR3ios
__ls__7ostreaml endl__FR7ostream
libm.so.6 libc.so.6 __libc_init_first
bsearch qsort __strtol_internal strcpy
strncpy strtok _environ __environ
environ _start _etext _edata __bss_start
_end 1 0 @ h | - ! ( ' , * + ) $ . / % #
" & U S

Alternately, the -b option produces only the hex byte code for each byte in the file (similar to the BSD hexdump utility). Unlike the od command, bytes are not grouped into two-byte words when displayed as hexadecimal numbers (which will switch order of the bytes in the output display on little-endian computers). Here is example output when using the -b option using the same file as in previous examples:

7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01 00 00 00 ac 
8c 04 08 34 00 00 00 68 5e 00 00 00 00 00 00 34 00 20 00 05 00 28 00 16 00 
15 00 06 00 00 00 34 00 00 00 34 80 04 08 34 80 04 08 a0 00 00 00 a0 00 00 
00 05 00 00 00 04 00 00 00 03 00 00 00 d4 00 00 00 d4 80 04 08 d4 80 04 08 
13 00 00 00 13 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 
80 04 08 00 80 04 08 78 5a 00 00 78 5a 00 00 05 00 00 00 00 10 00 00 01 00 
00 00 78 5a 00 00 78 ea 04 08 78 ea 04 08 2c 02 00 00 38 03 00 00 06 00 00 
00 00 10 00 00 02 00 00 00 04 5c 00 00 04 ec 04 08 04 ec 04 08 a0 00 00 00 
a0 00 00 00 06 00 00 00 04 00 00 00 2f 6c 69 62 2f 6c 64 2d 6c 69 6e 75 78 
2e 73 6f 2e 32 00 00 25 00 00 00 38 00 00 00 00 00 00 00 0d 00 00 00 20 00 
00 00 15 00 00 00 00 00 00 00 07 00 00 00 0b 00 00 00 23 00 00 00 01 00 00 
00 1d 00 00 00 14 00 00 00 16 00 00 00 0c 00 00 00 00 00 00 00 2f 00 00 00 
0e 00 00 00 00 00 00 00 00 00 00 00 35 00 00 00 19 00 00 00 21 00 00 00 1f 

2. Compiling files from hex byte codes

The binasc program can convert a file containing hex bytes back into actual bytes by using the -c option. When using the -c option, you must specify an output file after the option flag. A example use of the -c option:

binasc input.txt -c output

The input file can be formatted in a manner similar to the default output of binasc, where comment lines are started with a semi-colon. The output of binasc when using the -b option can also be compiled back into the original file contents. Additionally, there are several way to insert binary bytes into the output content as described in the sections below.

Note that you can reverse the process of the binasc program to recover the original file content (unless the -a option was used):

   binasc file1 > file2
   binasc file2 -c file3
   ; file1 and file3 should be the same

   binasc -b file1 > file2
   binasc file2 -c file3
   ; file1 and file3 should be the same

   binasc -a file1 > file2
   binasc file2 -c file3         ; this results in an error

See the examples page for example files to compile with the -c option.

3. Special codes recognized when compiling a binary file

Besides hex bytes, instructions for inserting binary bytes into the compiled output file include plain characters, multi-byte integers, and floating point numbers. Additional methods allow for convenient creation of standard MIDI files from a hand-edited text file. Example parsable tokens

token interpretation when compiling
0a the hexadecimal number 0a (decimal value 10) which will be converted to a single byte in the output (which as text would be interpreted as the newline character).
a the hexadecimal number 0a without the leading 0.
'18 the decimal number 18 which will be converted into a single byte (equivalent to the hex byte 12). Do not try to indicate values greater than '255 in a single-byte decimal number.
2'18 The decimal value 18 stored in big-endian ordered two-byte group. Equivalent to "00 12" or "'0 '18".
2u'18 The decimal value 18 stored in a little-endian ordered two-byte group. Equivalent to "12 00" or "'18 '0".
3'18 Equivalent to "00 00 12".
3u'18 Equivalent to "12 00 00".
4'18 Equivalent to "00 00 00 12".
4u'18 Equivalent to "12 00 00 00".
'-5 Negative -5 (decimal) representing one byte and using 2's compliment for negative numbers. Equivalent to the hex byte FB.
0101,0010 the binary number 0101,0010 (or 52 hex) which will be converted into a single byte in the output. The most significant bit is always leftmost bit.
0,0 Equivalent to 0000,0000.
00000000 Equivalent to 0000,0000. (The comma is optional if the binary number has at least three digits (in order to disambiguate with a hex byte).
0,1 Equivalent to 0000,0001.
001 Equivalent to 0000,0001.
,01 Invalid: Binary numbers cannot start with a command (this may change in the future).
10 This is the hexadecimal number 10, not the decimal number 10 or the binary number 10.
9 The hexadecimal number 9 which coincidentally is equivalent to the decimal number 9.
1 The hexadecimal number 1 which coincidentally is equivalent to the decimal number 1 and the binary number 1.
v128 A Variable Length Value used to store delta times in standard MIDI files. v128 is equivalent to 81 00.
p0.5 A MIDI pitch-bend data bytes representing 50% above default pitch (which typically a half-step if the range of the pitch bend is set to a wholestep). This is equivalent to the hex bytes 7f 5f.
t120 A MIDI tempo meta message tempo value. This will expand to a 3-byte integer representing the duration of quarter note in microseconds.

3.1 Comments

A semi-colon (;) marks the beginning of a comment which extends to the end of a line. A space (or tab) character must precede the semi-colon when the comment follows a number on a line.

The number/hash sign (#) is an equivalent comment character. This character can be used instead of ; for comments, or can be dedicated to C preprocessor directives for applying text substitutions before compiling.

#define SEQ 03 04 05

00 01 02 SEQ
SEQ SEQ

Running the above code through the C preprocessor gives:

# 1 "input.txt"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "input.txt"


00 01 02 03 04 05
03 04 05 03 04 05

Example use of the C preprocessor when compiling a file:

cpp input.txt | binasc -c output.bin

A more advanced example that can define the substitution text for SEQ externally to the file:

#ifndef SEQ 
#define SEQ 03 04 05
#endif

00 01 02 SEQ
SEQ SEQ
cpp -DSEQ="FF EE DD" input.txt
# 1 "input.txt"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "input.txt"




00 01 02 ff ee dd
ff ee dd ff ee dd

3.2 Hexadecimal numbers

Hexadecimal numbers specify one byte and must contain no more than 2 digits in the range from 00 to ff (0 to 255 decimal, or -128 to 127 as signed decimal values). The letter digits A-F can be either upper case or lower case. Examples of valid hexadecimal numbers:

7f 45 4c 46 1 1 1 0 0 
8c 04 08 34 0 0 0 8 e 
15 00 06 10 0 0 4 0 0 

3.3 Binary numbers

Binary numbers can be specified by plain numbers longer than three characters or numbers containing (but not starting with) a comma. A binary number is allowed to have up to 8 digits (bits) since a binary number represents one byte in the output file. An optional comma is expected to split the number into two equal parts with 4 bits on each side of the comma. If there are fewer then 4 digits on either side of the comma, zeros will be inferred to the left of the given digits for each half (nibble) of the byte.

For example "0010" is the binary number which is equal to the decimal number "4". The binary number "0010" can also be represented equivalently as "0,0010" and "0000,0010". Note that "10" is the hexadecimal number equal to the decimal number "16" and is not the binary number equal to the decimal number "2".

3.4 Decimal numbers (ints and floats)

Decimal numbers, unlike hexadecimal or binary numbers, can fill slots of 1-4 bytes for integers, and 4 or 8 bytes for floating-point decimal numbers. Decimal numbers may also be either positive or negative unlike the hexadecimal or binary number input to binasc compiling. A decimal number starts with a quote character (') followed by the number with no intervening space. There are two qualifications which can be given just before the quote (in either order):

  • a number in the range from 1 to 4 which specifies how many bytes into which the integer decimal number is to be stored. Floating-point numbers can be either stored in either 4 or 8 bytes. The default size for floating-point numbers is 4 bytes if no prefix size is specified.
  • he symbol "u" can be given before the quote character in a decimal number to indicate the sequence order into which the bytes for the number will be placed in the file. No letter "u" means that the most significant byte is written first (big-endian), while including the prefix letter "u" indicates to write the bytes in reverse order with the smallest byte occurring first (little-endian). For example the decimal number 1234 can be represented by the two-byte hexadecimal number 04d2. In big-endian storage the 04 byte is written first, then the d2 byte. in little-endian storage the d2 byte is written first then the 04 byte:
decimal hex big endian little endian
1234 04d2 04 d2 d2 04
2'1234 2u'1234

When a byte size is not specified before the quote character, the default is 1 for integers. When not specifying a byte size, valid decimal numbers are in the range from 0 to 255, or -128 to 127 if signed, i.e., the range for one-byte decimal numbers is from -128 to 255, and you have to know the representation later (signed or unsigned). If you specify a byte size of 1, then you can give any integer number value, but it will be truncated to fit into one byte. The maximum integer decimal number which can fill 4 bytes is 4294967294 or so. (hexadecimal ff ff ff ff).

More examples of decimal numbers:

token decimal # hex
'0 0 00
'255 255 ff
'256 0 00 (truncated)
2'256 256 01 00 (not truncated)
4'44100 44100 00 00 ac 44 (big-endian)
4u44100 44100 44 ac 00 00 (little-endian)
4u'453 453 c5 01 00 00
u4'453 453 c5 01 00 00' (u4'is equivalent to4u'`)
2'-5 -5 ff fb
3'500000 500000 07 a1 20

If a decimal number includes a period character (.) it is assumed to be a floating-point number. Floating-point numbers can be either 4 or 8 bytes.

token decimal hex
'3.1415 3.1415 40 49 0e 56
4'3.1415 3.1415 40 49 0e 56
u'3.1415 3.1415 56 0e 49 40
8'3.1415 3.1415 40 09 21 ca c0 83 12 6f
8u'3.1415 3.1415 6f 12 83 c0 ca 21 09 40
invalid examples reason
123 does not start with a quote character
'256 Exceeds the storage space of one byte (use a multi-byte indication). in this case, '256' is equivalent to 1'256which will truncate to1'0, or 00` hex.

3.5 ASCII characters

To insert literal ASCII characters into compiled output, precede each character with a plus (+). Each character is a separate token. For example to place the characters "cat" into a file, the tokenization would be "+c +a +t".

3.6 Variable Length Values

Variable-length values are used to store delta times in standard MIDI files. They are a form of compression so that small 4-byte integers can be represented by a single byte. To create a VLV, the bits of a 4-byte integer are grouped into 7-bit pieces. Any most-significant groupings containing only zeros are ignored (except for the least-significant grouping). The remaining groups are placed into separate bytes, with the most significant bit of each byte representing a continuation bit. If the continuation bit is "1", then there is at least one more byte after the current byte in the file which belongs to the VLV. If the continuation bite is "0", then the current byte is the last byte in the VLV.

To indicate a variable-length value in the input file for compiling with binasc, prefix a decimal number with the letter v, such as v100 which will be translated into 64 hex. Variable length values can only be used to store up to 4 bytes of an integer. The resulting VLV will be between 1 to 5 bytes long.

Here are more examples of VLVs:

VLV byte expansion
v0 00
v127 7f
v128 81 00
v123456 87 c4 40

3.7 MIDI pitch-bend data bytes

MIDI pitch-bend data bytes contain a 14-bit integer which is split into two 7-bit values stored with the least-significant byte coming first (little-endian). The minimum value 0 is represented by the two bytes 00 00 and the maximum value is represented by the two bytes 7f 7f. The middle of the range is 00 40.

In the input file used to compile a file with the binasc program, use the letter p followed (without space) by a floating-point number in the range from -1.0 to +1.0. The plus sign is optional for positive values, as is any leading zero. Values outside of the valid range will be truncated to the maximum or minimum value.

Below are example conversions of pitch-bend tokens into hexadecimal values. The cents column shows the number of cents deviation from the standard pitch if the default depth of the pitch bend is a whole tone (which it usually is). If this assumption is true, then cent = 200 * value.

pitch bend token hex bytes cents
p0 00 40 0
p1 or p+1 7f 7f 200 (wholetone)
p-1 00 00 -200
p0.5 or p.5 7f 5f 100 (semitone)
p-.25 7f 4f -50 (quartertone)
p-0.3333 55 2a -66.67

3.8 MIDI tempo meta message data bytes

Tempo in a standard MIDI file is given as a three-byte integer representing the duration of a quarter note in microseconds. For example a tempo of 60 beats per minute has one beat per second, and each second is a million microseconds, so the tempo 60BPM is represented in a MIDI file as 1000000.

To indicate a tempo in the input data for the -c compile process, prefix the letter t to a floating-point value.

tempo token decimal form hex bytes
t60 3'1000000 0f 42 40
t120 3'500000 07 a1 20
t40 3'1500000 16 e3 60
t144 3'416667 06 5b 9b
t63 3'952381 0e 88 3d
t132.45 3'453001 06 e9 89

Tempo is given in meta message 51 hex, so here is an example full event in a MIDI file using the t marker for tempo

v0 ff 51 03 t120 00 ff 51 03 07 a1 20

4. Examples

Example files for compiling with the -c option that demonstrate various methods of representing bytes as described above can be found on the examples page. Examples can be downloaded via Mercurial (if installed on your computer) with the command:

hg clone https://wiki.binasc.googlecode.com/hg binasc-wiki The example compiled files and their companion ASCII files are found in binasc-wki/files/examples.

5. Downloads

Compiled versions of binasc are available for Linux, OS X and Windows on the Download page.

Source code can be viewed online here. To download the source code, click on the zip link on that source-code browse page. The source code can also be downloaded using the Mercurial repository system (if you have it installed on your computer):

hg clone https://code.google.com/p/binasc

The source code should be easy to compile on linux or OS X by typing:

cd binasc; make To copy the program to /usr/bin type: make install To verify that the program is available from the command-line: which binasc This command should reply with the path to binasc: /usr/bin/binasc