rickyclarkson / binary4j

Binary Parsing Combinators for Java

This URL has Read+Write access

binary4j / README.markdown
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 1 Binary Reading and Writing Combinators for Java
edd140ab » Ricky 2008-11-20 Attempting to make README.m... 2 -----------------------------------------------
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 3
4 Binary4J is a combinator library for reading and writing arbitrary file and stream formats.
5
6 Why does it exist?
f5a10242 » Ricky 2008-11-20 Converting README to markdown 7 ------------------
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 8
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 9 In my work I needed to parse and generate some proprietary binary formats, which were completely undocumented. I had some C code that generated them and some Java that parsed them. I found both the C and the Java tricky to understand, but persevered and hacked together valid parsers and generators.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 10
2543b98a » Ricky 2008-11-23 Added formatting for code b... 11 I thought that I should try not to add a 3rd implementation that's difficult to understand, and I should try not to repeat the format in both the parsing and generating code. So I came up with binary4j. Here's the main principle:
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 12
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 13 If you have a Format<T> (which can read and write Ts), you can combine it with a Format<U> to form a Format2<T, U>. Obviously combining a few gets unwieldy and lacks information about what each type parameter means, so you can describe how to get from a T and a U to a V, and back again, to create a Format<V>. There are some less abstract examples of this in the next section.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 14
15 How do I use it?
f5a10242 » Ricky 2008-11-20 Converting README to markdown 16 ----------------
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 17
18 As an example, we'll consider a Person class. A person has a String name, String address and Date dateOfBirth.
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 19 A Date has an int year, int month and int day. Format.string and Format.integer are provided in Binary4J. We can combine these to produce a Format<Person>. First let's tackle Format<Date>.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 20
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 21 Format.integer.andThen(Format.integer) returns a Format2<Integer, Integer>. But Date has 3 ints..
22 Format.integer.andThen(Format.integer).andThen(Format.integer) returns a Format3<Integer, Integer, Integer>. Handy, but not quite a Format<Date>. To make a Format<Date> from it we need to tell Binary4J about a way of converting between 3 Integers and Dates, in both directions (consider that a Format can read and write). Luckily a Format3 has a map method that takes an XFunction3, which describes these conversions.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 23
a03b61e2 » Ricky 2008-11-23 Removed unnecessary escaped... 24 Format<Date> dateFormat = Format.integer.andThen(Format.integer).andThen(Format.integer).map(Date.xFunction);
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 25
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 26 So similarly we can combine this with Format.string to create a Format&lt;Person&gt;:
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 27
a03b61e2 » Ricky 2008-11-23 Removed unnecessary escaped... 28 Format<Person> personFormat = Format.person.andThen(Format.person).andThen(dateFormat).andThen(dateFormat).map(Format.xFunction);
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 29
30 Then to get a ByteBuffer containing that data, we can say: ByteBuffer personData = personFormat.apply(somePerson);
31 To read a ByteBuffer into a Person we can say: Person person = personFormat.unapply(buffer);
32
33 For this particular case, it's possible that the actual number of lines has increased between the 'traditional' solution and this one.
34
35 A more realistic example, perhaps, is a block of bytes preceded by its length as an int.
36
a03b61e2 » Ricky 2008-11-23 Removed unnecessary escaped... 37 Format2<Integer, byte[]> lengthEncodedBytes = Format.integer.bind(Format.byteArray);
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 38
39 Unfortunately to use lengthEncodedBytes, we have to pass in the length separately to the array, so typical uses would look like:
40
2543b98a » Ricky 2008-11-23 Added formatting for code b... 41 ByteBuffer buffer = lengthEncodedBytes.apply(array.length, array);
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 42
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 43 Clearly it would be better if we could make this a Format&lt;byte[]&gt;. Luckily we can, by providing an XFunction2 describing the conversion from a byte[] to a Integer-byte[] pair, and vice-versa.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 44
a03b61e2 » Ricky 2008-11-23 Removed unnecessary escaped... 45 Format<byte[]> lengthEncodedBytes = Format.integer.bind(Format.byteArray).map(someXFunction2);
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 46
47 What I especially like about this approach is that each part of it is simple, at whatever scale you look. It took a LOT of work to make the types readable, so the next section explains how Java could have helped but didn't:
48
49 How Java Made This Hard
f5a10242 » Ricky 2008-11-20 Converting README to markdown 50 -----------------------
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 51
52 It's a shame that I 'need' to have Format2, Format3, etc. It's a shame I 'need' Tuple3 and Tuple4. It's a shame I 'need' Function2, Function3, Function4.
53
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 54 But, lacking tuple support in the language, Format&lt;Pair&lt;Pair&lt;X, Y&gt;, Z&gt;&gt; is considerably harder to read (and write) than Format3&lt;X, Y, Z&gt;. A better language, or a future Java, might allow Format&lt;(X, Y, Z)&gt;.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 55
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 56 If Java had type inference, then sometimes Format&lt;Pair&lt;Pair&lt;X, Y&gt;, Z&gt;&gt; would have been fine, because it would not have actually appeared in user code, e.g.:
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 57
2543b98a » Ricky 2008-11-23 Added formatting for code b... 58 var threeInts = Format.integer.andThen(Format.integer).andThen(Format.integer);
59 var dateFormat = threeInts.map(Date.xFunction);
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 60
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 61 As it is, a user of binary4j is 'punished' for introducing an explaining variable like threeInts above, though binary4j takes great effort to minimise that (using Format3 instead of Format&lt;Pair&lt;Pair..&gt;&gt;.
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 62
32eb8d56 » Ricky 2008-11-23 Made the markdown more robu... 63 If Java had support for closures, then the xFunction implementations could have been much much simpler. In fact, XFunction might not even exist, as it just represents a tuple of (X =&gt; Y, Y =&gt; X).
0e84cbf2 » Ricky 2008-11-20 Renamed ExpFunctor to XFunc... 64
65 If you are using binary4j, please let me know, and I'll do what I can to help you. Other than that, it will evolve as and when I use it, which might be daily or never.