rickyclarkson / binary4j
- Source
- Commits
- Network (0)
- Issues (0)
- Downloads (0)
- Wiki (1)
- Graphs
-
Branch:
master
Ricky (author)
Sun Nov 23 05:18:00 -0800 2008
binary4j / README.markdown
| 0e84cbf2 » | Ricky | 2008-11-20 | 1 | Binary Reading and Writing Combinators for Java | |
| edd140ab » | Ricky | 2008-11-20 | 2 | ----------------------------------------------- | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 3 | ||
| 4 | Binary4J is a combinator library for reading and writing arbitrary file and stream formats. | ||||
| 5 | |||||
| 6 | Why does it exist? | ||||
| f5a10242 » | Ricky | 2008-11-20 | 7 | ------------------ | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 8 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 9 | In my work I needed to parse and generate some proprietary binary formats, which were completely undocumented. I had some C code that generated them and some Java that parsed them. I found both the C and the Java tricky to understand, but persevered and hacked together valid parsers and generators. | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 10 | ||
| 2543b98a » | Ricky | 2008-11-23 | 11 | I thought that I should try not to add a 3rd implementation that's difficult to understand, and I should try not to repeat the format in both the parsing and generating code. So I came up with binary4j. Here's the main principle: | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 12 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 13 | If you have a Format<T> (which can read and write Ts), you can combine it with a Format<U> to form a Format2<T, U>. Obviously combining a few gets unwieldy and lacks information about what each type parameter means, so you can describe how to get from a T and a U to a V, and back again, to create a Format<V>. There are some less abstract examples of this in the next section. | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 14 | ||
| 15 | How do I use it? | ||||
| f5a10242 » | Ricky | 2008-11-20 | 16 | ---------------- | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 17 | ||
| 18 | As an example, we'll consider a Person class. A person has a String name, String address and Date dateOfBirth. | ||||
| 32eb8d56 » | Ricky | 2008-11-23 | 19 | A Date has an int year, int month and int day. Format.string and Format.integer are provided in Binary4J. We can combine these to produce a Format<Person>. First let's tackle Format<Date>. | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 20 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 21 | Format.integer.andThen(Format.integer) returns a Format2<Integer, Integer>. But Date has 3 ints.. | |
| 22 | Format.integer.andThen(Format.integer).andThen(Format.integer) returns a Format3<Integer, Integer, Integer>. Handy, but not quite a Format<Date>. To make a Format<Date> from it we need to tell Binary4J about a way of converting between 3 Integers and Dates, in both directions (consider that a Format can read and write). Luckily a Format3 has a map method that takes an XFunction3, which describes these conversions. | ||||
| 0e84cbf2 » | Ricky | 2008-11-20 | 23 | ||
| a03b61e2 » | Ricky | 2008-11-23 | 24 | Format<Date> dateFormat = Format.integer.andThen(Format.integer).andThen(Format.integer).map(Date.xFunction); | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 25 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 26 | So similarly we can combine this with Format.string to create a Format<Person>: | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 27 | ||
| a03b61e2 » | Ricky | 2008-11-23 | 28 | Format<Person> personFormat = Format.person.andThen(Format.person).andThen(dateFormat).andThen(dateFormat).map(Format.xFunction); | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 29 | ||
| 30 | Then to get a ByteBuffer containing that data, we can say: ByteBuffer personData = personFormat.apply(somePerson); | ||||
| 31 | To read a ByteBuffer into a Person we can say: Person person = personFormat.unapply(buffer); | ||||
| 32 | |||||
| 33 | For this particular case, it's possible that the actual number of lines has increased between the 'traditional' solution and this one. | ||||
| 34 | |||||
| 35 | A more realistic example, perhaps, is a block of bytes preceded by its length as an int. | ||||
| 36 | |||||
| a03b61e2 » | Ricky | 2008-11-23 | 37 | Format2<Integer, byte[]> lengthEncodedBytes = Format.integer.bind(Format.byteArray); | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 38 | ||
| 39 | Unfortunately to use lengthEncodedBytes, we have to pass in the length separately to the array, so typical uses would look like: | ||||
| 40 | |||||
| 2543b98a » | Ricky | 2008-11-23 | 41 | ByteBuffer buffer = lengthEncodedBytes.apply(array.length, array); | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 42 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 43 | Clearly it would be better if we could make this a Format<byte[]>. Luckily we can, by providing an XFunction2 describing the conversion from a byte[] to a Integer-byte[] pair, and vice-versa. | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 44 | ||
| a03b61e2 » | Ricky | 2008-11-23 | 45 | Format<byte[]> lengthEncodedBytes = Format.integer.bind(Format.byteArray).map(someXFunction2); | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 46 | ||
| 47 | What I especially like about this approach is that each part of it is simple, at whatever scale you look. It took a LOT of work to make the types readable, so the next section explains how Java could have helped but didn't: | ||||
| 48 | |||||
| 49 | How Java Made This Hard | ||||
| f5a10242 » | Ricky | 2008-11-20 | 50 | ----------------------- | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 51 | ||
| 52 | It's a shame that I 'need' to have Format2, Format3, etc. It's a shame I 'need' Tuple3 and Tuple4. It's a shame I 'need' Function2, Function3, Function4. | ||||
| 53 | |||||
| 32eb8d56 » | Ricky | 2008-11-23 | 54 | But, lacking tuple support in the language, Format<Pair<Pair<X, Y>, Z>> is considerably harder to read (and write) than Format3<X, Y, Z>. A better language, or a future Java, might allow Format<(X, Y, Z)>. | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 55 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 56 | If Java had type inference, then sometimes Format<Pair<Pair<X, Y>, Z>> would have been fine, because it would not have actually appeared in user code, e.g.: | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 57 | ||
| 2543b98a » | Ricky | 2008-11-23 | 58 | var threeInts = Format.integer.andThen(Format.integer).andThen(Format.integer); | |
| 59 | var dateFormat = threeInts.map(Date.xFunction); | ||||
| 0e84cbf2 » | Ricky | 2008-11-20 | 60 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 61 | As it is, a user of binary4j is 'punished' for introducing an explaining variable like threeInts above, though binary4j takes great effort to minimise that (using Format3 instead of Format<Pair<Pair..>>. | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 62 | ||
| 32eb8d56 » | Ricky | 2008-11-23 | 63 | If Java had support for closures, then the xFunction implementations could have been much much simpler. In fact, XFunction might not even exist, as it just represents a tuple of (X => Y, Y => X). | |
| 0e84cbf2 » | Ricky | 2008-11-20 | 64 | ||
| 65 | If you are using binary4j, please let me know, and I'll do what I can to help you. Other than that, it will evolve as and when I use it, which might be daily or never. | ||||
