x12pp is a CLI pretty-printer for X12 EDI files.
X12 is an arcane format consisting of a fixed-length header followed by a series of segments, each separated by a segment terminator character.
These segments are generally not separated by newlines, so extracting a range of lines from a file or taking a peek at the start using the usual Unix toolbox becomes unnecessarily painful.
Of course, you could split the lines using
sed -e 's/~/~\n/g' and get on with
your day, but:
- although the
~is the traditional and most widely-used segment terminator it's not required -- each X12 file specifies its own terminators as part of the header.
perlwould mean I wouldn't have a chance to explore fast stream processing in Rust.
So here we are.
$ brew tap clarkema/nomad $ brew install x12pp
$ cargo install x12pp
x12pp is written in Rust, so you'll need an up-to-date Rust installation in
order to build it from source. The result is a statically-compiled binary at
target/release/x12pp, which you can copy wherever you need.
$ git clone https://github.com/clarkema/x12pp $ cd x12pp $ cargo build --release $ ./target/release/x12pp --version
$ x12pp < FILE > NEWFILE $ x12pp FILE -o NEWFILE # Strip newlines out instead with: $ x12pp --uglify FILE
See manpage or
--help for more.
All tests were performed on an Intel Core i9-7940X, using a 1.3G X12 test file
located on a RAM disk. In each case, shell redirection was used to
pipe the file through the test command and into
/dev/null in order to get
as close as possible to measuring pure processing time. For example:
$ time sed -e 's/~/~\n/g' < test-file > /dev/null
|GNU sed 4.7||
||✗||✓ but slower||✗||8.5s|
- 'SIGPIPE' refers to whether a command can return a partial result without
having to process the entire input. One of the motivations for
x12ppwas to be able to run
x12pp < FILE | head -n 100without having to plough through a multi-gigabyte file.
- Of course you could write a Perl script that did correctly read the segment terminator before processing the rest of the file.
- Perl produces the correct output with input data that is already wrapped, but much slower; around 24 seconds compared to 8.5.
- See https://github.com/notpeter/edicat for edicat