Skip to content

v2.0.0 Release: Named Fields

Compare
Choose a tag to compare
@jondegenhardt jondegenhardt released this 11 Jul 04:11
v2.0.0
4295ba5

To download and unpack prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v2.0.0/tsv-utils-v2.0.0_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v2.0.0/tsv-utils-v2.0.0_osx-x86_64_ldc2.tar.gz | tar xz

Installation instructions are in the ReleasePackageReadme.txt file in the release package.

To be notified of new releases:

GitHub supports notification of new releases. Click the "Watch" button on the repository page and select "Releases Only".

Release 2.0.0 Changes: Named Field Support

Release 2.0.0 adds named field support to all tools in the tsv-utils toolkit. This is a significant usability improvement.

Named fields can be used with any file or data stream that has a header line. Named fields are enabled by the --H|header option. Field numbers can be used as well, just as in the prior versions of the toolkit. Glob-style wildcards can be used and escapes can be used to specify field names containing special characters.

Details are available in the Field Syntax section of the Tools Reference manual.

Examples - Assume a file with the header fields:

 1    test_name
 2    run
 3    elapsed_time
 4    user_time
 5    system_time
 6    max_memory

Commands like the following can be used:

$ # Select individual fields, like 'cut'
$ tsv-select data.tsv -H -f user_time            # Field  4
$ tsv-select data.tsv -H -f test_name,user_time  # Fields 1,4
$ tsv-select data.tsv -H -f '*_time'             # Fields 3,4,5

$ # Filter lines using numeric comparisons against individual fields
$ tsv-filter data.tsv -H --lt elapsed_time:100
$ tsv-filter data.tsv -H --gt elapsed_time:100 --lt system_time:20

$ # Statistical summaries
$ tsv-summarize data.tsv -H --median elapsed_time
$ tsv-summarize data.tsv -H --median '*_time'
$ tsv-summarize data.tsv -H --group-by test_name --median '*_time'

$ # Uniq'ing on a field
$ tsv-uniq data.tsv -H -f test_name 

$ # Joins - Assume another file 'test_info.tsv' with 'test_name' and
$ # 'expected_time' fields. A join can be performed using column names.
$ tsv-join -H -f test_into.tsv data.tsv --key-fields test_name --append-fields expected_time

See the reference docs or online help for details on specific tools. There is also documentation in the Tools Overview section of the main project README file.

Named field support addresses enhancement request #25. It implemented via PRs #284 through #300.

Other Changes

  • Prebuilt binaries have been updated to use the latest LDC compiler (ldc-1.22.0).