Skip to content

@jondegenhardt jondegenhardt released this Jun 14, 2019

One change:

  • Fixes incorrect comma use in the dub.json file. Needed to planned changes in dub. Also needed for dlang CI pipelines.

There are no changes to any of the tools.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.2/tsv-utils-v1.4.2_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.2/tsv-utils-v1.4.2_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4

@jondegenhardt jondegenhardt released this Apr 7, 2019

This release contains one new feature and several performance improvements:

  • tsv-uniq --number - Line numbering grouped by key (new feature). The key is either the whole line or a subset of fields. Each unique key gets its own set of line numbers. See the tsv-uniq reference for details.
  • Improved I/O read performance. This was achieved by using a buffered version of std.stdio.File.byLine. Especially effective for narrow files. Tools using byLine (most of the tools) typically see a 10-40% performance gain, depending on tool and type of file (measured on OS X). Implementation documentation: tsv_utils.common.utils.bufferedByLine.
  • Updated compiler to LDC 1.15.0 for prebuilt binaries (frontend/druntime/phobos 2.085.1). This includes an update to LLVM 8.0 and a couple of improvements to memory allocation and GC collection. The latter improved performance of several of the tools, especially tools like tsv-join that allocate large amounts of memory.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.1/tsv-utils-v1.4.1_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.1/tsv-utils-v1.4.1_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4
Apr 7, 2019
v1.4.0 release

@jondegenhardt jondegenhardt released this Nov 12, 2018 · 31 commits to master since this release

This release modifies tsv-sample random value printing so most values are printed in decimal notation, without exponents. This is for subsequent processing by GNU sort. Sorting numbers with exponents requires "general numeric" order (option 'g'), which is much slower than "numeric" order (option 'n'). See Shuffling large files on the Tips and Tricks page for more info.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.2/tsv-utils-v1.3.2_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.2/tsv-utils-v1.3.2_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4

@jondegenhardt jondegenhardt released this Nov 11, 2018 · 33 commits to master since this release

In this release:

  • tsv-sample: Adds full-line as key to distinct sampling. This completes the work that has been done on sampling over the last few point releases. tsv-sample now supports a fair set of sampling modes. Performance is also good, in keeping with the tradition of the other tsv-utils tools.
  • Prebuilt binaries have been updated to use the latest LDC compiler (1.12.0). This is a significant performance boost to regex search in tsv-filter. Unfortunately csv2tsv is a little slower.
  • The build system now supports using LDC's LTO compiled druntime and phobos libraries (those shipped with the compiler). This eliminates the need to download the druntime and phobos source code at build time. This is more convenient and supports package managers better.
  • Code level documentation now generates good documentation when used with the dpldocs documentation system. Go to the tsv-utils code documentation to see the result.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.1/tsv-utils-v1.3.1_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.1/tsv-utils-v1.3.1_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4
Nov 11, 2018
Release 1.3.0

@jondegenhardt jondegenhardt released this Oct 21, 2018 · 44 commits to master since this release

This release add several new sampling algorithms that improve runtime performance and memory utilization for a number of sampling use-cases. There are no new forms of sampling, just additional algorithms. The new algorithms:

  • A skip sampling implementation of Bernoulli sampling.
  • An implementation of reservoir sampling "Algorithm R" used for unweighted random sampling.
  • A line order randomization algorithm based on array shuffling.

Formal performance benchmarks have not been run. However, tests run on Mac OS as part of development show favorable results relative to other available tools, including GNU shuf.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.3/tsv-utils-v1.2.3_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.3/tsv-utils-v1.2.3_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4

@jondegenhardt jondegenhardt released this Oct 7, 2018 · 49 commits to master since this release

This release adds new capabilities and performance improvements to tsv-sample. Documentation was also updated to improve clarity. Key changes:

  • New feature: Simple random sampling with replacement - All lines from input sources are read in, then lines are repeated selected at random and written out. Lines can be output multiple times. The process continues until the specified number of samples has been written. Invoke using the -r|--replace and -n|--num NUM options.
  • New feature: Random value printing - A new feature was added for generating random values for all input lines. In the default case it shows the values used for Bernoulli sampling trials. It can also be used with 'distinct' sampling to show the sampling bucket a line is placed in based on the key-fields specified. This feature is invoked with the --gen-random-inorder option. A related feature, --print-random, was updated so that it is now supported by all applicable sampling modes.
  • Line order randomization performance improvements: One of the basic tsv-sample use cases is line order randomization. The case where all input lines are being permuted was re-written and is now quite a bit faster and uses less memory. This applies to both weighted and unweighted sampling. (The case where a subsampling is being done via the -n|--num option uses reservoir sampling was already fast.)
  • Command line option change - The option for specifying the probability used for Bernoulli sampling was changed from -r|--rate to -p|prob. This was done to create a more consistent set of option names for new features and features that may be added in the future.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.2/tsv-utils-v1.2.2_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.2/tsv-utils-v1.2.2_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4

@jondegenhardt jondegenhardt released this Aug 3, 2018 · 65 commits to master since this release

This release adds features for tsv-utils automated tests. There are no changes to any of the tools.

The new testing features add support for different correct output results for different compiler/library versions. The main case is for changes to error message text, which in some cases includes text from the phobos library.

Alternate test outputs were added for a planned change to Phobos in an upcoming release. This was bundled into a tagged release to support the D language project tester where tsv-utils is used.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.1/tsv-utils-v1.2.1_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.1/tsv-utils-v1.2.1_osx-x86_64_ldc2.tar.gz | tar xz
Assets 4

@jondegenhardt jondegenhardt released this Jul 16, 2018 · 68 commits to master since this release

This release changes the repository name from eBay/tsv-utils-dlang to eBay/tsv-utils. This better reflects the functionality provided by the TSV Utilities. There are no other changes. Please report any issues found with the name change on the Issues page.

Assets 4
You can’t perform that action at this time.