terse

Output randomly sampled lines from input stream or file. Uses simple reservoir sampling algorithm to process input with linear time complexity. Suitable for processing streams, seeing each line only once. Retains relative order of lines.

Usage example

> seq 1000000 | terse -n 5
349893
539678
576919
738393
758023

Performance

Comparison against shuf -n on real data: 5.1GB nginx log with 17451712 lines in it.

root@logger:~# ls -lh /var/log/remote/nginx/2023_02_02_18.log
-rw-r----- 1 root logs 5.1G Feb  2 18:59 /var/log/remote/nginx/2023_02_02_18.log
root@logger:~# wc -l /var/log/remote/nginx/2023_02_02_18.log
17451712 /var/log/remote/nginx/2023_02_02_18.log
root@logger:~# time terse -i /var/log/remote/nginx/2023_02_02_18.log -n 25 > /dev/null

real    0m2.656s
user    0m1.315s
sys     0m1.372s
root@logger:~# time shuf -n 25 /var/log/remote/nginx/2023_02_02_18.log > /dev/null

real    0m22.784s
user    0m21.059s
sys     0m1.703s

It processes about tens of millions of lines per second on modern computer. Most likely I/O will become bottleneck in such sampling rather than application performance will be an issue.

Installation

Binaries

Pre-built binaries are available here.

Build from source

Alternatively, you may install terse from source. Run the following within the source directory:

make install

Docker

A docker image is available as well. Here is an example of running terse in a pipeline with docker:

seq 5 | docker run -i --rm yarmak/terse

Synopsis

> terse -h
Usage:

terse [OPTION]...

Options:
  -buffered
    	buffer control (default true)
  -i string
    	use input file instead of stdin
  -n int
    	number of lines to sample (default 25)
  -o string
    	use output file instead of stdout
  -seed value
    	use fixed random seed (default is a value from CSPRNG)
  -version
    	show program version and exit
  -z	line delimiter is NUL, not newline

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
cmd/terse		cmd/terse
reservoir		reservoir
rng		rng
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

terse

Usage example

Performance

Installation

Binaries

Build from source

Docker

Synopsis

About

Releases 2

Languages

License

Snawoot/terse

Folders and files

Latest commit

History

Repository files navigation

terse

Usage example

Performance

Installation

Binaries

Build from source

Docker

Synopsis

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Languages