Releases: brentp/gargs
Releases · brentp/gargs
small fix
run-time in log output
v0.3.8
- report run-time in log output
compression
0.3.7
- compress temporary files with gzip (BestSpeed)
- default to bash instead of sh if it exists and SHELL is not specified
v0.3.6
0.3.6
- output gargs version in help.
- restore --ordered (-o) to keep order of output same as input.
this will cache 3*proccesses output waiting for the slowest job to finish.
This means that if the user requested 10 processes (-p 10) then there could
be up to 30 finished jobs waiting for a slow job to finish. If these are input
memory, they are guaranteed to take <= 1MB (+ go's overhead). If they are larger
than 1MB, then their data will be on disk.
This is implemented carefully such that the performance penalty will be small
unless there are few extremely long-running process outliers. - set $PROCESS_I environment variable for each line (or batch of lines).
- read
GARGS_PROCESS_BUFFER
to let user set size of data before a tempfile is used. - read
GARGS_WAIT_MULTIPLIER
to determine how many finished processes will wait for single slow processes
higher values improve concurrency at the expense of memory. - better cleanup of tmp files in case of process halt.
--log and default to continue on error.
0.3.5
- Flush Stdout every 2 seconds.
- Nice String() output for *Command that show time to run, error, etc.
- Colorized errors
- Fix error/exit-code tracking when a tmpfile is used.
- remove --continue-on-error (-c) and make that the default. Introduce --stop-on-error (-s).
- add --log argument where each command is logged. If successful it is prefixed with '#' if not, it is printed as-is. If the entire execution ends succesfully, the last line will be '# SUCCESS' otherwise it will show, e.g. '# FAILED 3 commands'. The failed commands are easily grep'ed from the log with "grep -v ^# $log"
usability improvements
As of this version gargs
will not read everything into memory as before. It will read up to 1MB. If it does not get an EOF by then it start using a temp-file. This will reduce memory usage.
It also fixes --nlines to be quite useful e.g.:
cat regions.txt | gargs -p 20 -n 10 "bcftools view some.bam {}"
will send 10 regions to each process to amortize the cost of loading the index into memory. Note that the place-holder {}
is specified only once.
It also defaults --sep to "\s+" if --nlines is not specified.
Finally, it adds a --retry argument that takes an integer that indicates the number of times a failed process should be retried. This is nice for transient network errors.
cleanup
--dry-run
usage: gargs [--procs PROCS] [--nlines NLINES] [--sep SEP] [--shell SHELL] [--verbose] [--continue-on-error] [--ordered] [--dry-run] COMMAND
positional arguments:
command command to execute
options:
--procs PROCS, -p PROCS
number of processes to use [default: 1]
--nlines NLINES, -n NLINES
number of lines to consume for each command. -s and -n are mutually exclusive. [default: 1]
--sep SEP, -s SEP regular expression split line with to fill multiple template spots default is not to split. -s and -n are mutually exclusive.
--shell SHELL shell to use [default: bash]
--verbose, -v print commands to stderr before they are executed.
--continue-on-error, -c
report errors but don't stop the entire execution (which is the default).
--ordered, -o keep output in order of input; default is to output in order of return which greatly improves parallelization.
--dry-run, -d print (but do not run) the commands (for debugging)
--help, -h display this help and exit
positional arguments.
v0.2.0 allow specifying multiple arguments with {0}, {1}, {2}