Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

70 lines (66 sloc) 5.536 kB

Multitool - Command Line Reference

multitool [param] [param] ...

first tap must be a source and last tap must be a sink

options:
-h|--helpshow this help text
--markdowngenerate help text as GitHub Flavored Markdown
--dot=filenamewrite a plan DOT file, then exit
taps:
sourcean url to input data
source.namename of this source, required if more than one
source.skipheaderset true if the first line should be skipped
source.hasheaderset true if the first line should be used for field names
source.delimdelimiter used to separate fields
source.seqfileread from a sequence file instead of text; specify N fields, or 'true'
sinkan url to output path
sink.selectfields to sink
sink.replaceset true if output should be overwritten
sink.compresscompression: enable, disable, or default
sink.writeheaderset true to write field names as the first line
sink.delimdelimiter used to separate fields
sink.seqfilewrite to a sequence file instead of text; writeheader, delim, and compress are ignored
operations:
rejectregex, matches are discarded. all fields are matched unless args is specified
reject.argsfields to match against
selectregex, matches are kept. matches against all fields unless args is given
select.argsfields to match against
cutsplit the first field, and return the given fields, or all fields. 0 for first, -1 for last
cut.delimregex delimiter, defaut: '\t' (TAB)
parseparse the first field with given regex
parse.groupsregex groups, comma delimited
retainnarrow the stream to the given fields. 0 for first, -1 for last
discardnarrow the stream removing the given fields. 0 for first, -1 for last
pgenparse the first field with given regex, return as new tuples
replaceapply replace the regex
replace.replacereplacement string
replace.replaceAlltrue if pattern should be applied more than once
groupwhat fields to group/sort on, grouped fields are sorted
group.secondaryfields to secondary sort on
group.secondary.reverseset true to reverse secondary sort
joinwhat fields to join and group on, grouped fields are sorted
join.lhssource name of the lhs of the join
join.lhs.grouplhs fields to group on, default FIRST
join.rhssource name of the rhs of the join
join.rhs.grouprhs fields to group on, default FIRST
join.joinerjoin type: inner, outer, left, right
join.namebranch name
concatjoin the given fields, will join ALL by default
concat.delimdelimiter, defaut: '\t' (TAB)
gensplit the first field, and return the given result fields as new tuples
gen.delimregex delimiter, defaut: '\t' (TAB)
countcount the number of values in the grouping
sumsum the values in the grouping
expruse java expression as function, e.g. $0.toLowerCase()
expr.argsthe fields to use as arguments
sexpruse java expression as filter, e.g. $0 != null
sexpr.argsthe fields to use as arguments
debugprint tuples to stdout of task jvm
debug.prefixa value to distinguish which branch debug output is coming from
filenameinclude the filename from which the current value was found
filename.appendappend the filename to the record
filename.onlyonly return the filename
uniquereturn the first value in each grouping

This release is licensed under the Apache Software License 2.0.

Jump to Line
Something went wrong with that request. Please try again.