Skip to content
ignore-me edited this page Sep 9, 2014 · 2 revisions

cgrep

cgrep is a wrapper around the cmr-grep that provides an interface that is familiar to users of grep.

The general form of a cgrep command is.

$ cgrep [-flags] "<pattern>" "fileglob"

Find rock in any of the files in mountain.

cgrep "rock" "/mnt/gv0/user/hive/warehouse/mountain/*"
{"name":"rock", "type":"rock"}

Find rock in any of the files in mountain and output to rocks.out.

cgrep "rock" "/mnt/gv0/user/hive/warehouse/mountain/*" > rocks.out

Additionally ignore-case...

cgrep -i "rock" "/mnt/gv0/user/hive/warehouse/mountain/*"
{"name":"rock", "type":"rock"}
{"name":"Rockman", "type":"robot"}

supported flags

  -i, --ignore-case         ignore case distinctions
  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)
  -e, --regexp PATTERN      use PATTERN for matching
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -v, --invert-match        select non-matching lines
  -c, --count               print only a count of matching lines per FILE
  -o, --only-matching       show only the part of a line matching PATTERN
  -n, --line-number         print line number with output lines
  -q, --quiet               suppress all normal output

cmr-grep

cmr-grep is the client that cgrep is built on top of, it provides mostly the same functionality though a flag based interface, making it slightly more verbose.

The general form of a cmr-grep command is.

cmr-grep --input "<input>" --pattern "<pattern>" -o "<output>" [--flags "<flags>"]

Find rock in any of the files in the mountain.

cmr-grep --pattern "rock" --input "/mnt/gv0/user/hive/warehouse/mountain/pdate=2014-01-01/ptime=0/*" --stdout
{"name":"rock", "type":"rock"}

Find rock in any of the files in the mountain and output to rocks.out.

cmr-grep --pattern "rock" --input "/mnt/gv0/user/hive/warehouse/mountain/*" --stdout > rocks.out

Additionally ignore-case...

cmr-grep --flags "-i"  --pattern "rock" --input "/mnt/gv0/user/hive/warehouse/mountain/*" --stdout
{"name":"rock", "type":"rock"}
{"name":"Rockman", "type":"robot"}

cget

cget is wrapper around cmr specifically for querying json formatted time-series data. It provides a simplified interface to subset of the functionality that cmr provides. cget requires some specific configuration in order to function correctly. See Configuration for more information.

The general form of a cget command is.

cget --select "<fields>" --from "<table>" --filter "<filters>" --between "<start-date> and <end-date>"

Find out how many bananas were purchased each day during January of 2014

cget --select "day" --from "purchases" --filter "+type:banana" --between "2014-01-01 and 2014-02-01"
2014-01-01  38
2014-01-02  42
2014-01-03  16
2014-01-04  54
... etc ...
2014-01-31  25

Find out how many bananas or oranges were purchased on the first of January 2014

cget --select "day,type" --from "purchases" --filter "+type:banana,+type:orange" --between "2014-01-01 and 2014-01-02"
2014-01-01  banana  38
2014-01-01  orange  22

cmr

cmr is the client that cget is built on top of, it provides an interface for performing map-reduce jobs. cmr is packaged along with a general JSON mapper and a reducer which are used to implement cget.

The general form of a cmr command is.

cmr --input "<input>" --mapper "<mapper>" --reducer "<reducer>" -o "<output>"

Find out how many bananas were purchased each day during January of 2014

cmr --input "/mnt/gv0/user/hive/warehouse/purchases/day=2014-01-*/*" --mapper "cmr_map_json day _1 +type:banana" --reducer "cmr_reduce s" --stdout
2014-01-01  38
2014-01-02  42
2014-01-03  16
2014-01-04  54
... etc ...
2014-01-31  25

Find out how many bananas or oranges were purchased on the first of January 2014

cmr --input "/mnt/gv0/user/hive/warehouse/purchases/day=2014-01-*/*" --mapper "cmr_map_json day _1 +type:banana +type:orange" --reducer "cmr_reduce s" --stdout
2014-01-01  banana  38
2014-01-01  orange  22
Clone this wiki locally