Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

480 lines (316 sloc) 27.162 kB


JSON is an excellent data interchange format and rapidly becoming the preferred format for Web APIs. Thusfar, most of the tools to process it are very limited. Yet, when working in Javascript, JSON is fluid and natural.

Why can't command-line Javascript be easy?

Underscore-CLI can be a simple pretty printer:

cat data.json | underscore print

Or it can form the backbone of a rich, full-powered Javascript command-line, inspired by "perl -pe", and doing for structured data what sed, awk, and grep do for text.

cat example-data/earthporn.json | underscore extract 'data.children' | underscore pluck data | underscore pluck title

See Real World Example for the output and more examples.

Underscore-CLI is:

  • FLEXIBLE - THE "swiss-army-knife" tool for processing JSON data - can be used as a simple pretty-printer, or as a full-powered Javascript command-line
  • POWERFUL - Exposes the full power and functionality of underscore.js (plus underscore.string, json:select, and CoffeeScript)
  • SIMPLE - Makes it simple to write JS one-liners similar to using "perl -pe"
  • CHAINED - Multiple command invokations can be chained together to create a data processing pipeline
  • MULTI-FORMAT - Rich support for input / output formats - pretty-printing, strict JSON, etc. See Data Formats
  • DOCUMENTED - Excellent command-line documentation with multiple examples for every command

A Bit More Explanation ...

Underscore-CLI is built on Node.js, which is less than a 4M download and very easy to install. Node.js is rapidly gaining mindshare as a tool for writing scalable services in Javascript.

Unfortutately, out-of-the-box, Node.js is a pretty horrible as a command-line tool. This is what it takes to simply echo stdin:

cat foo.json | node -e '
  var data = "";
  process.stdin.on("data", function (d) {
    data = data + d;
  process.stdin.on("end", function () {
    // put all your code here

Ugly. Underscore-CLI handles all the verbose boilerplate, making it easy to do simple data manipulations:

echo '[1, 2, 3, 4]' | underscore process 'map(data, function (value) { return value+1 })'

If you are used to seeing "", note that because we arn't worried about keeping the global namespace clean, many useful functions (including all of underscore.js) are exposed as globals.

Of course 'mapping' a function to a dataset is super common, so as a shortcut, it's exposed as a first-class command, and the expression you provide is auto-wrapped in "function (value, key, list) { return ... }".

echo '[1, 2, 3, 4]' | underscore map 'value+1'

Also, while you can pipe data in, if the data is just a string like the example above, there's a shortcut for that too:

underscore -d '[1, 2, 3, 4]' map 'value+1'

Or if it's stored in a file, and you want to write the output to another file:

underscore -i data.json map 'value+1' -o output.json

Here's what it takes to increment the minor version number for an NPM package (straight from our Makefile):

underscore -i package.json process 'vv=data.version.split("."),vv[2]++,data.version=vv.join("."),data' -o package.json

Installing Underscore-CLI

Installing Node (command-line javascript)

Installing Node is easy. It's only a 4M download:

Download Node

Alternatively, if you do homebrew, you can:

brew install node

For more details on what node is, see this StackOverflow thread


npm install -g underscore-cli
underscore help



If you run the tool without any arguments, this is what prints out:

  underscore <command> [--in <filename>|--data <JSON>|--nodata] [--infmt <format>] [--out <filename>] [--outfmt <format>] [--quiet] [--strict] [--text] [--coffee] [--js]


  help [command]      Print more detailed help and examples for a specific command
  type                Print the type of the input data: {object, array, number, string, boolean, null, undefined}
  print               Output the data without any transformations. Can be used to pretty-print JSON data.
  run <exp>           Runs arbitrary JS code. Use for CLI Javascripting.
  process <exp>       Run arbitrary JS against the input data.  Expression Args: (data)
  extract <field>     Extract a field from the input data.  Also supports field1.field2.field3
  map <exp>           Map each value from a list/object through a transformation expression whose arguments are (value, key, list).'
  reduce <exp>        Boil a list down to a single value by successively combining each element with a running total.  Expression args: (total, value, key, list)
  reduceRight <exp>   Right-associative version of reduce. ie, 1 + (2 + (3 + 4)). Expression args: (total, value, key, list)
  select <jselexp>    Run a 'JSON Selector' query against the input data. See
  find <exp>          Return the first value for which the expression Return a truish value.  Expression args: (value, key, list)
  filter <exp>        Return an array of all values that make the expression true.  Expression args: (value, key, list)
  reject <exp>        Return an array of all values that make the expression false.  Expression args: (value, key, list)
  flatten             Flattens a nested array (the nesting can be to any depth). If you pass '--shallow', the array will only be flattened a single level.
  pluck <key>         Extract a single property from a list of objects
  keys                Retrieve all the names of an object's properties.
  values              Retrieve all the values of an object's properties.
  extend <object>     Override properties in the input data.
  defaults <object>   Fill in missing properties in the input data.
  any <exp>           Return 'true' if any of the values in the input make the expression true.  Expression args: (value, key, list)
  all <exp>           Return 'true' if all values in the input make the expression true.  Expression args: (value, key, list)
  isObject            Return 'true' if the input data is an object with named properties
  isArray             Return 'true' if the input data is an array
  isString            Return 'true' if the input data is a string
  isNumber            Return 'true' if the input data is a number
  isBoolean           Return 'true' if the input data is a boolean, ie {true, false}
  isNull              Return 'true' if the input data is the 'null' value
  isUndefined         Return 'true' if the input data is undefined
  template <filename> Process an underscore template and print the results. See 'help template'


  -h, --help            output usage information
  -V, --version         output the version number
  -i, --in <filename>   The data file to load.  If not specified, defaults to stdin.
  --infmt <format>      The format of the input data. See 'help formats'
  -o, --out <filename>  The output file.  If not specified, defaults to stdout.
  --outfmt <format>     The format of the output data. See 'help formats'
  -d, --data <JSON>     Input data provided in lieu of a filename
  -n, --nodata          Input data is 'undefined'
  -q, --quiet           Suppress normal output.  'console.log' will still trigger output.
  --strict              Use strict JSON parsing instead of more lax 'eval' syntax.  To avoid security concerns, use this with ANY data from an external source.
  --text                Parse data as text instead of JSON. Sets input and output formats to 'text'
  --coffee              Interpret expression as CoffeeScript. See
  --js                  Interpret expression as JavaScript. (default is "auto")


  underscore map --data '[1, 2, 3, 4]' 'value+1'
  # [2, 3, 4, 5]

  underscore map --data '{"a": [1, 4], "b": [2, 8]}' '_.max(value)'
  # [4, 8]

  echo '{"foo":1, "bar":2}' | underscore map -q 'console.log("key = ", key)'
  # "key = foo\nkey = bar"

  underscore pluck --data "[{name : 'moe', age : 40}, {name : 'larry', age : 50}, {name : 'curly', age : 60}]" name
  # ["moe", "larry", "curly"]

  underscore keys --data '{name : "larry", age : 50}'
  # ["name", "age"]

  underscore reduce --data '[1, 2, 3, 4]' 'total+value'
  # 10

Data Formats


Output dense JSON using JSON.stringify

{"num":9,"str1":"Hello World","str2":"Hello World","object0":{},"object1":{"a":1,"b":2},"object2":{"3":3,"a":1,"b":2,"prop1":1,"prop2":2},"array0":[],"array1":[1,2,3,4],"array2":[1,2,null,null,null,6],"array3":[1,2,3,3],"date1":"2012-06-28T22:02:25.993Z","date2":"2012-06-28T22:02:25.993Z","err1":{},"err2":{"3":3,"prop1":1,"prop2":2},"regex1":{},"regex2":{"3":3,"prop1":1,"prop2":2},"null1":null,"deep":{"a":[{"longstr":"nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu","b":{"c":{}}}],"g":{"longstr":"nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu"}}}


Output strictly correct, human-readible JSON w/ smart whitespace

  "num": 9,
  "str1": "Hello World",
  "str2": "Hello World",
  "object0": { },
  "object1": { "a": 1, "b": 2 },
  "object2": { "3": 3, "a": 1, "b": 2, "prop1": 1, "prop2": 2 },
  "array0": [ ],
  "array1": [1, 2, 3, 4],
  "array2": [1, 2, null, null, null, 6],
  "array3": [1, 2, 3, 3, "prop1": 1, "prop2": 2],
  "date1": "2012-06-28T22:02:25.993Z",
  "date2": "2012-06-28T22:02:25.993Z",
  "err1": { },
  "err2": { "3": 3, "prop1": 1, "prop2": 2 },
  "regex1": { },
  "regex2": { "3": 3, "prop1": 1, "prop2": 2 },
  "null1": null,
  "deep": {
    "a": [
        "longstr": "nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu",
        "b": { "c": { } }
    "g": {
      "longstr": "nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu"


Output lax JSON (output is valid JS object syntax, but not strict JSON).

  num: 9,
  str1: 'Hello World',
  str2: 'Hello World',
  object0: { },
  object1: { a: 1, b: 2 },
  object2: { '3': 3, a: 1, b: 2, prop1: 1, prop2: 2 },
  array0: [ ],
  array1: [1, 2, 3, 4],
  array2: [1, 2, null, undefined, , 6],
  array3: [1, 2, 3, 3, prop1: 1, prop2: 2],
  date1: "2012-06-28T22:02:25.993Z",
  date2: "2012-06-28T22:02:25.993Z",
  err1: { },
  err2: { '3': 3, prop1: 1, prop2: 2 },
  regex1: { },
  regex2: { '3': 3, prop1: 1, prop2: 2 },
  null1: null,
  undef1: undefined,
  deep: {
    a: [
        longstr: 'nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu',
        b: { c: { } }
    g: {
      longstr: 'nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu'


Uses Node's 'util.inspect' to print the output

{ num: 9,
  str1: 'Hello World',
  str2: 'Hello World',
  object0: {},
  object1: { a: 1, b: 2 },
  object2: { '3': 3, a: 1, b: 2, prop1: 1, prop2: 2 },
  array0: [],
  array1: [ 1, 2, 3, 4 ],
  array2: [ 1, 2, null, undefined, , 6 ],
  array3: [ 1, 2, 3, 3, prop1: 1, prop2: 2 ],
  date1: Thu, 28 Jun 2012 22:02:25 GMT,
  date2: { Thu, 28 Jun 2012 22:02:25 GMT '3': 3, prop1: 1, prop2: 2 },
  err1: [Error: my err msg],
  err2: { [Error: my err msg] '3': 3, prop1: 1, prop2: 2 },
  regex1: /^78/,
  regex2: { /^78/ '3': 3, prop1: 1, prop2: 2 },
  fn1: [Function],
  fn2: [Function: fn_name],
  fn3: { [Function: fn_name] '3': 3, prop1: 1, prop2: 2 },
  null1: null,
  undef1: undefined,
   { a: 
      [ { longstr: 'nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu',
          b: { c: {} } } ],
     g: { longstr: 'nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu' } } }


If data is a string, it is printed directly without quotes. If data is an array, elements are separated by newlines. Objects and arrays-within-arrays are JSON formated into a single line

{"num":9,"str1":"Hello World","str2":"Hello World","object0":{},"object1":{"a":1,"b":2},"object2":{"3":3,"a":1,"b":2,"prop1":1,"prop2":2},"array0":[],"array1":[1,2,3,4],"array2":[1,2,null,null,null,6],"array3":[1,2,3,3],"date1":"2012-06-28T22:02:25.993Z","date2":"2012-06-28T22:02:25.993Z","err1":{},"err2":{"3":3,"prop1":1,"prop2":2},"regex1":{},"regex2":{"3":3,"prop1":1,"prop2":2},"null1":null,"deep":{"a":[{"longstr":"nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu","b":{"c":{}}}],"g":{"longstr":"nuhaosenthuasoenthuasoenthuasoenthuasoenthuasnoethuasnoethuasonethuasnoethusanoethiasnoethuasonethuasoenhuasnoethuasnoethuasonethusanoethusnaoethuasnoethuiasnoeidaosneutdhaoesntuhaoesnthuasonehuasnoethuaosentuhasoenthuaosnethuasoenthuasoenthuasoentuhasnoethuasnoehuasnoethuasnoethuasonethuasnotehuasnotehuasnoethuasonetu"}}}

Real World Examples

Playing with data from a webservice

Let's play with a real data source, like For convenience (and consistent test results), an abbreviated version of this data is stored in example-data/earthporn.json.

First of all, note how raw unformatted JSON is really hard to parse with your eyes ...

lse,"title":"Eating breakfast in the Norwegian woods! Captured with my phone [2448x3264] ","num_comments":70,"score":960
tml":null,"selftext":"","likes":null,"saved":false,"id":"rwgmb","clicked":false,"title":"The Rugged Beauty of Zion NP Ut
ah at Sunrise [OC] (1924x2579)","num_comments":5,"score":72,"approved_by":null,"over_18":false,"hidden":false,"thumbnail
false,"title":"Falls and island near Valdez, AK on a rainy day [4200 x 3000]","num_comments":10,"score":573,"approved_by

As I've already mentioned, it would be trivial to pretty print the data with 'underscore print'. However, if we are just trying to get a sense of the structure of the data, we can do one better:

TODO: working on a 'summarize' command -- INSERT_THAT_HERE (2012-05-04)

Now, let's say that we want a list of all the image titles; using a json:select query, this is downright trivial:

cat example-data/earthporn.json | underscore select .title

Which prints:

[ 'Fjaðrárgljúfur canyon, Iceland [OC] [683x1024]',
  'New town, Edinburgh, Scotland [4320 x 3240]',
  'Sunrise in Bryce Canyon, UT [1120x700] [OC]',
  'Kariega Game Reserve, South Africa [3584x2688]',
  'Valle de la Luna, Chile [OS] [1024x683]',
  'Frosted trees after a snowstorm in Laax, Switzerland [OC] [1072x712]' ]

If we want to grep the results, 'text' is a better format choice:

cat example-data/earthporn.json | underscore select .title --outfmt text

Fjaðrárgljúfur canyon, Iceland [OC] [683x1024]
New town, Edinburgh, Scotland [4320 x 3240]
Sunrise in Bryce Canyon, UT [1120x700] [OC]
Kariega Game Reserve, South Africa [3584x2688]
Valle de la Luna, Chile [OS] [1024x683]
Frosted trees after a snowstorm in Laax, Switzerland [OC] [1072x712]

Let's create code-style names for those images using the 'camelize' function from underscore.string.

cat earthporn.json | underscore select '.data .title' | underscore map 'camelize(value.replace(/\[.*\]/g,"")).replace(/[^a-zA-Z]/g,"")' --outfmt text

Which prints ...


Try doing THAT with any other CLI one-liner!

Version-bump in package.json

This one is straight out of our own Makefile:

underscore -i package.json process 'vv=data.version.split("."); vv[2]++; data.version=vv.join("."); data;' -o package.json

Getting a greppable list of URLs fetched during the load of a website

This is one I did at work the other day. Chrome --> Dev Console (CMD-OPT-J) --> Network Tab --> (right click context menu) --> Save All as HAR. I have no idea why it's called a "HAR" file, but it's pure JSON data ... pretty verbose stuff, but I just want the urls ...

cat site.har | underscore select '.url' --outfmt text | grep mydomain > urls.txt

Well, I'd also like to ack through the contents of all those files. Best to get a local snapshot of it all:

cat urls.txt | while read line; do curl $line > $(echo $line | perl -pe 's/https?://([^?]*)[?]?.*/$1'); done

And I'm off to the races analyzing the behavior and load ordering of a complex production site that dynamically loads (literally) hundreds of individual resources off the network. Sure, I could have viewed all that stuff inside Chrome, but I wanted a local directory-structured snapshot that I could serve on a local Nginx instance by adding entries in /etc/hosts that mapped the production domains to Now I can run the exact production site locally, make changes, and see what they would do.

Look at for a more comprehensive list of examples.

Polish: 1001 Little Conveniences

Templates as first class NPM modules - ie, real stack traces

When using the 'template' command, we go to great length to provide a fully debuggable experience. We have a custom version of the template compilation code (templates are compiled to JS and then evaluated) that ensures a 1:1 mapping between line numbers in the original *.template file and line numbers in the generated JS code. This code is then loaded as if it were a real Node.js module (literally, using a require() statement). This means that should anything go wrong, the resulting stack traces and sytax exceptions will have correct line numbers from the original template file.

Expressions auto-return the last value

This one is a bit CoffeeScript inspired. When we parse command-line expressions for commands like 'map', they are evaluated as NodeScript objects. This allows us to retrieve the last value in the expression. In a previous version we wrapped expressions in function boilerplate; however this blocked the use of semicolons within an expression. With first class Script objects, we can evaluate multiple semicolon delimited expressions and still capture the value from the last expression evaluated. Thus, all of the following expressions will return "10".

underscore run '5 + 5'
underscore run 'x=5; y=5; x+y;'
underscore run 'x=5, y=5, x+y;'

This even works to find the last evaluated value inside conditional branches (these also return 10):

underscore run 'x=5; if (x > 0) { 10; } else { 0; }'           # last value is 10
underscore run 'x=5; if (x > 0) { y=5; } else { y=-99; } x+y;' # last value is 'x+y' 

In general, the principle here is that you shouldn't have think to hard because the code should just return what you intuitively expect.

Autodetection of CoffeeScript

If you type a CoffeeScript expression and forget to use the '--coffee' flag, Underscore-CLI will first attempt to parse it as JavaScript, and if that fails, parse it as CoffeeScript.

However, a warning is emitted:

"Warning: Parsing user expression 'foo?.bar?.baz' as CoffeeScript.  Use '--coffee' to be more explicit."

Why do we print a warning? Unfortunately, there are a number of language features that are ambiguous between JS and Coffee. ie, expressions that are valid in both languages but with different meaning. For example:

test ? 10 : 20;  // JS: if test is true, then 10, else 20
test ? 10 : 20;  // Coffee: if test is true, then test, else {10: 20}.  Tragic.

Smart auto-consumption of STDIN

TBI - as of this version, if there is no data, we will block for reading STDIN. We should only do this if the user expression refers to the well-known 'data' variable. This would unify the 'process' and 'run' commands.

Smart auto-detection of return value

TBI - as of this version, the last evaluated expression value is always returned. However, sometimes, you want to mutate the existing data instead of returning a new value. This should be easy. If the expression does something like 'data.key = value', then the return value should be 'data'. Today, you have to write 'data.key = value; data'. I want that last part to be implicit, but only if you mutate the data variable. And there should be a command-line flag "--retval={expr,data,auto}", with 'auto' being the default.

Efficienct stream processing for set-oriented commands like 'map'

TBI - as of this version, all commands slurp the entire input stream and parse it before doing any data manipulation. This works fine for the vast majority of scenarios, but if you actually had a 30GB JSON file, it would be a bit clunky. For set-oriented commands like 'map', a smarter core engine plus a smarter JSON parser could enable stream-oriented processing where data processing occurs continuously as the input is read and streamed to the output without ever needing to store the entire dataset in memory at once. This feature requires a custom JSON-parser and some serious fancy, but I'll get to it eventually. If you have any performance-sensitive use-cases, post an issue on Github, and I'd be glad to work with you.


  • jsonpipe - Python focused, w/ a featureset centered around a single scenario
  • jshon - Has a lot of functions, but very terse
  • json-command - very limited
  • TickTick - Bash focused JSON manipulation. Iteresting w/ heavy Bash integration. Complements this tool.
  • json - Similar idea.
  • jsawk - Similar idea. Uses a custom JS environment. Good technical documentation.
  • jsonpath - this is not a CLI tool. It's a runtime JS library.
  • json:select() - this is not a CLI tool. CSS-like selectors for JSON. Very interesting idea.... now available as an Underscore-CLI command.

Please add a Github issue if I've missed any.

Jump to Line
Something went wrong with that request. Please try again.