jqawk is an awk-inspired programming language for wrangling JSON.
In these examples, gh.json
contains the response from https://api.github.com/users/alligator/repos
# Count the repos
jqawk "{ count++ } END { print count }" gh.json
# Find the most used language
jqawk "$.language != null { langs[$.language]++ } END { for (k, v in langs) print v, k }" gh.json | sort
# Find repos with open issues
jqawk "$.open_issues_count > 0 { print $.name, $.open_issues_count }" gh.json
# Find the year the most repos were created
jqawk "{ years[$.created_at.split('-')[0]]++ } END { for (k, v in years) print v, k }" gh.json | sort -n | tail -1
# Find the biggest repo
jqawk "$.size > max.size { max = $ } END { print max.name }" gh.json
# Remove all but the id and name properties, writing the result to stdout
jqawk -o - "{ $ = $.pluck('id', 'name') }" gh.json
While you can write full programs in jqawk, it's most useful for one-liners or in a pipeline.
Here's an example command-line:
$ jqawk "$.hours > 15 { print $.name }" emp.json
If the content of emp.json
is:
[
{ "name": "Beth", "rate": 4, "hours": 0 },
{ "name": "Dan", "rate": 3.75, "hours": 0 },
{ "name": "Kathy", "rate": 4, "hours": 10 },
{ "name": "Mark", "rate": 5, "hours": 20 },
{ "name": "Mary", "rate": 5.50, "hours": 22 },
{ "name": "Susie", "rate": 4.25, "hours": 18 }
]
This will print
Mark
Mary
Susie
The first argument is the program, the second is the JSON file. JSON can be read from stdin too.
The program is a series of rules, which are made up of a pattern and a body.
The pattern $.hours > 15
matches every item in the input where the hours
property is > 15.
The body { print $.name }
prints the name
property.
If you know awk, this should be familiar.
If the input is an array, each rule will run for each item in the array, with $
set to that item. If the input is any other type, each rule is run once with $
set to the input.
Here's a more complex program that calculates the pay (rate times hours) of each person:
BEGIN {
print 'Pay'
print '----------------'
}
$.hours > 0 {
printf("%-8s %f\n", $.name, $.rate * $.hours)
total += $.rate * $.hours
}
END {
print '----------------'
print'Total ', total
}
BEGIN
and END
are special rules that run before and after the input is processed.
A more complex program like this is better placed in a file. It can be run with the -f
flag:
$ jqawk -f prog.jqawk file.json
Pay
----------------
Kathy 40
Mark 100
Mary 121
Susie 76.5
----------------
Total 337.5
If the input value you want isn't at the root of the JSON, you can use the -r
flag to select it.
With this JSON:
{
"status": "success",
"result": [
{ "name": "alligator" },
{ "name": "someone else" }
]
}
This command selects the result
array as the root value:
$ jqawk -r "$.result" "{ print $.name }" file.json
alligator
someone else
The argument to -r
can be any valid expression.
jqawk provides if
, else
, for
, and while
control-flow statements:
BEGIN {
columns = ['name', 'rate', 'hours']
rows = []
}
{
if ($.name != 'Dan') {
rows.push([$.name, $.rate, $.hours])
}
}
END {
# print column headers
for (col in columns) {
printf('%-10s ', col)
}
# print divider
printf('\n')
i = 0
while (i < 32) {
printf('-')
i++
}
printf('\n')
# print rows
for (row in rows) {
printf('%-10s %-10f %-10f\n', row[0], row[1], row[2])
}
# print footer
for (i = 0; i < 32; i++) {
printf('-')
}
printf('\n')
}
$ jqawk -f prog.jqawk emp.json
name rate hours
--------------------------------
Beth 4 0
Kathy 4 10
Mark 5 20
Mary 5.5 22
Susie 4.25 18
--------------------------------
Regex literals can be matched against with the ~
operator:
$ jqawk "$.name ~ /^M/ { print $.name }" emp.json
Mark
Mary
Use !~
to invert the match.
Functions can be defined at the same level as rules:
function pay(rate, hours) {
return rate * hours
}
{
print $.name, pay($.rate, $.hours)
}
$ jqawk -f prog.jqawk emp.json
Beth 0
Dan 0
Kathy 40
Mark 100
Mary 121
Susie 76.5
If the input is an array, jqawk sets two variables in a pattern rule:
$
is the current item in the array$index
is the array index of that item
If the input is not an array jqawk sets one variable:
$
is the input value
You can pass jqawk multiple JSON files.
The BEGIN
and END
rules are run before and after processing all files.
Other rules are run for every file.
The $file
variable is set to the current file name.
Modifying $
will modify the JSON. The -o
flag directs the JSON output to a file, or -
for stdout:
$ cat example.json
[
{ "name": "alligator" },
{ "name": "someone else" }
]
$ jqawk -o - "{ $.name_length = $.name.length() }" example.json
[
{
"name": "alligator",
"name_length": 9
},
{
"name": "someone else",
"name_length": 12
}
]
A rough jqawk language reference.
! unary not
&& logical and
|| logical or
== equal
!= not equal
< less
<= less or equal
> greater
>= greater or equal
is is of type. value type names are:
string, bool, number, array, object, function, regex, unknown
+ add
- subtract
/ divide
* multiply
% modulo
++ pre/postfix increment
-- pre/postfix decrement
~ regex match
!~ regex not match
= assign
a[x] index
a[-x] index backwards from the end of the array
a.x property
123 number
"hello" string
'hello' string
true bool
false bool
[1, 2] array
{ a: 1, b: '2' } object
/hello/ regex
null null
pattern { body } runs when pattern is truthy
BEGIN { body } runs before the input is processed
END { body} runs after the input is processed
If the input is an array, each pattern rule runs for each item in the array.
If the input is not an array, each pattern rule runs once.
print <expression>, <expression>, ...
print <expression>s, separated by spaces
return <expression>
return from the current function, optionally returning the value of <expression>
break
exit the enclosing for or while loop
continue
immediately begin the next iteration of the enclosing loop
next
immediately exit the rule and process no further rules for the current item
exit
immediately exit the program
if (<expression>) <body> else <elsebody>
execute <body> if <expression> is truthy. If the value is falsy and the else
is present, execute <elsebody>
while (<expression>) <body>
execute the body until <expression> is falsy
for (<preexpression>; <checkexpression>; <postexpression>) <body>
execute <preexpression>, then execute { <body>; <postexpression> } until
<checkexpression> is falsy
for (<identifier> in <expression>) <body>
execute <body> for each item in <expression> with <identifier> set to it's
value.
if <expression> is an array, <identifier> is each item.
if it's an object, <identifier> is each key.
if it's a string, <identifier> is each character.
for (<identifier>, <indexidentifier> in <expression>) <body>
same as above, with <indexindeitifer> set to the index of the item for arrays
or the value for objects
printf(format_string, args...)
printf. supports %s and %f format codes, and width specifiers
json(arg)
return arg converted to a pretty-printed JSON string
num(arg)
convert arg to a number (64-bit float). only works on strings
string.length()
return the length
string.upper()
return an uppercase copy of the string
string.lower()
return a lowercase copy of the string
string.split(separator)
split the string into substrings on a separator
object.length()
return the number of keys
object.pluck(k1,
k2, ...) return a shallow copy of the object containing only the given keys
array.length()
return the length
array.push(value)
push value to the end of the array
array.pop()
remove and return the last value in the array
array.popfirst()
remove and return the first value in the array
array.contains(value)
return true if value is in the array
array.sort()
return a sorted copy of the array. if the array contains mixed types, all
values are converted to strings before sorting