aq

jq for Apache Arrow — query and transform Parquet, Arrow IPC, CSV, and NDJSON files using jq-style expressions.

aq [OPTIONS] [EXPR] [FILE]...

Each row in the file is treated as a JSON object. The expression runs on each row, just like jq processes NDJSON.

Install

cargo install --git https://github.com/Anaethelion/aq

Or from a local clone:

cargo install --path .

Examples

# Print all rows as a table
aq data.parquet

# Extract a single field
aq '.first_name' data.parquet

# Filter rows
aq 'select(.salary > 50000)' data.parquet

# Project fields
aq '{name: .first_name, salary}' data.parquet

# Pipe to jq
aq -o ndjson data.parquet | jq '.first_name'

# Inspect schema
aq --schema data.parquet

# Read from stdin
cat data.arrow | aq 'select(.age > 30)'

# Count matching rows (--slurp collects all rows into an array first)
aq --slurp '[.[] | select(.salary > 50000)] | length' data.parquet

# Round-trip via Arrow IPC
aq -o arrow data.parquet | aq 'select(.still_hired)'

Options

Flag	Description
`-f, --format <FORMAT>`	Force input format: `arrow`, `parquet`, `csv`, `json`
`-o, --output <FORMAT>`	Output format: `table`, `ndjson`, `json`, `csv`, `arrow` (default: `table` on TTY, `ndjson` when piped)
`--schema`	Print schema and exit
`-s, --slurp`	Collect all rows into a JSON array before filtering — enables aggregates
`--no-header`	Suppress headers in table/csv output
`-c, --compact`	Compact JSON output

Expression syntax

Expressions use jq syntax. By default, the expression runs once per row (streaming mode).

# Per-row (streaming, default)
aq '.name' data.parquet                          # field access
aq 'select(.age > 30)' data.parquet              # filter
aq '{name, age}' data.parquet                    # projection
aq '.salary * 1.1' data.parquet                  # transform

# Aggregate (requires --slurp / -s)
aq -s 'length' data.parquet                      # row count
aq -s '[.[] | select(.age > 30)] | length' data.parquet
aq -s '[.[].salary] | add / length' data.parquet  # average salary

Supported formats

Format	Extensions	Notes
Parquet	`.parquet`
Arrow IPC	`.arrow`	File and stream formats
CSV	`.csv`	Schema inferred from header
NDJSON	`.json`, `.ndjson`	One JSON object per line

Format is auto-detected from the file extension or magic bytes. Use -f to override.

Limitations

Arrow → jq type mapping

Every Arrow column is converted to JSON before jq sees it:

Arrow type	jq representation
`Int8/16/32/64`, `UInt8/16/32/64`	number
`Float16/32/64`	number
`Decimal128/256`	number or string (precision-dependent)
`Date32/64`, `Timestamp`, `Time*`, `Duration`, `Interval`	integer (epoch in column units)
`Utf8`, `LargeUtf8`	string
`Binary`, `LargeBinary`	base64 string
`Boolean`	boolean
`List<T>`, `LargeList<T>`, `FixedSizeList<T>`	array
`Struct`	object
`Map<K,V>`	array of `{key, value}` objects
`Dictionary<K,V>`	decoded to the dictionary value type

jq → Arrow type mapping (`-o arrow`)

When writing Arrow output, types are re-inferred from jq output values. Only a subset of Arrow types can be produced:

jq output	Arrow type
integer	`Int64`
float	`Float64`
boolean	`Boolean`
string	`Utf8`
uniform integer array	`List<Int64>`
uniform float array	`List<Float64>`
uniform boolean array	`List<Boolean>`
uniform string array	`List<Utf8>`
mixed-type array	`List<Utf8>` (elements serialized)
object	`Utf8` (serialized as JSON string)
all-null column	`Utf8`

Types not expressible in JSON (timestamps, decimals, binary, structs…) cannot be round-tripped through -o arrow — they arrive as integers or strings and leave as Int64 or Utf8.

Precision loss

Large integers: jq uses IEEE 754 float64 internally, so Int64 values beyond ±2^53 (~9 × 10^15) lose precision in expressions. Arithmetic and equality checks on such values may silently give wrong results.
Integer width: Int8/16/32 and UInt8/16/32 widen to Int64 on -o arrow output.
Float width: Float16/32 widen to Float64.
UInt64: Values above 2^53 lose precision when passed through jq.

Column ordering in Arrow output

serde_json stores object keys in alphabetical order. Projection expressions like {name, age} produce {age, name} in the Arrow output — alphabetically sorted, not in expression order.

jaq vs jq compatibility

aq uses jaq rather than jq. Most everyday programs work, but some features are absent or differ:

Not supported: $ENV, env, input/inputs, $__loc__, modulemeta
debug: output format differs from jq
path builtins: path(), getpath, setpath, delpaths may behave differently
?// (alternative operator): semantics may differ from jq
@format strings: @base64, @uri, @csv, @tsv, @html, @json are supported; @base64d requires valid padding
Multiple files: use aq 'expr' a.json b.json instead of jq -n '[inputs]'

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
demo.gif		demo.gif
demo.tape		demo.tape

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aq

Install

Examples

Options

Expression syntax

Supported formats

Limitations

Arrow → jq type mapping

jq → Arrow type mapping (`-o arrow`)

Precision loss

Column ordering in Arrow output

jaq vs jq compatibility

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aq

Install

Examples

Options

Expression syntax

Supported formats

Limitations

Arrow → jq type mapping

jq → Arrow type mapping (-o arrow)

Precision loss

Column ordering in Arrow output

jaq vs jq compatibility

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

jq → Arrow type mapping (`-o arrow`)

Packages