What follows is a list of text-based file formats with command line tools for manipulating each (with a focus on Linux).
- DSV
- XML, HTML
- JSON
- YAML, TOML
- INI
- Configuration files
- Bonus round: CLIs for single-file databases
- License
- Disclosure
Delimiter-separated values, including CSV, TSV, etc.
Awk is a POSIX-standard command line tool for processing this sort of data.
Name and link | Description |
---|---|
cut |
Select portions of each line of files. Can work with delimiter-separated fields. See man 1 cut on your system (GNU, FreeBSD). |
join |
Join lines of two files on a common field. See man 1 join on your system (GNU, FreeBSD). |
paste |
Combine lines in a text file. See man 1 paste on your system (GNU, FreeBSD). |
sort |
Sort data by key fields. See man 1 sort on your system (GNU, FreeBSD). |
uniq |
Find or remove repeated lines. See man 1 uniq on your system (GNU, FreeBSD). |
Name and link | Description |
---|---|
GNU datamash | Perform statistical operations on text input. |
Miller | sed , awk , cut , join and sort for name-indexed data such as CSV and tabular JSON. |
tab | A non-Turing-complete programming language for data processing. An alternative to Awk. |
xsv | Index, slice, analyze, split and join CSV files. |
Name | Programming language and database engine | Features | Usage link | License |
---|---|---|---|---|
csvkit | Python, SQLite 3 | Use header row for column names, custom input and output encoding, custom input field separator, custom output field separator, custom output formatting, CSV JOINs, Python module. Excel and JSON to CSV. CSV to JSON. SQL queries for CSV. | Usage | MIT |
q | Python, SQLite 3 | Use header row for column names, custom input and output encoding, gzipped input, custom input field separator (string literal), custom output field separator, custom output formatting, table JOINs, Python module. | Usage | GNU GPL 3 |
sqawk | C, SQLite 3 | Use header row for column names, column name aliases, can skip lines until a regexp matches, custom input field separator (string literal, per-file), keep SQLite file, show generated SQL, table JOINs. | Usage | ? |
Sqawk | Tcl, SQLite 3 | Use header row for column names, custom input field separator (regexp, per-file), custom input record delimiter (regexp, per-file), custom table names, custom output field separator, custom output record separator, merge selected columns into one, ASCII/Unicode table output, CSV input and output, JSON output, Tcl output, table JOINs. | Usage | MIT |
Squawk | Python, custom SQL interpreter | Access log and CSV input, JSON and CSV output, Python code generation. | — | Three-clause BSD |
termsql | Python, SQLite 3 | Use header rows for column names, custom field separator (regexp), custom record separator (string literal), lines as columns, skip a given number of lines and the beginning and at the end, merge selected columns into one, HTML, CSV, SQL and Tcl output. | Manual | MIT |
textql | Go, SQLite 3 | Use header rows for column names, keep SQLite file, custom input field separator (string literal). | Usage | MIT |
Name and link | Description |
---|---|
pup | A tool to filter HTML pages using CSS selectors inspired by jq. |
Saxon | Scrape XML and HTML data using XPath. Documentation |
tq | Retrieve content from HTML using CSS selectors. |
xml2 | Convert XML and HTML to and from flat, greppable lists of "path=value" statements. |
XMLStarlet | A set of command line tools to transform, query, validate and edit XML documents. |
See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.
Name and link | Description |
---|---|
jsonaxe | A JSON processor, similar to JQ, with an expressive Python-based DSL |
jo | Create JSON objects from the shell. |
jq | A command line tool that implements a functional DSL for creating and manipulating JSON. It can convert JSON to other formats. |
jshon | Create and manipulate JSON using getopt-style command-line options. |
json | Similar to JQ, written in JS. |
json2 | Convert JSON to and from flat, greppable lists of "path=value" statements. Modeled after xml2. |
json-table | Transform JSON data structures into tables of columns and rows for processing in the shell. |
validjson | Command line tool to validate or pretty-print JSON data. |
Name and link | Description |
---|---|
jq | With a format converter like Remarshal |
validyaml | Command line tool to validate or pretty-print YAML data. |
Name and link | Platform | License | Description |
---|---|---|---|
IniFile (DOS version) | Windows (x86, x86-64), MS-DOS | Closed-source freeware | Can set and remove properties in INI files. Can retrieve properties as a list of batch file set commands to set the corresponding variables. Changes files in place. |
crudini | Any with Python 2.x | GNU GPLv2 | Can set and remove properties in INI files. Can retrieve properties as shell script commands to set the corresponding variables. Can output updated INI data or change files in place. |
initool | Windows, Linux, FreeBSD | MIT | Can set and remove properties in INI files and check for their existence. Outputs updated INI data. |
- Augeas — can extract data from and modify a number of file formats. However, not all format are equally well supported by Augeas and for some formats only a limited subset of all valid files can be parsed.
- Elektra — can manipulate configuration files with similar problems as Augeas for application-specific configuration files (it also uses the same lenses), but with better support for generic formats such as JSON or INI.
Name | Description | File format |
---|---|---|
GNU Recutils | "[A] set of tools and libraries to access human-editable, plain text databases called recfiles." | Text-based, roughly "key: value" |
SDB | "[A] simple string key/value database based on djb's cdb disk storage and supports JSON and arrays introspection." | Binary |
sqlite3(1) | "[A] simple command-line utility [...] that allows the user to manually enter and execute SQL statements against an SQLite database." | Binary |
The contents of this document is licensed under the Creative Commons Attribution 4.0 International License. By contributing you agree to release your contribution under this license.
Sqawk, Remarshal and initool were written by the curator of this document.