-
Notifications
You must be signed in to change notification settings - Fork 104
Comparison
Tier: Intermediate
A short, honest comparison of qsv against neighboring tools. The deep numbers live in docs/BENCHMARKS.md and the live dashboard at qsv.dathere.com/benchmarks. All performance claims on this page link to those sources rather than embedding numbers that will go stale.
Note
See also the original xsv 0.13.0 stats compared with qsv 1.0.0 stats wiki page for a side-by-side example of the stats-output expansion.
xsv is the original Rust CSV tool from BurntSushi. qsv is a maintained, multithreaded fork that adds many commands and features. xsv has been on minimal-maintenance status since ~2019.
| xsv 0.13 | qsv 20+ | |
|---|---|---|
| Commands | ~13 | 70+ |
| Multithreaded | No | Many commands (🚀 / 🏎️) |
| Polars-powered SQL | No |
sqlp, joinp, pivotp, scoresql
|
| JSON Schema validation | No |
validate (with custom keywords) |
| Geocoding | No | geocode |
| HTTP fetching | No |
fetch, fetchpost with caching |
| AI / LLM integration | No |
describegpt, MCP server, Cowork plugin |
| Embedded scripting DSL | No | Luau and Python |
| External-* commands | No |
extsort, extdedup
|
| Apache DataSketches | No |
--cardinality-method approx, --quantile-method approx, frequency --sketch-method frequent_items
|
stats output columns |
12 (default), 14 (--everything) |
37 (default), 47 (--everything) — see legacy wiki page
|
| Ongoing Development | Archived | Active |
Migration path: install qsvlite — it's the xsv-compatible subset, with the same flags and command set. Or install full qsv for everything.
csvkit is a Python CSV toolkit (csvstack, csvgrep, csvjoin, csvstat, …) with long history.
Speed: qsv outperforms csvkit by ~10× on typical workloads (compiled Rust + multithreading vs Python). See docs/BENCHMARKS.md for the methodology.
Surface area:
- csvkit has tighter integration with the Python ecosystem (pip-installable, extensible in Python).
- qsv has more commands (geocoding, fetch, validate with custom keywords, describegpt, Polars SQL, …).
- csvkit is one project; qsv is the engine plus an ecosystem (qsv pro, MCP server, Cowork plugin, qsv-recipes, qsv-lookup-tables, DataPusher+).
When to pick which:
- Inside a Python project where you already use pandas — csvkit might fit better.
- For shell pipelines, CI gates, or large files — qsv wins decisively.
- The two can coexist; many users use csvkit's
csvstatthen pipe results into a downstream qsv command.
Miller is C, fast, and shape-agnostic — it handles CSV, TSV, JSON, JSONL, DKVP, PPRINT, NIDX. qsv is CSV-specialized with deeper stats and validation.
Speed: comparable for streaming row ops. qsv pulls ahead on aggregations, joins, and stats due to multithreading.
Where Miller shines:
- DKVP and nested-JSON inputs.
- Compact DSL that does row transformations and filtering in one expression.
- Long-standing maturity.
Where qsv shines:
- 48-metric stats with guaranteed type inference.
- JSON Schema validation at 780k rows/sec.
- Polars-powered SQL and asof joins.
- Integrated AI workflows.
- MCP server / Cowork plugin / qsv pro ecosystem.
Both are excellent. Many shell power-users keep both installed and reach for whichever is faster for the task at hand.
DuckDB is an embedded analytical SQL database. Its read_csv table function is highly optimized.
Different jobs:
- DuckDB excels at multi-CSV SQL analytics with a full query optimizer and OLAP execution engine.
- qsv excels at the pre-DB cleaning, profiling, validation, and enrichment layer.
The recommended pattern: use qsv for cleaning + Parquet conversion, then DuckDB for analytics:
qsv stats --stats-jsonl raw.csv
qsv schema --polars raw.csv
qsv to parquet outdir/ raw.csv
duckdb -c "SELECT * FROM read_parquet('outdir/raw.parquet') WHERE ..."qsv also integrates with DuckDB directly:
-
qsv sqlpruns Polars SQL; the closest spiritual cousin toduckdb -c "..."for CSVs. -
qsv scoresql --duckdbuses DuckDB's planner to score a query before running it. -
qsv describegptwithQSV_DUCKDB_PATHuses DuckDB for SQL-RAG.
pandas is the Python data-analysis workhorse. qsv complements pandas; it doesn't replace it.
- pandas is in-memory, Python-native, and excels at ad-hoc analysis with charts and ML.
- qsv is streaming (mostly), shell-native, and excels at fast profiling, validation, transformations on multi-GB files, and pre-DB cleaning.
For a 2.7M-row CSV, qsv stats runs in well under a second; pd.read_csv(...).describe() typically takes 10+ seconds. For inline charts or train_test_split, pandas / scikit-learn are obviously the right tools.
Use qsv from notebooks via subprocess:
import subprocess
subprocess.run(['qsv', 'stats', '--everything', '--stats-jsonl', 'data.csv'], check=True)
stats_df = pd.read_csv('data.stats.csv')See Integrations → Python notebooks.
Visidata is a terminal UI for tabular data exploration — closer to qsv lens than to the rest of qsv. Both are excellent in their niche.
- Visidata is interactive (TUI). Sift, filter, pivot, sort visually.
- qsv lens is interactive too, via csvlens.
- The rest of qsv is non-interactive — fast batch operations from the shell.
Use Visidata for exploratory analysis; use qsv for the rest of the pipeline.
awk and sed are general-purpose text tools. They don't understand CSV quoting — embedded commas, multi-line quoted fields, and escaped quotes will trip them up.
Use qsv for any CSV operation. Use awk / sed for plain-text logs and configuration files.
| You want to… | Reach for |
|---|---|
| Profile / validate / clean a CSV | qsv |
| Multi-file SQL analytics | DuckDB + qsv-prepared Parquet |
| Notebook-driven exploratory ML | pandas + qsv subprocess for heavy lifting |
| Shape-agnostic stream processing (JSON, DKVP, …) | Miller |
| Interactive TUI exploration | qsv lens or Visidata |
| Drag-and-drop GUI exploration | qsv pro |
| AI-assisted analysis | qsv MCP Server + Claude Cowork Plugin |
| xsv-compatible drop-in | qsvlite |
-
docs/BENCHMARKS.md— canonical reference - qsv.dathere.com/benchmarks — live dashboard
- xsv 0.13.0 stats vs qsv 1.0.0 stats — legacy wiki page
- Why qsv? — the affirmative case
-
Binary Variants —
qsvlitefor xsv migrants - Integrations — pairing qsv with adjacent tools
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation