-
Notifications
You must be signed in to change notification settings - Fork 104
Why qsv
Tier: Beginner
You already have awk, pandas, csvkit, miller, xsv, duckdb, Excel, and Polars. Why pick up another CSV tool? Here's the short pitch.
qsv is fast enough that you stop noticing it. A few headline numbers, all measured against real public datasets:
- 48 statistical measures for every column of a 2.7M-row CSV in under a second (benchmark). With an index, faster still.
-
Index a 15 GB / 28M-row NYC 311 dataset in ~14 seconds. After that,
count,sample, andsliceare instantaneous. -
Validate a 1M-row CSV against a JSON Schema 2020-12 spec at up to 780,000 rows/sec — see
docs/Validate.md. -
Geocode 360,000 records per second against a local Geonames mirror — see
docs/help/geocode.md. - Diff two 1M × 9-column CSVs in under 600 ms.
For full benchmarks and reproduction instructions, see docs/BENCHMARKS.md and qsv.dathere.com/benchmarks.
Tip
Why it matters: when your tooling is sub-second, you experiment more, you check assumptions more often, and you ship cleaner data. The first thing most new users say is "wait, that's how fast stats runs?"
qsv follows the Unix philosophy: 70+ single-purpose commands that compose via stdin/stdout. The example from Getting Started chains five commands to find the top 10 US cities by population:
qsv search --select Country '^us$' wcp.csv \
| qsv sort --select Population --numeric --reverse \
| qsv slice --len 10 \
| qsv select 'AccentCity,Region,Population' \
| qsv tableNo SQL parser, no in-memory DataFrame, no schema declaration. Every step streams.
When you need more than streaming row ops, qsv has it built in:
-
SQL on CSV / Parquet / Arrow / JSONL —
sqlpruns Polars SQL (PostgreSQL dialect) and can process files larger than RAM. See SQL & Polars. -
joinpfor asof, non-equi, and outer joins — Polars-powered, multithreaded, larger-than-RAM. See Joins & Set Ops. - Two embedded DSLs — Luau (Lua 0.720, with BEGIN/MAIN/END blocks and lookup tables) and Python (f-string expressions per row). See Scripting (Luau / Python).
-
MiniJinja templating — for report generation (
template) and HTTP POST bodies (fetchpost). See HTTP & Web. -
JSON Schema 2020-12 validation with custom keywords (
currency,dynamicEnum,uniqueCombinedWith). See Validation & Schema. - Geocoding against an updatable local Geonames mirror — no network calls at runtime. See Geospatial.
-
HTTP fetching with HTTP/2 flow control, RFC RateLimit-aware throttling, Redis/disk caching, and
jaqJSON extraction. See HTTP & Web. -
AI-driven data dictionaries —
describegptproduces neuro-symbolic descriptions and SQL RAG sessions against any OpenAI-compatible LLM (including local Ollama / Jan / LM Studio). See AI & Documentation.
You don't have to load anything into a database. Files are the table. But if you want SQL, point sqlp at one or many CSVs (or Parquet, or JSONL) and write a query — Polars handles the rest.
qsv stats doesn't sample-and-guess. For every column it produces:
- Guaranteed data type inference (Null / String / Float / Integer / Date / DateTime / Boolean)
- Sum, mean, stddev, variance, min, max, range, geometric/harmonic mean
- Cardinality, mode/antimode (with weights), sortiness
- Median, quartiles, percentiles, MAD, IQR, skewness, kurtosis
- Plus extended outlier, robust, and bivariate stats via
moarstats(55 more measures)
See docs/STATS_DEFINITIONS.md for the full list and Aggregation & Statistics for usage.
qsv is the engine inside several adjacent projects:
- qsv pro — desktop GUI: drop a spreadsheet, explore interactively. See qsv pro Spotlight.
- qsv MCP server — turns qsv into a tool for any Model Context Protocol client (Claude Desktop, Claude Code, …).
- Claude Cowork plugin — 15 data-science skills + 3 agents for Claude.
-
DataPusher+ — a CKAN extension powered by
qsvdp, the slim variant. - qsv-recipes — community Luau scripts.
- qsv-cookiecutter — project scaffolding.
See Integrations for the full picture.
| Variant | Size | Best for |
|---|---|---|
qsv |
full | Day-to-day workstation use |
qsvlite |
~16 % | xsv migrants, minimal install, low-resource environments |
qsvdp |
~16 % | DataPusher+ / CKAN data pipelines |
qsvmcp |
smaller | MCP server deployments |
See Binary Variants for the full feature matrix.
Short version:
-
qsv vs
xsv— qsv is a fork of xsv with vastly more commands, multithreading, indexing, and active development. If you like xsv, installqsvliteand you have a drop-in. -
qsv vs
csvkit— qsv is roughly 10×–14× faster on real workloads and has more commands. csvkit is Python, easy to extend in Python; qsv is Rust, easy to script in shell. -
qsv vs
miller— overlap is significant; miller is more general (TSV/JSON/DKVP shapes), qsv is more CSV-specialized with deeper stats/validation. Use whichever you reach for first. -
qsv vs DuckDB CSV reader — they complement each other.
qsv to parquet+ DuckDB is a great pipeline. qsv specializes in pre-DB cleaning, profiling, validation, and enrichment. - qsv vs pandas — pandas is in-memory and Python-native. qsv is streaming, shell-native, and faster for big-CSV profiling. The two coexist in many notebooks (see Integrations).
Full comparison: Comparison vs others.
# Run qsv online — no install required:
open https://qsv.dathere.comOr install it locally and follow Getting Started.
- Getting Started — first commands hands-on
- Command Reference — every command
-
docs/PERFORMANCE.md— performance tuning canonical reference -
docs/BENCHMARKS.md— benchmark methodology - qsv.dathere.com/benchmarks — live benchmark dashboard
-
docs/FEATURES.md— feature flag matrix - External Resources — talks, blog series, podcasts
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation