-
Notifications
You must be signed in to change notification settings - Fork 102
AI and Documentation
Tier: Intermediate
Commands covered: describegpt, color, pro
Per-command flag reference lives in
/docs/help/. This page is the workflow layer — when to reach for each command and how they compose.
The flagship here is describegpt — a neuro-symbolic data dictionary generator and SQL-RAG chat assistant. "Neuro-symbolic" means the heavy lifting (column types, ranges, cardinalities) is done deterministically by qsv's stats and frequency caches; the LLM only fills in human-friendly labels and descriptions. The result: hallucination-resistant data documentation, often produced against a local LLM (Ollama / Jan / LM Studio).
For the deep-dive, see docs/Describegpt.md and the output gallery (Markdown, JSON, TOON, Spanish, Mandarin, and SQL-RAG examples).
color is the colorized-table cousin of table. pro bridges to the qsv pro desktop app.
| If you want to… | Use | Notes |
|---|---|---|
| Generate a data dictionary for a CSV | describegpt --dictionary |
Outputs deterministic stats + LLM-written labels |
| Generate description + tags + dictionary | describegpt --all |
The full "FAIRify" mode |
| Ask a natural-language question about a CSV | describegpt --prompt "..." |
SQL-RAG sub-mode kicks in when needed |
| Get multilingual descriptions (Spanish, Mandarin, …) | describegpt --lang ... |
LLM-driven; quality varies by model |
| Use a local LLM (Ollama / Jan / LM Studio) | describegpt -u http://localhost:11434/v1 --model ... |
Recommended for sensitive data |
| Pretty colorized table for the terminal | color |
Auto-detects light/dark theme, fits to terminal width |
| Open a file in csvlens via qsv pro | qsv pro lens |
Requires qsv pro running |
| Import a file into qsv pro's Workflow | qsv pro workflow |
Requires qsv pro running |
Calls any OpenAI-compatible LLM endpoint with a configurable MiniJinja-templated prompt. The prompt is fed pre-computed summary statistics and frequency distribution from qsv's stats / frequency caches — that's the "symbolic" half. The LLM generates only the natural-language labels and descriptions — that's the "neuro" half. Result: deterministic, reproducible documentation that doesn't hallucinate column names or invent ranges.
-
Dictionary mode (
--dictionary,--description,--tags,--all) — generates documentation for the whole dataset. -
Chat / RAG mode (
--prompt "...") — answer a natural-language question. If the answer needs more than stats+frequency, qsv enters SQL-RAG sub-mode: it generates a SQL query, runs it against the data (DuckDB ifQSV_DUCKDB_PATHis set, Polars SQL otherwise), and returns the deterministic answer.
qsv describegpt data.csv --api-key "$OPENAI_API_KEY" --all
# Writes data.dictionary.json (or similar) and prints a Markdown summaryqsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
-u http://localhost:11434/v1 \
--model deepseek-r1:14b \
--dictionaryqsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
--prompt "What is the most common complaint type?"The LLM consults the frequency table (already cached) and answers without writing any SQL.
qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
--prompt "What are the top 10 complaint types by community board and borough by year?"Stats alone can't answer this (it needs a group-by × borough × year), so qsv:
- Builds a small random sample as additional LLM context.
- Asks the LLM to write a SQL query that answers the question.
-
Runs the query with DuckDB (if
QSV_DUCKDB_PATHis set) or Polars SQL. - Returns the actual result, not a guess. See
docs/describegpt/nyc311-describegpt-prompt.mdfor a worked example with the resulting CSV.
The Allegheny property sales session shows three rounds of refinement against the dataset. The final query produces a most-expensive-listings CSV — deterministic, reproducible.
qsv describegpt NYC_311.csv --all --lang es > nyc311-describegpt-spanish.md
qsv describegpt NYC_311.csv --all --lang zh > nyc311-describegpt-mandarin.md(See docs/describegpt/ for actual Spanish and Mandarin outputs.)
qsv describegpt NYC_311.csv \
--tags \
--tag-vocab data-tag-vocabulary.csv > nyc311-tags.md--tag-vocab constrains the LLM to choose from a list of approved tags — useful for CKAN harmonization.
describegpt emits Markdown (default), JSON, and TOON (Toon Format — a compact JSON encoding designed for LLM prompts). See docs/describegpt/ for examples of each.
The default prompt templates are in resources/describegpt_defaults.toml. Copy and edit to fit your organization's documentation style — describegpt --prompt-file my-prompts.toml ....
See also: /docs/help/describegpt.md, docs/Describegpt.md, Output gallery, Stats Cache & Caching, SQL & Polars, Claude Cowork Plugin, MCP Server, Cookbook → Stats → Insights.
table with colors. Same elastic-tab alignment, but with color-coded data types (string vs number vs date), terminal-fit truncation, and theme auto-detection (light vs dark). Loads the entire CSV into memory — pair with slice or sample for large files.
The polars feature lets color also display Arrow, Avro, Parquet, JSON arrays, and JSONL.
Example: colorized top-10 cities by population
qsv search --select Country '^us$' wcp.csv \
| qsv sort --select Population --numeric --reverse \
| qsv slice --len 10 \
| qsv colorExample: force colors when piping (or running in CI)
QSV_FORCE_COLOR=1 qsv stats wcp.csv | qsv color | less -RExample: override terminal theme detection
QSV_THEME=DARK qsv color wcp.csvExample: browse a Parquet file (polars feature)
qsv to parquet outdir/ wcp.csv
qsv color outdir/wcp.parquetSee also: /docs/help/color.md, table — uncolored alternative, lens — interactive viewer, Environment Variables — QSV_FORCE_COLOR, QSV_THEME, QSV_TERMWIDTH.
Bridges to the qsv pro API. qsv pro must be running on the same machine. Two subcommands:
-
lens— opens a CSV in csvlens inside an Alacritty window (Windows only). -
workflow— imports a file into qsv pro's Workflow panel.
Example: send a CSV to qsv pro's Workflow
qsv pro workflow data.csvExample: open a CSV in csvlens via qsv pro (Windows)
qsv pro lens data.csvThe Workflow subcommand accepts CSV, TSV, SSV, TAB, XLSX, XLS, XLSB, XLSM, ODS — auto-conversion happens inside qsv pro.
For everything qsv pro offers beyond this command, see qsv pro Spotlight and qsvpro.dathere.com.
See also: /docs/help/pro.md, qsv pro Spotlight, lens — qsv's built-in interactive viewer, Integrations.
- Command Reference (index)
- Claude Cowork Plugin — qsv as 15 skills + 3 agents for Claude Code
- MCP Server — qsv as a Model Context Protocol server
- qsv pro Spotlight — desktop GUI companion
docs/Describegpt.md-
SQL & Polars —
describegptSQL-RAG runs throughsqlp/ DuckDB -
Stats Cache & Caching — what powers
describegpt's symbolic half - Cookbook → Stats → Insights
- External Resources — "Have we achieved ACI?" blog series
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- Visualization (viz)
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation
- Recipes index
- Inspect an Unknown CSV
- Clean & Normalize
- Geographic Enrichment
- Date Enrichment
- CKAN Integration
- JSON Schema Validation
- Build a Data Pipeline
- Stats → Insights
- Fetch & Cache
- Larger-than-RAM CSV
- Diff & Audit
- Multi-table Joins
- Synthesize Fake Data