AI and Documentation

AI & Documentation

Tier: Intermediate Commands covered: describegpt, color, pro

Per-command flag reference lives in /docs/help/. This page is the workflow layer — when to reach for each command and how they compose.

The flagship here is describegpt — a neuro-symbolic data dictionary generator and SQL-RAG chat assistant. "Neuro-symbolic" means the heavy lifting (column types, ranges, cardinalities) is done deterministically by qsv's stats and frequency caches; the LLM only fills in human-friendly labels and descriptions. The result: hallucination-resistant data documentation, often produced against a local LLM (Ollama / Jan / LM Studio).

For the deep-dive, see docs/Describegpt.md and the output gallery (Markdown, JSON, TOON, Spanish, Mandarin, and SQL-RAG examples).

color is the colorized-table cousin of table. pro bridges to the qsv pro desktop app.

Quick decision table

If you want to…	Use	Notes
Generate a data dictionary for a CSV	`describegpt --dictionary`	Outputs deterministic stats + LLM-written labels
Generate description + tags + dictionary	`describegpt --all`	The full "FAIRify" mode
Ask a natural-language question about a CSV	`describegpt --prompt "..."`	SQL-RAG sub-mode kicks in when needed
Get multilingual descriptions (Spanish, Mandarin, …)	`describegpt --lang ...`	LLM-driven; quality varies by model
Use a local LLM (Ollama / Jan / LM Studio)	`describegpt -u http://localhost:11434/v1 --model ...`	Recommended for sensitive data
Pretty colorized table for the terminal	`color`	Auto-detects light/dark theme, fits to terminal width
Open a file in csvlens via qsv pro	`qsv pro lens`	Requires qsv pro running
Import a file into qsv pro's Workflow	`qsv pro workflow`	Requires qsv pro running

`describegpt`

Calls any OpenAI-compatible LLM endpoint with a configurable MiniJinja-templated prompt. The prompt is fed pre-computed summary statistics and frequency distribution from qsv's stats / frequency caches — that's the "symbolic" half. The LLM generates only the natural-language labels and descriptions — that's the "neuro" half. Result: deterministic, reproducible documentation that doesn't hallucinate column names or invent ranges.

Two modes

Dictionary mode (--dictionary, --description, --tags, --all) — generates documentation for the whole dataset.
Chat / RAG mode (--prompt "...") — answer a natural-language question. If the answer needs more than stats+frequency, qsv enters SQL-RAG sub-mode: it generates a SQL query, runs it against the data (DuckDB if QSV_DUCKDB_PATH is set, Polars SQL otherwise), and returns the deterministic answer.

Example: full data dictionary against OpenAI's default model

qsv describegpt data.csv --api-key "$OPENAI_API_KEY" --all
# Writes data.dictionary.json (or similar) and prints a Markdown summary

Example: against a local Ollama instance (no API key, no data leaves your machine)

qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
  -u http://localhost:11434/v1 \
  --model deepseek-r1:14b \
  --dictionary

Example: chat — ask a question the stats alone can answer

qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
  --prompt "What is the most common complaint type?"

The LLM consults the frequency table (already cached) and answers without writing any SQL.

Example: chat with SQL-RAG sub-mode

qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
  --prompt "What are the top 10 complaint types by community board and borough by year?"

Stats alone can't answer this (it needs a group-by × borough × year), so qsv:

Builds a small random sample as additional LLM context.
Asks the LLM to write a SQL query that answers the question.
Runs the query with DuckDB (if QSV_DUCKDB_PATH is set) or Polars SQL.
Returns the actual result, not a guess. See docs/describegpt/nyc311-describegpt-prompt.md for a worked example with the resulting CSV.

Example: iterative SQL-RAG session refinement

The Allegheny property sales session shows three rounds of refinement against the dataset. The final query produces a most-expensive-listings CSV — deterministic, reproducible.

Example: multilingual output

qsv describegpt NYC_311.csv --all --lang es > nyc311-describegpt-spanish.md
qsv describegpt NYC_311.csv --all --lang zh > nyc311-describegpt-mandarin.md

(See docs/describegpt/ for actual Spanish and Mandarin outputs.)

Example: controlled tag vocabulary (avoid LLM tag drift)

qsv describegpt NYC_311.csv \
  --tags \
  --tag-vocab data-tag-vocabulary.csv > nyc311-tags.md

--tag-vocab constrains the LLM to choose from a list of approved tags — useful for CKAN harmonization.

Output formats

describegpt emits Markdown (default), JSON, and TOON (Toon Format — a compact JSON encoding designed for LLM prompts). See docs/describegpt/ for examples of each.

Configurable prompts

The default prompt templates are in resources/describegpt_defaults.toml. Copy and edit to fit your organization's documentation style — describegpt --prompt-file my-prompts.toml ....

See also: /docs/help/describegpt.md, docs/Describegpt.md, Output gallery, Stats Cache & Caching, SQL & Polars, Claude Cowork Plugin, MCP Server, Cookbook → Stats → Insights.

`color`

table with colors. Same elastic-tab alignment, but with color-coded data types (string vs number vs date), terminal-fit truncation, and theme auto-detection (light vs dark). Loads the entire CSV into memory — pair with slice or sample for large files.

The polars feature lets color also display Arrow, Avro, Parquet, JSON arrays, and JSONL.

Example: colorized top-10 cities by population

qsv search --select Country '^us$' wcp.csv \
  | qsv sort --select Population --numeric --reverse \
  | qsv slice --len 10 \
  | qsv color

Example: force colors when piping (or running in CI)

QSV_FORCE_COLOR=1 qsv stats wcp.csv | qsv color | less -R

Example: override terminal theme detection

QSV_THEME=DARK qsv color wcp.csv

Example: browse a Parquet file (polars feature)

qsv to parquet outdir/ wcp.csv
qsv color outdir/wcp.parquet

See also: /docs/help/color.md, table — uncolored alternative, lens — interactive viewer, Environment Variables — QSV_FORCE_COLOR, QSV_THEME, QSV_TERMWIDTH.

`pro`

Bridges to the qsv pro API. qsv pro must be running on the same machine. Two subcommands:

lens — opens a CSV in csvlens inside an Alacritty window (Windows only).
workflow — imports a file into qsv pro's Workflow panel.

Example: send a CSV to qsv pro's Workflow

qsv pro workflow data.csv

Example: open a CSV in csvlens via qsv pro (Windows)

qsv pro lens data.csv

The Workflow subcommand accepts CSV, TSV, SSV, TAB, XLSX, XLS, XLSB, XLSM, ODS — auto-conversion happens inside qsv pro.

For everything qsv pro offers beyond this command, see qsv pro Spotlight and qsvpro.dathere.com.

See also: /docs/help/pro.md, qsv pro Spotlight, lens — qsv's built-in interactive viewer, Integrations.

AI and Documentation

AI & Documentation

Quick decision table

describegpt

Two modes

Example: full data dictionary against OpenAI's default model

Example: against a local Ollama instance (no API key, no data leaves your machine)

Example: chat — ask a question the stats alone can answer

Example: chat with SQL-RAG sub-mode

Example: iterative SQL-RAG session refinement

Example: multilingual output

Example: controlled tag vocabulary (avoid LLM tag drift)

Output formats

Configurable prompts

color

pro

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Get Started

Command Reference

Cookbook

Tuning & Internals

Ecosystem

Reference

`describegpt`

`color`

`pro`