Skip to content

AI and Documentation

Joel Natividad edited this page May 13, 2026 · 12 revisions

AI & Documentation

Tier: Intermediate Commands covered: describegpt, color, pro

Per-command flag reference lives in /docs/help/. This page is the workflow layer — when to reach for each command and how they compose.

The flagship here is describegpt — a neuro-symbolic data dictionary generator and SQL-RAG chat assistant. "Neuro-symbolic" means the heavy lifting (column types, ranges, cardinalities) is done deterministically by qsv's stats and frequency caches; the LLM only fills in human-friendly labels and descriptions. The result: hallucination-resistant data documentation, often produced against a local LLM (Ollama / Jan / LM Studio).

For the deep-dive, see docs/Describegpt.md and the output gallery (Markdown, JSON, TOON, Spanish, Mandarin, and SQL-RAG examples).

color is the colorized-table cousin of table. pro bridges to the qsv pro desktop app.

Quick decision table

If you want to… Use Notes
Generate a data dictionary for a CSV describegpt --dictionary Outputs deterministic stats + LLM-written labels
Generate description + tags + dictionary describegpt --all The full "FAIRify" mode
Ask a natural-language question about a CSV describegpt --prompt "..." SQL-RAG sub-mode kicks in when needed
Get multilingual descriptions (Spanish, Mandarin, …) describegpt --lang ... LLM-driven; quality varies by model
Use a local LLM (Ollama / Jan / LM Studio) describegpt -u http://localhost:11434/v1 --model ... Recommended for sensitive data
Pretty colorized table for the terminal color Auto-detects light/dark theme, fits to terminal width
Open a file in csvlens via qsv pro qsv pro lens Requires qsv pro running
Import a file into qsv pro's Workflow qsv pro workflow Requires qsv pro running

describegpt

Calls any OpenAI-compatible LLM endpoint with a configurable MiniJinja-templated prompt. The prompt is fed pre-computed summary statistics and frequency distribution from qsv's stats / frequency caches — that's the "symbolic" half. The LLM generates only the natural-language labels and descriptions — that's the "neuro" half. Result: deterministic, reproducible documentation that doesn't hallucinate column names or invent ranges.

Two modes

  • Dictionary mode (--dictionary, --description, --tags, --all) — generates documentation for the whole dataset.
  • Chat / RAG mode (--prompt "...") — answer a natural-language question. If the answer needs more than stats+frequency, qsv enters SQL-RAG sub-mode: it generates a SQL query, runs it against the data (DuckDB if QSV_DUCKDB_PATH is set, Polars SQL otherwise), and returns the deterministic answer.

Example: full data dictionary against OpenAI's default model

qsv describegpt data.csv --api-key "$OPENAI_API_KEY" --all
# Writes data.dictionary.json (or similar) and prints a Markdown summary

Example: against a local Ollama instance (no API key, no data leaves your machine)

qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
  -u http://localhost:11434/v1 \
  --model deepseek-r1:14b \
  --dictionary

Example: chat — ask a question the stats alone can answer

qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
  --prompt "What is the most common complaint type?"

The LLM consults the frequency table (already cached) and answers without writing any SQL.

Example: chat with SQL-RAG sub-mode

qsv describegpt NYC_311_SR_2010-2020-sample-1M.csv \
  --prompt "What are the top 10 complaint types by community board and borough by year?"

Stats alone can't answer this (it needs a group-by × borough × year), so qsv:

  1. Builds a small random sample as additional LLM context.
  2. Asks the LLM to write a SQL query that answers the question.
  3. Runs the query with DuckDB (if QSV_DUCKDB_PATH is set) or Polars SQL.
  4. Returns the actual result, not a guess. See docs/describegpt/nyc311-describegpt-prompt.md for a worked example with the resulting CSV.

Example: iterative SQL-RAG session refinement

The Allegheny property sales session shows three rounds of refinement against the dataset. The final query produces a most-expensive-listings CSV — deterministic, reproducible.

Example: multilingual output

qsv describegpt NYC_311.csv --all --lang es > nyc311-describegpt-spanish.md
qsv describegpt NYC_311.csv --all --lang zh > nyc311-describegpt-mandarin.md

(See docs/describegpt/ for actual Spanish and Mandarin outputs.)

Example: controlled tag vocabulary (avoid LLM tag drift)

qsv describegpt NYC_311.csv \
  --tags \
  --tag-vocab data-tag-vocabulary.csv > nyc311-tags.md

--tag-vocab constrains the LLM to choose from a list of approved tags — useful for CKAN harmonization.

Output formats

describegpt emits Markdown (default), JSON, and TOON (Toon Format — a compact JSON encoding designed for LLM prompts). See docs/describegpt/ for examples of each.

Configurable prompts

The default prompt templates are in resources/describegpt_defaults.toml. Copy and edit to fit your organization's documentation style — describegpt --prompt-file my-prompts.toml ....

See also: /docs/help/describegpt.md, docs/Describegpt.md, Output gallery, Stats Cache & Caching, SQL & Polars, Claude Cowork Plugin, MCP Server, Cookbook → Stats → Insights.

color

table with colors. Same elastic-tab alignment, but with color-coded data types (string vs number vs date), terminal-fit truncation, and theme auto-detection (light vs dark). Loads the entire CSV into memory — pair with slice or sample for large files.

The polars feature lets color also display Arrow, Avro, Parquet, JSON arrays, and JSONL.

Example: colorized top-10 cities by population

qsv search --select Country '^us$' wcp.csv \
  | qsv sort --select Population --numeric --reverse \
  | qsv slice --len 10 \
  | qsv color

Example: force colors when piping (or running in CI)

QSV_FORCE_COLOR=1 qsv stats wcp.csv | qsv color | less -R

Example: override terminal theme detection

QSV_THEME=DARK qsv color wcp.csv

Example: browse a Parquet file (polars feature)

qsv to parquet outdir/ wcp.csv
qsv color outdir/wcp.parquet

See also: /docs/help/color.md, table — uncolored alternative, lens — interactive viewer, Environment VariablesQSV_FORCE_COLOR, QSV_THEME, QSV_TERMWIDTH.

pro

Bridges to the qsv pro API. qsv pro must be running on the same machine. Two subcommands:

  • lens — opens a CSV in csvlens inside an Alacritty window (Windows only).
  • workflow — imports a file into qsv pro's Workflow panel.

Example: send a CSV to qsv pro's Workflow

qsv pro workflow data.csv

Example: open a CSV in csvlens via qsv pro (Windows)

qsv pro lens data.csv

The Workflow subcommand accepts CSV, TSV, SSV, TAB, XLSX, XLS, XLSB, XLSM, ODS — auto-conversion happens inside qsv pro.

For everything qsv pro offers beyond this command, see qsv pro Spotlight and qsvpro.dathere.com.

See also: /docs/help/pro.md, qsv pro Spotlight, lens — qsv's built-in interactive viewer, Integrations.

See also

Clone this wiki locally