Transform and Reshape

Transform & Reshape

Tier: Intermediate Commands covered: apply, applydp, behead, cat, datefmt, edit, enum, explode, fill, fixlengths, fmt, implode, input, pseudo, rename, replace, reverse, safenames, sort, sortcheck, transpose

Note

Per-command flag reference lives in /docs/help/. This page is the workflow layer — when to reach for each command and how they compose.

These are the pre-analysis commands. Most data pipelines start by cleaning column names, fixing encoding, dropping bad rows, normalizing dates, and sorting — before any stats, joins, or validation.

Quick decision table

If you want to…	Use	Notes
Normalize encoding / skip preamble / strip BOM	`input`	Should be the first command in most pipelines
Make column names DB- or CKAN-safe	`safenames`	Lowercase, snake_case, dedupe, length cap
Rename specific columns	`rename`	Pair-wise or all-at-once
Drop columns the cheap way	`select '!cols'`	See Selection & Inspection → select
Run NLP / dates / currency / sentiment transformations	`apply`	The Swiss-army knife
Slim version of `apply` for DataPusher+	`applydp`	`qsvdp` only
Format dates to ISO 8601	`datefmt`	19 input formats auto-detected
Regex replace across fields	`replace`	Multithreaded with an index
Pseudonymize a column (stable mapping)	`pseudo`	Same input value always maps to the same ID
Forward-fill missing values	`fill`	With `--groupby` to scope to groups
Add row numbers / UUIDs / a constant column	`enum`	Supports uuid4 and uuid7
Split a multi-valued column into rows	`explode`	E.g., `"red\|blue\|green"` → 3 rows
Collapse rows by key, joining a column	`implode`	The inverse of `explode`
Reshape: long ↔ wide	`transpose`	`--long N` flips wide to long
Concatenate CSVs (rows or columns)	`cat`	Supports directory & `.infile-list` input
Repair ragged rows	`fixlengths`	Pads short rows; truncates long ones
Strip the header row	`behead`	For piping headerless data
Change delimiter / line endings / quoting	`fmt`	RFC 4180 by default; supports ASCII RS/US delimited
Edit a single cell at row × column	`edit`	Surgical fix
Sort by one or more columns	`sort`	Lex / natural / numeric / random / unique
Verify a file is already sorted	`sortcheck`	Cheap precondition for `dedup --sorted`
Reverse row order	`reverse`	Constant memory with an index

`input`

The "normalize my CSV before anything else" command. Handles non-UTF-8 encoding, BOM, preamble lines, custom quoting, custom escape characters, and trims whitespace.

Example: a legacy export with 3 preamble lines and ISO-8859-1 encoding

# Step 1: transcode to UTF-8 (input itself is lossy by default)
iconv -f ISO-8859-1 -t UTF-8 legacy.csv -o legacy-utf8.csv

# Step 2: skip the preamble, trim whitespace
qsv input --skip-lines 3 --trim-headers --trim-fields legacy-utf8.csv > clean.csv

Example: a CSV with # comment lines and ragged whitespace

qsv input --comment '#' --trim-fields data.csv > clean.csv

Example: auto-detect and skip a preamble

qsv input --auto-skip exports/2024-q4.csv > q4.csv

--auto-skip sniffs the file for preamble records — useful for inconsistent vendor exports.

See also: /docs/help/input.md, sniff, Troubleshooting → UTF-8.

`safenames`

Make column headers safe for PostgreSQL, CKAN, BigQuery, and most other databases: lowercase, snake_case, deduplicate, cap at 60 bytes. Built for DataPusher+ but generally useful.

Example: clean headers from a NYC Open Data export

qsv safenames "311_Service_Requests.csv" > clean_headers.csv

Header "Complaint Type" becomes complaint_type; "Incident Address" becomes incident_address; "_id" is reserved by CKAN and gets renamed to reserved__id.

Example: audit how many unsafe headers exist (without rewriting)

qsv safenames --mode V exports/messy.csv
# stderr: 6 header/s
# 1 duplicate/s: "c1:2"
# 4 unsafe header/s: ["12_col", "Col with Embedded Spaces", "", "Column!@Invalid+Chars"]

Example: JSON audit for CI/CD gates

qsv safenames --mode J exports/messy.csv | jq '.unsafe_headers | length'

See also: /docs/help/safenames.md, Recipe: CKAN Integration, applydp.

`rename`

Bulk rename columns. Takes a positional argument that is the new comma-separated header list (which replaces all headers), or --pairwise old1,new1,old2,new2 for targeted renames.

Example: rename two specific columns

qsv rename --pairwise 'AccentCity,City,Population,Pop' wcp.csv > renamed.csv

Example: replace all column names with generic _col_1, _col_2, …

qsv rename _all_generic data.csv

Example: add headers to a headerless CSV

qsv rename 'id,timestamp,user,action' --no-headers events.csv

See also: /docs/help/rename.md, safenames, select.

`apply`

The big toolbox: run a chain of operations (trim, upper, lower, len, replace, mtrim, regex_replace, …) against one or more columns. Has NLP submodules for sentiment, profanity, similarity, eudex phonetic hashing, language detection, and name-gender inference. Has a dateadd and datesub for date arithmetic. Has a geocode submodule that complements the dedicated geocode command.

Example: clean a name column — trim, title-case, drop parentheticals

qsv apply operations trim,titlecase 'Full Name' contacts.csv \
  | qsv apply operations mtrim 'Full Name' --comparand '()' > clean_contacts.csv

Example: sentiment analysis on NYC 311 complaint descriptors

qsv apply sentiment 'Descriptor' \
  --new-column sentiment_score \
  NYC_311_SR_2010-2020-sample-1M.csv > with_sentiment.csv

Example: phonetic hash for fuzzy name matching

qsv apply eudex 'Last Name' --new-column phonetic_key voters.csv > voters_keyed.csv

Example: censor profanity in a feedback column

qsv apply operations censor 'Feedback' feedback.csv > clean_feedback.csv

Example: name-gender inference for demographic reporting (with disclaimers — see genai-disclaimer)

qsv apply gender 'First Name' --new-column inferred_gender voters.csv

See also: /docs/help/apply.md, applydp — slimmed-down for CKAN, luau, py.

`applydp`

apply without the NLP/AI features — purpose-built for DataPusher+. Available only in the qsvdp binary variant. Use this when you need DataPusher+-compatible operations (trim, upper, lower, replace, mtrim, cast, calc) in a tiny binary.

Example: typical CKAN cleanup pipeline

qsvdp applydp operations trim,lower 'Email' contacts.csv \
  | qsvdp applydp operations cast 'Amount' --comparand integer > clean.csv

See also: /docs/help/applydp.md, Binary Variants, Recipe: CKAN Integration.

`datefmt`

Format dates. Recognizes 19 input formats automatically — no need to specify the input format.

Example: normalize NYC 311 "Created Date" to ISO 8601

qsv datefmt 'Created Date' NYC_311_SR_2010-2020-sample-1M.csv > iso.csv

Example: extract year-month into a new column for grouping

qsv datefmt 'Created Date' \
  --formatstr '%Y-%m' \
  --new-column year_month \
  NYC_311_SR_2010-2020-sample-1M.csv > with_ym.csv

Example: format all columns ending in _date (case-insensitive)

qsv datefmt '/(?i)_date$/' transactions.csv > transactions_iso.csv

Example: ISO week number for weekly reporting

qsv datefmt 'Created Date' --formatstr '%V' --new-column iso_week NYC_311.csv

See also: /docs/help/datefmt.md, chrono strftime specifiers, Recipe: Date Enrichment.

`replace`

Regex find-and-replace. Multithreaded with an index. Applies to each field individually.

Example: redact emails from a public release

qsv replace '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' '<EMAIL>' raw.csv > public.csv

Example: normalize phone formats

qsv replace '\((\d{3})\)\s*(\d{3})-(\d{4})' '$1-$2-$3' contacts.csv > normalized.csv

Example: case-insensitive cleanup of "N/A" sentinels into empty cells

qsv replace -i '^(N/A|NA|null|none|-)$' '' messy.csv > cleaned.csv

See also: /docs/help/replace.md, search, apply operations regex_replace.

`pseudo`

Deterministic pseudonymization: the same input value always gets the same identifier, even across rows.

Example: replace customer names with stable IDs for analytics export

qsv pseudo 'Customer Name' \
  --start 1000 --increment 1 \
  --formatstr 'CUST-{}' \
  transactions.csv > pseudonymized.csv

Mary → CUST-1000, John → CUST-1001, Mary again → CUST-1000 (same ID).

Example: pseudonymize multiple columns

qsv pseudo 'Patient ID' --formatstr 'PT-{}' raw.csv \
  | qsv pseudo 'Provider' --formatstr 'PRV-{}' > deidentified.csv

For irreversible (one-way) hashing, see blake3.

See also: /docs/help/pseudo.md, enum, blake3, Cookbook → Clean & Normalize.

`fill`

Fill empty values. Forward-fill (default), first-value-fill, backfill, or constant default. Optional --groupby scopes the fill within group boundaries.

Example: forward-fill a sparse "Region" column inside each "Country" group

qsv fill --groupby Country Region wcp.csv > filled.csv

Example: replace all empty Population cells with 0

qsv fill --default 0 Population wcp.csv > zerofilled.csv

Example: backfill missing dates from the first valid value

qsv fill --backfill 'Open Date' events.csv > backfilled.csv

See also: /docs/help/fill.md, apply, Recipe: Clean & Normalize.

`enum`

Add a column of incremental numbers, UUIDs, or a constant value. Also can copy an existing column.

Example: add a UUID v7 (time-ordered) to NYC 311 rows for downstream joins

qsv enum --uuid7 --new-column row_id NYC_311_SR_2010-2020-sample-1M.csv > with_id.csv

Example: tag every row with a constant "import_batch" identifier

qsv enum --constant 'batch-2024-12-01' --new-column batch raw.csv > tagged.csv

Example: duplicate a column for a downstream tool that wants both raw and normalized

qsv enum --copy 'Email' --new-column email_raw \
  contacts.csv \
  | qsv apply operations lower 'Email' > normalized.csv

See also: /docs/help/enum.md, pseudo, blake3.

`explode`

Split a column with multi-valued cells into multiple rows.

Example: a CKAN dataset's "Tags" field is |-delimited

# tags column: "energy|renewable|solar"
qsv explode Tags '|' datasets.csv > one_tag_per_row.csv

After exploding, you can run frequency Tags to see the tag distribution.

Example: explode after splitting on commas (note: needs careful quoting)

qsv explode 'Skills' ',' resumes.csv > one_skill_per_row.csv

See also: /docs/help/explode.md, implode — the inverse, frequency.

`implode`

Inverse of explode: collapse rows sharing a key into a single row, joining a value column with a separator. Drops all other columns by default. Use --sorted if input is already key-sorted to stream with bounded memory.

Example: rebuild a comma-separated tags column from exploded rows

qsv implode -k dataset_id -v Tag ',' exploded.csv > recombined.csv

Example: aggregate after group-wise stats

# Get a per-Country list of cities in wcp
qsv implode --sorted -k Country -v AccentCity '; ' wcp_sorted.csv > cities_by_country.csv

See also: /docs/help/implode.md, explode, sort (often a prerequisite).

`transpose`

Swap rows and columns. Use --multipass for large datasets that don't fit in memory. Use --long N to convert wide-format data into long-format with column-name identifiers.

Example: convert a quarterly pivot (Region × Quarter columns) into long format

qsv transpose --long 'Region' quarterly_pivot.csv > long.csv

Example: rotate a small wcp slice for inspection

qsv slice --len 1 wcp.csv | qsv transpose

See also: /docs/help/transpose.md, pivotp — the Polars-powered alternative.

`cat`

Concatenate CSVs by row or by column. Supports a directory or an .infile-list as input.

Example: concatenate 12 monthly NYC 311 exports

qsv cat rows nyc311-2024-*.csv -o nyc311-2024.csv

Example: same, but tag each row with its source filename

qsv cat rowskey --group fname --group-name source_file \
  nyc311-2024-*.csv -o nyc311-2024-tagged.csv

Example: a side-by-side join via column-wise concatenation (when row order matches)

# leftside.csv has Country,City; rightside.csv has Country,Population
qsv cat columns leftside.csv rightside.csv -o side_by_side.csv

Example: concat every CSV in a directory

qsv cat rows exports/2024/ -o full_year.csv

See also: /docs/help/cat.md, join, sqlp (use SQL UNION ALL if you want filtering+concat in one step).

`fixlengths`

Pad short rows with empty fields or truncate long ones. Use when a CSV is ragged.

Example: a legacy ERP export with inconsistent row widths

qsv fixlengths legacy-erp.csv > fixed.csv

By default it pads up to the widest row. Use --length N to force a specific width.

See also: /docs/help/fixlengths.md, input.

`fmt`

Change output delimiter, line endings, or quoting style. Use to write TSV / SSV / Excel-friendly CSV / ASCII-delimited output.

Example: TSV output

qsv fmt --out-delimiter '\t' data.csv > data.tsv

Example: Excel-friendly CSV (CRLF endings + quote everything)

qsv fmt --crlf --quote-always data.csv > excel-friendly.csv

Example: ASCII RS/US delimited output (for tools that can't handle commas in fields)

qsv fmt --ascii data.csv > data.ascii

See also: /docs/help/fmt.md, Conversion & I/O.

`behead`

Drop the header row. Useful before piping into commands that expect headerless input or before concatenating multiple files that share headers.

Example: combine month-by-month exports, keeping only the first header

qsv headers --just-names exports/2024-01.csv > combined.csv
for f in exports/2024-*.csv; do
  qsv behead "$f" >> combined.csv
done

(Prefer qsv cat rows exports/2024-*.csv — it does the same thing in one step. Use behead when you need more custom control.)

See also: /docs/help/behead.md, cat, headers.

`edit`

Change a single cell at the given row and column. Surgical for those "fix one bad value" moments.

Example: correct a typo in a known cell

# Set row 0 (first data row), "color" column, to "green"
qsv edit items.csv color 0 green > fixed.csv

color can be the column name or 0-based index. If it's a number, it's interpreted as an index.

See also: /docs/help/edit.md, replace, apply.

`sort`

Sort by one or more columns: lexicographically, numerically, naturally (item1, item2, item10), with reverse and unique modes, or randomly.

Example: top-10 cities by population (used in Getting Started)

qsv search --select Country '^us$' wcp.csv \
  | qsv sort --select Population --numeric --reverse \
  | qsv slice --len 10

Example: natural sort on parcel keys like BBL-1-100-12

qsv sort --select BBL --natural parcels.csv > sorted_parcels.csv

Example: random shuffle with a seed (reproducible)

qsv sort --random --seed 42 wcp.csv > shuffled.csv

Example: sort then dedup in a single pipeline

qsv sort --select Email contacts.csv | qsv dedup --select Email --sorted > unique_contacts.csv

For files > RAM, use extsort.

See also: /docs/help/sort.md, sortcheck, extsort.

`sortcheck`

Verify a CSV is sorted (cheap precondition for dedup --sorted and the --sorted flag on implode).

Example: gate dedup --sorted on a precondition

qsv sortcheck --numeric file.csv && qsv dedup --numeric --sorted file.csv

Example: emit JSON sort stats for CI

qsv sortcheck --json --select id file.csv

See also: /docs/help/sortcheck.md, sort, dedup.

`reverse`

Reverse row order. With an index, runs in constant memory. Unlike sort --reverse, this preserves the order of rows with the same key (it just flips the file).

Example: reverse a date-sorted log to put newest events first

qsv index events.csv
qsv reverse events.csv > events_newest_first.csv

See also: /docs/help/reverse.md, sort --reverse, index.

Transform and Reshape

Transform & Reshape

Quick decision table

input

safenames

rename

apply

applydp

datefmt

replace

pseudo

fill

enum

explode

implode

transpose

cat

fixlengths

fmt

behead

edit

sort

sortcheck

reverse

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Get Started

`input`

`safenames`

`rename`

`apply`

`applydp`

`datefmt`

`replace`

`pseudo`

`fill`

`enum`

`explode`

`implode`

`transpose`

`cat`

`fixlengths`

`fmt`

`behead`

`edit`

`sort`

`sortcheck`

`reverse`