v1.1.0 - Major Feature Release: 23 New Data Processing Commands
π Major Feature Release
This release adds 23 new data processing commands across three phases, along with major improvements to schema generation, statistics performance, and database ingestion.
β¨ New Commands
Phase 1 - Fundamental Data Processing (7 commands):
count- Count rows with DuckDB optimizationtable- Pretty-print data as aligned tablehead- Extract first N rowstail- Extract last N rowsenum- Add row numbers, UUIDs, or constantsreverse- Reverse row orderfixlengths- Normalize field counts
Phase 2 - Data Cleaning & Transformation (9 commands):
sort- Sort by columns with numeric/descending optionssample- Random sampling (fixed count or percentage)search- Regex-based search and filteringdedup- Remove duplicates with key-field optionsfill- Fill empty/null values with strategiesrename- Rename fields by mapping or regexexplode- Split columns by separatorreplace- String replacement with regex supportcat- Concatenate files by rows or columns
Phase 3 - Advanced Data Processing (7 commands):
join- Relational joins (inner, left, right, full outer)diff- Compare files and show differencesexclude- Remove rows based on keystranspose- Swap rows and columnssniff- Detect file propertiesslice- Extract rows by range or indexfmt- Reformat CSV with formatting options
π Performance Improvements
- Stats Command: 10-100x faster with DuckDB engine for CSV, JSONL, JSON, and Parquet files
- DuckDB Integration: Automatic engine selection for optimal performance
- Batch Operations: Improved performance with
write_bulk()for large datasets
π Schema Improvements
- Format Exports: Support for JSON Schema, Avro, Parquet, and Cerberus formats
- Full Output Support: Text, JSON, and YAML output formats now work correctly
- AI Documentation: Working AI-powered field descriptions with provider selection
- Record Counting: Statistics now include record counts in schema output
ποΈ Database Ingestion
- MySQL Support: Auto-create table, upsert, and batch operations
- SQLite Support: File and in-memory databases with PRAGMA optimizations
- Improved Performance: Better support for PostgreSQL, DuckDB, MongoDB, and Elasticsearch
π Migration & Deprecations
- Iterabledata Migration: All commands now use external
iterabledatalibrary - Resource Management: Improved cleanup with try/finally blocks
- Deprecated: Local
IterableDataandDataWriterclasses (useopen_iterable()instead) - Deprecated:
schemecommand (useschema --format cerberusinstead)
π Bug Fixes
- Fixed resource leaks in statistics, textproc, and ingester commands
- Fixed schema command output format options being ignored
- Fixed schema command AI documentation not working
- Fixed missing record counting in schema output
Full Changelog: v1.0.18...v1.1.0