Release v1.1.0 - Major Feature Release: 23 New Data Processing Commands · datacoon/undatum

🎉 Major Feature Release

This release adds 23 new data processing commands across three phases, along with major improvements to schema generation, statistics performance, and database ingestion.

✨ New Commands

Phase 1 - Fundamental Data Processing (7 commands):

count - Count rows with DuckDB optimization
table - Pretty-print data as aligned table
head - Extract first N rows
tail - Extract last N rows
enum - Add row numbers, UUIDs, or constants
reverse - Reverse row order
fixlengths - Normalize field counts

Phase 2 - Data Cleaning & Transformation (9 commands):

sort - Sort by columns with numeric/descending options
sample - Random sampling (fixed count or percentage)
search - Regex-based search and filtering
dedup - Remove duplicates with key-field options
fill - Fill empty/null values with strategies
rename - Rename fields by mapping or regex
explode - Split columns by separator
replace - String replacement with regex support
cat - Concatenate files by rows or columns

Phase 3 - Advanced Data Processing (7 commands):

join - Relational joins (inner, left, right, full outer)
diff - Compare files and show differences
exclude - Remove rows based on keys
transpose - Swap rows and columns
sniff - Detect file properties
slice - Extract rows by range or index
fmt - Reformat CSV with formatting options

🚀 Performance Improvements

Stats Command: 10-100x faster with DuckDB engine for CSV, JSONL, JSON, and Parquet files
DuckDB Integration: Automatic engine selection for optimal performance
Batch Operations: Improved performance with write_bulk() for large datasets

📋 Schema Improvements

Format Exports: Support for JSON Schema, Avro, Parquet, and Cerberus formats
Full Output Support: Text, JSON, and YAML output formats now work correctly
AI Documentation: Working AI-powered field descriptions with provider selection
Record Counting: Statistics now include record counts in schema output

🗄️ Database Ingestion

MySQL Support: Auto-create table, upsert, and batch operations
SQLite Support: File and in-memory databases with PRAGMA optimizations
Improved Performance: Better support for PostgreSQL, DuckDB, MongoDB, and Elasticsearch

🔄 Migration & Deprecations

Iterabledata Migration: All commands now use external iterabledata library
Resource Management: Improved cleanup with try/finally blocks
Deprecated: Local IterableData and DataWriter classes (use open_iterable() instead)
Deprecated: scheme command (use schema --format cerberus instead)

🐛 Bug Fixes

Fixed resource leaks in statistics, textproc, and ingester commands
Fixed schema command output format options being ignored
Fixed schema command AI documentation not working
Fixed missing record counting in schema output

Full Changelog: v1.0.18...v1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0 - Major Feature Release: 23 New Data Processing Commands

Choose a tag to compare

Sorry, something went wrong.