-
Notifications
You must be signed in to change notification settings - Fork 103
xsv has proven to be an indispensable part of our data integration work. We've used it in several projects, large and small, and it has made data-wrangling much easier. We've used a bevy of tools before, both open-source and commercial (mainly OpenRefine, Trifacta née Data Wrangler, a library of python scripts, even sponsoring an open source project), and there was nothing that approached the speed and convenience of xsv.
Early 2021, there were several pending PRs we were interested in and features we wanted to contribute ourselves.
As it happens, there was a maintainership discussion on GitHub at the time, where @BurntSushi - xsv's original author (of ripgrep fame and prolific Rust contributor), suggested that "if folks want to carry on in a fork, that might be the best path forward. I might request giving the project a different name though, because I do at least intend to at some point breath life back into xsv."
And that's how qsv came to be. Itch scratched.
Q stands for Quick(written in Rust), Queries (with joins and regular expressions!), Querl (sounds like curl), and Quartiles (check out stats!). qsv can handle large Quantities of data (most of its commands do not need to load the entire CSV into memory and can deal with very large files) and Quickly improve data Quality, leading to a Quantum increase in your productivity!
Also, my middle name is Queaño... 😁
We've worked with a lot of jurisdictions with their open data efforts using CKAN, and we've seen how data wrangling is a big problem. We see qsv as an integral part of our data pipelines, that will dramatically lower the barrier to publishing high quality data.
From screening for PII; to slicing data to manageable, logical partitions; to geocoding; to prepping/normalizing data from various IoT vendors; to automatically computing their data dictionaries, qsv can be the "data wrangling duct tape" that will allow them to compose robust data pipelines.
We've also worked with the private sector, and we see the same issues...
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation
- Recipes index
- Inspect an Unknown CSV
- Clean & Normalize
- Geographic Enrichment
- Date Enrichment
- CKAN Integration
- JSON Schema Validation
- Build a Data Pipeline
- Stats → Insights
- Fetch & Cache
- Larger-than-RAM CSV
- Diff & Audit
- Multi-table Joins
- Synthesize Fake Data