Skip to content

asterisk-labs/cozip

Repository files navigation

cozip

License MIT PyPI R Julia Spec


Open a ZIP like a table. Still a ZIP, now queryable.

cozip glues a Parquet manifest onto an ordinary ZIP and drops a tiny fixed index at byte 0 that points to it. Fetch the index, fetch the manifest, query it locally, then range-request just the bytes you actually want. A 20 GB archive becomes a queryable dataset in two reads.

how cozip works

It works because nothing about the ZIP changes. unzip works. zipfile.ZipFile works. Your OS preview pane works. The manifest is just the first entry, and any conforming ZIP reader walks right past it.

Example

import cozip
import pyarrow as pa

table = pa.table({
    "path":  ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    "name":  ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    "split": ["train", "val", "train"],
    "label": ["cloud", "water", "forest"],
})
cozip.write("dataset.zip", table)

manifest = cozip.read("https://example.com/dataset.zip")
train = manifest.filter(pa.compute.equal(manifest["split"], "train"))

path says where each file lives on disk. name is how it shows up inside the archive. Everything else rides along into the manifest and becomes queryable on read. R and Julia have the same API, see their READMEs.

Bindings

Language Install Docs
Python pip install cozip python/
R install.packages("cozip", repos = "https://asterisk-labs.r-universe.dev") r/
Julia Pkg.Registry.add("https://github.com/asterisk-labs/AsteriskRegistry"); Pkg.add("Cozip") julia/

Every binding wraps the same C core. A cozip written by R reads byte for byte identically in Julia, in Python, in C.

Spec

See SPEC.md. The format is short and stable. Any conforming reader handles any conforming writer.

License

MIT.


Made with ♥ by

Asterisk Labs

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors