GitHub - asterisk-labs/cozip

Open a ZIP like a table. Still a ZIP, now queryable.

cozip glues a Parquet manifest onto an ordinary ZIP and drops a tiny fixed index at byte 0 that points to it. Fetch the index, fetch the manifest, query it locally, then range-request just the bytes you actually want. A 20 GB archive becomes a queryable dataset in two reads.

It works because nothing about the ZIP changes. unzip works. zipfile.ZipFile works. Your OS preview pane works. The manifest is just the first entry, and any conforming ZIP reader walks right past it.

Example

import cozip
import pyarrow as pa

table = pa.table({
    "path":  ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    "name":  ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    "split": ["train", "val", "train"],
    "label": ["cloud", "water", "forest"],
})
cozip.write("dataset.zip", table)

manifest = cozip.read("https://example.com/dataset.zip")
train = manifest.filter(pa.compute.equal(manifest["split"], "train"))

path says where each file lives on disk. name is how it shows up inside the archive. Everything else rides along into the manifest and becomes queryable on read. R and Julia have the same API, see their READMEs.

Bindings

Language	Install	Docs
Python	`pip install cozip`	python/
R	`install.packages("cozip", repos = "https://asterisk-labs.r-universe.dev")`	r/
Julia	`Pkg.Registry.add("https://github.com/asterisk-labs/AsteriskRegistry"); Pkg.add("Cozip")`	julia/

Every binding wraps the same C core. A cozip written by R reads byte for byte identically in Julia, in Python, in C.

Spec

See SPEC.md. The format is short and stable. Any conforming reader handles any conforming writer.

License

MIT.

Made with ♥ by

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.githooks		.githooks
.github/workflows		.github/workflows
core		core
docs		docs
images		images
julia		julia
python		python
r		r
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG		CHANGELOG
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SPEC.md		SPEC.md
VERSION		VERSION
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example

Bindings

Spec

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Example

Bindings

Spec

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages