GitHub - extua/wacksy: An experimental library for reading and writing WACZ files

What needs to go into a WACZ file, according to the example in the spec:

archive
└── data.warc.gz
datapackage.json
datapackage-digest.json
indexes
└── index.cdx.gz
pages
└── pages.jsonl

Operations chart

Broadly what needs to be done, read the WACZ file, create an index and, a datapackage, in that order and then convert everything to bytes and zip it up.

flowchart
    A@{ shape: lean-r, label: "WARC file"}
    B@{ shape: rect, label: "Create index" }
    C@{ shape: rect, label: "Create datapackage" }
    D@{ shape: rect, label: "Create datapackage digest" }
    E1@{ shape: lean-l, label: "Convert index to bytes" }
    E2@{ shape: lean-l, label: "Convert to bytes" }
    F@{ shape: lean-l, label: "Zip up the files" }
    G@{ shape: lean-r, label: "WACZ file"}
    A --> index
    subgraph index
    B --> E1
    end
    index --> datapackage
    subgraph datapackage
    C --> E2 --> D --> E2
    style index stroke-dasharray: 5 5
    end
    A --> F
    index --> F
    datapackage --> F --> G

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github		.github
.vscode		.vscode
examples		examples
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Operations chart

About

Releases

Languages

License

extua/wacksy

Folders and files

Latest commit

History

Repository files navigation

Operations chart

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages