Skip to content
/ wacksy Public

An experimental library for reading and writing WACZ files

License

Notifications You must be signed in to change notification settings

extua/wacksy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What needs to go into a WACZ file, according to the example in the spec:

archive
└── data.warc.gz
datapackage.json
datapackage-digest.json
indexes
└── index.cdx.gz
pages
└── pages.jsonl

Operations chart

Broadly what needs to be done, read the WACZ file, create an index and, a datapackage, in that order and then convert everything to bytes and zip it up.

flowchart
    A@{ shape: lean-r, label: "WARC file"}
    B@{ shape: rect, label: "Create index" }
    C@{ shape: rect, label: "Create datapackage" }
    D@{ shape: rect, label: "Create datapackage digest" }
    E1@{ shape: lean-l, label: "Convert index to bytes" }
    E2@{ shape: lean-l, label: "Convert to bytes" }
    F@{ shape: lean-l, label: "Zip up the files" }
    G@{ shape: lean-r, label: "WACZ file"}
    A --> index
    subgraph index
    B --> E1
    end
    index --> datapackage
    subgraph datapackage
    C --> E2 --> D --> E2
    style index stroke-dasharray: 5 5
    end
    A --> F
    index --> F
    datapackage --> F --> G
Loading

About

An experimental library for reading and writing WACZ files

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages