Release v0.0.1 · bodleian/wacksy

As of this point, the WACZ and indexer can output (almost) everything needed from a WARC file to a fully spec-compliant WACZ file.
The last thing missing was the pages.jsonl file, which is now produced when reading through the WARC file as part of the indexer.
I want to avoid reading through the WARC twice to produce two files, so have wrapped everything into one indexer, again there's probably a better way of doing this.

The other happy change in this release is removing code duplication from the WARC reader in case of gzipped and non-gzipped files.
First time I've tried using type generics in Rust, the code is messy, but it works.

Added

(indexer) Use type generics to eliminate code duplication when iterating through records, this finally gets rid of an awkward situation where I was having to maintain two separate iterators .
add pages indexer to wacz writer, with a struct for page records, this is the main thing in this release.

Fixed

add newline to page records, needed for pages.jsonl format, closes #37, nice and easy change
(indexer) skip serialising null fields in page record
(datapackage) pass cdxj_index_bytes through to the datapackage

Other

Lots more little documentation/readme changes and additions. Code refactoring, etc.

(indexer) use core instead of standard libraries for error formatting
add serde features to dependencies, update cargofile
(datapackage) move compose_datapackage into datapackage implementation
(datapackage) DataPackageResource::new now returns a result/error rather than panicking
(indexer) use httparse to parse http status code from response and remove the happily redundant cut_http_headers_from_record function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Fixed

Other

Uh oh!