Skip to content
This repository has been archived by the owner on Apr 25, 2021. It is now read-only.

Replace warcio #10

Open
PromyLOPh opened this issue Dec 2, 2018 · 2 comments
Open

Replace warcio #10

PromyLOPh opened this issue Dec 2, 2018 · 2 comments

Comments

@PromyLOPh
Copy link
Owner

PromyLOPh commented Dec 2, 2018

The API is not exactly pretty and it’s easy to mess things up. There are no plausibility checks and no validation. We want:

  • A nice/clean API that separates WARC and its payloads. warcio mixes WARC/HTTP
  • Relaxed parsing (read broken files)
  • Strict validation based on specs before writing (writing records violating the specs should not be possible)
  • read(write()) should be identity function (easier testing) (see read(write(record)) != record webrecorder/warcio#57)
@PromyLOPh PromyLOPh added this to the 2.0 milestone Dec 2, 2018
@ikreymer
Copy link

ikreymer commented Dec 3, 2018

Hi, just wanted to say, as the creator of warcio that we'd definitely welcome contributions and improvements to the library. warcio has evolved and was refactored from a larger library (pywb), and some components were added later. It was originally optimized for stream-based reading (and later writing). It is certainly not perfect, and we would definitely welcome suggestions for improvements.
I would say we share many of the goals that you've mentioned. Reading and fixing partially broken files is also a key goal, and there is currently a PR to add more (optional) validation. The library has evolved over the years to meet the specific needs but of course our resources are also limited. If you have suggestions for improvements or would like to submit PRs, we would be happy to listen.

@PromyLOPh
Copy link
Owner Author

You’re right, we share a common goal and I should be reporting stuff like the last thing on my list (which I just did). I’m very glad that I can just use warcio right now and I have (almost) no issues regarding its functionality or reliability. I’m not sure how to approach “fixing” my main issue, the API, though. It seems like it would require major refactoring. And I don’t have the resources to do that or (as the title indicates) even to replace warcio. So, consider the list above a personal wishlist of mine.

@PromyLOPh PromyLOPh removed this from the 2.0 milestone Jul 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants