Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor warc parsing logic #3

Merged
merged 40 commits into from
Sep 9, 2019
Merged

Vendor warc parsing logic #3

merged 40 commits into from
Sep 9, 2019

Conversation

alecmocatta
Copy link
Member

This should fix cargo publish

sbeckeriv and others added 30 commits July 27, 2015 19:28
Copied from another project i started. Moving just the nom parser out
plethora is not working. I need a dang sample file
I am seeing a more and more on irc but no solution yet
Read test input as bytes instead of strings
Not all bodies are strings. We need to support images. Not working yet
Most likely they shouldnt.
Move some things around. Smaller file still fails
Used wget to download and then parsed it.
Need more test, docs and fix a few issues
* Remove a couple unused imports that Rust 1.8.0 complains about.

* Add an explicit unwrap() call to each call to write!() to silence unused-result warnings in Rust 1.8.0

* Replace feature(test) and the test crate (which error out on the Rust 1.8.0 stable channel) with cfg(tests), which now seems to be the recommended approach.
The current crate does not match the docs. It is returning a hash
instead of a record.

Locked nom to the version i was using.
Clean up for 1.8.0
Set license in cargo
Added change log
Rustfmt
Remove 2 lines. Dont need the file if we have the license.
Remove cfg directive in the test
… record. (#3)

* Add rustfmt backup files and vim swapfiles to gitignore.

* Run rustfmt on test source code.

* Add a test for incomplete files.

* Update the logic for returning IResult::Incomplete to handle partial buffers.  The input slice may fall in the middle of a buffer if, for example, this parser is used to parse chunks of a multi-gig file streamed over the network or from disk.  Also, [the WARC spec](http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf) says that the WARC-Truncated header is for use by the archiver, and even if set, Content-Length should include the actual truncated size rather than its original size.
No harm seems to be taken.
* Streaming parse

Working heavily off of the examples [copied and deleted] I got some code
to compile. I need to now test sed code.

* Consumer has an incomplete loop it never breaks

* Debugging info

Remove record from inner code. Not sure if the shadowing was a loop.

* Move to memory producer

* Broken code

To get help on.

* Refactor a bit

Remove the e0303 problem
I am not excited about the solution.
Test are passing. I need to figure out the state concepts better.

* Work-ish

has lots of debugging statements. Doesnt currently handle incomplete
records

* First cut at streaming

Its not pretty and its not really complete but it is something.
I need to think about a useful api. What would I really want.

* Remove debug

* An attempt at iterators

I am not ashamed to say I dont truely understand how iterators are
implemented in rust. I hope by messing many up I will figure it out.

* A working object called warc streamer

* rustfmt
Simple upgrade to nom 2.0
Remove warnings in tests
sbeckeriv and others added 10 commits November 26, 2016 09:39
Upgrade nom to 2.0
Updates the requirements on [url](https://github.com/servo/rust-url) to permit the latest version.
- [Release notes](https://github.com/servo/rust-url/releases)
- [Commits](servo/rust-url@v1.7.0...v2.1.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
* first build on 4, no test build yet

* Get initial non-producer tests passing

* it once again parses a plethora

* Remove Consumer/Producer dependent portions of API

* Get actual latest version
@alecmocatta alecmocatta merged commit e9777bd into master Sep 9, 2019
@alecmocatta alecmocatta deleted the warc-parser branch September 9, 2019 13:00
alecmocatta added a commit that referenced this pull request Sep 10, 2019
spaces rather than hyphens in wip label
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants