Added async implementation of CSV reader #199
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
I was shopping for a crate that had a solid CSV parser and this one popped up. I'm looking to use it in a project that needs to parse large CSV files in WebAssembly.
That poses a problem because the standard
io::Read
andio::Write
imply blocking I/O, which is a no-go on WASM due to its javascript interop. Another constraint is that these are large CSV files. I don't want to have to hold them in memory at any given time, so reading it all to a buffer, then parsing won't work very well. Ideally, I want to be able to read in chunks and emit a stream of records as I go, so I'm really only left with async I/O (AFAIK).I read up on issue #141 and it looks like the main reason it didn't move forward was because the rust async ecosystem was still in flux? I think rust async is going in the direction of stability, and while standard
AsyncRead
andAsyncWrite
traits haven't been stabilized, I wonder if this crate might be able to use patterns from other async libraries to implement an AsyncReader in the meantime.AsyncReader
I've been able to use async-compression in my project, so I went ahead and modeled an AsyncReader off of the
Stream<Item = io::Result<Bytes>>
pattern in that crate and it appears to work.I basically started from the synchronous
Reader
class and tried to change as little as possible about the interface. These new structures and dependencies are gated behind an additive "async" feature so they shouldn't interfere with any downstream crates.You can test it by running the following command:
cargo test --features async
Questions
Alternatives
Closing Thoughts
This is far from a finished impl. I basically wanted to throw something together to get your feedback on if and how this should continue. I haven't changed any of the docs or done any in-depth testing of this feature aside from the simple sanity checks. It's just a proof-of-concept so I can verify that something like this will work for my purposes.
If you do decide to move forward with this idea, I'm willing to help write the docs, examples, and tests for this feature if you want.
Also, there's no rush on this, so just let me know what you think when you get time.