-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resumable Parsing #48
Comments
Does this work? IEnumerable<Token> ParseStream(TextReader stream)
{
var result = this.Parser.Parse(stream);
while (result.Success)
{
yield return result.Value;
result = this.Parser.Parse(stream);
}
} After the call to |
Yep, that works. Thank you! |
Sorry for reopening this. I'm still new to this library and I'm trying to use it for a class assignment. This outputs 6: private static void Main(string[] args)
{
var parser = Parser.Digit;
var input = "121528";
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(input)))
using (var reader = new StreamReader(stream))
{
parser.Parse(reader);
Console.WriteLine(stream.Position);
}
} Shouldn't this output Edit: for reference, I'm reading input from a file for the assignment. |
Ah, I think you're right. Because data is pulled from the stream in chunks, the parser potentially (usually) reads ahead of where it actually ends up. I guess the fix would be to rewind the stream (after checking |
Would it be preferable to return the total count of tokens consumed Edit: On second thoughts you can already pull the |
I think returning an IEnumerable<T> GetUnconsumedTokens(TToken[] buffer, int startIndex) {
foreach (int i = startIndex; i < buffer.Length; i++) {
// You might want to remove the item at some point for memory purposes too, this is just an example
yield return buffer[i];
}
// read from input here, this depends on the input type of course
} This could be a method on |
Ah yeah I quite like that idea. "Here are the tokens which I pulled from the input but didn't actually eat", ie, the remainder of the |
It might be more convenient to have |
I've been messing with this a bit and I agree that Something that comes to mind is that the |
I've been kicking around the idea of replacing When you get to (eg) render an error message, you can concretise the Thoughts? |
Code here: 4141044 |
Regarding leftover tokens, I'm thinking it makes sense to make it part of the protocol between the parser and the |
* drop support for everything below netstandard21 * add MaybeNullWhen and fix other nullability warns * make Real into a property. Fixes #78 * change ITokenStream to use span * replace PosCalculator stuff with IConfiguration * configurable array pool * remove unneeded refs * fast, vectorised version of ComputeSourcePos in the happy path * didnt need this after all * add test for cached _lastSourcePos * remove comment * fix method name * fix bug in ComputeSourcePos * remove vectorised version, it was not working properly * Change ExpectedCollector to a class, use an object pool, change TryParse's argument to an ICollection * ObjectPool is marginally faster when you use the base class instead of the interface * Revert "ObjectPool is marginally faster when you use the base class instead of the interface" This reverts commit 323afbb. Revert "Change ExpectedCollector to a class, use an object pool, change TryParse's argument to an ICollection" This reverts commit 64d23fd. * replace ExpectedCollector with PooledList. also implement IList and IDisposable, and add tests * PooledList rent from ArrayPool lazily * fix test * change Tokens to an ImmutableArray * Replace SourcePos with SourcePosDelta. This allows users to work with positions within partially-already-consumed files * add ResumableTokenStream. fixes #48 * add test * rename OnParserEnd * drop support for netstandard2.1. 5.0 all the way baybayy * Fix warnings * publish token streams * Publish custom parser APIs * fix test * package updates * add test for RepeatString * unpublish InternalError. for now we'll let parsers set error info but not read it. let's see how that goes * unpublish Dispose * xmldoc * New version of SkipWhitespaes which performs much better on long sequences of space characters (quite common in source code) * oops, was meant to pin this
I would love to be able to resume parsing once I finish parsing something. For example, it would be nice to do something like this:
This would allow us to parse from an input stream as needed, without needing to store all the output tokens in memory. For example, we could process a 2GB+ file and write the output into another file without using very much memory.
Sprache has an equivalent to this, but does not support parsing from a stream:
One possible approach is to keep a reference to the
ParseState<T>
inside the object thatParse
returns.The text was updated successfully, but these errors were encountered: