Skip to content

Position information #16

Closed
hinderer opened this Issue May 22, 2012 · 7 comments

4 participants

@hinderer

Would it be possible for the library to report, for each parsed entity, at which position it was found in the input stream ?

It would be helpful to get positions in terms of lines and columns but perhaps also in terms of offset from the beginning of the input stream.

The report is a followup to a discussion with @snoyberg

@snoyberg

Just to elaborate a bit: the use case for this would be in parsers like xml-conduit and tagstream-conduit. It would be very convenient to be able to give users line/column information about invalid input. Frankly, I'm not familiar enough with attoparsec internals to know if this is possible, but it would be great if we could turn on "debugging" or "verbose" mode and get that information, and for those wanting to keep the highest possible performance, keep it off.

@bos
Owner
bos commented May 22, 2012

It would be possible to add this information, but at the cost of performance. Basically, my stock answer to this question is "if you need line/column numbers or fancy errors, use Parsec".

@bos bos closed this May 22, 2012
@snoyberg

I had one other idea for implementation which wouldn't involve any changes to attoparsec, just wanted to get your input. Suppose in a wrapper package like attoparsec-conduit, we counted how many lines exist in each chunk of data sent to attoparsec.

If we get back a Fail result, it should be a suffix of the most recently passed in chunk of data (can you confirm?). Therefore, we could get the position of the failure- or fairly close to it- by adding the number of lines from the previous chunks to the number of lines in the current chunk before the Fail.

I can test this out myself later, I was just curious if you saw an immediate reason why this wouldn't work.

@sol
sol commented Jun 10, 2012

@snoyberg Maybe somewhat related: I implemented attoparsec's API (Data.Attoparsec.Text only) in terms of Parsec. That way I can write parsers that can be compiled against attoparsec and Parsec. I use this mostly for debugging attoparsec parsers. But maybe you could just re-parse with Parsec on parse errors.

Of course, this requires you to insert try at alternatives that consume input, to make Parsec happy. I didn't have any issues with that. If you have good test coverage, a missing try should be pointed out by failing tests when you run your parser with Parsec.

Code is here: https://github.com/sol/attoparsec-parsec

@snoyberg

Thanks for the pointer, but I don't think it will work in general for our use cases. Besides complexity involved in getting a dual compilation setup going, we'd have to cache the entirety of the input to make this work. In specific cases like reading from a file, we could just repeat the action, but when reading from an HTTP connection, we wouldn't want to make the request twice.

@snoyberg

OK, I was able to add support at the attoparsec-conduit level:

https://github.com/snoyberg/conduit/blob/36a38766be9e9bc32047eb65ff8ae61df1206d27/attoparsec-conduit/Data/Conduit/Attoparsec.hs

I tested this out with xml-conduit, and I am now able to get nice position information on parse errors.

@hinderer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.