Would it be possible for the library to report, for each parsed entity, at which position it was found in the input stream ?
It would be helpful to get positions in terms of lines and columns but perhaps also in terms of offset from the beginning of the input stream.
The report is a followup to a discussion with @snoyberg
Just to elaborate a bit: the use case for this would be in parsers like xml-conduit and tagstream-conduit. It would be very convenient to be able to give users line/column information about invalid input. Frankly, I'm not familiar enough with attoparsec internals to know if this is possible, but it would be great if we could turn on "debugging" or "verbose" mode and get that information, and for those wanting to keep the highest possible performance, keep it off.
It would be possible to add this information, but at the cost of performance. Basically, my stock answer to this question is "if you need line/column numbers or fancy errors, use Parsec".
I had one other idea for implementation which wouldn't involve any changes to attoparsec, just wanted to get your input. Suppose in a wrapper package like attoparsec-conduit, we counted how many lines exist in each chunk of data sent to attoparsec.
If we get back a Fail result, it should be a suffix of the most recently passed in chunk of data (can you confirm?). Therefore, we could get the position of the failure- or fairly close to it- by adding the number of lines from the previous chunks to the number of lines in the current chunk before the Fail.
I can test this out myself later, I was just curious if you saw an immediate reason why this wouldn't work.
@snoyberg Maybe somewhat related: I implemented attoparsec's API (Data.Attoparsec.Text only) in terms of Parsec. That way I can write parsers that can be compiled against attoparsec and Parsec. I use this mostly for debugging attoparsec parsers. But maybe you could just re-parse with Parsec on parse errors.
Of course, this requires you to insert try at alternatives that consume input, to make Parsec happy. I didn't have any issues with that. If you have good test coverage, a missing try should be pointed out by failing tests when you run your parser with Parsec.
Code is here: https://github.com/sol/attoparsec-parsec
Thanks for the pointer, but I don't think it will work in general for our use cases. Besides complexity involved in getting a dual compilation setup going, we'd have to cache the entirety of the input to make this work. In specific cases like reading from a file, we could just repeat the action, but when reading from an HTTP connection, we wouldn't want to make the request twice.
OK, I was able to add support at the attoparsec-conduit level:
I tested this out with xml-conduit, and I am now able to get nice position information on parse errors.