-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New parser! #154
New parser! #154
Conversation
Thanks for this Mike! I haven't had a look at the code yet, but these sound like pretty good improvements. Unfortunately, I won't have much time to check it over the next week. I'll try to get back to you. I'm not sure what the modules in |
No rush, it might be good to get a few people to test this with their grammars to make sure this hasn't broken anything. I can't make head or tail of apps/family. The original commit for some of it is here but doesnt really give any details. I can't find anything which says what it might be for. |
Hey @mrob95, sorry I haven't been reviewing this recently. I've noticed the CI build for this issue is failing for Python 2.7 (doc tests), even though GitHub isn't displaying the CI status. I am trying to fix that problem at the moment. I'm pretty sure the Python 3.x builds would also be failing if they included the doc tests. I wouldn't worry about the I should be more active on here in the next week to get some issues/PRs sorted out. |
I'll take a look - the tests seem to be passing on my machine and I haven't been seeing the CI builds until now Edit: Okay, easy fixes. The failing 2.7 tests were (obviously) a Unicode issue - lark always outputs Unicode strings so the assertions need to reflect this. The failing doctest was due to a misunderstanding - I had assumed that sentences would be parsed into a sequence of individual literals each one word, but it seems like they should actually just be one literal. Makes sense. |
Thanks for the fixes. I'll try to get the doctests working with Python 3.x today. I was wondering if the new parser's grammar could be modified to allow quoted strings. This would start to fix #138. |
It already allows e.g. "hello 'world'" -> Literal(u"hello 'world'"), if we needed to produce a different class than that would be easy to do. Do you know what the QuotedLiteral needs to look like? |
While the parser does accept quotes, it doesn't do exactly what we need for Sequence([
Literal(['testing']),
QuotedLiteral(['quoted text'])
]) The class QuotedLiteral(Literal):
def __init__(self, text, name=None, value=None, default=None):
Literal.__init__(self, text, name, value, default)
self._words = [text] # treat all words as one word. I would change the parser myself, but you're already familiar with lark. #138 is low-priority, so never mind if this is too complicated. I can handle the engine tasks after |
Hmm, it is complicated slightly by the fact that we have to deal with words containing apostrophes as well as potentially quoted words, so there is a lot of potential for ambiguity. Would it not be simpler to just parse everything as sequences of individual literals? Seems like this would fix the bug without anything having to be quoted or special cased. Edit, reading that thread again I think I had things upside down. I think the best way to implement this in a clean way is to do it using a different character to |
Using sequences of individual literals should fix the problem actually, unless I'm missing something. That is effectively what Vocola does by quoting grammar words. It would also simplify the changes to the engine compilers. |
I think it may be best for me to do that as a separate PR in any case, it seems like it might need a few rounds of testing and iteration and the commit history for this is already a mess ^^ |
Yep, fair enough. It's low-priority anyway. Speaking of the commit history, did you want to combine/squash some commits by rebasing before merging? BTW I've had a look at Anyway, t4ngo never finished implementing it. I think it should be removed from dragonfly2 entirely. |
Yeah my bad I have been meaning to clean it up. Every time you think you are getting the hang of git it finds a way to kick you in the nuts - some combination of committing from multiple different profiles and rebasing confused it and led to duplicate commits. Regarding apps.family I agree that if it isn't fully implemented and nobody is using it then we should get rid of it. The idea of having dedicated grammar files is potentially a good one, but I think it would be easier to start fresh than try to figure out a half baked implementation. |
No worries, thanks for cleaning that up. Yep, Git can be tricky to use at times. Agreed. I'll remove that in a separate PR for future reference. I'd also like to remove the Windows dialog-related classes under So this is ready for merging now? |
Work on the command family app was started by Christo Butcher, Dragonfly's original author, but was never finished. It is being removed prior to merging the new parser (PR #154), which is incompatible with parser code in family/loader_parser.py.
Yes, I think this is ready to merge. I think it is quite likely that someone will find an edge case which worked with the old parser but isn't handled by this one, but it should be simple to fix those as they come up. |
Okay then, I'll merge this now. Yes, issues like that can be fixed as they come up. Thanks again for putting this together! 👍 |
This has been bugging me all week so I finally took another look at it. Turns out I was using the slow, forgiving lark parser (Earley) rather than the fast, extremely uncooperative parser (LALR).
Headlines:
This is still a work in progress, although I think I have caught at least most of the potential edge cases. The only thing remaining that it doesn't like in caster is
"((meth|method)|(meth|method)s)"
inide_shared.py
. I don't think this should actually be valid syntax, but am willing to be persuaded.As I said, LALR is quite unforgiving and the error messages are not brilliant so modify this at your peril :-). I tried a lot of things which I thought should have worked but didn't.
TODO:
Proper error messagesmore tests?Questions:
apps/family/loader_parser.py
? It appears to rely on the old parser but I have no idea what any of that stuff does