Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero width negative lookahead #23

Closed
richardhundt opened this issue Apr 30, 2021 · 8 comments
Closed

zero width negative lookahead #23

richardhundt opened this issue Apr 30, 2021 · 8 comments

Comments

@richardhundt
Copy link

richardhundt commented Apr 30, 2021

Could we have a way to do a zero width negative lookahead?

The first thing I tried to do was define a set of reserved words, and then define identifiers as "words but not keywords".

In terms of syntax you could use unary minus, as it binds tightly. Something like:

keyword = lit("def", "class", "in", "out")
ident = -keyword & reg(r'[a-zA-Z_][a-zA-Z0-9_]')

EDIT: I realise that I could do it using just regexes, but then word boundaries get a bit clunky, as I'd need to embed the whitespace handling in the regex, which isn't ideal.

@drhagen
Copy link
Owner

drhagen commented May 1, 2021

I started to write a long post about how I need to implement what Scala parser combinators calls flatMap or into, but then I realized that the predicate parser or pred does exactly what you want.

from parsita import *

keywords = {"def", "class", "in", "out"}

class NonkeywordParsers(TextParsers):
    variable = pred(reg(r'[a-zA-Z_][a-zA-Z0-9_]*'), lambda x: x not in keywords, 'nonkeyword')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()
# parsita.state.ParseError: Expected nonkeyword but found 'class'
# Line 1, character 1

# class
# ^

I probably still need to implement flatMap, what I'll probably call the transformation parser or >=, but it is not needed for this case.

@richardhundt
Copy link
Author

Thanks. I'm trying to generalize it though, so that I can condition on a parser instance. I'd expect the following to work as well, but it quietly succeeds:

from parsita import *

class NonkeywordParsers(TextParsers):
    keyword = lit("class")
    variable = (keyword & failure("Unexpected keyword")) | reg(r'[a-zA-Z_][a-zA-Z0-9_]*')

NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

I find that a little surprising.

@richardhundt
Copy link
Author

richardhundt commented May 1, 2021

What does work is this:

from parsita import *

keywords = {"def", "class", "in", "out"}

class NonkeywordParsers(TextParsers):
    keyword = lit("def", "class", "in", "out")
    variable = pred(reg(r'[a-zA-Z_][a-zA-Z0-9_]*'), lambda x: not isinstance(NonkeywordParsers.keyword.parse(x), Success), 'nonkeyword')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

It's a bit ugly though, referencing the keyword parser statically like that

@drhagen
Copy link
Owner

drhagen commented May 1, 2021

I find that a little surprising.

In this case, (keyword & failure("Unexpected keyword")) fails as you expected, but because it is followed by a |, then it tries the next alternative reg(r'[a-zA-Z_][a-zA-Z0-9_]*'), which succeeds.

@richardhundt
Copy link
Author

richardhundt commented May 1, 2021

so failure doesn't raise an exception?

EDIT: I get it, so failure is just a signal to the parser that the match failed, but not a parse error... hmm... I want to fail hard there

@drhagen
Copy link
Owner

drhagen commented May 1, 2021

so failure doesn't raise an exception?

No, it is a Parser that always parses into a Failure

@drhagen
Copy link
Owner

drhagen commented May 1, 2021

I want to fail hard there

Parsita does not really have a concept of "This is not just a failure, but getting here is a catastrophic failure that should cause all alternatives to be ignored".

I guess you could do this is you wanted to blow up the parser:

from parsita import *

keywords = {"def", "class", "in", "out"}

def abort(message):
    raise ParseError(message)

class NonkeywordParsers(TextParsers):
    keyword = lit("class")
    variable = (keyword > (lambda x: abort('Unexpected keyword'))) | reg(r'[a-zA-Z_][a-zA-Z0-9_]*')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

@richardhundt
Copy link
Author

richardhundt commented May 1, 2021

I see now why you want flatMap... however with the >> you can ignore the useless capture:

from typing import Generic

from parsita import *
from parsita.state import Input, Output, Convert, Reader, Continue, Backtrack


class NegationParser(Generic[Input, Output], Parser[Input, Output]):
    def __init__(self, parser: Parser[Input, Output]):
        super().__init__()
        self.parser = parser

    def consume(self, reader: Reader[Input]):
        status = self.parser.consume(reader)
        if isinstance(status, Backtrack):
            return Continue(reader, None)
        else:
            return Backtrack(reader, lambda: self.name_or_nothing())

    def name_or_nothing(self):
        return 'not ' + self.parser.name_or_nothing()

    def __repr__(self):
        return '-' + self.name_or_nothing()


def neg(parser):
    return NegationParser(parser)

class NonkeywordParsers(TextParsers):
    keyword = lit("def", "class", "in", "out")
    variable = neg(keyword) >> reg(r'[a-zA-Z_][a-zA-Z0-9_]*')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

Alternatively, in the SequentialParser where you append status.value to the output array (line 407), can't you check for None?

EDIT: hmm, None checks won't work, because the sequence wants a list. Nevermind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants