zero width negative lookahead #23

richardhundt · 2021-04-30T17:33:52Z

Could we have a way to do a zero width negative lookahead?

The first thing I tried to do was define a set of reserved words, and then define identifiers as "words but not keywords".

In terms of syntax you could use unary minus, as it binds tightly. Something like:

keyword = lit("def", "class", "in", "out")
ident = -keyword & reg(r'[a-zA-Z_][a-zA-Z0-9_]')

EDIT: I realise that I could do it using just regexes, but then word boundaries get a bit clunky, as I'd need to embed the whitespace handling in the regex, which isn't ideal.

The text was updated successfully, but these errors were encountered:

drhagen · 2021-05-01T10:20:29Z

I started to write a long post about how I need to implement what Scala parser combinators calls flatMap or into, but then I realized that the predicate parser or pred does exactly what you want.

from parsita import *

keywords = {"def", "class", "in", "out"}

class NonkeywordParsers(TextParsers):
    variable = pred(reg(r'[a-zA-Z_][a-zA-Z0-9_]*'), lambda x: x not in keywords, 'nonkeyword')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()
# parsita.state.ParseError: Expected nonkeyword but found 'class'
# Line 1, character 1

# class
# ^

I probably still need to implement flatMap, what I'll probably call the transformation parser or >=, but it is not needed for this case.

richardhundt · 2021-05-01T13:47:55Z

Thanks. I'm trying to generalize it though, so that I can condition on a parser instance. I'd expect the following to work as well, but it quietly succeeds:

from parsita import *

class NonkeywordParsers(TextParsers):
    keyword = lit("class")
    variable = (keyword & failure("Unexpected keyword")) | reg(r'[a-zA-Z_][a-zA-Z0-9_]*')

NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

I find that a little surprising.

richardhundt · 2021-05-01T13:52:48Z

What does work is this:

from parsita import *

keywords = {"def", "class", "in", "out"}

class NonkeywordParsers(TextParsers):
    keyword = lit("def", "class", "in", "out")
    variable = pred(reg(r'[a-zA-Z_][a-zA-Z0-9_]*'), lambda x: not isinstance(NonkeywordParsers.keyword.parse(x), Success), 'nonkeyword')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

It's a bit ugly though, referencing the keyword parser statically like that

drhagen · 2021-05-01T13:53:16Z

I find that a little surprising.

In this case, (keyword & failure("Unexpected keyword")) fails as you expected, but because it is followed by a |, then it tries the next alternative reg(r'[a-zA-Z_][a-zA-Z0-9_]*'), which succeeds.

richardhundt · 2021-05-01T13:53:47Z

so failure doesn't raise an exception?

EDIT: I get it, so failure is just a signal to the parser that the match failed, but not a parse error... hmm... I want to fail hard there

drhagen · 2021-05-01T13:54:40Z

so failure doesn't raise an exception?

No, it is a Parser that always parses into a Failure

drhagen · 2021-05-01T14:06:22Z

I want to fail hard there

Parsita does not really have a concept of "This is not just a failure, but getting here is a catastrophic failure that should cause all alternatives to be ignored".

I guess you could do this is you wanted to blow up the parser:

from parsita import *

keywords = {"def", "class", "in", "out"}

def abort(message):
    raise ParseError(message)

class NonkeywordParsers(TextParsers):
    keyword = lit("class")
    variable = (keyword > (lambda x: abort('Unexpected keyword'))) | reg(r'[a-zA-Z_][a-zA-Z0-9_]*')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

richardhundt · 2021-05-01T14:59:39Z

I see now why you want flatMap... however with the >> you can ignore the useless capture:

from typing import Generic

from parsita import *
from parsita.state import Input, Output, Convert, Reader, Continue, Backtrack


class NegationParser(Generic[Input, Output], Parser[Input, Output]):
    def __init__(self, parser: Parser[Input, Output]):
        super().__init__()
        self.parser = parser

    def consume(self, reader: Reader[Input]):
        status = self.parser.consume(reader)
        if isinstance(status, Backtrack):
            return Continue(reader, None)
        else:
            return Backtrack(reader, lambda: self.name_or_nothing())

    def name_or_nothing(self):
        return 'not ' + self.parser.name_or_nothing()

    def __repr__(self):
        return '-' + self.name_or_nothing()


def neg(parser):
    return NegationParser(parser)

class NonkeywordParsers(TextParsers):
    keyword = lit("def", "class", "in", "out")
    variable = neg(keyword) >> reg(r'[a-zA-Z_][a-zA-Z0-9_]*')

assert NonkeywordParsers.variable.parse('foo') == Success('foo')
NonkeywordParsers.variable.parse('class').or_die()

Alternatively, in the SequentialParser where you append status.value to the output array (line 407), can't you check for None?

EDIT: hmm, None checks won't work, because the sequence wants a list. Nevermind.

drhagen mentioned this issue May 1, 2021

Add falible conversion parser #24

Closed

richardhundt closed this as completed May 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zero width negative lookahead #23

zero width negative lookahead #23

richardhundt commented Apr 30, 2021 •

edited

Loading

drhagen commented May 1, 2021 •

edited

Loading

richardhundt commented May 1, 2021

richardhundt commented May 1, 2021 •

edited

Loading

drhagen commented May 1, 2021

richardhundt commented May 1, 2021 •

edited

Loading

drhagen commented May 1, 2021

drhagen commented May 1, 2021

richardhundt commented May 1, 2021 •

edited

Loading

zero width negative lookahead #23

zero width negative lookahead #23

Comments

richardhundt commented Apr 30, 2021 • edited Loading

drhagen commented May 1, 2021 • edited Loading

richardhundt commented May 1, 2021

richardhundt commented May 1, 2021 • edited Loading

drhagen commented May 1, 2021

richardhundt commented May 1, 2021 • edited Loading

drhagen commented May 1, 2021

drhagen commented May 1, 2021

richardhundt commented May 1, 2021 • edited Loading

richardhundt commented Apr 30, 2021 •

edited

Loading

drhagen commented May 1, 2021 •

edited

Loading

richardhundt commented May 1, 2021 •

edited

Loading

richardhundt commented May 1, 2021 •

edited

Loading

richardhundt commented May 1, 2021 •

edited

Loading