notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

tscholak · 2021-02-20T14:13:04Z

Hi,
I've been studying this definition of the notFollowedBy combinator for attoparsec:

Line 456 in 9b86500

notFollowedBy p = optional p >>= maybe (pure ()) (unexpected . show)

I am now fairly convinced that this is a no-op, but I'd like to ask for some feedback for my reasoning from someone who knows the subject matter better than myself. To this end, please consider the following reduction steps:

  optional p >>= maybe (pure ()) (fail . show) -- `unexpected` is `fail`
~ optional p >>= maybe (pure ()) (const empty) -- `fail` should behave as `empty` here
~ Just <$> p <|> pure Nothing >>= maybe (pure ()) (const empty) -- inlining the definition of `optional`
~ (p >>= const empty) <|> pure () -- distributing the alternative
~ (p *> empty) <|> pure () -- using a simple fact about `const` and `>>=`
~ pure () -- if `p` succeeds, we've got `empty <|> pure ()` which reduces to `pure ()`. if `p` fails, we directly get `pure ()`.

which does nothing when used like this: p <* pure (). pure () always succeeds and consumes nothing. so this is equivalent to p.

Can someone please comment on this reasoning?

Thanks :)

The text was updated successfully, but these errors were encountered:

nwf · 2021-02-22T12:26:46Z

An easy way to check your reasoning is simply to perform substitution and see what happens on a test case. In this case, if I, at GHCi...

> :set -XOverloadedStrings
> import Control.Applicative
> import Data.Attoparsec.ByteString
> optional p = Just <$> p <|> pure Nothing

> notFollowedBy p = optional p >>= maybe (pure ()) (fail . show)
> parse (notFollowedBy $ word8 0x41) "A"
Fail "" [] "Failed reading: 65"

> notFollowedBy p = Just <$> p <|> pure Nothing >>= maybe (pure ()) (const empty)
> parse (notFollowedBy $ word8 0x41) "A"
Fail "" [] "Failed reading: empty"

> notFollowedBy p = (p >>= const empty) <|> pure ()
> parse (notFollowedBy $ word8 0x41) "A"
Done "A" ()

We can therefore see that the step you are labeling "distributing the alternative" changes behavior (as does the replacement of fail, but that's less interesting). Pondering this for a while, we see that this is because that rewrite has changed the outermost operator from >>= to <|>. That is, prior to "distribution", the parser commits to whether p has succeeded and then reacts, whereas, after "distribution", the parser is entertaining the possibility that p >>= const empty might backtrack (and it does, regardless of p's outcome).

tscholak · 2021-02-22T13:30:38Z

Hi @nwf, thanks for this great response!

I've followed your steps on my end as well, and I understand that there is no law that would justify my "distribution" rewrite. To the contrary, this substitution is in general wrong, as illustrated by your example.

Still, I'm a bit puzzled, because I've been also working with yoctoparsec, https://hackage.haskell.org/package/yoctoparsec, for which the above definition of notFollowedBy does not work.

Please consider:

λ> import Control.Applicative
λ> import Control.Monad (replicateM)
λ> import Control.Monad.Yoctoparsec
λ> notFollowedBy p = optional p >>= maybe (pure ()) (const empty)
λ> char x = mfilter (== x) token
λ> parseString @[] (replicateM 2 token <* notFollowedBy (char 'b')) "aab"
[("aa","b")]

Here I'm using the [] monad for backtracking. Each element of the returned list is a successful parsing alternative. fst is the parsing result, and snd is the remainder. If notFollowedBy worked as it does for attoparsec, I would have expected the result to be []. For comparison:

λ> import Control.Applicative
λ> import Control.Monad (replicateM)
λ> import Data.Attoparsec.Text
λ> notFollowedBy p = optional p >>= maybe (pure ()) (const empty)
λ> parse (replicateM 2 anyChar <* notFollowedBy (char 'b')) "aab"
λ> Fail "" [] "Failed reading: empty"

It seems that, in yoctoparsec's case, the above definition of notFollowedBy is indeed a no-op.

nwf · 2021-02-22T17:18:45Z

Ah ha. Apparently this ultimately, as noted by https://wiki.haskell.org/MonadPlus, comes down to "The precise set of rules that MonadPlus should obey is not agreed upon.", a truly unfortunate situation. Therein, you will see that [] obeys left distribution, and so yes, your reasoning is correct and this definition of notFollowedBy is a no-op. Attoparsec's Parser does not and so it works.

Given this state of affairs, it's not immediately clear to me that a general notFollowedBy is possible.

tscholak · 2021-02-23T21:50:31Z

Hi @nwf, yes! That's exactly what's going on! Thanks for pointing this out.

I went back to the basics of parsing, backtracking, and logic programming, and I think I've found the right solution for yoctoparsec:

The logict package (https://hackage.haskell.org/package/logict-0.7.1.0) defines a combinator lnot (https://hackage.haskell.org/package/logict-0.7.1.0/docs/Control-Monad-Logic-Class.html#v:lnot) that does exactly what we need here:

If m succeeds with at least one value, lnot m fails. If m fails, then lnot m succeeds with the value ().

In order to use this combinator with yoctoparsec, I needed to define an instance for MonadLogic for FreeT f b, where b is the underlying backtracking monad, e.g. b ~ Logic or []. I've settled on the following definition for now:

instance (Functor f, MonadLogic b) => MonadLogic (FreeT f b) where
  -- msplit :: FreeT f b a -> FreeT f b (Maybe (a, FreeT f b a))
  msplit (FreeT b) = FreeT $ do
    r <- msplit b
    case r of
      Nothing -> pure . Pure $ Nothing
      Just (val, b') ->
        case val of
          Pure a -> pure . Pure $ Just (a, FreeT b')
          Free w -> pure . Free $ fmap msplit w

With this, I can now do:

λ> import Control.Applicative
λ> import Control.Monad (replicateM)
λ> import Control.Monad.Yoctoparsec
λ> import Control.Monad.Logic
λ> notFollowedBy p = lnot p 
λ> char x = mfilter (== x) token
λ> parseString @[] (replicateM 2 token <* notFollowedBy (char 'b')) "aab"
[]
λ> parseString @[] (replicateM 2 token <* notFollowedBy (char 'b')) "aac"
[("aa","")]

I'm quite happy with this. What do you think?

tscholak · 2021-02-24T14:28:03Z

Update: The above implementation of MonadLogic (FreeT f b) is incorrect, because it throws away b' in the Free w case. Here's a better one:

instance (Applicative f, MonadLogic b) => MonadLogic (FreeT f b) where
  -- msplit :: FreeT f b a -> FreeT f b (Maybe (a, FreeT f b a))
  msplit (FreeT b) = FreeT $ go b []
    where
      go b ws = do
        r <- msplit b
        case r of
          Nothing -> pure $ case ws of
            [] -> Pure Nothing
            (w : ws) ->
              let go' fas [] = fas
                  go' fas (w : ws) = go' (liftA2 (:) w fas) ws
               in Free $ fmap (msplit . asum) (go' (fmap pure w) ws)
          Just (val, b') ->
            case val of
              Pure a -> pure . Pure $ Just (a, FreeT b')
              Free w -> go b' (w : ws)

With this:

λ> parseString @[] (lnot $ char 'b' <|> char 'c') "aca"
[((),"ca")]
λ> parseString @[] (lnot $ char 'b' <|> char 'c') "bca"
[]
λ> parseString @[] (lnot $ char 'b' <|> char 'c') "cca"
[]
λ> parseString @[] (lnot $ char 'b' <|> char 'c') "dca"
[((),"ca")]

Edit: I just realize that lnot consumes input. Not sure what can be done about that.

tscholak mentioned this issue Feb 21, 2021

Move notFollowedBy to LookAheadParsing? #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

tscholak commented Feb 20, 2021 •

edited

nwf commented Feb 22, 2021

tscholak commented Feb 22, 2021

nwf commented Feb 22, 2021

tscholak commented Feb 23, 2021 •

edited

tscholak commented Feb 24, 2021 •

edited

notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

Comments

tscholak commented Feb 20, 2021 • edited

nwf commented Feb 22, 2021

tscholak commented Feb 22, 2021

nwf commented Feb 22, 2021

tscholak commented Feb 23, 2021 • edited

tscholak commented Feb 24, 2021 • edited

tscholak commented Feb 20, 2021 •

edited

tscholak commented Feb 23, 2021 •

edited

tscholak commented Feb 24, 2021 •

edited