Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

Open
tscholak opened this issue Feb 20, 2021 · 5 comments
Open

notFollowedBy is a no-op for attoparsec and yoctoparsec, say #85

tscholak opened this issue Feb 20, 2021 · 5 comments

Comments

@tscholak
Copy link

tscholak commented Feb 20, 2021

Hi,
I've been studying this definition of the notFollowedBy combinator for attoparsec:

notFollowedBy p = optional p >>= maybe (pure ()) (unexpected . show)

I am now fairly convinced that this is a no-op, but I'd like to ask for some feedback for my reasoning from someone who knows the subject matter better than myself. To this end, please consider the following reduction steps:

  optional p >>= maybe (pure ()) (fail . show) -- `unexpected` is `fail`
~ optional p >>= maybe (pure ()) (const empty) -- `fail` should behave as `empty` here
~ Just <$> p <|> pure Nothing >>= maybe (pure ()) (const empty) -- inlining the definition of `optional`
~ (p >>= const empty) <|> pure () -- distributing the alternative
~ (p *> empty) <|> pure () -- using a simple fact about `const` and `>>=`
~ pure () -- if `p` succeeds, we've got `empty <|> pure ()` which reduces to `pure ()`. if `p` fails, we directly get `pure ()`.

which does nothing when used like this: p <* pure (). pure () always succeeds and consumes nothing. so this is equivalent to p.

Can someone please comment on this reasoning?

Thanks :)

@nwf
Copy link
Collaborator

nwf commented Feb 22, 2021

An easy way to check your reasoning is simply to perform substitution and see what happens on a test case. In this case, if I, at GHCi...

> :set -XOverloadedStrings
> import Control.Applicative
> import Data.Attoparsec.ByteString
> optional p = Just <$> p <|> pure Nothing

> notFollowedBy p = optional p >>= maybe (pure ()) (fail . show)
> parse (notFollowedBy $ word8 0x41) "A"
Fail "" [] "Failed reading: 65"

> notFollowedBy p = Just <$> p <|> pure Nothing >>= maybe (pure ()) (const empty)
> parse (notFollowedBy $ word8 0x41) "A"
Fail "" [] "Failed reading: empty"

> notFollowedBy p = (p >>= const empty) <|> pure ()
> parse (notFollowedBy $ word8 0x41) "A"
Done "A" ()

We can therefore see that the step you are labeling "distributing the alternative" changes behavior (as does the replacement of fail, but that's less interesting). Pondering this for a while, we see that this is because that rewrite has changed the outermost operator from >>= to <|>. That is, prior to "distribution", the parser commits to whether p has succeeded and then reacts, whereas, after "distribution", the parser is entertaining the possibility that p >>= const empty might backtrack (and it does, regardless of p's outcome).

@tscholak
Copy link
Author

Hi @nwf, thanks for this great response!

I've followed your steps on my end as well, and I understand that there is no law that would justify my "distribution" rewrite. To the contrary, this substitution is in general wrong, as illustrated by your example.

Still, I'm a bit puzzled, because I've been also working with yoctoparsec, https://hackage.haskell.org/package/yoctoparsec, for which the above definition of notFollowedBy does not work.

Please consider:

λ> import Control.Applicative
λ> import Control.Monad (replicateM)
λ> import Control.Monad.Yoctoparsec
λ> notFollowedBy p = optional p >>= maybe (pure ()) (const empty)
λ> char x = mfilter (== x) token
λ> parseString @[] (replicateM 2 token <* notFollowedBy (char 'b')) "aab"
[("aa","b")]

Here I'm using the [] monad for backtracking. Each element of the returned list is a successful parsing alternative. fst is the parsing result, and snd is the remainder. If notFollowedBy worked as it does for attoparsec, I would have expected the result to be []. For comparison:

λ> import Control.Applicative
λ> import Control.Monad (replicateM)
λ> import Data.Attoparsec.Text
λ> notFollowedBy p = optional p >>= maybe (pure ()) (const empty)
λ> parse (replicateM 2 anyChar <* notFollowedBy (char 'b')) "aab"
λ> Fail "" [] "Failed reading: empty"

It seems that, in yoctoparsec's case, the above definition of notFollowedBy is indeed a no-op.

@nwf
Copy link
Collaborator

nwf commented Feb 22, 2021

Ah ha. Apparently this ultimately, as noted by https://wiki.haskell.org/MonadPlus, comes down to "The precise set of rules that MonadPlus should obey is not agreed upon.", a truly unfortunate situation. Therein, you will see that [] obeys left distribution, and so yes, your reasoning is correct and this definition of notFollowedBy is a no-op. Attoparsec's Parser does not and so it works.

Given this state of affairs, it's not immediately clear to me that a general notFollowedBy is possible.

@tscholak
Copy link
Author

tscholak commented Feb 23, 2021

Hi @nwf, yes! That's exactly what's going on! Thanks for pointing this out.

I went back to the basics of parsing, backtracking, and logic programming, and I think I've found the right solution for yoctoparsec:

The logict package (https://hackage.haskell.org/package/logict-0.7.1.0) defines a combinator lnot (https://hackage.haskell.org/package/logict-0.7.1.0/docs/Control-Monad-Logic-Class.html#v:lnot) that does exactly what we need here:

If m succeeds with at least one value, lnot m fails. If m fails, then lnot m succeeds with the value ().

In order to use this combinator with yoctoparsec, I needed to define an instance for MonadLogic for FreeT f b, where b is the underlying backtracking monad, e.g. b ~ Logic or []. I've settled on the following definition for now:

instance (Functor f, MonadLogic b) => MonadLogic (FreeT f b) where
  -- msplit :: FreeT f b a -> FreeT f b (Maybe (a, FreeT f b a))
  msplit (FreeT b) = FreeT $ do
    r <- msplit b
    case r of
      Nothing -> pure . Pure $ Nothing
      Just (val, b') ->
        case val of
          Pure a -> pure . Pure $ Just (a, FreeT b')
          Free w -> pure . Free $ fmap msplit w

With this, I can now do:

λ> import Control.Applicative
λ> import Control.Monad (replicateM)
λ> import Control.Monad.Yoctoparsec
λ> import Control.Monad.Logic
λ> notFollowedBy p = lnot p 
λ> char x = mfilter (== x) token
λ> parseString @[] (replicateM 2 token <* notFollowedBy (char 'b')) "aab"
[]
λ> parseString @[] (replicateM 2 token <* notFollowedBy (char 'b')) "aac"
[("aa","")]

I'm quite happy with this. What do you think?

@tscholak
Copy link
Author

tscholak commented Feb 24, 2021

Update: The above implementation of MonadLogic (FreeT f b) is incorrect, because it throws away b' in the Free w case. Here's a better one:

instance (Applicative f, MonadLogic b) => MonadLogic (FreeT f b) where
  -- msplit :: FreeT f b a -> FreeT f b (Maybe (a, FreeT f b a))
  msplit (FreeT b) = FreeT $ go b []
    where
      go b ws = do
        r <- msplit b
        case r of
          Nothing -> pure $ case ws of
            [] -> Pure Nothing
            (w : ws) ->
              let go' fas [] = fas
                  go' fas (w : ws) = go' (liftA2 (:) w fas) ws
               in Free $ fmap (msplit . asum) (go' (fmap pure w) ws)
          Just (val, b') ->
            case val of
              Pure a -> pure . Pure $ Just (a, FreeT b')
              Free w -> go b' (w : ws)

With this:

λ> parseString @[] (lnot $ char 'b' <|> char 'c') "aca"
[((),"ca")]
λ> parseString @[] (lnot $ char 'b' <|> char 'c') "bca"
[]
λ> parseString @[] (lnot $ char 'b' <|> char 'c') "cca"
[]
λ> parseString @[] (lnot $ char 'b' <|> char 'c') "dca"
[((),"ca")]

Edit: I just realize that lnot consumes input. Not sure what can be done about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants