Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow DFAs with ambiguous actions [RFC] #49

Merged
merged 7 commits into from
Nov 2, 2020

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented May 26, 2020

See issue #48 . With this PR, attempting to create a DFA with ambiguous actions will result in an error.

Before:

machine = let
   A = re"XY"
   B = re"XZ"
   A.actions[:enter] = [:entering_A]
   Automa.compile(A | B)
end

Here, the machine, when seeing an X, won't know if it has to execute :entering_A or not. Previously, all possible actions were taken, even mututally exclusive ones, which could result in unexpected and self-contradicting behaviour, like exiting the same regex multiple times, or entering a regex and never exiting it (even at EOF).
With this commit, the above code raises an error:
ERROR: Ambiguous NFA: Byte 0x58 can lead to actions nothing or [:entering_A]

Note that you can still have ambiguous DFAs if the actions are not ambiguous, for example this code will still work:

machine = let
   A = re"XY"
   B = re"XZ"
   A.actions[:exit] = [:exiting_A]
   Automa.compile(A | B)
end

While the machine don't know if it is entering A or B, that doesn't matter because they lead to the same actions. And when exiting A, the machine knows for certain that it's not exiting B.

Edit: This change will cause multiple tests to fail, because several tests include ambiguous DFAs. I think the tests should be changed, but want comments for this.

@jakobnissen
Copy link
Member Author

jakobnissen commented May 26, 2020

Counterexample: This machine is ambiguous, but not caught by my PR:
Edit: Fixed in the latest commit. See below.

machine = let
   a = re"[a-z]+"
   a.actions[:exit] = [:one]
   b = re"[a-z][a-z0-9]+"
   b.actions[:exit] = [:two]
   Automa.compile(a | b)
end

It might be possible to catch these by changing the accumulate_actions function, but I can't spend time on that right now. I'll come back to it some other day

@jakobnissen
Copy link
Member Author

Alright, all edge cases I could think of has now been covered.

The ambiguity check can be skipped by passing unambiguous=false to either Automa.compile (where it's a keyword), or Automa.nfa2dfa (where it's a positional argument). However, hopefully, future developers will take a hard, long look at their parser if Automa yells at them that its actions can't be resolved, rather than just disabling the check.

Conflicting preconditions should resolve ambiguities, but I've not been able to actually construct conflicting preconditions, so I haven't been able to test it in Test19.jl.

This is ready to merge.

@kescobo
Copy link
Member

kescobo commented May 29, 2020

This is awesome, thanks for putting in the work! Sad to lose 100% coverage - maybe @bicycle1885 has ideas for how to test?

@jakobnissen
Copy link
Member Author

I can't access the codecov report @kescobo. If I get access, I'll fix it myself.

@jakobnissen
Copy link
Member Author

jakobnissen commented Sep 10, 2020

Turns out it's possible to create a loop of epsilon edges:

import Automa
const re = Automa.RegExp
Automa.re2nfa(re.rep(re.rep(re.any()) * re.opt(re.parse("A"))))

This leads to an infinite loop, so that needs to be fixed.

@jakobnissen
Copy link
Member Author

Infinite loop fixed, but that exposed a bug (or well, suboptimal behaviour) in the nfa2dfa function: #56

@TransGirlCodes TransGirlCodes merged commit 7c119fb into BioJulia:master Nov 2, 2020
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Mar 26, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Mar 26, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Mar 26, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Mar 26, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Mar 26, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Apr 1, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Apr 1, 2021
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Apr 1, 2021
Defines an unambiguous state machine based on BioJulia/Automa.jl#49.
CiaranOMara added a commit to BioJulia/XAM.jl that referenced this pull request Apr 1, 2021
Defines an unambiguous state machine.
BioJulia/Automa.jl#49.
@jakobnissen jakobnissen deleted the fixamb branch April 30, 2022 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants