Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the compiler produce minimal DFAs #91

Open
V0ldek opened this issue Jan 28, 2023 · 3 comments
Open

Make the compiler produce minimal DFAs #91

V0ldek opened this issue Jan 28, 2023 · 3 comments
Labels
acceptance: go ahead Reviewed, implementation can start area: performance Performance improvements help wanted External contributions welcome type: feature New feature or request
Milestone

Comments

@V0ldek
Copy link
Member

V0ldek commented Jan 28, 2023

Is your feature request related to a problem? Please describe.
After #7 the compiler no longer necessarily outputs a minimal DFA. It's still a correct DFA, just not minimal.

Describe the solution you'd like
The algorithm in query::automaton::minimizer needs to be rewritten to minimize the automaton.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Example query that produces a non-minimal automaton: $..a.*.*..b.*.*. In the resulting automaton below states 3 and 4 are equivalent.

garph

@V0ldek V0ldek added type: feature New feature or request help wanted External contributions welcome area: performance Performance improvements mod: compiler labels Jan 28, 2023
@github-actions github-actions bot added the acceptance: triage Waiting for owner's input label Jan 28, 2023
@github-actions
Copy link

Tagging @V0ldek for notifications

@V0ldek V0ldek added this to the Future milestone Jan 28, 2023
@github-actions github-actions bot added acceptance: go ahead Reviewed, implementation can start and removed acceptance: triage Waiting for owner's input labels Jan 28, 2023
@ZwerdDPU
Copy link
Contributor

ZwerdDPU commented Apr 8, 2023

Looking into this issue. So far:

  • I have created a unit test with this subject in mind, though it appears some of the other tests should regress once this change is made. Available for review at https://github.com/ZwerdDPU/rsonpath/tree/minimizer: rsonpath-lib::query::automaton::minimizer::tests::minimization_deduplicates_trivial_test.
    That test is intentionally failing for now.

  • This is the tree the validation will check which includes the notated minimzation of combining states 3&4, with corresponding node ids offset, for the purpose of making validating the result easier:
    #91 postminimization

It seems to me that the minimzation will only take place within checkpoint-created boundaries. Does this seem correct?

I intend to look further into this issue in the coming weeks.
Let me know what you think.

@V0ldek
Copy link
Member Author

V0ldek commented Apr 9, 2023

Current checkpoint-based code is basically a hack to produce "somewhat-minimal" automata. In particular it can be proven that without wildcards the checkpoints are enough for the result to be truly minimal (Lemma 5.1. in https://v0ldek.com/masters).

The property is that once you reach a checkpoint you can forget all the other states, which means that a checkpoint will always be a unique state in any resulting DFA. So minimizing between them should be enough.

BTW, you can make your life testing this easier with https://paperman.name/semigroup/. It's a tool for automata properties, and allows you to minimize them. If you run rsonpath -c -v <query> it will actually print the DFA in a format that can be directly accepted by the tool.

For example:

just r -c '$..a.*.*..b.*.*' -v

will output as one of the logs:

[rsonpath_lib::query::automaton] NFA: s0.a -> s0, s1;
s0.b -> s0;
s0.X -> s0;
s1.a -> s2;
s1.b -> s2;
s1.X -> s2;
s2.a -> s3;
s2.b -> s3;
s2.X -> s3;
s3.b -> s3, s4;
s3.a -> s3;
s3.X -> s3;
s4.a -> s5;
s4.b -> s5;
s4.X -> s5;
s5.a -> s6;
s5.b -> s6;
s5.X -> s6;

You can copy-paste it into semigroup to get this nice graph:

image

(By convention we mark the wildcard as capital X, since the tool only accepts letters)

So if you come up with an interesting case to test you can easily find the minimal automaton to verify the result.

@V0ldek V0ldek pinned this issue Apr 9, 2023
ZwerdDPU pushed a commit to ZwerdDPU/rsonpath that referenced this issue Apr 10, 2023
@V0ldek V0ldek unpinned this issue Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acceptance: go ahead Reviewed, implementation can start area: performance Performance improvements help wanted External contributions welcome type: feature New feature or request
Projects
Development

No branches or pull requests

2 participants