Skip to content

Commit

Permalink
Merge 22679cb into 187f2b9
Browse files Browse the repository at this point in the history
  • Loading branch information
eliotwrobson authored Sep 3, 2023
2 parents 187f2b9 + 22679cb commit 4b2a457
Show file tree
Hide file tree
Showing 10 changed files with 457 additions and 20 deletions.
37 changes: 37 additions & 0 deletions .github/RELEASE_CHECKLIST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Automata Release Checklist

If you are an Automata collaborator, this is a checklist you can follow to
properly publish a release to GitHub and PyPI.

- [ ] Before Release
- [ ] Run tests and coverage report (`coverage run -m nose2`)
- [ ] All tests pass
- [ ] Code coverage is over 90% for all files
- [ ] Update Migration Guide with details on breaking API changes and upgrade path
- [ ] Update README and Migration Guide with latest major release (e.g. v7)
- [ ] Write release notes for new release
- [ ] Check copyright line break in README (there should be two spaces after
the *Copyright <year> Caleb Evans* line; sometimes these can get removed
while editing the README, depending on your editor's settings
- [ ] Check copyright year (the end year in the range should always be the
current year)
- [ ] Release
- [ ] Merge `develop` into `main`
- [ ] `git checkout main`
- [ ] `git pull`
- [ ] `git merge develop`
- [ ] Commit version bump in `pyproject.toml`
- [ ] Commit message must be `Prepare v<new_version> release`, e.g. `Prepare v8.0.0 release`
- [ ] Tag commit with new release number
- [ ] Tag name must be v-prefixed, followed by the semantic version, e.g.
`v8.0.0`)
- [ ] Push new commit and tag with `git push && git push --tags`
- [ ] Post-Release
- [ ] Check [package page on PyPI](https://pypi.org/project/automata-lib/) to
ensure that new release is public
- [ ] Post new GitHub Release with release notes
- [ ] Rebase `develop` on top of latest `main`
- `git checkout develop`
- `git pull`
- `git rebase main`
- `git push`
27 changes: 27 additions & 0 deletions .github/workflows/draft-pdf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
on:
push:
paths: ["joss/**"]
pull_request:
paths: ["joss/**"]

jobs:
paper:
runs-on: ubuntu-latest
name: Paper Draft
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
# This should be the path to the paper within your repo.
paper-path: joss/paper.md
- name: Upload
uses: actions/upload-artifact@v1
with:
name: paper
# This is the output path where Pandoc will write the compiled
# PDF. Note, this should be the same directory as the input
# paper.md
path: joss/paper.pdf
6 changes: 4 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ on:
jobs:
publish:
runs-on: ubuntu-latest
environment: release
# Required for PyPI Trusted Publishers feature
permissions:
id-token: write

steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -40,11 +44,9 @@ jobs:
- name: Publish distribution to Test PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/

- name: Publish distribution to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
repository_url: https://upload.pypi.org/legacy/
61 changes: 47 additions & 14 deletions automata/fa/dfa.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ def __len__(self) -> int:
"""Returns the cardinality of the language represented by the DFA."""
return self.cardinality()

def to_partial(self) -> Self:
def to_partial(self, *, retain_names: bool = False, minify: bool = True) -> Self:
"""
Turns a DFA (complete or not) into a partial DFA.
Removes dead states and trap states (except the initial state)
Expand All @@ -242,6 +242,18 @@ def to_partial(self) -> Self:
new_states = live_states & non_trap_states
new_states.add(self.initial_state)

if minify:
# No need to alter transitions here, since unused entries in
# that dict are removed automatically by the minify call
return self.__class__._minify(
reachable_states=new_states,
input_symbols=self.input_symbols,
transitions=self.transitions,
initial_state=self.initial_state,
reachable_final_states=self.final_states & new_states,
retain_names=retain_names,
)

return self.__class__(
states=new_states,
input_symbols=self.input_symbols,
Expand Down Expand Up @@ -441,11 +453,25 @@ def minify(self, retain_names: bool = False) -> Self:
If False, new states will be named 0, ..., n-1.
"""

# Compute reachable states and final states
bfs_states = self.__class__._bfs_states(
self.initial_state, lambda state: iter(self.transitions[state].items())
)
reachable_states = set(bfs_states)
if self.allow_partial:
# In the case of a partial DFA, we want to try to condense
# possible trap states before the main minify operation.
graph = self._get_digraph()
live_states = nx.descendants(graph, self.initial_state) | {
self.initial_state
}
non_trap_states = set(self.final_states).union(
*(nx.ancestors(graph, state) for state in self.final_states)
)
reachable_states = live_states & non_trap_states
reachable_states.add(self.initial_state)
else:
# Compute reachable states and final states
bfs_states = self.__class__._bfs_states(
self.initial_state, lambda state: iter(self.transitions[state].items())
)
reachable_states = set(bfs_states)

reachable_final_states = self.final_states & reachable_states

return self.__class__._minify(
Expand Down Expand Up @@ -490,7 +516,11 @@ def _minify(
for start_state, path in transitions.items():
if start_state in reachable_states:
for symbol, end_state in path.items():
transition_back_map[symbol][end_state].append(start_state)
symbol_dict = transition_back_map[symbol]
# If statement here needed to ignore certain transitions
# when minifying a partial DFA.
if end_state in symbol_dict:
symbol_dict[end_state].append(start_state)

origin_dicts = tuple(transition_back_map.values())
processing = {final_states_id}
Expand Down Expand Up @@ -534,16 +564,19 @@ def _minify(
new_states = frozenset(back_map.values())
new_initial_state = back_map[initial_state]
new_final_states = frozenset(back_map[acc] for acc in reachable_final_states)
new_transitions = {
name: {
letter: back_map[transitions[next(iter(eq))][letter]]
for letter in transitions[next(iter(eq))].keys()
new_transitions = {}

for name, eq in eq_class_name_pairs:
eq_class_rep = next(iter(eq))
inner_transition_dict_old = transitions[eq_class_rep]
new_transitions[name] = {
letter: back_map[inner_transition_dict_old[letter]]
for letter in inner_transition_dict_old.keys()
if inner_transition_dict_old[letter] in reachable_states
}
for name, eq in eq_class_name_pairs
}

allow_partial = any(
len(lookup) != len(input_symbols) for lookup in transitions.values()
len(lookup) != len(input_symbols) for lookup in new_transitions.values()
)
return cls(
states=new_states,
Expand Down
5 changes: 3 additions & 2 deletions docs/fa/class-dfa.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,11 @@ else:
dfa.copy() # returns deep copy of dfa
```

## DFA.to_partial(self)
## DFA.to_partial(self, retain_names = False, minify = True)

Creates an equivalent partial DFA with all unnecessary transitions removed. If the DFA is
already partial, just returns a copy.
already partial, just returns a copy. Will minify the input DFA if `minify` is `True`,
and retain names during this if `retain_names` is `True`.

```python
dfa.to_partial() # returns deep copy of dfa
Expand Down
Binary file added joss/finite_language_dfa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
170 changes: 170 additions & 0 deletions joss/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
@InProceedings{Sutner03,
author={Sutner, Klaus},
editor={Champarnaud, Jean-Marc and Maurel, Denis},
title={automata, a Hybrid System for Computational Automata Theory},
booktitle={Implementation and Application of Automata},
year={2003},
publisher={Springer Berlin Heidelberg},
address={Berlin, Heidelberg},
pages={221--227},
abstract={We present a system that performs computations on finite state machines, syntactic semigroups, and one-dimensional cellular automata.},
isbn={978-3-540-44977-5},
url={https://doi.org/10.1007/3-540-44977-9_21},
doi={10.1007/3-540-44977-9_21},
}

@Misc{brics,
author = {Anders M\o{}ller},
title = {dk.brics.automaton -- Finite-State Automata
and Regular Expressions for {Java}},
url = {http://www.brics.dk/automaton/},
year = 2021
}

@article{AlmeidaMR10,
author = {Marco Almeida and
Nelma Moreira and
Rog{\'{e}}rio Reis},
title = {Testing the Equivalence of Regular Languages},
journal = {Journal of Automata, Languages and Combinatorics},
volume = {15},
number = {1/2},
pages = {7--25},
year = {2010},
url = {https://doi.org/10.25596/jalc-2010-007},
doi = {10.25596/jalc-2010-007},
timestamp = {Mon, 11 May 2020 22:57:06 +0200},
biburl = {https://dblp.org/rec/journals/jalc/AlmeidaMR10.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

@inbook{mihov_schulz_2019,
place={Cambridge},
series={Cambridge Tracts in Theoretical Computer Science},
title={The Minimal Deterministic Finite-State Automaton for a Finite Language},
DOI={10.1017/9781108756945.011},
booktitle={Finite-State Techniques: Automata, Transducers and Bimachines},
publisher={Cambridge University Press},
author={Mihov, Stoyan and Schulz, Klaus U.},
year={2019},
pages={253–278},
collection={Cambridge Tracts in Theoretical Computer Science}
}

@book{AhoSU86,
author = {Aho, Alfred V. and Lam, Monica S. and Sethi, Ravi and Ullman, Jeffrey D.},
title = {Compilers: Principles, Techniques, and Tools (2nd Edition)},
year = {2006},
isbn = {0321486811},
publisher = {Addison-Wesley Longman Publishing Co., Inc.},
address = {USA},
pages = {152-155}
}

@incollection{Hopcroft71,
title = {AN n log n ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON},
editor = {Zvi Kohavi and Azaria Paz},
booktitle = {Theory of Machines and Computations},
publisher = {Academic Press},
pages = {189-196},
year = {1971},
isbn = {978-0-12-417750-5},
doi = {https://doi.org/10.1016/B978-0-12-417750-5.50022-1},
url = {https://www.sciencedirect.com/science/article/pii/B9780124177505500221},
author = {John Hopcroft},
}


@INPROCEEDINGS{Erickson23,
author = {Jeff Erickson, Jason Xia, Eliot Wong Robson, Tue Do, Aidan Tzur Glickman, Zhuofan Jia, Eric Jin, Jiwon Lee, Patrick Lin, Steven Pan, Samuel Ruggerio, Tomoko Sakurayama, Andrew Yin, Yael Gertner, and Brad Solomon},
title = {Auto-graded Scaffolding Exercises For Theoretical Computer Science},
booktitle = {2023 ASEE Annual Conference \& Exposition},
year = {2023},
month = {June},
address = {Baltimore , Maryland},
publisher = {ASEE Conferences},
url = {https://peer.asee.org/42347}
}

@misc{Johnson_2010,
title={Nick’s Blog},
url={http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata},
author={Johnson, Nick},
year={2010},
month={Jul}
}

@article{Knuth77,
author = {Knuth, Donald E. and Morris, Jr., James H. and Pratt, Vaughan R.},
title = {Fast Pattern Matching in Strings},
journal = {SIAM Journal on Computing},
volume = {6},
number = {2},
pages = {323-350},
year = {1977},
doi = {10.1137/0206024},
URL = {https://doi.org/10.1137/0206024},
eprint = {https://doi.org/10.1137/0206024}
}

@book{Hopcroft06,
author = {Hopcroft, John E. and Motwani, Rajeev and Ullman, Jeffrey D.},
title = {Introduction to Automata Theory, Languages, and Computation (3rd Edition)},
year = {2006},
isbn = {0321455363},
publisher = {Addison-Wesley Longman Publishing Co., Inc.},
address = {USA}
}

@book{Sipser12,
series = {Introduction to the {Theory} of {Computation}},
title = {Introduction to the {Theory} of {Computation}},
isbn = {978-1-133-18781-3},
url = {https://books.google.com/books?id=4J1ZMAEACAAJ},
publisher = {Cengage Learning},
author = {Sipser, M.},
year = {2012},
lccn = {2012938665},
pages = {45-47}
}

@article{Marschall11,
title = {Construction of minimal deterministic finite automata from biological motifs},
volume = {412},
issn = {0304-3975},
url = {https://www.sciencedirect.com/science/article/pii/S0304397510006948},
doi = {https://doi.org/10.1016/j.tcs.2010.12.003},
abstract = {Deterministic finite automata (DFAs) are constructed for various purposes in computational biology. Little attention, however, has been given to the efficient construction of minimal DFAs. In this article, we define simple non-deterministic finite automata (NFAs) and prove that the standard subset construction transforms NFAs of this type into minimal DFAs. Furthermore, we show how simple NFAs can be constructed from two types of pattern popular in bioinformatics, namely (sets of) generalized strings and (generalized) strings with a Hamming neighborhood.},
number = {8},
journal = {Theoretical Computer Science},
author = {Marschall, Tobias},
year = {2011},
keywords = {Consensus string, Deterministic finite automaton, Generalized string, Minimization, Motif},
pages = {922--930},
}

@article{Knuutila01,
title = {Re-describing an algorithm by {Hopcroft}},
volume = {250},
issn = {0304-3975},
url = {https://www.sciencedirect.com/science/article/pii/S0304397599001504},
doi = {https://doi.org/10.1016/S0304-3975(99)00150-4},
abstract = {J. Hopcroft introduced already in 1970 an O(nlogn)-time algorithm for minimizing a finite deterministic automaton of n states. Although the existence of the algorithm is widely known, its theoretical justification, correctness and running time analysis are not. We give here a tutorial reconstruction of Hopcroft's algorithm focusing on a firm theoretical basis, clear correctness proofs and a well-founded computational analysis. Our analysis reveals that if the size of the input alphabet m is not fixed, then Hopcroft's original algorithm does not run in time O(mnlogn) as is commonly believed in the literature. The O(mnlogn) holds, however, for the variation presented later by D. Gries and for a new variant given in this article. We also propose a new efficient routine for refining the equivalence classes constructed in the algorithm and suggest a computationally sound heuristics as an enhancement.},
number = {1},
journal = {Theoretical Computer Science},
author = {Knuutila, Timo},
year = {2001},
keywords = {Algorithms, Finite automata, Minimization},
pages = {333--363},
}

@ARTICLE{Xu16,
author={Xu, Chengcheng and Chen, Shuhui and Su, Jinshu and Yiu, S. M. and Hui, Lucas C. K.},
journal={IEEE Communications Surveys & Tutorials},
title={A Survey on Regular Expression Matching for Deep Packet Inspection: Applications, Algorithms, and Hardware Platforms},
year={2016},
volume={18},
number={4},
pages={2991-3029},
doi={10.1109/COMST.2016.2566669}
}
Loading

0 comments on commit 4b2a457

Please sign in to comment.