This is a Python 3 project that uses nondeterministic finite automata to parse regular expressions. It was completed for a 3rd year software development graph theory assignment.
- main.py: Main Python file, run this to run the program
- nfa.py: Class representing an NFA (not specific to this project, could be used elsewhere)
- nfa_utils: Has functions for joining NFAs in varoius ways (ie. concatenate, union,...) and also the recursive function to build an NFA from the given regular expression string. Does most of the hard work of the project.
- tests.py: Has test cases for various functions. If all these tests pass, it's a good indication that the project is functioning properly.
- research.txt: Contains the research I did to help me with the project, including references, and also just some of my thoughts about how I was planning to tackle certain problems.
NFAs are represented as objects, in a way that is as close as possible to their mathematical definition (see nfa.py).
NFAs are built recursivly using Thompson's construction.
The regular expression is never converted to postfix notation, it operates on infix notation. No stack is explicitly used either, instead the algorithm uses the call stack.
You need Python 3 to run this project. Clone the repo and run python3 main.py from inside.
You may want to run python main.py instead, depending on your setup.
Follow the instructions provided when you run the program:
Set the regular expression by typing regex=(regex here)
Example: regex=011010
You can give input text to test against that regex by just typing it in by itself.
Also, you can type exit to exit the program.
When you create an NFA using regex=..., a breakdown/visualization of the recursive algorithm will be printed as the NFA is built.
The program will report whether or not a given input string is accepted or rejected by the currently set regex when you give it an input string.
It will also report some information on how much time was taken for an NFA to build, or how long it took to check if an input string is accepted or rejected.
Concatenation operation: . (dot character)
Union operator: | (pipe character)
Kleene star: * (asterisk)
One-or-more-of operator: + (plus)
Zero-or-one-of operator: ? (question mark)
Order: ? + * . |
Left to right, where '?' is done first, and '|' is done last.
Anything that isn't recognized as a special character.
Internally, there are no preset alphabet characters. They are based on whatever input you give it.
p.y.t.h.o.n or python - Matches only the word "python"
python|java|C# - Matches any one of "python", "java" or "C#"
o+k then or o*ok then - Matches "ok then", "ooook then", "ooooooook then", etc...
c?loud - Matches "cloud" and "loud"
H?A?h?a?*!*|H?E?h?e?*!* - Accepts a wide range of laughs, including "Ha", "heh", "Haha", and "AAAAAAAAAAHAHAHAHAHA!!"
Run this command from within the project directory:
python3 -m unittest tests.TestNFA
You may want to run python -m unittest tests.TestNFA instead, depending on your setup.
It's probably a better idea but I had more fun figuring it out on my own.
The main drawback of my approach seems to be that adding support for parenthesis would be very messy and difficult.