hregex

Yet another toy regular expression engine implementation in Haskell

Build

$ stack build

Features

support:

? The question mark indicates zero or one occurrences of the preceding element. For example, colou?r matches both "color" and "colour".

* The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on.

+ The plus sign indicates one or more occurrences of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".

[] A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].

[^] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.

() Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group.

Automata

NFA-test:

*Regex.RegexParser> runRegexParser nfa "(m|f)ood"
[(states            :fromList [0,1,2,3,4,5,6,7,8]
input chars       :fromList "dfmo"
transitions       :fromList [Edge 1 'm' 2,Edge 3 'f' 4,Edge 5 'o' 6,Edge 6 'o' 7,Edge 7 'd' 8,Epsilon 0 1,Epsilon 0 3,Epsilon 2 5,Epsilon 4 5]
initial state     :0
acceptable states :fromList [8],"")]

*Regex.RegexParser> runRegexParser nfa "[a-bc]"
[(states            :fromList [0,1,2,3,4,5,6,7,8,9]
input chars       :fromList "abc"
transitions       :fromList [Edge 1 'a' 2,Edge 4 'b' 5,Edge 6 'c' 7,Epsilon 0 1,Epsilon 0 3,Epsilon 2 9,Epsilon 3 4,Epsilon 3 6,Epsilon 5 8,Epsilon 7 8,Epsilon 8 9]
initial state     :0
acceptable states :fromList [9],"")]

*Regex.RegexParser> runRegexParser nfa "[^a-zA-Z]"
[(states            :fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]
input chars       :fromList "0123456789"
transitions       :fromList [Edge 1 '0' 2,Edge 4 '1' 5,Edge 7 '2' 8,Edge 10 '3' 11,Edge 13 '4' 14,Edge 16 '5' 17,Edge 19 '6' 20,Edge 22 '7' 23,Edge 25 '8' 26,Edge 27 '9' 28,Epsilon 0 1,Epsilon 0 3,Epsilon 2 37,Epsilon 3 4,Epsilon 3 6,Epsilon 5 36,Epsilon 6 7,Epsilon 6 9,Epsilon 8 35,Epsilon 9 10,Epsilon 9 12,Epsilon 11 34,Epsilon 12 13,Epsilon 12 15,Epsilon 14 33,Epsilon 15 16,Epsilon 15 18,Epsilon 17 32,Epsilon 18 19,Epsilon 18 21,Epsilon 20 31,Epsilon 21 22,Epsilon 21 24,Epsilon 23 30,Epsilon 24 25,Epsilon 24 27,Epsilon 26 29,Epsilon 28 29,Epsilon 29 30,Epsilon 30 31,Epsilon 31 32,Epsilon 32 33,Epsilon 33 34,Epsilon 34 35,Epsilon 35 36,Epsilon 36 37]
initial state     :0
acceptable states :fromList [37],"")]

DFA-test:

*Regex.RegexParser> runRegexParser dfa "(m|f)ood"
[(states            :fromList [fromList [0,1,3],fromList [2,5],fromList [4,5],fromList [6],fromList [7],fromList [8]]
input chars       :fromList "dfmo"
transitions       :fromList [Edge (fromList [0,1,3]) 'f' (fromList [4,5]),Edge (fromList [0,1,3]) 'm' (fromList [2,5]),Edge (fromList [2,5]) 'o' (fromList [6]),Edge (fromList [4,5]) 'o' (fromList [6]),Edge (fromList [6]) 'o' (fromList [7]),Edge (fromList [7]) 'd' (fromList [8])]
initial state     :fromList [0,1,3]
acceptable states :fromList [fromList [8]],"")]

*Regex.RegexParser> runRegexParser dfa "[a-bc]"
[(states            :fromList [fromList [0,1,3,4,6],fromList [2,9],fromList [5,8,9],fromList [7,8,9]]
input chars       :fromList "abc"
transitions       :fromList [Edge (fromList [0,1,3,4,6]) 'a' (fromList [2,9]),Edge (fromList [0,1,3,4,6]) 'b' (fromList [5,8,9]),Edge (fromList [0,1,3,4,6]) 'c' (fromList [7,8,9])]
initial state     :fromList [0,1,3,4,6]
acceptable states :fromList [fromList [2,9],fromList [5,8,9],fromList [7,8,9]],"")]

*Regex.RegexParser> runRegexParser dfa "[^a-zA-Z]"
[(states            :fromList [fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27],fromList [2,37],fromList [5,36,37],fromList [8,35,36,37],fromList [11,34,35,36,37],fromList [14,33,34,35,36,37],fromList [17,32,33,34,35,36,37],fromList [20,31,32,33,34,35,36,37],fromList [23,30,31,32,33,34,35,36,37],fromList [26,29,30,31,32,33,34,35,36,37],fromList [28,29,30,31,32,33,34,35,36,37]]
input chars       :fromList "0123456789"
transitions       :fromList [Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '0' (fromList [2,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '1' (fromList [5,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '2' (fromList [8,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '3' (fromList [11,34,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '4' (fromList [14,33,34,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '5' (fromList [17,32,33,34,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '6' (fromList [20,31,32,33,34,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '7' (fromList [23,30,31,32,33,34,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '8' (fromList [26,29,30,31,32,33,34,35,36,37]),Edge (fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]) '9' (fromList [28,29,30,31,32,33,34,35,36,37])]
initial state     :fromList [0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27]
acceptable states :fromList [fromList [2,37],fromList [5,36,37],fromList [8,35,36,37],fromList [11,34,35,36,37],fromList [14,33,34,35,36,37],fromList [17,32,33,34,35,36,37],fromList [20,31,32,33,34,35,36,37],fromList [23,30,31,32,33,34,35,36,37],fromList [26,29,30,31,32,33,34,35,36,37],fromList [28,29,30,31,32,33,34,35,36,37]],"")]

TODO

Most important

improve the speed!

{m, n} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes.

$ Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
app		app
doc		doc
src		src
test		test
.gitignore		.gitignore
ChangeLog.md		ChangeLog.md
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
package.yaml		package.yaml
stack.yaml		stack.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hregex

Build

Features

Automata

NFA-test:

DFA-test:

TODO

Most important

About

Releases

Packages

Languages

License

higuoxing/hregex

Folders and files

Latest commit

History

Repository files navigation

hregex

Build

Features

Automata

NFA-test:

DFA-test:

TODO

Most important

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages