You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Emacs regex is similar to PCRE regex. In that case we could use the fancy-regex crate (which implements a backtracking engine), once #84 is fixed. However there are still several differences that would need to be handled.
meta characters
Emacs regex meta characters are backwards from what most regex use. For example () represent literal parens, and \(\) is a capture group. Also | is literal, and \| is alternation. This is easy enough to fix with pre-processing the regex.
syntax aware matches
Several of the regex patterns match on the syntax definition of characters.
\w: word character
\s: match syntax class
"Word" and "symbol" are defined by the major modes syntax table. You could transform these into general character classes ([...]) for the rust regex engine.
There is also the special character \=, which matches the point. To handle this you could split the buffer into two parts; before point and after point. Then match each half separately.
boundaries
Emacs defines a regex for the boundary of words and symbols.
\<: beginning of word
\>: end of word
\_<: beginning of symbol
\_>: end of symbol
these will need to be implemented with look-arounds. You can’t even build them into the regex engine because they can change per major mode.
Buffer Gap
Most performance oriented regex libraries expect to operate on contiguous data. However a gap buffer will have a gap of garbage data somewhere in the buffer. This becomes a problem when the span of the regex search crosses the gap. The simplest solution here is to move the gap outside of the range of the search. This could performance issues if the lines are really long. We also have to consider how to match multiline regex. Not sure of a good way to handle that. Here are some notes from the remacs project.
The text was updated successfully, but these errors were encountered:
Emacs regex is similar to PCRE regex. In that case we could use the fancy-regex crate (which implements a backtracking engine), once #84 is fixed. However there are still several differences that would need to be handled.
meta characters
Emacs regex meta characters are backwards from what most regex use. For example
()
represent literal parens, and\(\)
is a capture group. Also|
is literal, and\|
is alternation. This is easy enough to fix with pre-processing the regex.syntax aware matches
Several of the regex patterns match on the syntax definition of characters.
\w
: word character\s
: match syntax class"Word" and "symbol" are defined by the major modes syntax table. You could transform these into general character classes (
[...]
) for the rust regex engine.There is also the special character
\=
, which matches the point. To handle this you could split the buffer into two parts; before point and after point. Then match each half separately.boundaries
Emacs defines a regex for the boundary of words and symbols.
\<
: beginning of word\>
: end of word\_<
: beginning of symbol\_>
: end of symbolthese will need to be implemented with look-arounds. You can’t even build them into the regex engine because they can change per major mode.
Buffer Gap
Most performance oriented regex libraries expect to operate on contiguous data. However a gap buffer will have a gap of garbage data somewhere in the buffer. This becomes a problem when the span of the regex search crosses the gap. The simplest solution here is to move the gap outside of the range of the search. This could performance issues if the lines are really long. We also have to consider how to match multiline regex. Not sure of a good way to handle that. Here are some notes from the remacs project.
The text was updated successfully, but these errors were encountered: