Skip to content

Commit

Permalink
Revamp finding splittable places in /i full node
Browse files Browse the repository at this point in the history
Commits 3ae8ec4 and
cc1ed63 didn't actually work.

Tests in pat_advanced.t would have failed, except that optimizations in
the regex engine in the meantime led to the tests not actually testing
what they originally did.

I believe that this finally gets it right for non-/l.

The problem is when an EXACTFish node becomes full, you don't want to
split across a multi-char fold.  To use a fairly familiar example, we
can't split between 'ss', as that sequence matches a LATIN SMALL LETTER
SHARP S, and the way the regex engine currently works, it can't see
beyond the current node; it would see one or the other 's' but not the
sequence.  So the code backs off one character and checks if it can
split there.  If not, it repeats until it finds such a place or gets to
the beginning.  If the entire node is all 's'es, for example, there's no
good place to split.  So it gives up and takes all of them.

One thing I hadn't realized before is when there are three-character
folds, you can't split if the current position is the beginning of the
three, but also when it is the second of the three.
  • Loading branch information
khwilliamson committed Nov 16, 2019
1 parent 42d7c91 commit 4e4df05
Show file tree
Hide file tree
Showing 2 changed files with 237 additions and 159 deletions.

0 comments on commit 4e4df05

Please sign in to comment.