Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revamp finding splittable places in /i full node
Commits 3ae8ec4 and cc1ed63 didn't actually work. Tests in pat_advanced.t would have failed, except that optimizations in the regex engine in the meantime led to the tests not actually testing what they originally did. I believe that this finally gets it right for non-/l. The problem is when an EXACTFish node becomes full, you don't want to split across a multi-char fold. To use a fairly familiar example, we can't split between 'ss', as that sequence matches a LATIN SMALL LETTER SHARP S, and the way the regex engine currently works, it can't see beyond the current node; it would see one or the other 's' but not the sequence. So the code backs off one character and checks if it can split there. If not, it repeats until it finds such a place or gets to the beginning. If the entire node is all 's'es, for example, there's no good place to split. So it gives up and takes all of them. One thing I hadn't realized before is when there are three-character folds, you can't split if the current position is the beginning of the three, but also when it is the second of the three.
- Loading branch information