Skip to content

Commit

Permalink
add explanatory comment
Browse files Browse the repository at this point in the history
  • Loading branch information
dmsnell committed Nov 27, 2018
1 parent fc58901 commit 910ba2d
Showing 1 changed file with 39 additions and 0 deletions.
39 changes: 39 additions & 0 deletions packages/block-serialization-default-parser/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,45 @@ let document;
let offset;
let output;
let stack;

/**
* Matches block comment delimiters
*
* While most of this pattern is straightforward the attribute parsing
* incorporates a tricks to make sure we don't choke on specific input
*
* - since JavaScript has no possessive quantifier or atomic grouping
* we are emulating it with a trick
*
* we want a possessive quantifier or atomic group to prevent backtracking
* on the `}`s should we fail to match the remainder of the pattern
*
* we can emulate this with a positive lookahead and back reference
* (a++)*c === ((?=(a+))\1)*c
*
* let's examine an example:
* - /(a+)*c/.test('aaaaaaaaaaaaad') fails after over 49,000 steps
* - /(a++)*c/.test('aaaaaaaaaaaaad') fails after 85 steps
* - /(?>a+)*c/.test('aaaaaaaaaaaaad') fails after 126 steps
*
* this is because the possessive `++` and the atomic group `(?>)`
* tell the engine that all those `a`s belong together as a single group
* and so it won't split it up when stepping backwards to try and match
*
* if we use /((?=(a+))\1)*c/ then we get the same behavior as the atomic group
* or possessive and prevent the backtracking because the `a+` is matched but
* not captured. thus, we find the long string of `a`s and remember it, then
* reference it as a whole unit inside our pattern
*
* @cite http://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead
* @cite http://blog.stevenlevithan.com/archives/mimic-atomic-groups
* @cite https://javascript.info/regexp-infinite-backtracking-problem
*
* once browsers reliably support atomic grouping or possessive
* quantifiers natively we should remove this trick and simplify
*
* @type RegExp
*/
const tokenizer = /<!--\s+(\/)?wp:([a-z][a-z0-9_-]*\/)?([a-z][a-z0-9_-]*)\s+({(?:(?=([^}]+|}+(?=})|(?!}\s+\/?-->)[^])*)\5|[^]*?)}\s+)?(\/)?-->/g;

function Block( blockName, attrs, innerBlocks, innerHTML, innerContent ) {
Expand Down

0 comments on commit 910ba2d

Please sign in to comment.