Parser: Propose new hand-coded parser #8083

dmsnell · 2018-07-20T13:57:20Z

For some time we've needed a more performant PHP parser for the first
stage of parsing the post_content document.

See #1681 (early exploration)
See #8044 (parser performance issue)
See #1775 (parser performance, fixed in php-pegjs)

I'm proposing this implementation of the spec parser as an alternative
to the auto-generated parser from the PEG definition.

Updates

This now also includes a copy of the parser in JS whose performance is also quite good.
The files have been moved into the /packages directory - I still need some help understanding where it all belongs and how to make the package work

This provides a setup fixture for #6831 wherein we are testing alternate
parser implementations - https://comparator-yizlfvqafz.now.sh

Distinctives

designed as a basic recursive-descent
but doesn't recurse on the call-stack, recurses via trampoline
moves linearly through document in one pass
relies on RegExp for tokenization

Note I expect us to discover implementation bugs during the initial rollout of this parser. We have run it through our document library and unit tests but real posts are surely getting into more complicated constructions. We can deal with these as they come but we should expect these.

Todo

nested blocks include the nested content in their innerHTML
this needs to go away
create test fixture - https://comparator-yizlfvqafz.now.sh
figure out where to save this file
phpunit tests

Benchmark

For posterity's sake I ran the merged parser through the parser comparator and compared it against the auto-generated spec parser. Here are the results from my laptop

                                    ms                        MB    
                                Spec  Default  Speedup    Spec  Default  Comparison
demo-post.html                    29.58   0.23   130     38.56   16.43     43%
early-adopting-the-future.html   263.83   1.01   262     36.84   17.10     46%
moby-dick-parsed.html           5012.13  11.55   434     75.41   25.18     33%
pygmalian-raw-html.html          330.35   0.24  1366    116.72   16.90     14%
redesigning-chrome-desktop.html  211.42   1.22   173     37.22   16.51     44%
shortcode-shortcomings.html       71.28   0.36   198     34.07   16.98     50%
web-at-maximum-fps.html          161.35   0.87   186     33.12   16.32     49%

The tests were done on my late 2013 rMBP quad core 2.6 GHz laptop. According to the Intel Power Gadget the CPU was running at 3.6 GHz the entire time. Each document was parsed with each parser at least 47 times and the runs were at random and each run was randomly chosen to parse the document between one and five times in a row before returning the results. Runtime and memory use were measured inside a runner script running in Docker as described in the parser comparator.

dmsnell · 2018-07-20T13:59:15Z

I'm pretty sure that the next steps from here involve pondering the data structure of the stack. We have enough working knowledge now to know what we need to track and how we can pop that from the stack to the output.

Done

pento

Noice! Let's get this in sooner rather than later, so we can make inroads on the things depending on having a faster parser. 🙂

I've left some comments, here are a few random notes that have occurred to me, as well:

It feels a little weird to be putting the PHP parser on NPM, but we don't really use Packagist at all, sooo... 🤷‍♂️ Let's stick with NPM for now, we can potentially explore doing Packagist/composer things later.
phpcs.xml.dist needs to be updated to scan the new PHP code. I mentioned a couple of coding standards issues in the comments, but PHPCS should pick up the rest.
Combined with switching the parser in gutenberg_parse_blocks(), phpunit/class-parsing-test.php should be updated to use gutenberg_parse_blocks(), rather than Gutenberg_PEG_Parser.

With this performance improvement, it seems like we could change do_blocks() to parse the content, instead of using the dynamic blocks regex.

pento · 2018-08-26T01:34:58Z

packages/block-serialization-default-parser/README.md

@@ -0,0 +1,107 @@
+# Block Serialization Default Parser
+
+This library contains the default block serialization parser implementations for


You'll need to remove the manual line breaks from the README: we use the Jetpack Markdown parser, which adds a   for single line breaks.

this makes me want to cry since it's something I love about markdown and consistent among every other markdown parser I've used.

The implication of the “one or more consecutive lines of text” rule is that Markdown supports “hard-wrapped” text paragraphs. This differs significantly from most other text-to-HTML formatters (including Movable Type’s “Convert Line Breaks” option) which translate every line break character in a paragraph into a tag.

When you do want to insert a break tag using Markdown, you end a line with two or more spaces, then type return.

Yes, this takes a tad more effort to create a , but a simplistic “every line break is a ” rule wouldn’t work for Markdown. Markdown’s email-style blockquoting and multi-paragraph list items work best — and look better — when you format them with hard breaks.
https://daringfireball.net/projects/markdown/syntax#p

nonetheless, I have destroyed my markdown to make it happy in ee72314cc

😢

pento · 2018-08-26T01:42:57Z

packages/block-serialization-default-parser/parser.php

@@ -0,0 +1,260 @@
+<?php
+
+function bsdp_parse($document ) {


Instead of adding a new _parse() function, can gutenberg_parse_blocks() be updated to use the new parser? We can add a filter in there for easier switching between classes: eg, existing filters in Core that filter a Class name: wp_rest_server_class, customize_dynamic_setting_class.

block_parser_class works for me.

see related comment response below.

I'm having some trouble understanding what you wrote @pento. I hope we create a filter to select the parsing function but won't that depend somewhat on having unique names for each possible parse functions?

also, are wp_rest_server_class and customize_dynamic_setting_class anyway related here? are you suggesting we create a class interface for the block parser class?

in lib/block.php I had originally envisioned something like this…

$parser = apply_filter( 'block_parser_class', 'bsdp_parse' ); call_user_func( $parser, $post_content );

I guess you are recommending this instead?

$parser_class = apply_filter( 'block_parser_class', 'bsdp' ); $parser = new $parser_class(); $parser->parse( $post_content );

experimented in 064efa58d but I haven't tested it yet

for what it's worth I'd be more comfortable getting this parser in first before making the parser system pluggable just because of the scope of the changes

pento · 2018-08-26T01:48:23Z

packages/block-serialization-default-parser/parser.php

+    static $parser;
+
+    if ( ! isset( $parser ) ) {
+        $parser = new BSDP_Parser();


I'm not wild about the BSDP_ prefix. I get why it's there, but perhaps it could be a little more descriptive?

Agreed. Block_Parser()?

mainly this is there to prevent namespace collisions. my hope is that a few PRs after this we'll have a filter choose the parser and obviously if we create two or more Block_Parser() classes we'll run into conflicts.

any thoughts on that? even with an encapsulating class we run into some issues here because I don't think we can create a class within a class. the only way around it otherwise I think is actual namespacing which isn't supported on older PHP version…

Realistically, is there going to be a completely new parser appear between now and 5.0? It seems like this parser is going to be the one that will go into Core.

If that's the case, we should just use a generic name. WP_Block_Parser will fit into the WordPress naming scheme.

pento · 2018-08-26T01:48:50Z

packages/block-serialization-default-parser/parser.php

+
+        switch ( $token_type ) {
+            case 'no-more-tokens':
+                # if not in a block then flush output


Need to use // for single inline comments.

double-slashed it in ee72314cc

pento · 2018-08-26T01:49:38Z

packages/block-serialization-default-parser/parser.php

+                    return false;
+                }
+
+                # Otherwise we have a problem


Block inline comments should be in the form:

/* * blah * * - foo * - bar */

exploded comments in ee72314cc

pento · 2018-08-26T02:03:12Z

packages/block-serialization-default-parser/README.md

+# Block Serialization Default Parser
+
+This library contains the default block serialization parser implementations for
+WordPress documents. It provides native PHP and Javascript parsers that implement


s/Javascript/JavaScript/ 🙂

substituted in ee72314cc

gziolo · 2018-08-27T10:04:54Z

packages/block-serialization-default-parser/package.json

@@ -0,0 +1,25 @@
+{
+  "name": "@wordpress/block-serialization-default-parser",
+  "version": "1.0.0",


I would put 1.0.0-rc.0 or something like that to allow Lerna to do its job - it always bumps version so it would try to do 1.0.1 release otherwise ...

campaigned for release in 8c7e42c

gziolo · 2018-08-27T10:06:18Z

webpack.config.js

@@ -88,6 +88,7 @@ const gutenbergPackages = [
 	'autop',
 	'blob',
 	'blocks',
+	'block-serialization-default-parser',
 	'block-serialization-spec-parser',


Can we stop bundling the other one if we don't use it in Gutenberg anymore?

a good question. I don't want to kill the PEG parser since that maintains the spec in a way no hand-written implementation can.

in my comparator PRs I'm trying to move towards a system that will automatically run the implementations against the specification in something like a CI job so that we can have our formal specification without worrying about the implementation diverging (for example, if someone makes a change to the implementation without changing the spec first)

that is, I think we want to keep the spec-parser wherever we need it - mainly I think we want to strip it from the default load of Gutenberg but whether we build it, what do you think?

The package with transpiled code is going to be there anyway. It's really up to you and how you want to use it. If you are fine with referencing it as a regular npm package then you don't need it. If you want to consume it as part of e2e test or something which requires all Gutenberg build files then you can leave it as is. I just wanted to raise the awareness.

thanks - this is mainly just out of my expertise at this point. if you are willing to make a decision on it or can tell me what we should do then that would help me out.

it seems like several people want these parser tests to be written with jest and somehow in the normal suite - I don't know what that means here for this decision

gziolo · 2018-08-27T10:07:21Z

lib/client-assets.php

@@ -369,6 +376,7 @@ function gutenberg_register_scripts_and_styles() {
 		array(
 			'wp-autop',
 			'wp-blob',
+			'wp-block-serialization-default-parser',
 			'wp-block-serialization-spec-parser',


I think we no longer need to list wp-block-serialization-spec-parser as a dependency. In addition, we should stop registering it, too.

agreed on this one but I wasn't entirely sure how we wanted this to work…

do we want Gutenberg to automatically replace the spec parser with the "default" one at boot through a filter or do we want the "default" to be the default?

I want the auto-generated parser to be available still, especially for things like diagnostics and exploration.

As commented above, it all depends on the way you want to use it. I don't have any strong opinions about it. We should just ensure we don't ship unused code to the end users.

Do we have a decision here?

I left the spec parser registered but un-enqueued it in 66455b4

gziolo · 2018-08-27T10:09:36Z

lib/blocks.php

+	 *
+	 * @param string $parser_class Name of block parser class
+	 */
+	$parser_class = apply_filters( 'block_parser_class', 'BDSP_Parser' );


We should document it in the extensibility docs. Probably, the main document would be the best fit: https://github.com/WordPress/gutenberg/blob/master/docs/extensibility.md.

documented in 8c7e42c

This still reads BDSP. :)

another great catch - fixed in 66455b4

gziolo · 2018-08-27T10:11:24Z

packages/blocks/src/api/parser.js

@@ -378,6 +378,6 @@ const createParse = ( parseImplementation ) =>
 *
 * @return {Array} Block list.
 */
-export const parseWithGrammar = createParse( grammarParse );
+export const parseWithGrammar = createParse( defaultParse );


Should we offer a filter for JS implementation, too?

yes but I wasn't sure if this PR was the right one for it. that is, filtering out the PHP side seemed somewhat straightforward while filtering the JS side seemed more complicated since we have to take into account things like loading the parser bundles and making sure they are available before the editor loads

do you think we need to do it all here in this PR?

It's totally fine as its own PR, I just wanted to ensure we tackle both PHP and JS side of things.

gziolo · 2018-08-27T13:31:50Z

docs/extensibility/parser.md

+    return 'EmptyParser';
+}
+
+add_filter( 'block_parser_class', select_empty_parser, 10, 1 );


I think we provide the name of the function as a string in other examples to ensure it works with PHP 5.2. We might also want to prefix the function name with the plugin name:

add_filter( 'block_parser_class', `my_plugin_select_empty_parser`, 10, 1 );

good catch! I never meant to leave out the string - just neglected it - updated in 96ecfb8

gziolo · 2018-08-27T13:32:47Z

8c7e42c looks great, I left one comment which is a tiny thing that affects only PHP 5.2...

mcsf · 2018-08-28T09:48:03Z

packages/block-serialization-default-parser/parser.php

+}
+
+function bdsp_select_parser( $prev_parse_class ) {
+    return 'BSDP_Parser';


There's a typo at BSDP. Anyway, given that the apply_filters call in gutenberg_parse_blocks defaults to 'BDSP_Parser', we should remove this bit.

good catch! Is removed the function in 9c85a60

mcsf · 2018-08-28T18:31:31Z

I'm getting a tokenization bug while testing with a personal post. Digging…

mcsf · 2018-08-28T18:33:41Z

packages/block-serialization-default-parser/src/index.js

+	const namespace = namespaceMatch || 'core/';
+	const name = namespace + nameMatch;
+	const hasAttrs = !! attrsMatch;
+	const attrs = hasAttrs ? JSON.parse( attrsMatch ) : null;


I know there's a performance hit with try, but we should play it safe with JSON.parse, or generally speaking make sure we can inform the user of bad input and recover (e.g. isolate bad blocks) as best as possible. Thoughts, @dmsnell?

There's no longer the famed V8 deoptimization with try / catch

https://github.com/petkaantonov/bluebird/wiki/Optimization-killers#2-unsupported-syntax
v8/v8@9aac80f

added the try in 9c85a60 but left it out of the PHP since in PHP it already returns null on a failed parse

mcsf · 2018-08-29T10:04:25Z

@dmsnell: I've pushed a failing test for the parser. The gist of it is that I think the tokenizer is too greedy when looking for the end of an attributes group ({"some":"json"}). Thus, a document with two self-closing attribute-equipped blocks, not necessarily consecutive, breaks the parser:

<!-- wp:block {"ref":313} /-->
<!-- wp:block {"ref":482} /-->

This makes the parser throw a syntax error in the JSON.parse call:

SyntaxError: Unexpected token / in JSON at position 19

We should guarantee handling of any bad JSON here, but that's not the real issue. The issue is in the tokenizer, as the following fragment was returned as a match for attrsMatch:

{\"ref\":313} /--><!-- wp:block {\"ref\":482}

Note that, in contrast, the following input is correctly parsed:

<!-- wp:block {"ref":313} -->
<!-- /wp:block -->
<!-- wp:block {"ref":482} /-->

I used the following debugger patch:

diff --git a/packages/block-serialization-default-parser/src/index.js b/packages/block-serialization-default-parser/src/index.js
index 9c1983f22..007edd2b5 100644
--- a/packages/block-serialization-default-parser/src/index.js
+++ b/packages/block-serialization-default-parser/src/index.js
@@ -172,7 +172,7 @@ function nextToken() {
 	const namespace = namespaceMatch || 'core/';
 	const name = namespace + nameMatch;
 	const hasAttrs = !! attrsMatch;
-	const attrs = hasAttrs ? JSON.parse( attrsMatch ) : null;
+	const attrs = hasAttrs ? safeParse( attrsMatch ) : null;
 
 	// This state isn't allowed
 	// This is an error
@@ -192,6 +192,17 @@ function nextToken() {
 	return [ 'block-opener', name, attrs, startedAt, length ];
 }
 
+function safeParse( json ) {
+	let r;
+	try {
+		r = JSON.parse( json );
+	} catch ( e ) {
+		console.error( `Input of length ${ json.length }`, json );
+		throw e;
+	}
+	return r;
+}
+
 function addFreeform( rawLength ) {
 	const length = rawLength ? rawLength : document.length - offset;

dmsnell · 2018-08-29T18:41:20Z

the tokenizer is too greedy when looking for the end of an attributes group

excellent find @mcsf! you are right - I let in a greedy match when I had no reason to! that's been taken out by the addition of the ? to make the (?!-->). group un-greedy as it should be. I'm embarrassed that I let it in but so glad you found it and added the failing tests!

un-greedy modifier added in 9c85a60

also I rebased the branch

mcsf · 2018-09-11T14:53:24Z

Concerning the requiring of the PHP implementation, #9791 needs investigating.

aduth · 2018-09-17T15:23:51Z

Potential regression noted at #9968

Resolves #9968 It was noted that a classic block preceding a void block would disappear in the editor while if that same classic block preceded the long-form non-void representation of an empty block then things would load as expected. This behavior was determined to originate in the new default parser in #8083 and the bug was that with void blocks we weren't sending any preceding HTML soup/freeform content into the output list. In this patch I've duplicated some code from the block-closing function of the parser to spit out this content when a void block is at the top-level of the document. This bug did not appear when void blocks are nested because it's the parent block that eats HTML soup. In the case of the top-level void however we were immediately pushing that void block to the output list and neglecting the freeform HTML. I've added a few tests to verify and demonstrate this behavior. Actually, since I wasn't sure what was wrong I wrote the tests first to try and understand the behaviors and bugs. There are a few tests that are thus not entirely essential but worthwhile to have in here.

* Parser (Fix): Output freeform content before void blocks Resolves #9968 It was noted that a classic block preceding a void block would disappear in the editor while if that same classic block preceded the long-form non-void representation of an empty block then things would load as expected. This behavior was determined to originate in the new default parser in #8083 and the bug was that with void blocks we weren't sending any preceding HTML soup/freeform content into the output list. In this patch I've duplicated some code from the block-closing function of the parser to spit out this content when a void block is at the top-level of the document. This bug did not appear when void blocks are nested because it's the parent block that eats HTML soup. In the case of the top-level void however we were immediately pushing that void block to the output list and neglecting the freeform HTML. I've added a few tests to verify and demonstrate this behavior. Actually, since I wasn't sure what was wrong I wrote the tests first to try and understand the behaviors and bugs. There are a few tests that are thus not entirely essential but worthwhile to have in here.

mcsf

I hadn't realized this before — as my primary testing interface was the WP API (gist), through which everything is serialized into the same shape — but I now fear that we're not providing a consistent interface with the parser in its current state.

See my inline comments. Consumers of gutenberg_parse_blocks may make mistakes because of these discrepancies, and I fear they may already have: #10041.

cc @dmsnell

mcsf · 2018-09-19T20:57:54Z

packages/block-serialization-default-parser/parser.php

+
+        if ( isset( $stack_top->leading_html_start ) ) {
+            $this->output[] = array(
+                'attrs' => array(),


(copy-pasting a comment that I added in the more recent #9984) In this same file I'm seeing conflicting shapes for attrs:

'attrs' => array(), // here 'attrs' => new stdClass(), // in `add_freeform`

good call - I know there are some lingering inconsistencies too around null vs. {} in the spec grammar. a good follow-up PR that's been on my TODO list

mcsf · 2018-09-19T21:27:27Z

packages/block-serialization-default-parser/parser.php

+	 * @since 3.8.0
+	 * @var WP_Block_Parser_Block[]
+	 */
+    public $output;


I'm concerned about this promise that $output is an array of WP_Block_Parser_Block, since freeform fragments are added as [associative] arrays and not class instances.

we can definitely consider wiping the output clean of its classes - I didn't at first because it seemed benign to retain them, but if we sacrifice a little performance we can json_decode( json_encode( $output ) ) and clear it up

@jorgefilipecosta mentioned implementing an ArrayObject interface in our classes so that one can traverse our parser output natively, rather than doing the JSON dance. What do you think?

That's a good question. It means more divide between the PHP and JS versions of the parser. What's the JSON dance? Wouldn't having ArrayObject be somewhat superfluous?

// this already works with arrays and objects! $blocks = parse( $document ); $blocks = array_map( $blocks, $my_transformer );

we probably want to fix the bug as a separate thing from adding interfaces. I'm skeptical of the value of the latter if the former is resolved.

By JSON dance I meant json_decode( json_encode( $output ) ), sorry for not being clear.

we probably want to fix the bug as a separate thing from adding interfaces

So this is the actual issue: #10047. It's not the traversal (looking at your array_map example) but rather accessing properties of a block, which can either mean accessing properties of an array or of an object.

Classes offer some advantages we can publish abstract classes that contain the fields plugins can safely access, and other parsers can extend this general classes. Simple arrays don't offer this guarantees.

But now we have a problem some plugins are dependent on using simple arrays, even if this bug was already caught I'm not sure we can change the API to use classes.

So I think our options are revert back and use arrays, or advance and change our API to use classes. In the second case to be back-compatible with existing implementation accessing using the array syntax, I think our only solution is ArrayObject. It allows us to temporarily return something that behaves like a class for new implementations and an array for old implementations, in this case, we would add the deprecation messages saying we now return objects.

It's not the traversal (looking at your array_map example) but rather accessing properties of a block, which can either mean accessing properties of an array or of an object.

to me this is just evidence that the work to make all attribute reporting consistent is necessary. some attributes are null, some are objects

By JSON dance I meant json_decode( json_encode( $output ) ), sorry for not being clear.

that would be in the parser and wouldn't have to be manually performed. in fact, the classes are only even there for performance, so we can test the change of sorting everything in plain old objects vs. converting at the end. if it's a degradation then we can simply remove the classes if we want to preserve the simpler interface.

* Parser (Fix): Output freeform content before void blocks Resolves #9968 It was noted that a classic block preceding a void block would disappear in the editor while if that same classic block preceded the long-form non-void representation of an empty block then things would load as expected. This behavior was determined to originate in the new default parser in #8083 and the bug was that with void blocks we weren't sending any preceding HTML soup/freeform content into the output list. In this patch I've duplicated some code from the block-closing function of the parser to spit out this content when a void block is at the top-level of the document. This bug did not appear when void blocks are nested because it's the parent block that eats HTML soup. In the case of the top-level void however we were immediately pushing that void block to the output list and neglecting the freeform HTML. I've added a few tests to verify and demonstrate this behavior. Actually, since I wasn't sure what was wrong I wrote the tests first to try and understand the behaviors and bugs. There are a few tests that are thus not entirely essential but worthwhile to have in here.

Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).

There are numerous needs to process posts and block content from its structured form without demanding that plugin authors implement their own parsing systems. Since the new default parser was implemented in #8083 the server-side parse is now fast enough to consider doing full parses of our documents and with that brings the idea that we can filter block content from the parser itself. In this patch I'm exploring an API to allow extending the parser's behavior by post-processing blocks as they enter the parser's output array. This new filter gives the ability to transform all of the block's properties as they finish parsing. In the case of inner blocks the filter runs as the inner blocks have finished their own nesting. In the case of top-level blocks the filter runs after all inner content has finished parsing. One use case is in #8760 where we want to replace the HTML parts of blocks while preserving other structure. Another use case could be removing specific inner blocks or content based on the current user requesting a post. This filter exposes a kind of visitor pattern for the nested parse. > **THIS IS AN INCOMPLETE PATCH DO NOT MERGE**

Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).

* Parser: Normalize data types and fix default implementation Resolves #10041 Resolves #10047 A few inconsistencies have remained in the grammar specification concerning freeform blocks and blocks without attributes in the block delimiters. Freeform blocks were returned without block names and blocks without attributes returned `null` instead of an empty set of attributes. Further, the default parser implementation (from #8083) was returning an array of block objects instead of an array of generic arrays. This resulted in mismatches in PHP of accessing properties with `$block[ 'attrs' ]` syntax vs `$block->attrs` syntax. In this patch I've updatd the specification to remove all of the type ambiguity and have updated the default parser to match it. After this patch every block should be accessible as a normal array in PHP and have all properties: `blockName`, `attrs`, `innerBlocks`, and `innerHTML`. If no attributes are specified then `attrs` will be an empty set (in JavaScript `{}` and in PHP `array()`).

Previously we have been using a simplified parse to grab dynamic blocks and replace them with their rendered content. Since #8083 we've had a fast default parser which removes the need for a simplified parse here. In this patch we're replacing the existing simplified parser in `do_blocks` with the new default parser. This will open up new opportunities for working with nested blocks on the server.

Since the introduction of the default parser in #8083 we have had a subtle bug in the parsing which failed when empty attributes were specified in a block's comment delimiter - `{}` The absense of attributes was fine but _empty_ attributes were a failure. This is due to using `+?` in the RegExp tokenizer instead of using `*?` (which allows for no inner content in the JSON string). This patch updates the quantifier to restore functionality and fix the bug. This didn't appear in practice because we don't intentionally set `{}` as the attributes - the serializer drops it altogther, and our tests didn't catch it for similar reasons.

dmsnell added [Type] Enhancement A suggestion for improvement. [Status] In Progress Tracking issues with work in progress [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Jul 20, 2018

dmsnell requested review from mcsf, pento, mtias and aduth July 20, 2018 13:57

dmsnell mentioned this pull request Jul 20, 2018

Parser: Build system to compare alternative parser implementations #6831

Closed

4 tasks

dmsnell changed the title ~~Parser: Propose new hand-coded PHP parser~~ Parser: Propose new hand-coded parser Jul 21, 2018

dmsnell force-pushed the parser/rd-trampoline-php branch from dd4409a to 4191994 Compare July 23, 2018 07:21

mcsf mentioned this pull request Jul 27, 2018

Overview of Short-term Parsing Enhancements #8244

Closed

11 tasks

dmsnell mentioned this pull request Aug 9, 2018

Return inner HTML before and after inner blocks when parsing and fix … #8760

Closed

4 tasks

dmsnell force-pushed the parser/rd-trampoline-php branch 3 times, most recently from 478b27a to 24977fc Compare August 24, 2018 18:30

pento reviewed Aug 26, 2018

View reviewed changes

dmsnell force-pushed the parser/rd-trampoline-php branch from e246b11 to 7cf7971 Compare August 26, 2018 19:27

dmsnell mentioned this pull request Aug 26, 2018

WIP: Add dynamic block #6170

Closed

gziolo reviewed Aug 27, 2018

View reviewed changes

dmsnell force-pushed the parser/rd-trampoline-php branch from 6f4be14 to 07ffe45 Compare August 27, 2018 12:55

gziolo reviewed Aug 27, 2018

View reviewed changes

mcsf reviewed Aug 28, 2018

View reviewed changes

mcsf force-pushed the parser/rd-trampoline-php branch from a2dae1e to c154286 Compare August 29, 2018 10:05

dmsnell force-pushed the parser/rd-trampoline-php branch from c154286 to 138614d Compare August 29, 2018 17:50

aduth mentioned this pull request Sep 11, 2018

Build Tooling: Include block serialization default parser in plugin #9799

Merged

jrmd mentioned this pull request Sep 17, 2018

Parsing Issues with classic editor and php rendered blocks. #9968

Closed

dmsnell mentioned this pull request Sep 18, 2018

Parser (Fix): Output freeform content before void blocks #9984

Merged

4 tasks

mcsf mentioned this pull request Sep 18, 2018

gutenberg_parse_blocks( $post->post_content ) takes 135 ms to execute #7337

Closed

mcsf reviewed Sep 19, 2018

View reviewed changes

raquelmsmith mentioned this pull request Sep 20, 2018

Fatal error: Uncaught Error: Cannot use object of type WP_Block_Parser_Block as array #10047

Closed

dmsnell mentioned this pull request Sep 22, 2018

Parser: Normalize data types and fix default implementation #10107

Merged

4 tasks

dmsnell mentioned this pull request Sep 22, 2018

Block API: Add pre_render and post_render block filters #10108

Closed

mtias mentioned this pull request Oct 8, 2018

Gutenberg still parsing HTML with regular expressions #5967

Closed

This was referenced Oct 10, 2018

Parser: Replace dynamic-block regex in do_blocks #10463

Closed

Parser: Tabs or spaces? This never should have gotten in #10379

Merged

mcsf mentioned this pull request Oct 12, 2018

Parser: synchronous → asynchronous execution #7970

Closed

dmsnell mentioned this pull request Nov 9, 2018

Parser: Allow empty attributes in default parsers #11690

Closed

4 tasks

mcsf mentioned this pull request Nov 19, 2018

Out of memory in parser #3799

Closed

mcsf mentioned this pull request May 30, 2023

Consider removing "Block Grammar" documentation page #51067

Closed

		@@ -0,0 +1,107 @@
		# Block Serialization Default Parser

		This library contains the default block serialization parser implementations for

Parser: Propose new hand-coded parser #8083

Parser: Propose new hand-coded parser #8083

Conversation

dmsnell commented Jul 20, 2018 • edited

Distinctives

Todo

Benchmark

dmsnell commented Jul 20, 2018 • edited

pento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell Aug 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell Aug 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gziolo commented Aug 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcsf commented Aug 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcsf commented Aug 29, 2018

dmsnell commented Aug 29, 2018

mcsf commented Sep 11, 2018

aduth commented Sep 17, 2018

mcsf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcsf Sep 20, 2018 • edited

Choose a reason for hiding this comment

jorgefilipecosta Sep 20, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Jul 20, 2018 •

edited

dmsnell commented Jul 20, 2018 •

edited

dmsnell Aug 26, 2018 •

edited

dmsnell Aug 26, 2018 •

edited

mcsf Sep 20, 2018 •

edited

jorgefilipecosta Sep 20, 2018 •

edited