Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Propose new hand-coded parser #8083

Merged
merged 43 commits into from Sep 6, 2018
Merged

Parser: Propose new hand-coded parser #8083

merged 43 commits into from Sep 6, 2018

Conversation

@dmsnell
Copy link
Contributor

@dmsnell dmsnell commented Jul 20, 2018

For some time we've needed a more performant PHP parser for the first
stage of parsing the post_content document.

See #1681 (early exploration)
See #8044 (parser performance issue)
See #1775 (parser performance, fixed in php-pegjs)

I'm proposing this implementation of the spec parser as an alternative
to the auto-generated parser from the PEG definition.

Updates

  • This now also includes a copy of the parser in JS whose performance is also quite good.
  • The files have been moved into the /packages directory - I still need some help understanding where it all belongs and how to make the package work

This provides a setup fixture for #6831 wherein we are testing alternate
parser implementations - https://comparator-yizlfvqafz.now.sh

Distinctives

  • designed as a basic recursive-descent
  • but doesn't recurse on the call-stack, recurses via trampoline
  • moves linearly through document in one pass
  • relies on RegExp for tokenization

Note I expect us to discover implementation bugs during the initial rollout of this parser. We have run it through our document library and unit tests but real posts are surely getting into more complicated constructions. We can deal with these as they come but we should expect these.

Todo

  • nested blocks include the nested content in their innerHTML
    this needs to go away
  • create test fixture - https://comparator-yizlfvqafz.now.sh
  • figure out where to save this file
  • phpunit tests

Benchmark

For posterity's sake I ran the merged parser through the parser comparator and compared it against the auto-generated spec parser. Here are the results from my laptop

                                    ms                        MB    
                                Spec  Default  Speedup    Spec  Default  Comparison
demo-post.html                    29.58   0.23   130     38.56   16.43     43%
early-adopting-the-future.html   263.83   1.01   262     36.84   17.10     46%
moby-dick-parsed.html           5012.13  11.55   434     75.41   25.18     33%
pygmalian-raw-html.html          330.35   0.24  1366    116.72   16.90     14%
redesigning-chrome-desktop.html  211.42   1.22   173     37.22   16.51     44%
shortcode-shortcomings.html       71.28   0.36   198     34.07   16.98     50%
web-at-maximum-fps.html          161.35   0.87   186     33.12   16.32     49%

The tests were done on my late 2013 rMBP quad core 2.6 GHz laptop. According to the Intel Power Gadget the CPU was running at 3.6 GHz the entire time. Each document was parsed with each parser at least 47 times and the runs were at random and each run was randomly chosen to parse the document between one and five times in a row before returning the results. Runtime and memory use were measured inside a runner script running in Docker as described in the parser comparator.

@dmsnell
Copy link
Contributor Author

@dmsnell dmsnell commented Jul 20, 2018

I'm pretty sure that the next steps from here involve pondering the data structure of the stack. We have enough working knowledge now to know what we need to track and how we can pop that from the stack to the output.

Done

@dmsnell dmsnell changed the title Parser: Propose new hand-coded PHP parser Parser: Propose new hand-coded parser Jul 21, 2018
@dmsnell dmsnell force-pushed the parser/rd-trampoline-php branch from dd4409a to 4191994 Jul 23, 2018
@mcsf mcsf mentioned this pull request Jul 27, 2018
5 of 11 tasks complete
@dmsnell dmsnell force-pushed the parser/rd-trampoline-php branch 3 times, most recently from 478b27a to 24977fc Aug 23, 2018
Copy link
Member

@pento pento left a comment

Noice! Let's get this in sooner rather than later, so we can make inroads on the things depending on having a faster parser. 🙂

I've left some comments, here are a few random notes that have occurred to me, as well:

  • It feels a little weird to be putting the PHP parser on NPM, but we don't really use Packagist at all, sooo... 🤷‍♂️ Let's stick with NPM for now, we can potentially explore doing Packagist/composer things later.
  • phpcs.xml.dist needs to be updated to scan the new PHP code. I mentioned a couple of coding standards issues in the comments, but PHPCS should pick up the rest.
  • Combined with switching the parser in gutenberg_parse_blocks(), phpunit/class-parsing-test.php should be updated to use gutenberg_parse_blocks(), rather than Gutenberg_PEG_Parser.

With this performance improvement, it seems like we could change do_blocks() to parse the content, instead of using the dynamic blocks regex.

@@ -0,0 +1,107 @@
# Block Serialization Default Parser

This library contains the default block serialization parser implementations for

This comment has been minimized.

@pento

pento Aug 26, 2018
Member

You'll need to remove the manual line breaks from the README: we use the Jetpack Markdown parser, which adds a <br/> for single line breaks.

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

this makes me want to cry since it's something I love about markdown and consistent among every other markdown parser I've used.

The implication of the “one or more consecutive lines of text” rule is that Markdown supports “hard-wrapped” text paragraphs. This differs significantly from most other text-to-HTML formatters (including Movable Type’s “Convert Line Breaks” option) which translate every line break character in a paragraph into a <br /> tag.

When you do want to insert a <br /> break tag using Markdown, you end a line with two or more spaces, then type return.

Yes, this takes a tad more effort to create a <br />, but a simplistic “every line break is a <br />” rule wouldn’t work for Markdown. Markdown’s email-style blockquoting and multi-paragraph list items work best — and look better — when you format them with hard breaks.
https://daringfireball.net/projects/markdown/syntax#p

nonetheless, I have destroyed my markdown to make it happy in ee72314

😢

@@ -0,0 +1,260 @@
<?php

function bsdp_parse($document ) {

This comment has been minimized.

@pento

pento Aug 26, 2018
Member

Instead of adding a new _parse() function, can gutenberg_parse_blocks() be updated to use the new parser? We can add a filter in there for easier switching between classes: eg, existing filters in Core that filter a Class name: wp_rest_server_class, customize_dynamic_setting_class.

block_parser_class works for me.

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

see related comment response below.

I'm having some trouble understanding what you wrote @pento. I hope we create a filter to select the parsing function but won't that depend somewhat on having unique names for each possible parse functions?

also, are wp_rest_server_class and customize_dynamic_setting_class anyway related here? are you suggesting we create a class interface for the block parser class?

in lib/block.php I had originally envisioned something like this…

$parser = apply_filter( 'block_parser_class', 'bsdp_parse' );
call_user_func( $parser, $post_content );

I guess you are recommending this instead?

$parser_class = apply_filter( 'block_parser_class', 'bsdp' );
$parser = new $parser_class();
$parser->parse( $post_content );

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

experimented in 064efa5 but I haven't tested it yet

for what it's worth I'd be more comfortable getting this parser in first before making the parser system pluggable just because of the scope of the changes

static $parser;

if ( ! isset( $parser ) ) {
$parser = new BSDP_Parser();

This comment has been minimized.

@pento

pento Aug 26, 2018
Member

I'm not wild about the BSDP_ prefix. I get why it's there, but perhaps it could be a little more descriptive?

This comment has been minimized.

@mtias

mtias Aug 26, 2018
Contributor

Agreed. Block_Parser()?

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

mainly this is there to prevent namespace collisions. my hope is that a few PRs after this we'll have a filter choose the parser and obviously if we create two or more Block_Parser() classes we'll run into conflicts.

any thoughts on that? even with an encapsulating class we run into some issues here because I don't think we can create a class within a class. the only way around it otherwise I think is actual namespacing which isn't supported on older PHP version…

This comment has been minimized.

@pento

pento Sep 3, 2018
Member

Realistically, is there going to be a completely new parser appear between now and 5.0? It seems like this parser is going to be the one that will go into Core.

If that's the case, we should just use a generic name. WP_Block_Parser will fit into the WordPress naming scheme.


switch ( $token_type ) {
case 'no-more-tokens':
# if not in a block then flush output

This comment has been minimized.

@pento

pento Aug 26, 2018
Member

Need to use // for single inline comments.

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

double-slashed it in ee72314

return false;
}

# Otherwise we have a problem

This comment has been minimized.

@pento

pento Aug 26, 2018
Member

Block inline comments should be in the form:

/*
 * blah
 *
 * - foo
 * - bar
 */

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

exploded comments in ee72314

# Block Serialization Default Parser

This library contains the default block serialization parser implementations for
WordPress documents. It provides native PHP and Javascript parsers that implement

This comment has been minimized.

@pento

pento Aug 26, 2018
Member

s/Javascript/JavaScript/ 🙂

This comment has been minimized.

@dmsnell

dmsnell Aug 26, 2018
Author Contributor

substituted in ee72314

@dmsnell dmsnell force-pushed the parser/rd-trampoline-php branch from e246b11 to 7cf7971 Aug 26, 2018
@dmsnell dmsnell mentioned this pull request Aug 26, 2018
@@ -0,0 +1,25 @@
{
"name": "@wordpress/block-serialization-default-parser",
"version": "1.0.0",

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

I would put 1.0.0-rc.0 or something like that to allow Lerna to do its job - it always bumps version so it would try to do 1.0.1 release otherwise ...

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

campaigned for release in 8c7e42c

@@ -88,6 +88,7 @@ const gutenbergPackages = [
'autop',
'blob',
'blocks',
'block-serialization-default-parser',
'block-serialization-spec-parser',

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

Can we stop bundling the other one if we don't use it in Gutenberg anymore?

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

a good question. I don't want to kill the PEG parser since that maintains the spec in a way no hand-written implementation can.

in my comparator PRs I'm trying to move towards a system that will automatically run the implementations against the specification in something like a CI job so that we can have our formal specification without worrying about the implementation diverging (for example, if someone makes a change to the implementation without changing the spec first)

that is, I think we want to keep the spec-parser wherever we need it - mainly I think we want to strip it from the default load of Gutenberg but whether we build it, what do you think?

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

The package with transpiled code is going to be there anyway. It's really up to you and how you want to use it. If you are fine with referencing it as a regular npm package then you don't need it. If you want to consume it as part of e2e test or something which requires all Gutenberg build files then you can leave it as is. I just wanted to raise the awareness.

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

thanks - this is mainly just out of my expertise at this point. if you are willing to make a decision on it or can tell me what we should do then that would help me out.

it seems like several people want these parser tests to be written with jest and somehow in the normal suite - I don't know what that means here for this decision

@@ -369,6 +376,7 @@ function gutenberg_register_scripts_and_styles() {
array(
'wp-autop',
'wp-blob',
'wp-block-serialization-default-parser',
'wp-block-serialization-spec-parser',

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

I think we no longer need to list wp-block-serialization-spec-parser as a dependency. In addition, we should stop registering it, too.

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

agreed on this one but I wasn't entirely sure how we wanted this to work…

do we want Gutenberg to automatically replace the spec parser with the "default" one at boot through a filter or do we want the "default" to be the default?

I want the auto-generated parser to be available still, especially for things like diagnostics and exploration.

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

As commented above, it all depends on the way you want to use it. I don't have any strong opinions about it. We should just ensure we don't ship unused code to the end users.

This comment has been minimized.

@mcsf

mcsf Aug 30, 2018
Contributor

Do we have a decision here?

This comment has been minimized.

@dmsnell

dmsnell Aug 30, 2018
Author Contributor

I left the spec parser registered but un-enqueued it in 66455b4

*
* @param string $parser_class Name of block parser class
*/
$parser_class = apply_filters( 'block_parser_class', 'BDSP_Parser' );

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

We should document it in the extensibility docs. Probably, the main document would be the best fit: https://github.com/WordPress/gutenberg/blob/master/docs/extensibility.md.

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

documented in 8c7e42c

This comment has been minimized.

@mcsf

mcsf Aug 30, 2018
Contributor

This still reads BDSP. :)

This comment has been minimized.

@dmsnell

dmsnell Aug 30, 2018
Author Contributor

another great catch - fixed in 66455b4

@@ -378,6 +378,6 @@ const createParse = ( parseImplementation ) =>
*
* @return {Array} Block list.
*/
export const parseWithGrammar = createParse( grammarParse );
export const parseWithGrammar = createParse( defaultParse );

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

Should we offer a filter for JS implementation, too?

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

yes but I wasn't sure if this PR was the right one for it. that is, filtering out the PHP side seemed somewhat straightforward while filtering the JS side seemed more complicated since we have to take into account things like loading the parser bundles and making sure they are available before the editor loads

do you think we need to do it all here in this PR?

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

It's totally fine as its own PR, I just wanted to ensure we tackle both PHP and JS side of things.

@dmsnell dmsnell force-pushed the parser/rd-trampoline-php branch from 6f4be14 to 07ffe45 Aug 27, 2018
return 'EmptyParser';
}
add_filter( 'block_parser_class', select_empty_parser, 10, 1 );

This comment has been minimized.

@gziolo

gziolo Aug 27, 2018
Member

I think we provide the name of the function as a string in other examples to ensure it works with PHP 5.2. We might also want to prefix the function name with the plugin name:

add_filter( 'block_parser_class', `my_plugin_select_empty_parser`, 10, 1 );

This comment has been minimized.

@dmsnell

dmsnell Aug 27, 2018
Author Contributor

good catch! I never meant to leave out the string - just neglected it - updated in 96ecfb8

@gziolo
Copy link
Member

@gziolo gziolo commented Aug 27, 2018

8c7e42c looks great, I left one comment which is a tiny thing that affects only PHP 5.2...

}

function bdsp_select_parser( $prev_parse_class ) {
return 'BSDP_Parser';

This comment has been minimized.

@mcsf

mcsf Aug 28, 2018
Contributor

There's a typo at BSDP. Anyway, given that the apply_filters call in gutenberg_parse_blocks defaults to 'BDSP_Parser', we should remove this bit.

This comment has been minimized.

@dmsnell

dmsnell Aug 29, 2018
Author Contributor

good catch! Is removed the function in 9c85a60

@mcsf
Copy link
Contributor

@mcsf mcsf commented Aug 28, 2018

I'm getting a tokenization bug while testing with a personal post. Digging…

const namespace = namespaceMatch || 'core/';
const name = namespace + nameMatch;
const hasAttrs = !! attrsMatch;
const attrs = hasAttrs ? JSON.parse( attrsMatch ) : null;

This comment has been minimized.

@mcsf

mcsf Aug 28, 2018
Contributor

I know there's a performance hit with try, but we should play it safe with JSON.parse, or generally speaking make sure we can inform the user of bad input and recover (e.g. isolate bad blocks) as best as possible. Thoughts, @dmsnell?

This comment has been minimized.

@aduth

aduth Aug 28, 2018
Member

This comment has been minimized.

@dmsnell

dmsnell Aug 29, 2018
Author Contributor

added the try in 9c85a60 but left it out of the PHP since in PHP it already returns null on a failed parse

@mcsf
Copy link
Contributor

@mcsf mcsf commented Aug 29, 2018

@dmsnell: I've pushed a failing test for the parser. The gist of it is that I think the tokenizer is too greedy when looking for the end of an attributes group ({"some":"json"}). Thus, a document with two self-closing attribute-equipped blocks, not necessarily consecutive, breaks the parser:

<!-- wp:block {"ref":313} /-->
<!-- wp:block {"ref":482} /-->

This makes the parser throw a syntax error in the JSON.parse call:

SyntaxError: Unexpected token / in JSON at position 19

We should guarantee handling of any bad JSON here, but that's not the real issue. The issue is in the tokenizer, as the following fragment was returned as a match for attrsMatch:

{\"ref\":313} /--><!-- wp:block {\"ref\":482}

Note that, in contrast, the following input is correctly parsed:

<!-- wp:block {"ref":313} -->
<!-- /wp:block -->
<!-- wp:block {"ref":482} /-->

I used the following debugger patch:

diff --git a/packages/block-serialization-default-parser/src/index.js b/packages/block-serialization-default-parser/src/index.js
index 9c1983f22..007edd2b5 100644
--- a/packages/block-serialization-default-parser/src/index.js
+++ b/packages/block-serialization-default-parser/src/index.js
@@ -172,7 +172,7 @@ function nextToken() {
 	const namespace = namespaceMatch || 'core/';
 	const name = namespace + nameMatch;
 	const hasAttrs = !! attrsMatch;
-	const attrs = hasAttrs ? JSON.parse( attrsMatch ) : null;
+	const attrs = hasAttrs ? safeParse( attrsMatch ) : null;
 
 	// This state isn't allowed
 	// This is an error
@@ -192,6 +192,17 @@ function nextToken() {
 	return [ 'block-opener', name, attrs, startedAt, length ];
 }
 
+function safeParse( json ) {
+	let r;
+	try {
+		r = JSON.parse( json );
+	} catch ( e ) {
+		console.error( `Input of length ${ json.length }`, json );
+		throw e;
+	}
+	return r;
+}
+
 function addFreeform( rawLength ) {
 	const length = rawLength ? rawLength : document.length - offset;
@mcsf mcsf force-pushed the parser/rd-trampoline-php branch from a2dae1e to c154286 Aug 29, 2018
@dmsnell dmsnell force-pushed the parser/rd-trampoline-php branch from c154286 to 138614d Aug 29, 2018
@dmsnell
Copy link
Contributor Author

@dmsnell dmsnell commented Aug 29, 2018

the tokenizer is too greedy when looking for the end of an attributes group

excellent find @mcsf! you are right - I let in a greedy match when I had no reason to! that's been taken out by the addition of the ? to make the (?!-->). group un-greedy as it should be. I'm embarrassed that I let it in but so glad you found it and added the failing tests!

un-greedy modifier added in 9c85a60

also I rebased the branch

@mcsf
Copy link
Contributor

@mcsf mcsf commented Sep 11, 2018

Thanks, everyone, for seeing this through!

@mcsf
Copy link
Contributor

@mcsf mcsf commented Sep 11, 2018

Concerning the requiring of the PHP implementation, #9791 needs investigating.

@aduth
Copy link
Member

@aduth aduth commented Sep 17, 2018

Potential regression noted at #9968

dmsnell added a commit that referenced this pull request Sep 18, 2018
Resolves #9968

It was noted that a classic block preceding a void block would
disappear in the editor while if that same classic block preceded
the long-form non-void representation of an empty block then things
would load as expected.

This behavior was determined to originate in the new default parser
in #8083 and the bug was that with void blocks we weren't sending
any preceding HTML soup/freeform content into the output list.

In this patch I've duplicated some code from the block-closing
function of the parser to spit out this content when a void block
is at the top-level of the document.

This bug did not appear when void blocks are nested because it's
the parent block that eats HTML soup. In the case of the top-level
void however we were immediately pushing that void block to the
output list and neglecting the freeform HTML.

I've added a few tests to verify and demonstrate this behavior.
Actually, since I wasn't sure what was wrong I wrote the tests first
to try and understand the behaviors and bugs. There are a few tests
that are thus not entirely essential but worthwhile to have in here.
@dmsnell dmsnell mentioned this pull request Sep 18, 2018
4 of 4 tasks complete
dmsnell added a commit that referenced this pull request Sep 18, 2018
* Parser (Fix): Output freeform content before void blocks

Resolves #9968

It was noted that a classic block preceding a void block would
disappear in the editor while if that same classic block preceded
the long-form non-void representation of an empty block then things
would load as expected.

This behavior was determined to originate in the new default parser
in #8083 and the bug was that with void blocks we weren't sending
any preceding HTML soup/freeform content into the output list.

In this patch I've duplicated some code from the block-closing
function of the parser to spit out this content when a void block
is at the top-level of the document.

This bug did not appear when void blocks are nested because it's
the parent block that eats HTML soup. In the case of the top-level
void however we were immediately pushing that void block to the
output list and neglecting the freeform HTML.

I've added a few tests to verify and demonstrate this behavior.
Actually, since I wasn't sure what was wrong I wrote the tests first
to try and understand the behaviors and bugs. There are a few tests
that are thus not entirely essential but worthwhile to have in here.
Copy link
Contributor

@mcsf mcsf left a comment

I hadn't realized this before — as my primary testing interface was the WP API (gist), through which everything is serialized into the same shape — but I now fear that we're not providing a consistent interface with the parser in its current state.

See my inline comments. Consumers of gutenberg_parse_blocks may make mistakes because of these discrepancies, and I fear they may already have: #10041.

cc @dmsnell


if ( isset( $stack_top->leading_html_start ) ) {
$this->output[] = array(
'attrs' => array(),

This comment has been minimized.

@mcsf

mcsf Sep 19, 2018
Contributor

(copy-pasting a comment that I added in the more recent #9984) In this same file I'm seeing conflicting shapes for attrs:

'attrs' => array(), // here
'attrs' => new stdClass(), // in `add_freeform`

This comment has been minimized.

@dmsnell

dmsnell Sep 19, 2018
Author Contributor

good call - I know there are some lingering inconsistencies too around null vs. {} in the spec grammar. a good follow-up PR that's been on my TODO list

* @since 3.8.0
* @var WP_Block_Parser_Block[]
*/
public $output;

This comment has been minimized.

@mcsf

mcsf Sep 19, 2018
Contributor

I'm concerned about this promise that $output is an array of WP_Block_Parser_Block, since freeform fragments are added as [associative] arrays and not class instances.

This comment has been minimized.

@dmsnell

dmsnell Sep 19, 2018
Author Contributor

we can definitely consider wiping the output clean of its classes - I didn't at first because it seemed benign to retain them, but if we sacrifice a little performance we can json_decode( json_encode( $output ) ) and clear it up

This comment has been minimized.

@mcsf

mcsf Sep 20, 2018
Contributor

@jorgefilipecosta mentioned implementing an ArrayObject interface in our classes so that one can traverse our parser output natively, rather than doing the JSON dance. What do you think?

This comment has been minimized.

@dmsnell

dmsnell Sep 20, 2018
Author Contributor

That's a good question. It means more divide between the PHP and JS versions of the parser. What's the JSON dance? Wouldn't having ArrayObject be somewhat superfluous?

// this already works with arrays and objects!
$blocks = parse( $document );
$blocks = array_map( $blocks, $my_transformer );

we probably want to fix the bug as a separate thing from adding interfaces. I'm skeptical of the value of the latter if the former is resolved.

This comment has been minimized.

@mcsf

mcsf Sep 20, 2018
Contributor

By JSON dance I meant json_decode( json_encode( $output ) ), sorry for not being clear.

we probably want to fix the bug as a separate thing from adding interfaces

So this is the actual issue: #10047. It's not the traversal (looking at your array_map example) but rather accessing properties of a block, which can either mean accessing properties of an array or of an object.

This comment has been minimized.

@jorgefilipecosta

jorgefilipecosta Sep 20, 2018
Member

Classes offer some advantages we can publish abstract classes that contain the fields plugins can safely access, and other parsers can extend this general classes. Simple arrays don't offer this guarantees.

But now we have a problem some plugins are dependent on using simple arrays, even if this bug was already caught I'm not sure we can change the API to use classes.

So I think our options are revert back and use arrays, or advance and change our API to use classes. In the second case to be back-compatible with existing implementation accessing using the array syntax, I think our only solution is ArrayObject. It allows us to temporarily return something that behaves like a class for new implementations and an array for old implementations, in this case, we would add the deprecation messages saying we now return objects.

This comment has been minimized.

@dmsnell

dmsnell Sep 20, 2018
Author Contributor

It's not the traversal (looking at your array_map example) but rather accessing properties of a block, which can either mean accessing properties of an array or of an object.

to me this is just evidence that the work to make all attribute reporting consistent is necessary. some attributes are null, some are objects

By JSON dance I meant json_decode( json_encode( $output ) ), sorry for not being clear.

that would be in the parser and wouldn't have to be manually performed. in fact, the classes are only even there for performance, so we can test the change of sorting everything in plain old objects vs. converting at the end. if it's a degradation then we can simply remove the classes if we want to preserve the simpler interface.

mcsf pushed a commit that referenced this pull request Sep 21, 2018
* Parser (Fix): Output freeform content before void blocks

Resolves #9968

It was noted that a classic block preceding a void block would
disappear in the editor while if that same classic block preceded
the long-form non-void representation of an empty block then things
would load as expected.

This behavior was determined to originate in the new default parser
in #8083 and the bug was that with void blocks we weren't sending
any preceding HTML soup/freeform content into the output list.

In this patch I've duplicated some code from the block-closing
function of the parser to spit out this content when a void block
is at the top-level of the document.

This bug did not appear when void blocks are nested because it's
the parent block that eats HTML soup. In the case of the top-level
void however we were immediately pushing that void block to the
output list and neglecting the freeform HTML.

I've added a few tests to verify and demonstrate this behavior.
Actually, since I wasn't sure what was wrong I wrote the tests first
to try and understand the behaviors and bugs. There are a few tests
that are thus not entirely essential but worthwhile to have in here.
dmsnell added a commit that referenced this pull request Sep 22, 2018
Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
dmsnell added a commit that referenced this pull request Sep 22, 2018
Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
dmsnell added a commit that referenced this pull request Sep 22, 2018
Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
dmsnell added a commit that referenced this pull request Sep 22, 2018
Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
dmsnell added a commit that referenced this pull request Sep 22, 2018
There are numerous needs to process posts and block content from its
structured form without demanding that plugin authors implement their
own parsing systems.

Since the new default parser was implemented in #8083 the server-side
parse is now fast enough to consider doing full parses of our documents
and with that brings the idea that we can filter block content from the
parser itself.

In this patch I'm exploring an API to allow extending the parser's
behavior by post-processing blocks as they enter the parser's output
array. This new filter gives the ability to transform all of the block's
properties as they finish parsing.

In the case of inner blocks the filter runs as the inner blocks have
finished their own nesting. In the case of top-level blocks the filter
runs after all inner content has finished parsing.

One use case is in #8760 where we want to replace the HTML parts of
blocks while preserving other structure. Another use case could be
removing specific inner blocks or content based on the current user
requesting a post.

This filter exposes a kind of visitor pattern for the nested parse.

> **THIS IS AN INCOMPLETE PATCH DO NOT MERGE**
mcsf added a commit that referenced this pull request Oct 2, 2018
Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
mcsf added a commit that referenced this pull request Oct 5, 2018
Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
dmsnell added a commit that referenced this pull request Oct 6, 2018
* Parser: Normalize data types and fix default implementation

Resolves #10041
Resolves #10047

A few inconsistencies have remained in the grammar specification
concerning freeform blocks and blocks without attributes in the
block delimiters. Freeform blocks were returned without block
names and blocks without attributes returned `null` instead of
an empty set of attributes.

Further, the default parser implementation (from #8083) was
returning an array of block objects instead of an array of
generic arrays. This resulted in mismatches in PHP of accessing
properties with `$block[ 'attrs' ]` syntax vs `$block->attrs`
syntax.

In this patch I've updatd the specification to remove all of
the type ambiguity and have updated the default parser to match
it. After this patch every block should be accessible as a normal
array in PHP and have all properties: `blockName`, `attrs`,
`innerBlocks`, and `innerHTML`. If no attributes are specified
then `attrs` will be an empty set (in JavaScript `{}` and in
PHP `array()`).
dmsnell added a commit that referenced this pull request Oct 10, 2018
Previously we have been using a simplified parse to grab dynamic
blocks and replace them with their rendered content.

Since #8083 we've had a fast default parser which removes the need
for a simplified parse here.

In this patch we're replacing the existing simplified parser in
`do_blocks` with the new default parser. This will open up new
opportunities for working with nested blocks on the server.
dmsnell added a commit that referenced this pull request Nov 9, 2018
Since the introduction of the default parser in #8083 we have had a
subtle bug in the parsing which failed when empty attributes were
specified in a block's comment delimiter - `{}`

The absense of attributes was fine but _empty_ attributes were a
failure. This is due to using `+?` in the RegExp tokenizer instead of
using `*?` (which allows for no inner content in the JSON string).

This patch updates the quantifier to restore functionality and fix the
bug. This didn't appear in practice because we don't intentionally set
`{}` as the attributes - the serializer drops it altogther, and our
tests didn't catch it for similar reasons.
@dmsnell dmsnell mentioned this pull request Nov 9, 2018
4 of 4 tasks complete
@mcsf mcsf mentioned this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.