WP_HTML_Block: keep innerHTML as a string #2

mcsf · 2016-12-20T23:25:07Z

HTML blocks are essentially unidentified, unsupported, or simply DIY blocks. As such, we don't want to parse their contents. Instead, we want to preserve the contents verbatim.

<!-- wp:html -->
<div class="custom-stuff">
  <canvas></canvas>
  <p>Look, Ma, canvas!</p>
</div>
<!-- /wp -->

yields

{
  "type": "WP_Block",
  "blockType": "html",
  "innerHTML": "\n<div class=\"custom-stuff\">\n  <canvas></canvas>\n  <p>Look, Ma, canvas!</p>\n</div>\n"
}

HTML blocks are essentially unidentified, unsupported, or simply DIY blocks. As such, we don't want to parse their contents. Instead, we want to preserve the contents verbatim.

dmsnell · 2016-12-20T23:48:47Z

@mcsf I'm not sure how I feel about this approach which ends up doing more application-level parsing. It makes me nervous about feature creep and bloat, though I'm not diametrically opposed to it.

The parser (or maybe we should more aptly call it a lexer at this point) already handles html blocks and provides an output that's easy to process.

{
	type: 'block',
	blockType: 'html',
	value: '…'
}

mtias · 2016-12-21T15:53:12Z

@dmsnell isn't the current parser trying to break the html block into children, etc?

      {
        "type": "HTML_Tag",
        "name": "div",
        "attrs": {
          "class": "custom-stuff"
        },
        "startText": "<div class=\"custom-stuff\">",
        "endText": "</canvas>",
        "children": [
          {
            "type": "Text",
            "value": "\n  "
          },
          {
            "type": "HTML_Tag_Open",
            "name": "canvas",
            "attrs": {},
            "text": "<canvas>"
          }
        ]
      }

I was thinking we would parse HTML blocks as a single string value.

dmsnell · 2016-12-21T16:41:29Z

@mtias yeah we're parsing inside of the blocks and allowing for nested blocks. In Slack @mcsf and I discussed something I did with the Simplenote parser which I plan to add here, which is a rawContent property on each node containing the fully-contained string inside the entity. This would provide for us both a nested parse and quick access to the plain-text string.

nb · 2016-12-21T17:22:12Z

Whether we pre-parse it very much depends on what’s the plan to do with the HTML fragment. Is it used for the ultimate source-of-truth? Is it used as a cache for the rendered component? Or sometimes one, sometimes the other? Do we need to parse it back to structured data? What should the experience of a developer working writing a block be?

dmsnell · 2016-12-21T19:58:21Z

@nb good question. personally I like to consider the stored post content as the serialized form of the data, which happens to use a bulky and displayable syntax - HTML.

at this point I wouldn't see much reason why we couldn't also enforce a certain wellformedness in that HTML to make things smooth with the experience. the developer's job would be to guarantee that nothing is funny about the HTML he or she chooses to serialize.

still, I think that having the structure is more valuable than having the raw text. if I were writing some block I would prefer to check something like block.children.containsBlock( 'caption' ) than to do more parsing client-side. in other words, I would hope that by the time this hits the "editor" of the block then the HTML would vanish and only the data would be leftover

nb · 2016-12-22T09:10:55Z

at this point I wouldn't see much reason why we couldn't also enforce a certain wellformedness

What happens if somebody edits the HTML by hand?

still, I think that having the structure is more valuable than having the raw text. if I were writing some block I would prefer to check something like block.children.containsBlock( 'caption' )

Does this happen on the server-side?

personally I like to consider the stored post content as the serialized form of the data

This decision has huge implications on both developer and user flows. If the HTML fragment is not the ultimate source of truth what happens if a user in a legacy editor changes it? Do we reject their changes, do we warn them, do we disallow editing in a block-enabled editor?

mcsf · 2016-12-29T09:10:54Z

Does this happen on the server-side?

When would it have to? If all the changes to a post are persisted by way of generating HTML from the blocks node tree and saving that HTML to post_content, the front-end would work as it always has (at least with static blocks). What other scenario would require server-side intervention? (The whole point of the block-aware editor is that it is the one responsible for the HTML, thus client-side.)

This decision has huge implications on both developer and user flows. If the HTML fragment is not the ultimate source of truth what happens if a user in a legacy editor changes it? Do we reject their changes, do we warn them, do we disallow editing in a block-enabled editor?

That was the main decision from the start: the HTML definitely is the source of truth, for all sorts of compatibility (back-, forward-) reasons, to properly and minimally degrade.

nb · 2016-12-29T14:31:27Z

That was the main decision from the start: the HTML definitely is the source of truth, for all sorts of compatibility (back-, forward-) reasons, to properly and minimally degrade.

Here’s a challenge – is there a way to retain most of the user properties without sacrificing developer experience?

If we think of a component as data → view, with data is the source of truth, making view the source of truth is this case is an odd choice. Now, a block (component) has few more moving parts:

data → view
view → data – here things get super interesting. Historically, parsing HTML has been a tough job. Even if we assume (wrongly) that it will be well-formed, we all saw how developers’ aversion to XML and love of simpler formats like JSON changed the API/config landscapes overnight.

For me, a developer writing the code for a new block, writing a view → data mapper seems a lot of hard and error-prone work involving too much skill and thinking. Especially if I am writing something more complicated than blockquote, for example a contact form. Even with the blockquote+author example, I have to consider so many options – the user adding some HTML before or after the block markup within the borders of the block HTML comment, the user changing the author span to a div, changing everything altogether, while preserving the text. Is it my responsibility to cover all of the cases? Should I try to recover the data out of any markup?

dmsnell · 2017-04-07T10:51:49Z

Closing since this is old and I'm not sure it's relevant anymore… feel free to reopen

WP_HTML_Block: keep innerHTML as a string

904b147

HTML blocks are essentially unidentified, unsupported, or simply DIY blocks. As such, we don't want to parse their contents. Instead, we want to preserve the contents verbatim.

mcsf requested a review from dmsnell December 20, 2016 23:25

dmsnell closed this Apr 7, 2017

dmsnell deleted the add/wp-html-block branch April 7, 2017 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WP_HTML_Block: keep innerHTML as a string #2

WP_HTML_Block: keep innerHTML as a string #2

mcsf commented Dec 20, 2016

dmsnell commented Dec 20, 2016

mtias commented Dec 21, 2016 •

edited

Loading

dmsnell commented Dec 21, 2016

nb commented Dec 21, 2016

dmsnell commented Dec 21, 2016

nb commented Dec 22, 2016

mcsf commented Dec 29, 2016 •

edited

Loading

nb commented Dec 29, 2016

dmsnell commented Apr 7, 2017

WP_HTML_Block: keep innerHTML as a string #2

WP_HTML_Block: keep innerHTML as a string #2

Conversation

mcsf commented Dec 20, 2016

dmsnell commented Dec 20, 2016

mtias commented Dec 21, 2016 • edited Loading

dmsnell commented Dec 21, 2016

nb commented Dec 21, 2016

dmsnell commented Dec 21, 2016

nb commented Dec 22, 2016

mcsf commented Dec 29, 2016 • edited Loading

nb commented Dec 29, 2016

dmsnell commented Apr 7, 2017

mtias commented Dec 21, 2016 •

edited

Loading

mcsf commented Dec 29, 2016 •

edited

Loading