Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WP_HTML_Block: keep innerHTML as a string #2

Closed
wants to merge 1 commit into from
Closed

Conversation

mcsf
Copy link
Member

@mcsf mcsf commented Dec 20, 2016

HTML blocks are essentially unidentified, unsupported, or simply DIY blocks. As such, we don't want to parse their contents. Instead, we want to preserve the contents verbatim.

<!-- wp:html -->
<div class="custom-stuff">
  <canvas></canvas>
  <p>Look, Ma, canvas!</p>
</div>
<!-- /wp -->

yields

{
  "type": "WP_Block",
  "blockType": "html",
  "innerHTML": "\n<div class=\"custom-stuff\">\n  <canvas></canvas>\n  <p>Look, Ma, canvas!</p>\n</div>\n"
}

HTML blocks are essentially unidentified, unsupported, or simply DIY
blocks. As such, we don't want to parse their contents. Instead, we want
to preserve the contents verbatim.
@mcsf mcsf requested a review from dmsnell December 20, 2016 23:25
@dmsnell
Copy link
Member

dmsnell commented Dec 20, 2016

@mcsf I'm not sure how I feel about this approach which ends up doing more application-level parsing. It makes me nervous about feature creep and bloat, though I'm not diametrically opposed to it.

The parser (or maybe we should more aptly call it a lexer at this point) already handles html blocks and provides an output that's easy to process.

{
	type: 'block',
	blockType: 'html',
	value: '…'
}

@mtias
Copy link
Member

mtias commented Dec 21, 2016

@dmsnell isn't the current parser trying to break the html block into children, etc?

      {
        "type": "HTML_Tag",
        "name": "div",
        "attrs": {
          "class": "custom-stuff"
        },
        "startText": "<div class=\"custom-stuff\">",
        "endText": "</canvas>",
        "children": [
          {
            "type": "Text",
            "value": "\n  "
          },
          {
            "type": "HTML_Tag_Open",
            "name": "canvas",
            "attrs": {},
            "text": "<canvas>"
          }
        ]
      }

I was thinking we would parse HTML blocks as a single string value.

@dmsnell
Copy link
Member

dmsnell commented Dec 21, 2016

@mtias yeah we're parsing inside of the blocks and allowing for nested blocks. In Slack @mcsf and I discussed something I did with the Simplenote parser which I plan to add here, which is a rawContent property on each node containing the fully-contained string inside the entity. This would provide for us both a nested parse and quick access to the plain-text string.

@nb
Copy link
Member

nb commented Dec 21, 2016

Whether we pre-parse it very much depends on what’s the plan to do with the HTML fragment. Is it used for the ultimate source-of-truth? Is it used as a cache for the rendered component? Or sometimes one, sometimes the other? Do we need to parse it back to structured data? What should the experience of a developer working writing a block be?

@dmsnell
Copy link
Member

dmsnell commented Dec 21, 2016

@nb good question. personally I like to consider the stored post content as the serialized form of the data, which happens to use a bulky and displayable syntax - HTML.

at this point I wouldn't see much reason why we couldn't also enforce a certain wellformedness in that HTML to make things smooth with the experience. the developer's job would be to guarantee that nothing is funny about the HTML he or she chooses to serialize.

still, I think that having the structure is more valuable than having the raw text. if I were writing some block I would prefer to check something like block.children.containsBlock( 'caption' ) than to do more parsing client-side. in other words, I would hope that by the time this hits the "editor" of the block then the HTML would vanish and only the data would be leftover

@nb
Copy link
Member

nb commented Dec 22, 2016

at this point I wouldn't see much reason why we couldn't also enforce a certain wellformedness

What happens if somebody edits the HTML by hand?

still, I think that having the structure is more valuable than having the raw text. if I were writing some block I would prefer to check something like block.children.containsBlock( 'caption' )

Does this happen on the server-side?

personally I like to consider the stored post content as the serialized form of the data

This decision has huge implications on both developer and user flows. If the HTML fragment is not the ultimate source of truth what happens if a user in a legacy editor changes it? Do we reject their changes, do we warn them, do we disallow editing in a block-enabled editor?

@mcsf
Copy link
Member Author

mcsf commented Dec 29, 2016

Does this happen on the server-side?

When would it have to? If all the changes to a post are persisted by way of generating HTML from the blocks node tree and saving that HTML to post_content, the front-end would work as it always has (at least with static blocks). What other scenario would require server-side intervention? (The whole point of the block-aware editor is that it is the one responsible for the HTML, thus client-side.)

This decision has huge implications on both developer and user flows. If the HTML fragment is not the ultimate source of truth what happens if a user in a legacy editor changes it? Do we reject their changes, do we warn them, do we disallow editing in a block-enabled editor?

That was the main decision from the start: the HTML definitely is the source of truth, for all sorts of compatibility (back-, forward-) reasons, to properly and minimally degrade.

@nb
Copy link
Member

nb commented Dec 29, 2016

That was the main decision from the start: the HTML definitely is the source of truth, for all sorts of compatibility (back-, forward-) reasons, to properly and minimally degrade.

Here’s a challenge – is there a way to retain most of the user properties without sacrificing developer experience?

If we think of a component as data → view, with data is the source of truth, making view the source of truth is this case is an odd choice. Now, a block (component) has few more moving parts:

  • data → view
  • view → data – here things get super interesting. Historically, parsing HTML has been a tough job. Even if we assume (wrongly) that it will be well-formed, we all saw how developers’ aversion to XML and love of simpler formats like JSON changed the API/config landscapes overnight.

For me, a developer writing the code for a new block, writing a view → data mapper seems a lot of hard and error-prone work involving too much skill and thinking. Especially if I am writing something more complicated than blockquote, for example a contact form. Even with the blockquote+author example, I have to consider so many options – the user adding some HTML before or after the block markup within the borders of the block HTML comment, the user changing the author span to a div, changing everything altogether, while preserving the text. Is it my responsibility to cover all of the cases? Should I try to recover the data out of any markup?

@dmsnell
Copy link
Member

dmsnell commented Apr 7, 2017

Closing since this is old and I'm not sure it's relevant anymore… feel free to reopen

@dmsnell dmsnell closed this Apr 7, 2017
@dmsnell dmsnell deleted the add/wp-html-block branch April 7, 2017 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants