Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codeblock upcast converter prevents from loading markers inside codeblock #9402

Closed
scofalik opened this issue Apr 1, 2021 · 1 comment · Fixed by #9440
Closed

Codeblock upcast converter prevents from loading markers inside codeblock #9402

scofalik opened this issue Apr 1, 2021 · 1 comment · Fixed by #9440
Assignees
Labels
package:code-block type:bug This issue reports a buggy (incorrect) behavior.

Comments

@scofalik
Copy link
Contributor

scofalik commented Apr 1, 2021

📝 Provide detailed reproduction steps (if any)

  1. Open an editor with comments or track changes feature.
  2. Create a code block.
  3. Create a comment or suggestion inside the code block.
  4. Save & load the editor data (e.g. call editor.setData( editor.getData() ) ).

✔️ Expected result

Markers are retained.

❌ Actual result

Markers are gone.

📃 Other details

Additionally, it is impossible to upcast any "general" attributes that are set on <code> element /cc @Mgsy.

Both problem happen because on upcast, converter for <pre> takes care of everything and it consumes <code> element and does not fire conversion for its children. It is different than in image, where there is a converter for <figure> but it fires conversion for <img> and <img> itself has its own converter.

Additionally, the code block content is also converted in a custom way, to remove HTML tags. This is another blocker that prevents proper marker upcasting.

AFAIU (not tested), HTML coming from outside of the editor, which does not have <pre> will not be properly upcasted as well.

The solution here is to:

  • Provide converter for <code> that will mostly do what converter for <pre> is doing at the moment. Converter for <pre> should actually do almost nothing - see converter for <figure> in image.
  • Do not provide special conversion for text. If you want to "remove HTML", simply do that on model level with Schema. Actually, AFAICS, this is already done like that so I don't know why we have additional cleaning in the upcast converter. Maybe it is not needed?
@scofalik scofalik added type:bug This issue reports a buggy (incorrect) behavior. package:code-block labels Apr 1, 2021
@Mgsy Mgsy added domain:dx This issue reports a developer experience problem or possible improvement. squad:dx and removed domain:dx This issue reports a developer experience problem or possible improvement. squad:dx labels Apr 1, 2021
@scofalik
Copy link
Contributor Author

scofalik commented Apr 1, 2021

I did some quick checks in regards to this issue as it becomes PITA for me. For tl;dr skip to the bolded sentence at the end.

Below, I describe two ideas, from which the final idea emerges. I leave the original ideas for entertainment and learning purposes, and also because this whole subject needs a little more research and manual tests (and coding), so I think it is good to have the whole thought process written down.


The following snippet enables "back" the default text conversion (and conversion of other children, such as markers):

This fixes my problems (i.e. markers now can be set in the middle of the code block) and also makes the code simpler.

Setting the following data works correctly (there are no attributes on text):

editor.setData( '<pre><code class="language-plaintext">Foo<strong>bar</strong>baz<a href="#">xyz</a></code></pre>' );

However, this scenario from tests fail:

editor.setData( '<pre><code><p>Foo</p>\n<p>Bar</p></code></pre>' );

Because paragraphs break <codeBlock> element. So, if this is a real case we have a problem and we need to be smarter here. Surely, we cannot simply omit all the tags. This won't work correctly for markers and maybe other features that we introduce in the future (HTML comment?)

First idea is to try converting item-by-item in a different context (where everything's allowed) and then get text data and markers from it. Or, even better, get all the items that are allowed in the <codeBlock> element (more future-proof). Here's a quick snippet:

const holder = conversionApi.writer.createElement( '$clipboardHolder' );
conversionApi.convertChildren( viewChild, holder );

Array.from( conversionApi.writer.createRangeIn( holder ).getItems() ).forEach( item => {
    if ( item.is( '$textProxy' ) || item.is( 'element', '$marker' ) ) {
        writer.insert( item, codeBlock, 'end' );
    }
} );

With that snippet, the following worked as expected:

editor.setData( '<pre><code class="language-plaintext"><p>A<suggestion-start name="deletion:e7fed60586612bd21533e2cb5b8bcd4c8:u1"></suggestion-start>B</p>C<suggestion-end name="deletion:e7fed60586612bd21533e2cb5b8bcd4c8:u1"></suggestion-end>D</code></pre>' )

Three more notes here.

First, as I wrote, using schema might be more future-proof.

Second, if there is something like:

<pre><code><allowed-in-code>Foo</allowed-in-code>Bar</code></pre>

It won't work correctly (<allowed-in-code> will be inserted but then Foo text will be inserted too).

Third, there might be actually attributes on text which are allowed everywhere, even in code block (real case). So some smart attributes stripping would be needed.

The other idea was to set codeBlock as a limit object in schema. With this setting, elements and markers inside <code> worked correctly too (with just using conversionApi.convertChildren( viewChild, codeBlock );) but other problem appeared - double enter at the end of the code block would convert the whole code block into paragraphs instead of escaping it. I am not sure why this happened and what other problems may be caused by setting isLimit: true for codeBlock. This needs further research, although at least it would prevent splitting code blocks by not-allowed elements without introducing new code.

But then I decided to get the best of both worlds.

Instead of using $clipboardHolder as the conversion context, let's use the codeBlock element. But, wait, we did that and <p> splits it... Well, let's convert the children before adding the codeBlock to the "result" (before safeInsert()). This way the code block element will be the root of its tree and it will not be possible to split it. It will work like limit element.

So, we still have a proper conversion, based on default converters and on schema. All elements and attributes that are not allowed should be removed. Most importantly, allowed elements and attributes (on text too) should be left. The final solution at this point is:

-               // HTML elements are invalid content for `<code>`.
-               // Read only text nodes.
-               const textData = [ ...editingView.createRangeIn( viewChild ) ]
-                       .filter( current => current.type === 'text' )
-                       .map( ( { item } ) => item.data )
-                       .join( '' );
-               const fragment = rawSnippetTextToModelDocumentFragment( writer, textData );
-
-               writer.append( fragment, codeBlock );
+               conversionApi.convertChildren( viewChild, codeBlock );

Of course, this will not work correctly in such edge case:

<X><pre><code><Y>Foo</Y></code></pre></X>

If <Y> is allowed in <code> but not allowed in <X> (<Y> will not be removed). I think this is quite an edge case. If we want to care about that, I guess we need to research "idea 2" -- setting is limit on code element.

Last two notes:

  1. I removed rawSnippetTextToModelDocumentFragment() which is mostly responsible for changing \n to <softBreak> element. But we can post-process it after conversion, it will be relatively simple.
  2. This post only concerns converting children of <code> element - please remember that there is another problem - conversion of the <code> element itself. It should have its own converter and it should be possible to convert its attributes (e.g. markers are sometimes represented as attributes and they are removed now).

@maxbarnas maxbarnas assigned maxbarnas and unassigned maxbarnas Apr 8, 2021
scofalik added a commit that referenced this issue Apr 13, 2021
Fix (code-block): Markers created in/on code block element are now preserved after the document is loaded. Closes #9402.
@scofalik scofalik added this to the iteration 42 milestone Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:code-block type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants