Content initializes incorrectly with specific init HTML #1580

Mgsy · 2019-02-27T13:52:25Z

Is this a bug report or feature request? (choose one)

🐞 Bug report

💻 Version of CKEditor

Latest master.

📋 Steps to reproduce

Prepare the HTML document with the following content:

<div id="editor">
        <blockquote>
           <ul>
              <li>list Item</li>
           </ul>
        </blockquote>
        <p>paragraph</p>
</div>

Initialize the editor.

✅ Expected result

The paragraph is outside the blockquote.

❎ Actual result

The paragraph is inside the blockquote.

📃 Other details that might be useful

JSFiddle - https://jsfiddle.net/7n2oaurg/

It loads properly if the init HTML is wrapped within div:

<div id="editor">
        <blockquote>
           <div>
              <ul>
                 <li>list Item</li>
              </ul>
           </div>
        </blockquote>
        <p>paragraph</p>
</div>

The text was updated successfully, but these errors were encountered:

scofalik · 2019-02-27T13:55:53Z

Can you check if this is a regression?

Mgsy · 2019-02-27T14:05:01Z

It's a regression and it doesn't occur in v11.1.0.

Reinmar · 2019-02-27T15:30:12Z

Simpler scenario:

<blockquote>
    foo
</blockquote>
bar

Or:

<blockquote>
    foo
</blockquote>
<p>bar</p>

In both cases "bar" gets merged into the blockquote.

This is a regression introduced in v11.2.0. You can reproduce it on the current docs website:

scofalik · 2019-03-20T16:05:09Z

Quick summary: the original problem does not occur anymore. It was connected with lists converter and (incorrect) data.modelRange set in that converter. This was already fixed.

There is still a problem with scenarios added by @Reinmar. Without debugging, I am 90% sure it is because wrong data.modelRange in autoparagraphing (I think I was already signaling this problem when I was working on lists converter). We need to take care that data.modelRange is set properly in autoparagraph converter and we should check the default converters too.

BTW.:

<blockquote>
    <div>Foo</div>
</blockquote>
<p>Bar</p>

Produces incorrect editor data too (this is the other "mode" of autoparagraphing - changing unrecognized elements into s).

scofalik · 2019-03-20T16:05:22Z

I think we can close #1581 as DUP.

scofalik · 2019-03-22T10:19:52Z

Below are the results of my research on this subject. As I said earlier, the problem is with data.modelCursor during upcasting. data.modelRange is easier to manage. Below, when I write data I mean the object with modelCursor and modelRange properties.

First of all, some facts:

In engine functions, data is generated and returned by _convertItem and _convertChildren. The use is generating some starting values and passing around values received from converters.
We do not have strict rules on how modelCursor and modelRange should be exactly set, other than:
- modelCursor - next converted item should be inserted there,
- modelRange - here is the converted element (might have been split by its children).
Some converters are using modelCursor though in a way which assumes that modelCursor is set in a certain way. This is the default converter (from helpers) and also the autoparagraph converter.

I've made some quick research, I took <a><c></c></a> and made six scenarios checking where modelCursor should be placed for b and c converters, assuming that:

b splits / does not split a,
c splits / does not split b,
c splits / does not split a.

The thing is straightforward: an average converter should always set modelCursor in the original parent, after the converted element. If the original parent was split by the converted element, modelCursor should be at the beginning of the split part.

This is also intuitive but having such a straight rule also hinders some flexibility. It was also implemented like this... kind of. The implementation is not straightforward, functions are scattered between multiple files and you need to think about converters people might add. But most importantly, generating modelCursor in the current implementation assumes that all other converters follow the same rule.

Enter auto-paragraphing.

Auto-paragraphing uses modelCursor to keep adding auto-paragraphed content to the same paragraph. It looks like this:

<blockquote>
    foo <strong>bar</strong>
</blockquote>

When foo is converted, it cannot be inserted into <blockQuote> so <paragraph> is created and foo is inserted there. Then bar is converted. We don't want to end up with two paragraphs, we want both those text nodes to be inserted into one paragraph:

<blockQuote>
    <paragraph>
        foo <$text bold=true>bar</$text>
    </paragraph>
</blockQuote>

(Instead of <paragraph>foo</paragraph><paragraph><$text bold=true>bar</$text></paragraph>.)

To achieve this, auto-paragraphing sets modelCursor inside the new <paragraph>. But this is against the rules! The rule was that modelCursor should be set in the parent of the converted children, which is <blockQuote>, not <paragraph>.

Because of that, this part of code in the default upcast converter fails:

const childrenResult = conversionApi.convertChildren( data.viewItem, conversionApi.writer.createPositionAt( modelElement, 0 ) );

data.modelRange = new ModelRange(
	conversionApi.writer.createPositionBefore( modelElement ),
	conversionApi.writer.createPositionAfter( childrenResult.modelCursor.parent )
);

if ( splitResult.cursorParent ) {
	data.modelCursor = conversionApi.writer.createPositionAt( splitResult.cursorParent, 0 );
} else {
	data.modelCursor = data.modelRange.end;
}

As you can see it relies on the fact that modelCursor should be correctly placed:

conversionApi.writer.createPositionAfter( childrenResult.modelCursor.parent )

So we have a problem because we cannot have both solutions together implemented as they are.

My first idea was to simply use LivePosition to follow where the modelCursor should be set, instead of relying on what is returned by convertChildren(). Unfortunately, it appeared to me that LivePositions cannot be used during upcast because the conversion takes place in a detached model.DocumentFragment.

So my other idea was to store "offset from the end of the parent" and the parent node and simulate LivePosition this way. Unfortunately, it also failed because sometimes a child of the converted element breaks the parent of the converted element:

<blockquote>
    <p>
        foo
        <img src="abc.jpg" />
        bar
    </p>
</blockQuote>

In this scenario, I cannot simply remember <blockQuote> when <paragraph> children are converted, because it gets split and the reference is no longer valid. There's a test for that in the image plugin that failed after I implemented this one.

So, mid-conclusion is this:

I need something that updates automatically if changes are done in a different place of code (different converter). Something like bookmark element would do, but it is stiiiinky, or
I need some more data coming from other converters along with modelCursor and modelRange, or
I need to rely on modelCursor.

At the moment I fixed the issue in the auto-paragraphing. I set modelCursor according to the rules. In order to convert all texts into one paragraph, I store paragraphs created during auto-paragraphing and if a text is converted next to such a paragraph, instead of creating a paragraph, I use the existing one. It seems to work.

However, I am not sure about this solution. I don't like it that I have such a strict rule on modelCursor setting, even though it is correct in 99% of cases. It gives away the flexibility. So, for me, this is an open discussion at this point.

cc @pjasiun @Reinmar @mlewand

scofalik · 2019-03-22T12:08:20Z

After a lengthy discussion with @pjasiun here's what we decided on. I think we agreed on a solution with the best advantages/drawbacks ratio :).

We like the flexibility of data.modelCursor and don't want to make it fixed by a rule.
We agree that we are missing data in upcasting. In an element converter, after its children are converted, we don't get any data if and how the element was split. We can only use heuristics and/or rely on data.modelRange or data.modelCursor. Exact data would be helpful.

The first idea was to introduce data.splitParentCopy = splitResult.cursorParent || null;. This would be easy to introduce. A converter would set it whenever conversionApi.splitToAllowedParent() is used and the parent could read it to set modelCursor and modelRange properly. This idea had two disadvantages:

you need to remember to set it (and it could be automated as it is always the same value),
there's no information if the parent was split multiple times (foo<img />bar<img />baz).

So instead we decided to integrate splitting more tightly in upcasting and UpcastDispatcher:

The information about split element and copies will be stored when you use conversionApi.splitToAllowedParent().
We will introduce conversionApi.getSplitParts( element ) which will return the original element and all the split parts.
The information will be cleared after conversion is done.

This way there's less to set in data object and we have access to all the split parts.

At first, I thought that we won't need the "middle" split parts and we will always use the last one but then I remembered the problems with data.modelRange. I remembered that I proposed that data.modelRange could be an array with multiple ranges in case if the original element was split. That would have helped us to more precisely operate on the converted and split element. With conversionApi.getSplitParts() this problem will be also solved.

scofalik · 2019-03-22T12:27:59Z

Related: https://github.com/ckeditor/ckeditor5-engine/issues/1672, https://github.com/ckeditor/ckeditor5-engine/issues/1662

Feature: Introduced `UpcastConversionApi#getSplitParts`. Also, provided a way to set upcast conversion helper fired for every view element. Closes ckeditor/ckeditor5#1580. Closes ckeditor/ckeditor5#1581.

Mgsy added type:bug This issue reports a buggy (incorrect) behavior. status:confirmed labels Feb 27, 2019

scofalik self-assigned this Feb 27, 2019

Mgsy mentioned this issue Feb 27, 2019

Content is incorrect after pasting specific HTML #1581

Closed

Reinmar added this to the iteration 23 milestone Feb 27, 2019

Reinmar added the must-have label Mar 7, 2019

pjasiun closed this as completed in ckeditor/ckeditor5-engine#1704 Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content initializes incorrectly with specific init HTML #1580

Content initializes incorrectly with specific init HTML #1580

Mgsy commented Feb 27, 2019 •

edited

scofalik commented Feb 27, 2019

Mgsy commented Feb 27, 2019

Reinmar commented Feb 27, 2019

scofalik commented Mar 20, 2019

scofalik commented Mar 20, 2019

scofalik commented Mar 22, 2019 •

edited

scofalik commented Mar 22, 2019 •

edited

scofalik commented Mar 22, 2019

Content initializes incorrectly with specific init HTML #1580

Content initializes incorrectly with specific init HTML #1580

Comments

Mgsy commented Feb 27, 2019 • edited

Is this a bug report or feature request? (choose one)

💻 Version of CKEditor

📋 Steps to reproduce

✅ Expected result

❎ Actual result

📃 Other details that might be useful

scofalik commented Feb 27, 2019

Mgsy commented Feb 27, 2019

Reinmar commented Feb 27, 2019

scofalik commented Mar 20, 2019

scofalik commented Mar 20, 2019

scofalik commented Mar 22, 2019 • edited

scofalik commented Mar 22, 2019 • edited

scofalik commented Mar 22, 2019

Mgsy commented Feb 27, 2019 •

edited

scofalik commented Mar 22, 2019 •

edited

scofalik commented Mar 22, 2019 •

edited