Autoparagraphing strips all text attributes #3294

Reinmar · 2016-12-19T16:34:15Z

This test won't pass currently:

	it( 'should autoparagraph text with attributes', () => {
		doc.schema.allow( { name: '$inline', attributes: [ 'bold' ] } );
		buildViewConverter().for( editor.data.viewToModel )
			.fromElement( 'b' )
			.toAttribute( 'bold', true );

		const modelFragment = editor.data.parse( 'foo<b>bar</b>bom' );

		expect( stringifyModel( modelFragment ) )
			.to.equal( '<paragraph>foo<$text bold="true">bar</$text>bom</paragraph>' );
	} );

The reason is that the converter for the <b> element will receive a <paragraph> created automatically for the text inside it. Then, most likely, it will check that that paragraph can't have bold attribute and abort.

Perhaps a reasonable solution would be if the callback created by buildViewConverter used something like https://github.com/ckeditor/ckeditor5-core/blob/master/src/command/helpers/getschemavalidranges.js (actually – there's a ticket to move this method to the engine – https://github.com/ckeditor/ckeditor5-core/issues/14 – so it can be exactly this function) and apply the attribute to every item in the ranges. It may be a reasonable solution taken cases like pasting HTML such as:

<strong>
  <p>x<img>y</p>
</strong>

Thanks to that deep "application" all text nodes inside the paragraph would get the bold attribute. I emphasised "text nodes" because the image's attribute should be handled by its own converter.

The text was updated successfully, but these errors were encountered:

scofalik · 2017-04-13T11:15:05Z

Perhaps a reasonable solution would be if the callback created by buildViewConverter used something like https://github.com/ckeditor/ckeditor5-core/blob/master/src/command/helpers/getschemavalidranges.js

I am not convinced. If converter gets input that it should convert attribute A on element E it should do exactly this. Not only text nodes may have attributes after all. I know that attribute A will be probably valid only on certain elements but this sounds like asking for trouble. It looks like too broad and deep solution for this problem.

<strong>
 <p>x<img>y</p>
</strong>
Thanks to that deep "application" all text nodes inside the paragraph would get the bold attribute. I emphasised "text nodes" because the image's attribute should be handled by its own converter.

And now you ask converter to be able to convert incorrect model (or rather deal with the situation of incorrect HTML)?

My opinion is that converter should do exactly what it is asked to do and it should get correct model to convert. Everything else should be fixed before.

Edit:
I'd rather make a fixer that checks whether inline element (<strong>) contains container element (<p>) and if so, remove inline element and make it a parent of every child element in container element. Do this deeply as long as structure is incorrect. This should preferably happen on view, during conversion from DOM (or as a callback in appropriate place, I am not sure exactly where)

<strong>
  <div>
    abc
   <p>
      foo
      <img />
      bar
    </p>
    xyz
  </div>
</strong>

<div>
  <strong>abc</strong>
  <p>
    <strong>foo</strong>
    <strong><img /></strong> <!-- bold will not be applied if not allowed in schema !-->
    <strong>bar</strong>
  </p>
  <strong>xyz</strong>
</div>

I think it's reasonable that when taking external DOM, we should strive to generate view structure similar to view structure that would be generated by our model->view conversion.

Anyway, this is not related to original issue. We should create separate issue for this, but I am not sure where - whether this is more connected with DataController or clipboard?

Edit2:
This could be done on viewCleanup event of ViewConversionDispatcher if a better place will not be find.

scofalik · 2017-04-13T13:03:30Z

Back to the issue - kinda...

I have a new idea for autoparagraphing. It seems conceptually wrong that the auto-paragraph is added exactly in place of autoparagraphed text, as its direct parent. It feels like we should first check where the paragraph should be placed.

Here is proposed solution:

On text conversion, check if the text could be allowed in given context if it was in paragraph (same as now).
If so, convert the text, but don't add a new paragraph.
(Optionally) If last item in data.context is a model.Element mark it.
On each element conversion, after element is converted (so we have data.output) perform* autoparagraphing of its children - find all text nodes and put them in paragraphs (btw. note that we can get rid of merging autoparagraphs).
On each document fragment conversion, after it is converted, perform autoparagraphing of its children.

* This has to be done conditionally, of course. If we were marking elements on step 3, autoparagraph only if element was marked, so we won't, for example, put <paragraph> in <heading>. If we weren't marking, check if text is allowed in directly in this element. If not, autoparagraph. Both approaches have their merits, but we should be fine without step 3.

WDYT?

scofalik · 2017-04-13T14:30:04Z

It seems that above solution would also solve https://github.com/ckeditor/ckeditor5-paragraph/issues/11. However we would have to somehow recognize which elements can be autoparagraphed (besides text). If we know that this kind of can be autoparagraphed we can treat it like a text node (in steps 1. and 2.) and then also autoparagraph in step 4. and 5.

Reinmar · 2017-04-13T15:17:23Z

I have a new idea for autoparagraphing. It seems conceptually wrong that the auto-paragraph is added exactly in place of autoparagraphed text, as its direct parent. It feels like we should first check where the paragraph should be placed.

Well, it may help if I explain the story behind this. First, I've been only thinking on what to do with a text which parent cannot be converted. So, e.g. someone pastes <h1>x</h1> but we don't support h1, so, somehow we had the idea to focus on handling this in the text. And you're of course right that it didn't make much sense and this issue is result of that wrong decision.

However, during the course of action, I realised that we can't actually focus on the text anyway because <h1>x<img>y</h1> would result in two paragraphs and perhaps some image between them.

So, I added the second mechanism which focuses on block elements. Currently, I think that it handles most of the job when pasting because, usually, you have the content somehow wrapped within some blocks.

Do we need the first algorithm then? Perhaps not really. The situations which the disallowed blocks handler wouldn't catch are:

When someone pastes <p>x</p>y – but for that insertContent() can autoparagraph too... and, besides, text is allowed in $clipboardHolder so I don't know if this code is actually used in this case anyway.
When someone pastes y – but again – text is allowed in $clipboardHolder, so this isn't needed here. And insertContent() can, if needed, autoparagraph.
When someone loads <p>x</p>y or just y into the editor by setData() – but this means loading incorrect data which wasn't created with the editor. So, we could ignore this situation for now.
When someone loads empty data to the editor.

So, correct me if I'm wrong, but perhaps all this is just unnecessary?

scofalik · 2017-04-13T15:24:20Z

Spoiler alert, since you commented: I have a feeling that multiple issues connected with handling incorrect data exists because there wasn't any global idea how the process should look like, instead there are just fixes for single problems. I am commenting here and will also comment in other issues but after that we need a talk about all of this :).

scofalik · 2017-04-13T15:32:02Z

☝️

Don't be sad, you should best know that it's rarely possible to cover all the bases with the first implementation :).

scofalik · 2017-04-14T08:31:44Z

Do we need the first algorithm then? Perhaps not really.

I can check if this helps in current implementation, because I also have a feeling that the first (current) algorithm might be not needed for now.

But current implementation seems wrong anyway, because unrecognized elements are treated as <p> for all of their children.

BTW. Now as I think of it, actually, we don't need to use such complicated algorithm. Maybe we can skip step 1, 2 and 3, improving steps 4 and 5?

scofalik · 2017-04-14T09:28:44Z

I've checked it. As expected, setData() does not work correctly. Pasting was checked for:

<div> - not registered, paragraph-like element:

<div>a<b>b</b>c</div>
<div>a<b>b</b>c</div>

which resulted in two <paragraph> and correct attributes.

<section> - not registered, not-paragraph-like element:

<section>a<b>b</b>c</section>
<section>a<b>b</b>c</section>

Which resulted in one <paragraph> and proper attributes.

Reinmar · 2017-04-15T10:43:38Z

I've checked it. As expected, setData() does not work correctly.

I don't understand. Below you described exactly what would be an expected result for me. So it does or doesn't work correctly?

scofalik · 2017-04-15T16:59:59Z

That was for insertContent() (pasting) - not setData.

Reinmar · 2017-04-18T09:36:16Z

That was for insertContent() (pasting) - not setData.

You wrote it was for setData() yourself :P

Both ways – how the div or section elements are handled can be configured. We can make section works like a div if we see that <section> is often used without block content inside (which I doubt).

Still, I feel I don't understand your comment.

Reinmar · 2017-04-18T09:48:49Z

But current implementation seems wrong anyway, because unrecognized elements are treated as
for all of their children.

I'm not sure that you know how the algorithm works now. It bases on a list of paragraph-like elements. If element doesn't have its own converter and it is one of the paragraph-like elements, then it's converted to a pargraph. However, if it has block content inside (other paragraph-like elements), then it's skipped (that's to handle nested structures – e.g. nested lists).

So, the algorithm picks lowest paragraph-like elements and handle them. I don't understand how you'd like to improve it.

scofalik · 2017-04-19T15:15:49Z

You wrote it was for setData() yourself :P

I wrote:

I've checked it. As expected, setData() does not work correctly. Pasting was checked for:

And I meant that setData is incorrect AND here are results for pasting (which are correct). Should have made it in two paragraphs :P.

I used <section> on purpose to use an existing HTML tag (which is not recognized in any way by editor now). I know we can configure it :).

I'm not sure that you know how the algorithm works now.

I have a feeling that I do, but OTOH there are multiple places that touches autoparagraphing.

What I mean is that when you have:

<paragraphLike>
    foo
    <img />
    bar
</paragraphLike>

Now the converter will recognise that this is a paragraph-like element and will convert all of it's children in paragraph context, right? It creates paragraph element in model and want to jam everything into it. The question is whether this is a correct approach, maybe more correct would be to only convert text nodes in paragraph context but not other elements.

Anyway I don't want to upgrade it just for sake of upgrading. I mentioned that because I wanted to change it more drastically, however I realised that it may not need such a big change.

BTW:

So, the algorithm picks lowest paragraph-like elements and handle them. I don't understand how you'd like to improve it.

How this will be converted?

<div>foo<div>bar</div>xyz</div>

(I haven't checked it but maybe you know the answer from top of your head :)).

Reinmar · 2017-04-19T17:24:34Z

The question is whether this is a correct approach, maybe more correct would be to only convert text nodes in paragraph context but not other elements

Why? Right now it makes a lot of sense that any block which is similar to paragraph but which cannot be handled tries to become a paragraph.

If this cannot be pasted:

<div>a<img>b</div>

Let's try pasting this:

<p>a<img>b</p>

Then we go deeper and e.g. the image feature may handle the situation where <img> is not allowed in <paragraph>. But this is a next step and should be independent of the previous decision.

Reinmar · 2017-04-19T17:30:57Z

How this will be converted?

<div>foo<div>bar</div>xyz</div>

It's similar to:

		it( 'pastes ul>li>h2+h3+p as h2+h3+p when heading feature is present', () => {
			return VirtualTestEditor.create( {
					plugins: [ Paragraph, Clipboard, HeadingEngine ]
				} )
				.then( newEditor => {
					const editor = newEditor;
					const doc = editor.document;
					const clipboard = editor.plugins.get( 'clipboard/clipboard' );

					setModelData( doc, '<paragraph>[]</paragraph>' );

					clipboard.fire( 'inputTransformation', {
						content: parseView( '<ul><li>x</li><li><h2>foo</h2><h3>bar</h3><p>bom</p></li><li>x</li></ul>' )
					} );

					expect( getModelData( doc ) ).to.equal(
						'<paragraph>x</paragraph>' +
						'<heading1>foo</heading1><heading2>bar</heading2><paragraph>bom</paragraph>' +
						'<paragraph>x[]</paragraph>'
					);
				} );
		} );

So, the outer paragraph-like elements are ignored if hasParagraphLikeContent() returns true. It scans the entire contents of an element for other paragraph-like ones. This way, only the deepest ones are handled.

Reinmar · 2017-04-19T17:36:27Z

It's similar to:

No, it's not. I guess you meant the "foo" and "xyz" text nodes.

So, I guess first we'll have foo<p>bar</p>xyz and then... there's a chance that autoping text will handle this and turn it into: <p>foo</p><p>bar</p><p>xyz</p>.

OK, I can see that it's kinda like this:

		it( 'pastes ul>li>ul>li+li', () => {
			return VirtualTestEditor.create( {
					plugins: [ Paragraph, Clipboard ]
				} )
				.then( newEditor => {
					const editor = newEditor;
					const doc = editor.document;
					const clipboard = editor.plugins.get( 'clipboard/clipboard' );

					setModelData( doc, '<paragraph>[]</paragraph>' );

					clipboard.fire( 'inputTransformation', {
						content: parseView( '<ul><li>a<ul><li>b</li><li>c</li></ul></li></ul>' )
					} );

					expect( getModelData( doc ) ).to.equal(
						'<paragraph>a</paragraph>' +
						'<paragraph>b</paragraph>' +
						'<paragraph>c[]</paragraph>'
					);
				} );
		} );

The "a" is in a paragraph, but that's because we're pasting there. If we're talking about the conversion itself then we have two tests like this:

			it( 'should convert ul>li>p,text', () => {
				const modelFragment = editor.data.parse( '<ul><li><p>a</p>b</li></ul>' );

				expect( stringifyModel( modelFragment ) )
					.to.equal( '<paragraph>a</paragraph><paragraph>b</paragraph>' );
			} );

			// "b" is not autoparagraphed because clipboard holder allows text nodes.
			// There's a similar integrational test what's going to happen when pasting in paragraph-integration.js.
			it( 'should convert ul>li>p,text (in clipboard holder)', () => {
				const modelFragment = editor.data.parse( '<ul><li><p>a</p>b</li></ul>', '$clipboardHolder' );

				expect( stringifyModel( modelFragment ) )
					.to.equal( '<paragraph>a</paragraph>b' );
			} );

So, I'm brilliant :D And I guess this works due to autoping text. So, it may not be that unnecessary. Although, we still need to find a better way to perform that.

Fix: Content autoparagraphing has been improved. "Inline" view elements (converted to attributes or elements) will be now correctly handled and autoparagraphed. Closes #10. Closes #11.

Reinmar closed this as completed in ckeditor/ckeditor5-paragraph#23 May 3, 2017

Reinmar assigned scofalik May 3, 2017

mlewand transferred this issue from ckeditor/ckeditor5-paragraph Oct 9, 2019

mlewand added this to the iteration 10 milestone Oct 9, 2019

mlewand added type:bug This issue reports a buggy (incorrect) behavior. package:paragraph labels Oct 9, 2019

scofalik mentioned this issue Oct 11, 2019

Improved autoparagraphing ckeditor/ckeditor5-paragraph#23

Merged

Reinmar mentioned this issue Oct 11, 2019

Introduced simple autoparagraphing ckeditor/ckeditor5-paragraph#12

Merged

ansorensen mentioned this issue May 19, 2020

Enabling target attribute on anchors doesn't work on first anchor. #7242

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoparagraphing strips all text attributes #3294

Autoparagraphing strips all text attributes #3294

Reinmar commented Dec 19, 2016

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017

Reinmar commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 14, 2017

scofalik commented Apr 14, 2017 •

edited

Loading

Reinmar commented Apr 15, 2017

scofalik commented Apr 15, 2017

Reinmar commented Apr 18, 2017

Reinmar commented Apr 18, 2017

scofalik commented Apr 19, 2017 •

edited

Loading

Reinmar commented Apr 19, 2017

Reinmar commented Apr 19, 2017

Reinmar commented Apr 19, 2017

Autoparagraphing strips all text attributes #3294

Autoparagraphing strips all text attributes #3294

Comments

Reinmar commented Dec 19, 2016

scofalik commented Apr 13, 2017 • edited Loading

scofalik commented Apr 13, 2017 • edited Loading

scofalik commented Apr 13, 2017

Reinmar commented Apr 13, 2017 • edited Loading

scofalik commented Apr 13, 2017 • edited Loading

scofalik commented Apr 13, 2017 • edited Loading

scofalik commented Apr 14, 2017

scofalik commented Apr 14, 2017 • edited Loading

Reinmar commented Apr 15, 2017

scofalik commented Apr 15, 2017

Reinmar commented Apr 18, 2017

Reinmar commented Apr 18, 2017

scofalik commented Apr 19, 2017 • edited Loading

Reinmar commented Apr 19, 2017

Reinmar commented Apr 19, 2017

Reinmar commented Apr 19, 2017

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017 •

edited

Loading

Reinmar commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 13, 2017 •

edited

Loading

scofalik commented Apr 14, 2017 •

edited

Loading

scofalik commented Apr 19, 2017 •

edited

Loading