Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoparagraphing strips all text attributes #3294

Closed
Reinmar opened this issue Dec 19, 2016 · 16 comments · Fixed by ckeditor/ckeditor5-paragraph#23
Closed

Autoparagraphing strips all text attributes #3294

Reinmar opened this issue Dec 19, 2016 · 16 comments · Fixed by ckeditor/ckeditor5-paragraph#23
Assignees
Labels
package:paragraph type:bug This issue reports a buggy (incorrect) behavior.
Milestone

Comments

@Reinmar
Copy link
Member

Reinmar commented Dec 19, 2016

This test won't pass currently:

	it( 'should autoparagraph text with attributes', () => {
		doc.schema.allow( { name: '$inline', attributes: [ 'bold' ] } );
		buildViewConverter().for( editor.data.viewToModel )
			.fromElement( 'b' )
			.toAttribute( 'bold', true );

		const modelFragment = editor.data.parse( 'foo<b>bar</b>bom' );

		expect( stringifyModel( modelFragment ) )
			.to.equal( '<paragraph>foo<$text bold="true">bar</$text>bom</paragraph>' );
	} );

The reason is that the converter for the <b> element will receive a <paragraph> created automatically for the text inside it. Then, most likely, it will check that that paragraph can't have bold attribute and abort.

Perhaps a reasonable solution would be if the callback created by buildViewConverter used something like https://github.com/ckeditor/ckeditor5-core/blob/master/src/command/helpers/getschemavalidranges.js (actually – there's a ticket to move this method to the engine – https://github.com/ckeditor/ckeditor5-core/issues/14 – so it can be exactly this function) and apply the attribute to every item in the ranges. It may be a reasonable solution taken cases like pasting HTML such as:

<strong>
  <p>x<img>y</p>
</strong>

Thanks to that deep "application" all text nodes inside the paragraph would get the bold attribute. I emphasised "text nodes" because the image's attribute should be handled by its own converter.

@scofalik
Copy link
Contributor

scofalik commented Apr 13, 2017

Perhaps a reasonable solution would be if the callback created by buildViewConverter used something like https://github.com/ckeditor/ckeditor5-core/blob/master/src/command/helpers/getschemavalidranges.js

I am not convinced. If converter gets input that it should convert attribute A on element E it should do exactly this. Not only text nodes may have attributes after all. I know that attribute A will be probably valid only on certain elements but this sounds like asking for trouble. It looks like too broad and deep solution for this problem.

<strong>
 <p>x<img>y</p>
</strong>

Thanks to that deep "application" all text nodes inside the paragraph would get the bold attribute. I emphasised "text nodes" because the image's attribute should be handled by its own converter.

And now you ask converter to be able to convert incorrect model (or rather deal with the situation of incorrect HTML)?

My opinion is that converter should do exactly what it is asked to do and it should get correct model to convert. Everything else should be fixed before.

Edit:
I'd rather make a fixer that checks whether inline element (<strong>) contains container element (<p>) and if so, remove inline element and make it a parent of every child element in container element. Do this deeply as long as structure is incorrect. This should preferably happen on view, during conversion from DOM (or as a callback in appropriate place, I am not sure exactly where)

<strong>
  <div>
    abc
   <p>
      foo
      <img />
      bar
    </p>
    xyz
  </div>
</strong>
<div>
  <strong>abc</strong>
  <p>
    <strong>foo</strong>
    <strong><img /></strong> <!-- bold will not be applied if not allowed in schema !-->
    <strong>bar</strong>
  </p>
  <strong>xyz</strong>
</div>

I think it's reasonable that when taking external DOM, we should strive to generate view structure similar to view structure that would be generated by our model->view conversion.

Anyway, this is not related to original issue. We should create separate issue for this, but I am not sure where - whether this is more connected with DataController or clipboard?

Edit2:
This could be done on viewCleanup event of ViewConversionDispatcher if a better place will not be find.

@scofalik
Copy link
Contributor

scofalik commented Apr 13, 2017

Back to the issue - kinda...

I have a new idea for autoparagraphing. It seems conceptually wrong that the auto-paragraph is added exactly in place of autoparagraphed text, as its direct parent. It feels like we should first check where the paragraph should be placed.

Here is proposed solution:

  1. On text conversion, check if the text could be allowed in given context if it was in paragraph (same as now).
  2. If so, convert the text, but don't add a new paragraph.
  3. (Optionally) If last item in data.context is a model.Element mark it.
  4. On each element conversion, after element is converted (so we have data.output) perform* autoparagraphing of its children - find all text nodes and put them in paragraphs (btw. note that we can get rid of merging autoparagraphs).
  5. On each document fragment conversion, after it is converted, perform autoparagraphing of its children.

* This has to be done conditionally, of course. If we were marking elements on step 3, autoparagraph only if element was marked, so we won't, for example, put <paragraph> in <heading>. If we weren't marking, check if text is allowed in directly in this element. If not, autoparagraph. Both approaches have their merits, but we should be fine without step 3.

WDYT?

@scofalik
Copy link
Contributor

It seems that above solution would also solve https://github.com/ckeditor/ckeditor5-paragraph/issues/11. However we would have to somehow recognize which elements can be autoparagraphed (besides text). If we know that this kind of can be autoparagraphed we can treat it like a text node (in steps 1. and 2.) and then also autoparagraph in step 4. and 5.

@Reinmar
Copy link
Member Author

Reinmar commented Apr 13, 2017

I have a new idea for autoparagraphing. It seems conceptually wrong that the auto-paragraph is added exactly in place of autoparagraphed text, as its direct parent. It feels like we should first check where the paragraph should be placed.

Well, it may help if I explain the story behind this. First, I've been only thinking on what to do with a text which parent cannot be converted. So, e.g. someone pastes <h1>x</h1> but we don't support h1, so, somehow we had the idea to focus on handling this in the text. And you're of course right that it didn't make much sense and this issue is result of that wrong decision.

However, during the course of action, I realised that we can't actually focus on the text anyway because <h1>x<img>y</h1> would result in two paragraphs and perhaps some image between them.

So, I added the second mechanism which focuses on block elements. Currently, I think that it handles most of the job when pasting because, usually, you have the content somehow wrapped within some blocks.

Do we need the first algorithm then? Perhaps not really. The situations which the disallowed blocks handler wouldn't catch are:

  • When someone pastes <p>x</p>y – but for that insertContent() can autoparagraph too... and, besides, text is allowed in $clipboardHolder so I don't know if this code is actually used in this case anyway.
  • When someone pastes y – but again – text is allowed in $clipboardHolder, so this isn't needed here. And insertContent() can, if needed, autoparagraph.
  • When someone loads <p>x</p>y or just y into the editor by setData() – but this means loading incorrect data which wasn't created with the editor. So, we could ignore this situation for now.
  • When someone loads empty data to the editor.

So, correct me if I'm wrong, but perhaps all this is just unnecessary?

@scofalik
Copy link
Contributor

scofalik commented Apr 13, 2017

Spoiler alert, since you commented: I have a feeling that multiple issues connected with handling incorrect data exists because there wasn't any global idea how the process should look like, instead there are just fixes for single problems. I am commenting here and will also comment in other issues but after that we need a talk about all of this :).

@scofalik
Copy link
Contributor

scofalik commented Apr 13, 2017

☝️

Don't be sad, you should best know that it's rarely possible to cover all the bases with the first implementation :).

@scofalik
Copy link
Contributor

Do we need the first algorithm then? Perhaps not really.

I can check if this helps in current implementation, because I also have a feeling that the first (current) algorithm might be not needed for now.

But current implementation seems wrong anyway, because unrecognized elements are treated as <p> for all of their children.

BTW. Now as I think of it, actually, we don't need to use such complicated algorithm. Maybe we can skip step 1, 2 and 3, improving steps 4 and 5?

@scofalik
Copy link
Contributor

scofalik commented Apr 14, 2017

I've checked it. As expected, setData() does not work correctly. Pasting was checked for:

<div> - not registered, paragraph-like element:

<div>a<b>b</b>c</div>
<div>a<b>b</b>c</div>

which resulted in two <paragraph> and correct attributes.

<section> - not registered, not-paragraph-like element:

<section>a<b>b</b>c</section>
<section>a<b>b</b>c</section>

Which resulted in one <paragraph> and proper attributes.

@Reinmar
Copy link
Member Author

Reinmar commented Apr 15, 2017

I've checked it. As expected, setData() does not work correctly.

I don't understand. Below you described exactly what would be an expected result for me. So it does or doesn't work correctly?

@scofalik
Copy link
Contributor

That was for insertContent() (pasting) - not setData.

@Reinmar
Copy link
Member Author

Reinmar commented Apr 18, 2017

That was for insertContent() (pasting) - not setData.

You wrote it was for setData() yourself :P

Both ways – how the div or section elements are handled can be configured. We can make section works like a div if we see that <section> is often used without block content inside (which I doubt).

Still, I feel I don't understand your comment.

@Reinmar
Copy link
Member Author

Reinmar commented Apr 18, 2017

But current implementation seems wrong anyway, because unrecognized elements are treated as

for all of their children.

I'm not sure that you know how the algorithm works now. It bases on a list of paragraph-like elements. If element doesn't have its own converter and it is one of the paragraph-like elements, then it's converted to a pargraph. However, if it has block content inside (other paragraph-like elements), then it's skipped (that's to handle nested structures – e.g. nested lists).

So, the algorithm picks lowest paragraph-like elements and handle them. I don't understand how you'd like to improve it.

@scofalik
Copy link
Contributor

scofalik commented Apr 19, 2017

You wrote it was for setData() yourself :P

I wrote:

I've checked it. As expected, setData() does not work correctly. Pasting was checked for:

And I meant that setData is incorrect AND here are results for pasting (which are correct). Should have made it in two paragraphs :P.

I used <section> on purpose to use an existing HTML tag (which is not recognized in any way by editor now). I know we can configure it :).

I'm not sure that you know how the algorithm works now.

I have a feeling that I do, but OTOH there are multiple places that touches autoparagraphing.

What I mean is that when you have:

<paragraphLike>
    foo
    <img />
    bar
</paragraphLike>

Now the converter will recognise that this is a paragraph-like element and will convert all of it's children in paragraph context, right? It creates paragraph element in model and want to jam everything into it. The question is whether this is a correct approach, maybe more correct would be to only convert text nodes in paragraph context but not other elements.

Anyway I don't want to upgrade it just for sake of upgrading. I mentioned that because I wanted to change it more drastically, however I realised that it may not need such a big change.

BTW:

So, the algorithm picks lowest paragraph-like elements and handle them. I don't understand how you'd like to improve it.

How this will be converted?

<div>foo<div>bar</div>xyz</div>

(I haven't checked it but maybe you know the answer from top of your head :)).

@Reinmar
Copy link
Member Author

Reinmar commented Apr 19, 2017

The question is whether this is a correct approach, maybe more correct would be to only convert text nodes in paragraph context but not other elements

Why? Right now it makes a lot of sense that any block which is similar to paragraph but which cannot be handled tries to become a paragraph.

If this cannot be pasted:

<div>a<img>b</div>

Let's try pasting this:

<p>a<img>b</p>

Then we go deeper and e.g. the image feature may handle the situation where <img> is not allowed in <paragraph>. But this is a next step and should be independent of the previous decision.

@Reinmar
Copy link
Member Author

Reinmar commented Apr 19, 2017

How this will be converted?

<div>foo<div>bar</div>xyz</div>

It's similar to:

		it( 'pastes ul>li>h2+h3+p as h2+h3+p when heading feature is present', () => {
			return VirtualTestEditor.create( {
					plugins: [ Paragraph, Clipboard, HeadingEngine ]
				} )
				.then( newEditor => {
					const editor = newEditor;
					const doc = editor.document;
					const clipboard = editor.plugins.get( 'clipboard/clipboard' );

					setModelData( doc, '<paragraph>[]</paragraph>' );

					clipboard.fire( 'inputTransformation', {
						content: parseView( '<ul><li>x</li><li><h2>foo</h2><h3>bar</h3><p>bom</p></li><li>x</li></ul>' )
					} );

					expect( getModelData( doc ) ).to.equal(
						'<paragraph>x</paragraph>' +
						'<heading1>foo</heading1><heading2>bar</heading2><paragraph>bom</paragraph>' +
						'<paragraph>x[]</paragraph>'
					);
				} );
		} );

So, the outer paragraph-like elements are ignored if hasParagraphLikeContent() returns true. It scans the entire contents of an element for other paragraph-like ones. This way, only the deepest ones are handled.

@Reinmar
Copy link
Member Author

Reinmar commented Apr 19, 2017

It's similar to:

No, it's not. I guess you meant the "foo" and "xyz" text nodes.

So, I guess first we'll have foo<p>bar</p>xyz and then... there's a chance that autoping text will handle this and turn it into: <p>foo</p><p>bar</p><p>xyz</p>.

OK, I can see that it's kinda like this:

		it( 'pastes ul>li>ul>li+li', () => {
			return VirtualTestEditor.create( {
					plugins: [ Paragraph, Clipboard ]
				} )
				.then( newEditor => {
					const editor = newEditor;
					const doc = editor.document;
					const clipboard = editor.plugins.get( 'clipboard/clipboard' );

					setModelData( doc, '<paragraph>[]</paragraph>' );

					clipboard.fire( 'inputTransformation', {
						content: parseView( '<ul><li>a<ul><li>b</li><li>c</li></ul></li></ul>' )
					} );

					expect( getModelData( doc ) ).to.equal(
						'<paragraph>a</paragraph>' +
						'<paragraph>b</paragraph>' +
						'<paragraph>c[]</paragraph>'
					);
				} );
		} );

The "a" is in a paragraph, but that's because we're pasting there. If we're talking about the conversion itself then we have two tests like this:

			it( 'should convert ul>li>p,text', () => {
				const modelFragment = editor.data.parse( '<ul><li><p>a</p>b</li></ul>' );

				expect( stringifyModel( modelFragment ) )
					.to.equal( '<paragraph>a</paragraph><paragraph>b</paragraph>' );
			} );

			// "b" is not autoparagraphed because clipboard holder allows text nodes.
			// There's a similar integrational test what's going to happen when pasting in paragraph-integration.js.
			it( 'should convert ul>li>p,text (in clipboard holder)', () => {
				const modelFragment = editor.data.parse( '<ul><li><p>a</p>b</li></ul>', '$clipboardHolder' );

				expect( stringifyModel( modelFragment ) )
					.to.equal( '<paragraph>a</paragraph>b' );
			} );

So, I'm brilliant :D And I guess this works due to autoping text. So, it may not be that unnecessary. Although, we still need to find a better way to perform that.

Reinmar referenced this issue in ckeditor/ckeditor5-paragraph May 3, 2017
Fix: Content autoparagraphing has been improved. "Inline" view elements (converted to attributes or elements) will be now correctly handled and autoparagraphed. Closes #10. Closes #11.
@mlewand mlewand transferred this issue from ckeditor/ckeditor5-paragraph Oct 9, 2019
@mlewand mlewand added this to the iteration 10 milestone Oct 9, 2019
@mlewand mlewand added type:bug This issue reports a buggy (incorrect) behavior. package:paragraph labels Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:paragraph type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants