Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CKEditor 5 Mangles Table Structure and Makes Tables Inaccessible / Support for Tables with Multi-Level Headers / Allow <tbody> <th> in any column #14911

Open
jameslpetersen opened this issue Aug 31, 2023 · 14 comments · May be fixed by #15366
Labels
domain:accessibility This issue reports an accessibility problem. domain:v4-compatibility This issue reports a CKEditor 4 feature/option that's missing in CKEditor 5. package:table squad:core Issue to be handled by the Core team. support:2 An issue reported by a commercially licensed client. type:bug This issue reports a buggy (incorrect) behavior.

Comments

@jameslpetersen
Copy link

📝 Provide detailed reproduction steps (if any)

  1. Paste the HTML from tables 1, 2, 3, and 4 (below) into source view of the reference implementation of CKEditor.
  2. Switch out of source view and back into it.

Table 1

This table has a rowgroup header with a colspan.

<table>
    <thead>
    <tr>
        <th scope="col">Col Header 1</th>
        <th scope="col">Col Header 2</th>
        <th scope="col">Col Header 3</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup" colspan="3">Rowgroup Header 1</th>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <td>Data 1,1</td>
        <td>Data 2,1</td>
    </tr>
    </tbody>
</table>

Table 2

This table has two rowgroup headers that have colspans.

<table>
    <thead>
    <tr>
        <th scope="col">Col Header 1</th>
        <th scope="col">Col Header 2</th>
        <th scope="col">Col Header 3</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup" colspan="3">Rowgroup Header 1</th>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <td>Data 1,1</td>
        <td>Data 2,1</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="rowgroup" colspan="3">Rowgroup Header 2</th>
    </tr>
    <tr>
        <th scope="row">Row Header 2</th>
        <td>Data 1,2</td>
        <td>Data 2,2</td>
    </tr>
    <tr>
        <th scope="row">Row Header 3</th>
        <td>Data 1,3</td>
        <td>Data 2,3</td>
    </tr>
    </tbody>
</table>

Table 3

Similar to previous examples, but we've removed the colspan on the rowgroup and replaced with empty td elements. We've also added some row headers to the second column and a tbody block that doesn't have any th with a scope set to rowgroup.

<table>
    <thead>
    <tr>
        <th scope="col">Col Header 1</th>
        <th scope="col">Col Header 2</th>
        <th scope="col">Col Header 3</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">Rowgroup Header 1</th>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <th scope="row">Row Subheader 1</th>
        <td>Data 1,1</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="row">Row Header 2</th>
        <th scope="row">Row Subheader 2</th>
        <td>Data 1,2</td>
    </tr>
    <tr>
        <th scope="row">Row Header 3</th>
        <th scope="row">Row Subheader 3</th>
        <td>Data 1,3</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="rowgroup">Rowgroup Header 2</th>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <th scope="row">Row Header 4</th>
        <th scope="row">Row Subheader 4</th>
        <td>Data 1,4</td>
    </tr>
    </tbody>
</table>

Table 4

This table is an example from the HTML spec.

<table>
    <colgroup> <col>
    <colgroup> <col> <col> <col>
    <thead>
    <tr> <th> <th>2008 <th>2007 <th>2006
    <tbody>
    <tr> <th scope=rowgroup> Research and development
        <td> $ 1,109 <td> $ 782 <td> $ 712
    <tr> <th scope=row> Percentage of net sales
        <td> 3.4% <td> 3.3% <td> 3.7%
    <tbody>
    <tr> <th scope=rowgroup> Selling, general, and administrative
        <td> $ 3,761 <td> $ 2,963 <td> $ 2,433
    <tr> <th scope=row> Percentage of net sales
        <td> 11.6% <td> 12.3% <td> 12.6%
</table>

✔️ Expected result

CKEditor does not alter the table markup.

❌ Actual result

Table 1

CKEditor has moved the parent row for "Rowgroup Header 1" into the thead element, which makes the table inaccessible to screen readers.

<table>
    <thead>
    <tr>
        <th scope="col">
            Col Header 1
        </th>
        <th scope="col">
            Col Header 2
        </th>
        <th scope="col">
            Col Header 3
        </th>
    </tr>
    <tr>
        <th colspan="3" scope="rowgroup">
            Rowgroup Header 1
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="row">
            Row Header 1
        </th>
        <td>
            Data 1,1
        </td>
        <td>
            Data 2,1
        </td>
    </tr>
    </tbody>
</table>

Table 2

CKEditor has moved all the parent rows for rowgroup headers into the thead element. It has also put all the rows after the thead block into one tbody block. This makes the table inaccessible to screen readers and is a loss of data, since we can no longer reconstruct the original table due to loss of the tbody blocks.

<table>
    <thead>
    <tr>
        <th scope="col">
            Col Header 1
        </th>
        <th scope="col">
            Col Header 2
        </th>
        <th scope="col">
            Col Header 3
        </th>
    </tr>
    <tr>
        <th colspan="3" scope="rowgroup">
            Rowgroup Header 1
        </th>
    </tr>
    <tr>
        <th colspan="3" scope="rowgroup">
            Rowgroup Header 2
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="row">
            Row Header 1
        </th>
        <td>
            Data 1,1
        </td>
        <td>
            Data 2,1
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 2
        </th>
        <td>
            Data 1,2
        </td>
        <td>
            Data 2,2
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 3
        </th>
        <td>
            Data 1,3
        </td>
        <td>
            Data 2,3
        </td>
    </tr>
    </tbody>
</table>

Table 3

CKEditor has not moved all the parent rows for rowgroup headers into the thead element, but has put all the rows after the thead block into one tbody block. This causes data loss, since we can no longer reconstruct the original table due to loss of the tbody blocks, and makes the table inaccessible to screen readers. CKEditor has also turned row headers in the second column to td elements with a scope attribute, which is invalid, and causes further accessibility issues.

<table>
    <thead>
    <tr>
        <th scope="col">
            Col Header 1
        </th>
        <th scope="col">
            Col Header 2
        </th>
        <th scope="col">
            Col Header 3
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">
            Rowgroup Header 1
        </th>
        <td>
            &nbsp;
        </td>
        <td>
            &nbsp;
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 1
        </th>
        <td scope="row">
            Row Subheader 1
        </td>
        <td>
            Data 1,1
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 2
        </th>
        <td scope="row">
            Row Subheader 2
        </td>
        <td>
            Data 1,2
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 3
        </th>
        <td scope="row">
            Row Subheader 3
        </td>
        <td>
            Data 1,3
        </td>
    </tr>
    <tr>
        <th scope="rowgroup">
            Rowgroup Header 2
        </th>
        <td>
            &nbsp;
        </td>
        <td>
            &nbsp;
        </td>
    </tr>
    <tr>
        <th scope="row">
            Row Header 4
        </th>
        <td scope="row">
            Row Subheader 4
        </td>
        <td>
            Data 1,4
        </td>
    </tr>
    </tbody>
</table>

Table 4

CKEditor has added inline styles to col elements (strange), put all the rows after the thead block into one tbody block (again, potential data loss and makes the table inaccessible to screen readers). I'm not sure what the practical implications of this next quirk are, but I want to note it. CKEditor has added a non-breaking space character to an empty th tag. I don't think non-breaking spaces are ASCII whitespace. This means that the header cell is non-empty and user agents (including screen readers) should add it to the header list for the cells beneath it, according to the spec.

<table class="ck-table-resized">
    <colgroup><col style="width:25%;"><col style="width:25%;"><col style="width:25%;"><col style="width:25%;"></colgroup><colgroup><col><col><col></colgroup>
    <thead>
    <tr>
        <th>
            &nbsp;
        </th>
        <th>
            2008
        </th>
        <th>
            2007
        </th>
        <th>
            2006
        </th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">
            Research and development
        </th>
        <td>
            $ 1,109
        </td>
        <td>
            $ 782
        </td>
        <td>
            $ 712
        </td>
    </tr>
    <tr>
        <th scope="row">
            Percentage of net sales
        </th>
        <td>
            3.4%
        </td>
        <td>
            3.3%
        </td>
        <td>
            3.7%
        </td>
    </tr>
    <tr>
        <th scope="rowgroup">
            Selling, general, and administrative
        </th>
        <td>
            $ 3,761
        </td>
        <td>
            $ 2,963
        </td>
        <td>
            $ 2,433
        </td>
    </tr>
    <tr>
        <th scope="row">
            Percentage of net sales
        </th>
        <td>
            11.6%
        </td>
        <td>
            12.3%
        </td>
        <td>
            12.6%
        </td>
    </tr>
    </tbody>
</table>

❓ Possible solution

Add some kind of data attribute that tells CKEditor to not parse the table. Or just follow the table parsing algorithm from the HTML5 spec.

📃 Other details

  • Browser: Any
  • OS: Any
  • First affected CKEditor version: Unknown, but this is happening in the reference implementation.
  • Installed CKEditor plugins: None

If you'd like to see this fixed sooner, add a 👍 reaction to this post.

@jameslpetersen jameslpetersen added the type:bug This issue reports a buggy (incorrect) behavior. label Aug 31, 2023
@Witoso Witoso added the domain:accessibility This issue reports an accessibility problem. label Sep 11, 2023
@Witoso
Copy link
Member

Witoso commented Sep 11, 2023

Thank you for the report, we will analyze it. One comment, data change is not data loss, as far as I can see, we don't have in the examples the data loss where the data vanishes. We need to standardize the incoming HTML to a common format to make some features work, like colgroup for column resize.

@jameslpetersen
Copy link
Author

Thanks for the response, Witek.

My main reason for mentioning data loss is that this bug seems to merge multiple tbody tags into a single tbody tag. Depending on the source code of the original table, it may be impossible to determine the original number and/or location of those tbody tags. And if the table has one or more th tags whose scope attribute equals rowgroup, then the tbody tags convey semantic information about the table to users of screen readers. So, the bug may erase semantic data about a table that is necessary for screen reader users to properly interpret the table.

For example, given the following table, the loss of the second tbody tag prevents you from determining how many of the final rows were in the first tbody tag (or even if there was a second tbody tag). And the loss of the second tbody tag changes the meaning of the table for screen reader users.

<table>
    <thead>
    <tr>
        <th scope="col">Col 1</th>
        <th scope="col">Col 2</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <th scope="rowgroup">Rowgroup Header</th>
        <td></td>
    </tr>
    <tr>
        <th scope="row">Row Header 1</th>
        <td>Data 1</td>
    </tr>
    <tr>
        <th scope="row">Row Header 2</th>
        <td>Data 2</td>
    </tr>
    </tbody>
    <tbody>
    <tr>
        <th scope="row">Row Header 3</th>
        <td>Data 3</td>
    </tr>
    <tr>
        <th scope="row">Row Header 4</th>
        <td>Data 4</td>
    </tr>
    </tbody>
</table>

@wimleers
Copy link

wimleers commented Nov 2, 2023

Tracking this on drupal.org at https://www.drupal.org/project/drupal/issues/3384400.

@ericgsmith
Copy link

ericgsmith commented Nov 16, 2023

We have a client with over 1000 complex data tables that would all suffer what I would consider data loss as a result of this bug(s) / lack of full support for HTML tables.

To borrow the quote on the related Drupal.org issue from @phil-s

I'd agree that some HTML markup could be classed as metadata for most intents and purposes, but table markup is so integral to the semantic meaning of that content that I don't think you can consider it as anything other than "data".

There are a couple different issues detailed here, while investigating I wanted to explore a potential avenue specifically for the issue of tbody elements being condensed into 1.

I've added a draft PR with a PoC - the idea is to introduce a rowGroup attribute to the tableRow model.

I've added a manual test with the markup from the table 3 example - while it gives me the expected output for tbody elements, I haven't looked into the td / th issue.

Its draft (no doc changes or automated tests at this stage) but I'm very keen to get feedback on the approach before I go further as I'm not experienced with the CKEditor 5 framework / ecosystem so there's probably more elegant approaches.

For td / th conversion - there are a lot more moving parts here, I was thinking if a role attribute of heading || data could be added to the tableCell model - then when upcasting the original HTML can dictate if this is a heading or not. There seem to be lots of moving parts here that would be impacted, but I think the current logic of th being based on an entire heading row / heading column is problematic.

As it stands this issue is a blocker for use to adopt CKEditor5 so keen to help find solutions where we can.

Side note - your contribution docs are awesome - thank you for those!

@Witoso
Copy link
Member

Witoso commented Nov 17, 2023

@niegowski could you take a look when you have time?

@chrisgross
Copy link

I'd just like to add that while the examples provided are about CKEditor mishandling existing markup, it seems to me the CKEditor doesn't even create its own tables with the basic markup required for accessibility. For example, if a table has two headers in two different directions, like a header row as well as a header column, accessibility guidelines require the "scope" attribute in order to properly determine the direction of the headers, but CKEditor does not do this: https://www.w3.org/WAI/tutorials/tables/two-headers

@jameswilson
Copy link

jameswilson commented Feb 13, 2024

The table bugs here are really critical. I’m responding to an RFP for a city government website proposal, and one question completely disqualifies Drupal 10 with CKE5 as inconformant:

To comply with the U.S. Access Board Web-based Intranet and Internet Information and Applications (1194.22) provisions, the solution provides the following:

  • The solution allows markup to be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers.

If you copy/paste the table example from Curriculum for Web Content Accessibility Guidelines 1.0 (created in year 2000!) into the CKEditor 5 demo page (via the Source mode button) it is broken.

It appears that several things are happening:

  • multiple <tbody> sections are collapsed into a single one.
  • any <tr> in <tbody> consisting of only <th> tags (no <td> tags) are considered by CKEditor to be a header, and thus pulled out of order up into the <thead> section.
Screen Shot 2024-02-13 at 6 17 55 PM Screen Shot 2024-02-13 at 6 18 05 PM

@jameswilson
Copy link

jameswilson commented Feb 14, 2024

Regarding this one in the previous comment:

any <tr> in <tbody> consisting of only <th> tags (no <td> tags) are considered by CKEditor to be a header, and thus pulled out of order up into the <thead> section.

It is worth clarifying that this has not been called out as a distinct issue from the "rowgroup" bug mentioned in the issue summary above. They're related for sure, but this issue where rows from table body get pulled up to thead happens with a row containing only th tags (one or more) and NO td tags, irrespective of whether "rowgroup" attr is present or not.

@Witoso Witoso added package:table squad:core Issue to be handled by the Core team. labels Feb 29, 2024
@martynawierzbicka martynawierzbicka added the support:2 An issue reported by a commercially licensed client. label Feb 29, 2024
@dliwustl
Copy link

I'm late to this conversation, but wanted to chime in that we have a vendor who recently upgraded to CKEditor 5, and this has mangled all our tables as well. On a separate note, I wish CKEditor 5 wouldn't wrap the table in an auto-padded element, since this prevents us from setting our tables to 100% width.

Thank you for the report, we will analyze it. One comment, data change is not data loss, as far as I can see, we don't have in the examples the data loss where the data vanishes. We need to standardize the incoming HTML to a common format to make some features work, like colgroup for column resize.

Data change can absolutely be data loss. If someone shuffled all the primary IDs in your SQL table, would you not consider that data loss?

@wimleers
Copy link

wimleers commented Mar 1, 2024

@dliwustl

Data change can absolutely be data loss. If someone shuffled all the primary IDs in your SQL table, would you not consider that data loss?

I'm gonna remember this quote! 😄

@Witoso
Copy link
Member

Witoso commented Mar 1, 2024

🍎 👉 🍊 😊

On a separate note, I wish CKEditor 5 wouldn't wrap the table in an auto-padded element, since this prevents us from setting our tables to 100% width.

Just to make sure, you mean:

.ck-content .table {
    margin: 0.9em auto;
    display: table;
}

AFAIK, this could be easily configurable via changing CSS, and table.tableProperties.defaultProperties

@Witoso
Copy link
Member

Witoso commented Mar 1, 2024

JFYI, on the ck/14911-investigation branch there's a prepared manual test with all provided examples and live editors.

@dliwustl
Copy link

dliwustl commented Mar 1, 2024

On a separate note, I wish CKEditor 5 wouldn't wrap the table in an auto-padded element, since this prevents us from setting our tables to 100% width.

Just to make sure, you mean:

.ck-content .table {
    margin: 0.9em auto;
    display: table;
}

AFAIK, this could be easily configurable via changing CSS, and table.tableProperties.defaultProperties

Thanks for the reply! This is related to a vendor's implementation, so I'll pass this along. I had found and suggested editing that style before, but the link is very helpful. 😊

Might I ask one more related question (and then I promise to stop hijacking)? In the vendor's implementation, the table selection handle still appears when the editor element is in read-only mode. It appears when the page first loads, then disappears if you select the table and unselect it. Is that working as intended? The selection handle is useful in edit mode, but not in read-only mode.

@Witoso Witoso changed the title CKEditor 5 Mangles Table Structure and Makes Tables Inaccessible CKEditor 5 Mangles Table Structure and Makes Tables Inaccessible / Support for Tables with Multi-Level Headers / Allow <tbody> <th> in any column Mar 5, 2024
@Witoso Witoso added the domain:v4-compatibility This issue reports a CKEditor 4 feature/option that's missing in CKEditor 5. label Mar 5, 2024
@A-Kun
Copy link

A-Kun commented Apr 17, 2024

I'd just like to add that while the examples provided are about CKEditor mishandling existing markup, it seems to me the CKEditor doesn't even create its own tables with the basic markup required for accessibility. For example, if a table has two headers in two different directions, like a header row as well as a header column, accessibility guidelines require the "scope" attribute in order to properly determine the direction of the headers, but CKEditor does not do this: https://www.w3.org/WAI/tutorials/tables/two-headers

There's a separate issue (opened in 2018!) about this: #3175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:accessibility This issue reports an accessibility problem. domain:v4-compatibility This issue reports a CKEditor 4 feature/option that's missing in CKEditor 5. package:table squad:core Issue to be handled by the Core team. support:2 An issue reported by a commercially licensed client. type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants