Skip to content

HtmlDeserializer prepends leading space when pasting indented block HTML #60

Description

@MalaRuze

Summary

Pasting HTML into a BlockEditor (via the default withPaste plugin) prepends a stray leading space to the inserted text whenever the source HTML is indented/pretty-printed. HtmlDeserializer collapses a leading newline + indentation into a single space but never trims whitespace at block boundaries, so <p>\n Some text\n</p> deserializes to " Some text " instead of "Some text".

Environment

Reproduction

A BlockEditor created via createEditor wires withPaste by default (createEditorWithEssentials.tsx), which routes text/html clipboard data through HtmlDeserializer. Real-world clipboards from rendered/pretty-printed pages carry indented markup:

import { HtmlDeserializer } from '@contember/bindx-editor'

const deserializer = new HtmlDeserializer(children => ({ type: 'paragraph', children }), [])
const doc = new DOMParser().parseFromString('<p>\n\t\tSome text\n</p>', 'text/html')
const result = deserializer.deserializeBlocks(Array.from(doc.body.childNodes), {})
// result's text leaf === " Some text " — note the leading (and trailing) space

The failing test asserts the pasted text does not start/end with a space and that interior whitespace is still collapsed (<p>\n\tfoo\n\t\tbar\n</p>"foo bar").

Expected behavior

Per CSS white-space: normal, runs of whitespace collapse to a single space and leading/trailing whitespace is trimmed at block boundaries. Pasting <p>\n Some text\n</p> should insert Some text, matching what the browser visually renders for that markup.

Actual behavior

The deserialized text leaf is " Some text ". In the editor this surfaces as a space inserted before (and after) every pasted block, which the user must delete by hand. Reported by a real end user pasting into a rich-text field: "in ~100% of cases a space is written at the first position and then the text."

Suspected root cause

packages/bindx-editor/src/plugins/behaviour/paste/HtmlDeserializer.ts, deserializeTextNode (line 112):

private deserializeTextNode(node: Node, cumulativeTextAttrs: TextAttrs): Descendant[] | null {
    if (node.nodeType === Node.TEXT_NODE) {
        const text = node.textContent ?? ''
        return [{ ...cumulativeTextAttrs, text: text.replace(/[ \t]*(?:\r?\n[ \t]*)+/g, ' ') }]
    }
    ...

The regex collapses newline + surrounding indentation into a single space, but a leading \n\t\t (or trailing \n) at a block edge therefore becomes a leading/trailing " " that is never trimmed. processNodeListPaste already drops whole-whitespace text nodes (isWhiteSpace), but not the leading/trailing whitespace of a text node that also carries content.

Suggested fix

Trim whitespace at block boundaries as well as collapsing it — e.g. after assembling the block-level texts/elements in processNodeListPaste, strip a leading space from the first text leaf and a trailing space from the last, or normalize per-block in a way that mirrors white-space: normal. Care is needed to preserve a single interior space between inline siblings (foo <b>bar</b>), so trimming is best applied at the block edge rather than unconditionally per text node. Deferring exact placement to maintainers who know the intended inline/block boundary semantics.

Workaround shipped downstream

We applied a temporary workaround in our project, marked
TODO [BindX] (<this-issue-url>): <description>. The workaround wraps editor.insertData after the essentials plugins to strip the stray leading/trailing space from pasted block text; we will remove it once this issue is resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions