Skip to content

Removal of specialized HTML literal handling? #2946

@ashleysommer

Description

@ashleysommer

Possible easy solution for #2935 and #2945

The reason we forked html5lib to make html5lib-modern was because there is no new replacement for html5lib that provides the same XML-based HTML-tokenizing functionality that html5lib does. There's no alternative to move to.

Beautifulsoup4 is the logical replacement, but it includes html5lib in its dependency tree, so defeats the whole point.

But what if we just dropped that feature entirely? Why does RDFLib even want to be able to tokenize HTML Literals? The feature was added for a reason, but do we need to keep it?

Can we simply drop that feature, and treat HTML the same as any other string literal, and remove html5lib from our dependencies entirely?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions