Skip to content
This repository was archived by the owner on Nov 16, 2025. It is now read-only.
This repository was archived by the owner on Nov 16, 2025. It is now read-only.

Drain after DOCTYPE results in bad tokenization #75

@dgreensp

Description

@dgreensp

If I just tokenize the string "<!DOCTYPE html>", the doctype token comes out with a name of "htmltml". It seems to be a problem with the system of draining and undoing the buffer.

Is there some rule that's typically followed in the parser to avoid cases like this, but which is violated for doctypes? Knowing that would give me more confidence in the implementation. I actually haven't been playing with the tokenizer for very long; this is one of the first inputs I tried.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions