You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 16, 2025. It is now read-only.
If I just tokenize the string "<!DOCTYPE html>", the doctype token comes out with a name of "htmltml". It seems to be a problem with the system of draining and undoing the buffer.
Is there some rule that's typically followed in the parser to avoid cases like this, but which is violated for doctypes? Knowing that would give me more confidence in the implementation. I actually haven't been playing with the tokenizer for very long; this is one of the first inputs I tried.
If I just tokenize the string
"<!DOCTYPE html>", the doctype token comes out with a name of"htmltml". It seems to be a problem with the system of draining and undoing the buffer.Is there some rule that's typically followed in the parser to avoid cases like this, but which is violated for doctypes? Knowing that would give me more confidence in the implementation. I actually haven't been playing with the tokenizer for very long; this is one of the first inputs I tried.
Thanks!