-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the tokenizer able to be run separately #41
Comments
I am very much interested in this as well. I am using the tree builder with a custom (MediaWiki) tokenizer in a prototype I am working on currently. For now I modified the parser to take a tokenizer as an argument in parse(), but this is just a quick hack that ignores tokenizer modes completely. Right now I am mostly working on the tokenizer, but might be able to put some effort into the interface later. |
Oh, sweet. I'm integrating the latest draft's tokenizer changes, which flattens the modes out into separate states. That should help move this along. |
@aredridel: Has that work been completed? I'm thinking about updating a project to the newest version of html5 and getting rid of the state transitions in my code. |
It has not, sadly! It's not trivial to get 100% accurate, since the HTML5 algorithms assume a document tree, and if you're not parsing fully, you don't get one. I need to rework for some of the latest spec changes, and that should simplify, since they've flattened some of the parser down into tokenizer states. |
I'm using just the tokenizer at the moment. I was able to pull it out, but it would be nice if it was explicitly separable. |
It's hard with the revision of the spec I was originally targeting, since the parser state feeds back into the tokenizer. With the latest revision, much if not all of this is flattened out. As I migrate toward the current parsing spec, it'll smooth that out. |
Sounds great. Thanks for writing this package, btw, it seems to be one of a kind in the JS world. |
I'm kinda surprised at how one of a kind it is, but it's needed! |
Do this while minimally involving parser for state transitions between tokenizing modes.
The text was updated successfully, but these errors were encountered: