Make the tokenizer able to be run separately #41

aredridel · 2011-11-14T02:43:03Z

Do this while minimally involving parser for state transitions between tokenizing modes.

gwicke · 2011-11-28T17:29:09Z

I am very much interested in this as well. I am using the tree builder with a custom (MediaWiki) tokenizer in a prototype I am working on currently. For now I modified the parser to take a tokenizer as an argument in parse(), but this is just a quick hack that ignores tokenizer modes completely.

Right now I am mostly working on the tokenizer, but might be able to put some effort into the interface later.

aredridel · 2011-11-29T03:57:59Z

Oh, sweet. I'm integrating the latest draft's tokenizer changes, which flattens the modes out into separate states. That should help move this along.

papandreou · 2012-04-27T12:06:38Z

@aredridel: Has that work been completed? I'm thinking about updating a project to the newest version of html5 and getting rid of the state transitions in my code.

aredridel · 2012-04-27T12:52:21Z

It has not, sadly! It's not trivial to get 100% accurate, since the HTML5 algorithms assume a document tree, and if you're not parsing fully, you don't get one. I need to rework for some of the latest spec changes, and that should simplify, since they've flattened some of the parser down into tokenizer states.

dgreensp · 2013-04-17T01:56:23Z

I'm using just the tokenizer at the moment. I was able to pull it out, but it would be nice if it was explicitly separable.

aredridel · 2013-04-17T13:15:04Z

It's hard with the revision of the spec I was originally targeting, since the parser state feeds back into the tokenizer. With the latest revision, much if not all of this is flattened out. As I migrate toward the current parsing spec, it'll smooth that out.

dgreensp · 2013-04-17T16:35:47Z

Sounds great. Thanks for writing this package, btw, it seems to be one of a kind in the JS world.

aredridel · 2013-04-18T00:25:13Z

I'm kinda surprised at how one of a kind it is, but it's needed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the tokenizer able to be run separately #41

Make the tokenizer able to be run separately #41

aredridel commented Nov 14, 2011

gwicke commented Nov 28, 2011

aredridel commented Nov 29, 2011

papandreou commented Apr 27, 2012

aredridel commented Apr 27, 2012

dgreensp commented Apr 17, 2013

aredridel commented Apr 17, 2013

dgreensp commented Apr 17, 2013

aredridel commented Apr 18, 2013

Make the tokenizer able to be run separately #41

Make the tokenizer able to be run separately #41

Comments

aredridel commented Nov 14, 2011

gwicke commented Nov 28, 2011

aredridel commented Nov 29, 2011

papandreou commented Apr 27, 2012

aredridel commented Apr 27, 2012

dgreensp commented Apr 17, 2013

aredridel commented Apr 17, 2013

dgreensp commented Apr 17, 2013

aredridel commented Apr 18, 2013