Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using Fizzle for selection in HTML Blobs #18

Open
5 tasks done
cognifloyd opened this issue Jul 16, 2013 · 10 comments
Open
5 tasks done

Investigate using Fizzle for selection in HTML Blobs #18

cognifloyd opened this issue Jul 16, 2013 · 10 comments

Comments

@cognifloyd
Copy link
Owner

In HTML Blobs (typically subBlobs of Fluid Blobs when using the TemplateBuilder, I'm going to need a way to select elements in the syntax tree.

Fizzle is the standard for FlowQuery (which is what I'm using), but it does not have some of the convenience syntax that is so common in Css. The most important of which are:

  • #id
  • .class

I guess fizzle properties would map onto tags or elements, but I don't know how that works. I also need to be able to select Blobs based on their package and file path.

Here are some specific things to look into:

  • Investigate extending Fizzle Parser to add tag#id.class syntax (semantics should depend on the syntax, as this won't make sense everywhere fizzle is used)
  • Investigate how fizzle interacts with a path like: Cognfiire.EmptyBoilerplate:Resources/Private/FooBar.txt
  • Investigate How Fizzle is embedded in FlowQuery, especially how it is used in the filter() operation. (Fizzle is just an argument, right?)
  • push documentation change for Flow to remove the propertyNameFilter reference to foo.bar.baz
  • Look at TemplaVoila to see how it selects a particular path in the the file. Does it use xPath?

I think I'll have to use XPath, but if someone wants to write their own CSS Selector, I'll support that using symfony\CssSelector. For anything a builder writes to a file, unless it is user provided, it will be XPath. Hopefully XPath isn't hard to generate in JavaScript because this selector will be something that will go back and forth between the UI and the backend.

@cognifloyd
Copy link
Owner Author

OK. So FlowQuery uses Fizzle in the Filter operation, as I expected.

It passes the argument string into fizzle, and gets an array (a syntax tree) back.

Fizzle already has support for the #ObjectIdentifier notation, but an ObjectIdentifier is defined as: [0-9a-zA-Z_-]+. That could work in place of [id=blah]. I'll have to make a new filter that extends \TYPO3\Flow\Eel\FlowQuery\Operations\Object\FilterOperation, and redefine matchesIdentifierFilter() which currently expects the identifier to be a UUID, and uses the UUID to select the element (object in the collection) that is identified by that UUID. So, that will need to look through whatever HTML syntax tree (DOMElements probably) and find the one that has the given identifier.

See TYPO3.Neos:TypoScript\FlowQueryOperations\FilterOperation for inspiration.

@cognifloyd
Copy link
Owner Author

I'll probably use matchesPropertyNameFilter() to filter the tag name, I think. The only issue is, I might want to get a namespaced tag (to select a fluid tag like <f:render/>, for example), and that means I'd need a colon.

So, I might have to extend ObjectIdentifier to be [0-9a-zA-Z_-]+(':'[0-9a-zA-Z_-])?

Speaking of the colon, in BlobQuery, I'm going to want to filter based on PackageName:path/path/path which means I also need a /

@cognifloyd
Copy link
Owner Author

In the attribute Filter [ foo = 'example' ], foo can be a property like foo.bar.baz which does not make sense in HTML. Attributes should all be at the same level. So, I'll want to override getPropertyPath() to just return the whole property...

Then again, could I use the propertyNameFilter to get the class definition? So div.foo would be seen as a property, but I would provide the semantic meaning that div is a tagname and foo is a class. Would that work?

It is also possible to use symfony/CssSelector instead of Fizzle, but I'd really like to avoid adding another dependency. Hopefully fizzle will work.

@cognifloyd
Copy link
Owner Author

So the propertyNameFilter expects an identifier which is defined as: [a-zA-Z_] [a-zA-Z0-9_]*

And the only place that it looks for a property path is in the attribute filter, but it doesn't match a period . in the Identifier, so I don't see how it will ever match a path unless there's some string magic somewhere that converts a period into an underscore and back again before checking for a property path.

@cognifloyd
Copy link
Owner Author

The docs[1] say that 'foo.bar.baz' would be a valid property name, but the parser grammar doesn't accept periods in property names[2].

Plus, there are no unit tests that include a period in the property name[3], even though there are some method stubs in filter() that seem to expect that a propertyName can include periods.

(1) http://docs.typo3.org/neos/TYPO3NeosDocumentation/IntegratorGuide/EelFlowQuery.html#property-name-filters

(2) The grammar expects an Identifier
https://git.typo3.org/Packages/TYPO3.Eel.git/blob/HEAD:/Resources/Private/Grammar/Fizzle.peg.inc#l52
and an identifier is defined here (line 41):
https://git.typo3.org/Packages/TYPO3.Eel.git/blob/HEAD:/Resources/Private/Grammar/AbstractParser.peg.inc#l41
which matches "/ [a-zA-Z_] [a-zA-Z0-9_]* /" <-- There is no period in this regex

(3) https://git.typo3.org/Packages/TYPO3.Eel.git/blob/HEAD:/Tests/Unit/FlowQuery/FizzleParserTest.php#l68
Note line 68 propertyNameFilterIsMatched() asserts that two things don't match:
\TYPO3\Foo
TYPO3.Foo:Bar
But does not verify that anything does match. I think that means we need to match, 'foo', 'foo.bar', and 'foo.bar.baz' like the documentation suggests. It would also be very nice (for me anyway) to match these two examples that it says don't match because I need to filter based on packageName:path in one instance, and I need to filter based on tag#identity.class in another instance.
So, that would suggest that I need to add ':.#' to the matched characters for propertyNameFilter.

@cognifloyd
Copy link
Owner Author

So, I would have to do some major voodoo with Fizzle to get it to understand tag#id.class

Maybe I should just bite the bullet and include symfony\CssSelector and then implement a new filter() operation that is only used for HTML, but takes the css selector and passes it on to DomCrawler to get the right spot in the file.

@cognifloyd
Copy link
Owner Author

symfony\DomCrawler isn't really the best tool for generating HTML. It's designed to retrieve and navigate it, so that you can submit forms, but I would have to build a bunch of stuff around it to make it work the way I need it to (read and write html files, as well as the html in fluid files).

Other options include:

My requirements include:

  • I need to be able to watch every command (with an aspect probably) and write them as FlowQuery commands in a YAML file.
  • The UI will send some kind of selector to the backend that gets run as FlowQuery in PHP, but stored as Eel in the YAML file. The user will be inserting elements at a given location (or removing them), so I've got to have a robust system.
  • It's got to be repeatable (to facilitate the future migrations service) That might mean that I select an element in the old doc, update the doc to see where the element moved, and generate a new selector. I would prefer using things like ID or class to select things (as they aren't as fragile and prone to causing the TemplaVoila-style remapping hell), but I want to support selecting any element.
  • I don't care if it uses XPath or CssSelectors, but it's got to be robust, and I've got to be able to sniff it, store it, and repeat it.

@cognifloyd
Copy link
Owner Author

TemplaVoila suffers from NIH-syndrome. There's are elements of CSS selectors (like #id and .class) but it uses a custom [number] annotation that is unique to TemplaVoila, as well as the keywords INNER and OUTER to see whether or not to include a matched tag. I really don't want to go down the same path as TemplaVoila, and contorting Fizzle to select HTML elements would do exactly that. No, I will use an external library, and I will use standard XPath and/or CSS Selectors.

The question is, is there an equivalent to INNER/OUTER in CSS or in XPath?

@cognifloyd
Copy link
Owner Author

CSS Selctors can select elements but not the contents of those elements. The closest we get to selecting the contents of an element is E::first-line, E::first-letter, E::before, and E::after.

I will want something like before and after, but I think I need even more power.

So, to map TV concepts onto XPath:

  • /foo/text() selects only the text nodes in element foo (not comments or elements or anything else)
  • /foo/element() selects all of the element nodes in element foo (not comments, text, etc)
  • /foo/comment() selects all of the comment nodes in element foo (not elements, text, etc)
  • /foo selects the entire node including the element tag of foo. This is alot like OUTER in TV.
  • /foo/node() selects all nodes (text, comments, elements, etc) within the foo node, so it works like INNER in TV

See the specs

@cognifloyd
Copy link
Owner Author

Just to follow up. I investigated the various parsers mentioned earlier and QueryPath is the best for what I need.

It's faster than SimpleHTMLDOM[1], is designed for editing unlike DomCrawler[2], and is more actively maintained that PhpQuery and in 2010 it was faster than PhpQuery at write operations[3].

Also, support for HTML5 (and especially HTML5 fragments) is underway in QueryPath 3.x, so it really is the best choice.

That means that, for the most part, CSS Selectors are the way to go for selecting elements in the docs. I can use xpath if needed, and maybe someone will add an xquery operation at some point, but for now, CSS Selectors through QueryPath is what I'm going to use.

[1] https://groups.google.com/forum/#!topic/support-querypath/DEQIsoZW_pU
[2] http://symfony.com/doc/current/components/dom_crawler.html (see the first note at the top of the doc)
[3] http://web.archive.org/web/20100815061227/http://www.tagbytag.org/articles/phpquery-vs-querypath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant