Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[feature] conditional range index configs based on an attribute value #1233
This adds the possibility to add an attribute condition on complex range index config elements:
<create qname="tei:note"> <condition attribute="type" value="text_type" /> <field name="text_type" type="xs:string" case="no"></field> </create>
Will only store the text of the
tei:note[@type="text_type"][.="some text type"]
will be rewritten to
tei:note[range:field("text_type", "eq", "some text type")]
I tried to add as little complexity as possible to keep the performance impact at a minimum, especially if there are no conditions defined. If anyone has any suggestions for some benchmarks I could do let me know...
Background: In a project I have to deal with data like
<note type="language"> <term type="language">nicht ermittelt</term> </note> <note type="script"> <term type="script"/> <note type="writing_direction">nicht ermittelt</note> <note type="ink"/> <note type="hand_desc"> <p/> </note> <note type="structural_markers"> <p/> </note> </note>
And I want to build a faceted search feature with the contents of the note element as keys. Using conditional range index fields based on the note's type and retrieving them with range:index-keys-for-field gives better performance compared to defining separate index fields on the type and the text values and then using distinct-values() on them...
Very cool and thought-provoking! I can imagine this
One question: This feature applies only to "new" range indexes - i.e., those wrapped in
Yes it applies to the "new" range index.
You're right this could definitely be also useful for the lucene index. From a first glance there seems to be a lot of similar code so this shouldn't be too hard to implement. Probably even easier because we don't have to deal with query rewriting. Also conditions other than "an attribute equals" could be possible in the future... but let's get this merged first
I do have some ideas for refinement of the code, for example making ComplexRangeIndexConfigCondition abstract to allow for different kinds of conditions to be implemented as a subclass, and I also have a proof-of-concept ready for defining conditions for an arbitrary xpath (performance might be an issue though)...
But before I do any more work on this I'd like to wait for a preliminary comment on if this condition stuff would be considered for merging at all, maybe even already in 3.0...
@olvidalo For the Algolia Index plugin I am creating for @ttasovac I have an outstanding task to support simple predicates in the path expressions which define the Indexable objects. I wonder if we could perhaps both align on the same syntax generally and abstract this into a base class for indexes.
If we switched to supporting some simple predicate syntax, instead of your:
<create qname="tei:note"> <condition attribute="type" value="text_type" /> <field name="text_type" type="xs:string" case="no"/> </create>
We would have something like:
<create qname="tei:note[@type eq 'text_type']"> <field name="text_type" type="xs:string" case="no"/> </create>
What do you think?
@adamretter sounds like a good idea agreeing on a syntax. I agree that the simple predicate syntax you are proposing looks nicer than my
For example I'm working on supporting arbitrary xpath expressions in addition to a simple attribute condition. I am using eXist's XQuery engine to evaluate them for the specified element. Keeping performance in mind though, I do want to keep both kinds of conditions implemented separately as an attribute condition is way chepaer to evaluate than an xpath expression. So my preliminary plan was to use the condition tag but also allow "xpath" as an attribute, so when parsing I can discriminate between attribute and xpath conditions and instantiate different classes with different evaluation code depending on which attributes are specified.
But then again I'm also intrigued by the simplicity of your syntax, I'm just unsure about the complexity it adds. I'm still open to the idea and not at all zeroed in on my solution, but maybe we can discuss this a little...
Btw, I did add also functions
Here and example of configuration syntax
I'm ready to discuss and implement common rules syntax to share it between indexes.
Jan 30, 2017
Just a late thanks for the merge, I'm verry happy this got in in time for 3.0!
Meanwhile I've been working on a couple of extensions to the attribute condition (while maintaining syntax compatibility), also adding some features from @shabanovd's proposal. PR is coming up...