Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nocommit] Introduce ExpressionFacets along with a demo. #12184

Closed
wants to merge 1 commit into from

Conversation

gsmiller
Copy link
Contributor

@gsmiller gsmiller commented Mar 5, 2023

This is meant only as a demo/example. It is incomplete and untested. It's just to start discussion around a possible feature.

This is meant only as a demo/example. It is incomplete and untested.
It's just to start discussion around a possible feature.
@gsmiller gsmiller closed this Mar 5, 2023
@gsmiller
Copy link
Contributor Author

gsmiller commented Mar 5, 2023

Immediately closing. Only want to use this draft PR as an example.

FacetsCollector facetsCollector)
throws IOException, ParseException {
// Compute component aggregations:
for (FieldAggregation fa : aggregations) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanvodita here's where I think you're proposing we do multiple aggregations in one pass right? (which I'm not doing in this example, but it could be done and would probably be a bit more efficient)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s right. Seeing your demo makes me thing of a few other features that would be nice to have:

  1. Have a global cache of previously computed aggregations. This supports cases where multiple expressions use the same aggregations. Your bindings is already a global cache. New aggregation bindings could be added to it instead of aggregationBindings. Maybe the user can pass a flag to request for aggregations to be saved to the aggregation cache.
  2. Compute aggregations on different match-sets. For example, let’s say in the ranking expression I only want to aggregate populations for cities whose name starts with “S” and only aggregate distances for cities that are overseas with respect to their country’s capital. Some ways I can imagine this being supported are: a) By using a filter query, like other faceting implementations do. b) By introducing a ternary operator in the expressions or in the values bound to the expressions. c) Each aggregation object references a match-set to be computed against.
  3. Make ExpressionFacets recursive. I’m not sure if this is a good idea, but it makes sense to me conceptually. Right now, variables that show up in the expression have to be pre-computed and bound. What if the variables were treated as sub-expressions, computed recursively as needed, and added to a cache (as per point 1)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve worked a bit with the example code to see how these ideas could play out. I’ll go through them in the same order.

  1. As anticipated, it’s straight-forward to use the Bindings object as a global cache of aggregations. A strong objection to this is that the Bindings object would now contain values keyed by docid and keyed by ordinal. I don’t think this is a problem. When a variable is referenced in an expression it’s not ambiguous whether it is keyed by docid or ordinal. Another reason to keep field bindings separate from aggregation bindings is that aggregations make sense within the context of a particular query.
  2. The user can define DVSs that have a built-in condition. With this approach, it’s the user’s responsibility to provide conditional DVSs that make sense. Another approach is to use conditions in the expressions - the ternary operator is supported.
  3. I’m dropping this idea. Having considered it some more, I don’t see any benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stefanvodita. On your points above:

  1. It sounds like you're talking about re-using aggregations across fields, not just dims? I think it's fairly common for users to "pack" all their faceting dimensions in a single index field. In this case, the aggregations will be re-used across all the dims the user wants to facet on. If the user is spreading their facet dims across multiple index fields though, you're right that any common aggregations would need to be re-computed. This feels like a little bit of an unusual case though, and I'd rather not design for it up-front to be honest. I'm nervous about putting these ordinal-level bindings into a "general" bindings instance since we have no control over the names that have already been bound, so there could be collisions (although maybe unlikely?). By keeping them isolated to an internal binding instance, we have full control over the namespace. Maybe we could simplify a bit initially and look at this optimization in a follow-up issue if we think it's important for users?
  2. I'm a little confused by this use-case. If the user wants to restrict the match set used for faceting, wouldn't they want to restrict it in the same way across all aggregations and the final expression? That's easy enough and is modeled by FacetCountsWithFilterQuery, which we could use here. I'm not really clear on why a user would want to restrict the match set differently for different aggregations?
  3. Not sure I understand this either, but it sounds like you're unconvinced as well? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I didn't fully get this part. What I had in mind was preserving aggregations across different faceting calls. For example, val_distance_sum could be used in 2 different expressions we want to compute. We don’t want to recompute val_distance_sum each time. In any case, you’re right that we shouldn’t complicate things too much at this stage.
  2. I’ve mixed up 2 different things, which made things more confusing. The first one, which you address, is restricting the match-set for the expression computation. This can be done with a filter query. The second one is about having conditionals in the expression. This is already supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stefanvodita! On your first point:

If the user has "packed" all their faceting dimensions into a single index field (which is sort of the default behavior here), then the expression (along with all aggregation values) would be computed a single time when the Facets instance is instantiated. If the user is then making separate calls to get faceting information for different dimensions on this same facets instance (e.g., multiple calls to getTopChildren for different dimensions or dims + paths), then we're not re-computing anything. That's all I was getting at. Where we would be recomputing is if there are different index fields storing the faceting information. This is because we need a separate Facets instance for each index field, which is where some duplicate computation could come into play I think. So my point is just that I suspect most common use-cases would leverage a single index field, with a single Facets instance, so there wouldn't re duplicate computation.

@gsmiller
Copy link
Contributor Author

gsmiller commented Mar 6, 2023

@stefanvodita I created #12190 to track this further. I propose we carry the discussion forward over there to see if we can come up with a design we like, and solicit community feedback, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants