New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support lookups at Query Time #2329
Comments
would be worth mentioning the stuff in |
I will be working on adding some introspection endpoint via HTTP, that allows user to introspect lookups at run time. |
@b-slim what kind of introspection? |
@drcrallen it will depends on what lookup impl exposes, for instance maybe a prefix search. |
@b-slim cool, just FYI, at this point I'm planning on:
and other than that hoping you can drive for some of the other features you've been working on. I THINK only the cluster wide config stuff is a blocker for your items. |
Support for cluster-wide config: #1576 |
Snapshot lookups on node shutdown: #2517 |
Are there plans for evolving lookups to help with slowly changing dimensions? QTL is useful if product id X is always a product called "Rubber Duck". But if the product name changes to "Rubber Ducky" on a specific date, it would be nice to specify that the query time lookup should use the date of the "fact table" record when doing the lookup. Just in case that didn't make sense: datasource: product_sales
lookup: product information
so when grouping sales by product name, I'd expect:
|
I have not heard that in the plans and here's why: There is a distinction between an attribute of an event, and an attribute of something that is not an event (an attribute of an attribute .... like a name for an ID). What QTL is designed to handle, and what the immediate future of QTL (from our side at MMX, community discussion and contribution beyond our plans are always welcomed and encouraged) will handle is attributes of attributes. Where you have an attribute which describes an event, but that attribute is immutable. Attributes of those attributes may be mutable though. So you might have an attribute of a product being End Of Life, but suddenly it is doing well, so you extend the End Of Life, or something. The End Of Life is an attribute of the product ID, NOT of the events related to the product, and is therefore a candidate for being able to be included in QTL workflows in the current and immediate future incarnations. What you are describing are first class immutable attributes of events, and are not currently supported. What you CAN do is have an immutable attribute of the event be product_name_id and have that be stamped with the event, then QTL can lookup the human-friendly name. As always, such items are open to discussion. |
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. |
Druid currently has no real "join" support. There is a need for "query-time lookups" that do a dimension-style join at query time.
This effort can be broken down into 3 phases
Introduce an interface that can be leveraged for "QTL" that functionally works in the query flow. This interface should be something that can be extended and it should enable re-write optimizations that can allow the broker (or other node) to re-write the lookup so that it is not pushed all the way down if it does not need to be pushed all the way down. There is a PR out for this here: https://github.com/druid-io/druid/pull/2291/files
The interface in question is an interface inside of Druid and in order to facilitate administrator management of implementations, it will require a "manager" class that can have implementations registered and removed. The initial implementation of this would provide just the functionality and the ability to register lookups via static configuration in
runtime.properties
.We need a method of completely centralizing the configuration of lookups. This would be an endpoint on the coordinator that registers new lookups in the system. It would then be the coordinator's job to ensure that all nodes that need the lookups have them.
There are a couple of options here, but the simplest would be to just replicate all lookups to all query-processing nodes. An optimization on that option could be to form some sort of connection between lookups and the data sources they operate on and have the coordinator assign lookups according to which data sources are served by a given query-processing node
We need implementations of the lookup interface introduced in step 1. There is an implementation of the interface done in the
namespace-lookup
module that introduces its own set of interfaces for things that generateMap
objects and then it will build a lookup on top of those maps. There are other implementations that can also be done which do not force communication through aMap
object but just implement the Druid interface directly.The text was updated successfully, but these errors were encountered: