Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lookups at Query Time #2329

Closed
cheddar opened this issue Jan 25, 2016 · 12 comments
Closed

Support lookups at Query Time #2329

cheddar opened this issue Jan 25, 2016 · 12 comments
Labels

Comments

@cheddar
Copy link
Contributor

cheddar commented Jan 25, 2016

Druid currently has no real "join" support. There is a need for "query-time lookups" that do a dimension-style join at query time.

This effort can be broken down into 3 phases

  1. Introduce an interface that can be leveraged for "QTL" that functionally works in the query flow. This interface should be something that can be extended and it should enable re-write optimizations that can allow the broker (or other node) to re-write the lookup so that it is not pushed all the way down if it does not need to be pushed all the way down. There is a PR out for this here: https://github.com/druid-io/druid/pull/2291/files

    The interface in question is an interface inside of Druid and in order to facilitate administrator management of implementations, it will require a "manager" class that can have implementations registered and removed. The initial implementation of this would provide just the functionality and the ability to register lookups via static configuration in runtime.properties.

  2. We need a method of completely centralizing the configuration of lookups. This would be an endpoint on the coordinator that registers new lookups in the system. It would then be the coordinator's job to ensure that all nodes that need the lookups have them.

    There are a couple of options here, but the simplest would be to just replicate all lookups to all query-processing nodes. An optimization on that option could be to form some sort of connection between lookups and the data sources they operate on and have the coordinator assign lookups according to which data sources are served by a given query-processing node

  3. We need implementations of the lookup interface introduced in step 1. There is an implementation of the interface done in the namespace-lookup module that introduces its own set of interfaces for things that generate Map objects and then it will build a lookup on top of those maps. There are other implementations that can also be done which do not force communication through a Map object but just implement the Druid interface directly.

@drcrallen
Copy link
Contributor

would be worth mentioning the stuff in namespace-lookup is intended for use in "small" Map objects.

@b-slim
Copy link
Contributor

b-slim commented Feb 24, 2016

PR #1576 and #2517 are addressing point 2.

@b-slim
Copy link
Contributor

b-slim commented Feb 24, 2016

I will be working on adding some introspection endpoint via HTTP, that allows user to introspect lookups at run time.

@drcrallen
Copy link
Contributor

@b-slim what kind of introspection?

@b-slim
Copy link
Contributor

b-slim commented Feb 24, 2016

@drcrallen it will depends on what lookup impl exposes, for instance maybe a prefix search.

@drcrallen
Copy link
Contributor

@b-slim cool, just FYI, at this point I'm planning on:

  1. seeing cluster wide config over the finish line
  2. making sure the extensions get updated properly
  3. Adding more docs for QTL

and other than that hoping you can drive for some of the other features you've been working on. I THINK only the cluster wide config stuff is a blocker for your items.

@drcrallen
Copy link
Contributor

Support for cluster-wide config: #1576

@drcrallen
Copy link
Contributor

Snapshot lookups on node shutdown: #2517

@milimetric
Copy link

milimetric commented May 17, 2016

Are there plans for evolving lookups to help with slowly changing dimensions? QTL is useful if product id X is always a product called "Rubber Duck". But if the product name changes to "Rubber Ducky" on a specific date, it would be nice to specify that the query time lookup should use the date of the "fact table" record when doing the lookup. Just in case that didn't make sense:

datasource: product_sales

    {id: X, ts: 2013, sales: 123}
    {id: X, ts: 2014, sales: 234}
    {id: X, ts: 2015, sales: 345}

lookup: product information

    {ids: [X], ts: 2013, name: "Rubber Duck"}
    {ids: [X], ts: 2015, name: "Rubber Ducky"}

so when grouping sales by product name, I'd expect:

"Rubber Duck": (2013 + 2014 sales) 357 in sales
"Rubber Ducky": (only 2015 sales) 345 in sales

@drcrallen
Copy link
Contributor

I have not heard that in the plans and here's why:

There is a distinction between an attribute of an event, and an attribute of something that is not an event (an attribute of an attribute .... like a name for an ID).

What QTL is designed to handle, and what the immediate future of QTL (from our side at MMX, community discussion and contribution beyond our plans are always welcomed and encouraged) will handle is attributes of attributes. Where you have an attribute which describes an event, but that attribute is immutable. Attributes of those attributes may be mutable though. So you might have an attribute of a product being End Of Life, but suddenly it is doing well, so you extend the End Of Life, or something. The End Of Life is an attribute of the product ID, NOT of the events related to the product, and is therefore a candidate for being able to be included in QTL workflows in the current and immediate future incarnations.

What you are describing are first class immutable attributes of events, and are not currently supported. What you CAN do is have an immutable attribute of the event be product_name_id and have that be stamped with the event, then QTL can lookup the human-friendly name.

As always, such items are open to discussion.

@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2019
@stale
Copy link

stale bot commented Jul 5, 2019

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants