Join and subquery support #5664

siddharthteotia · 2020-07-07T23:55:55Z

There are two standard data-modeling approaches in analytical databases.

Star Schema

This is the de-facto standard to model data in data-warehouses to efficiently run OLAP (analytical / BI) style queries. We have a single fact table (containing measures / numeric data, e.g sales) surrounded by one or more dimension tables (e.g product). There is no relationship between dimension tables themselves. The only join path is between fact and dimension tables. In other words, you get a [1 : many] join since a single record in dimension table can be associated with multiple records in fact table.

One of the key things to note about modeling data this way is that dimension tables are denormalized (which again reiterates the fact that there is no relationship between two dimension tables).

Snowflake Schema

It is similar to star schema in the sense that there is a fact table and dimension tables. The key difference is that at least some of the dimension tables are normalized (thus leading to more dimension tables). This way you also establish relationship between multiple dimension tables as well.

The potential problem with this schema is normalization and thus resorting to join often even for simple queries. Not only that, writing a join query is way more complex in snowflake as compared to star. This is the main reason why star schema is generally the preferred choice in OLAP world. Managing the schema (I mean the collection of tables) is also complicated in this case.

There are few other ways to model as well but the above two are standard ones. The modeling step is going to be critical since it will dictate (to some extent) what kind of joins we support, complexity of such queries and the complexity at the user end to write such queries.

Standard Distributed Join Techniques

Now regardless of what we do above, there are couple of ways to implement join in distributed query engines like Pinot/Presto/Kusto/Spark etc and broadcast join is one of them.

Broadcast join is a common way to execute standard star schema join where we join a large fact table with smaller dimension table(s). The smaller table is then broadcasted to each server for the server to execute a local in-memory join (potentially a hash join by using the dimension table as the build side of the join and fact table as the probe side). The reducer/aggregator layer can do the final processing. In the spark community, this is commonly referred to as map-side join.

Co-located join for partitioned tables

Another way is to do a co-located join for partitioned tables. Let's say we are joining tables T1 and T2 on the join key column K. If both tables are partitioned on the join key column with the same uniform partition function, then each node essentially has all the data locally to execute it's side of the join without any data movement (as involved in broadcast join).

There are more known ways (shuffle) to do distributed join with varying degree of data movement, complexity etc. However, I feel the modeling approach that we will adopt along with any restrictions should be the first thing to get clarity on and this will also be determined by (to some extent) what our users currently expect from Pinot in terms of join. I feel we should start looking at join in Pinot with limited support for star schema join (where we try to restrict the number of dimension tables) as a reasonable starting point. We can also look at dimension-to-dimension join but we need to be more careful when modeling the data for such scenario.

I will start creating a proposal cum design document and share with the community. Meanwhile, we can use the issue to have some discussion on requirements etc.

dharakk · 2020-10-28T19:08:58Z

Design document for a star schema join involving a fact table and small size dim tables: https://docs.google.com/document/d/1InWmxbRqwcqIakzvoEWHLxtX4XR9H5L01256EbAUHV8/edit?usp=sharing

This design require the dim table to be available on each host hence the restriction on size. Future work for this will involve lifting this restriction by implementing a broadcast join operator, which will enable larger and partitioned dim tables and help solve more generic join use cases.

siddharthteotia · 2021-07-12T23:27:31Z

At LinkedIn, we are exploring a combined design for subqueries and distributed joins since design / architecture wise there might be overlap and it is good to think of both design wise.

cc @amrishlal @jackjlli

Once we have had some preliminary thought / initial design, we will share with community in the next couple of weeks/months to iterate upon and collaborate further.

atris · 2021-07-13T03:35:55Z

Great to see this. I am excited about this feature - but I don't see a strict need for subqueries to be designed in conjunction with joins. IMO subqueries are simpler and have independent use cases (nested aggregates).

I have been working on a sub query only proposal that I plan to share today. Please share your inputs and feedback on it, and let's ensure its aligned with your overall plan of join support

siddharthteotia · 2021-07-13T06:35:58Z

@atris, Subqueries can broadly be divided into two categories

inner query (and there could be multiple of these) referring to same table as outer query
inner query (and there could be multiple of these) referring to different tables not necessarily same as outer query.

Since the overall query might touch more than 1 table and might require us to consider data movement / exchange operators (as in the case of join), we think it is better to think about both during design. This is the reason why we want to approach them together in the design phase to prevent situation where we go ahead and design, implement one of the two features and later on find ourselves retro-fitting the other one.

atris · 2021-07-13T06:37:47Z

If the outer query is being executed on a broker node, on top of the result of the underlying topmost Combine operator, would this still be a problem?

siddharthteotia · 2021-07-13T06:40:54Z

I don't completely follow that. Can you give an example ? Also, are you approaching this from making subqueries work for a particular scenario ?

MrNeocore · 2021-09-04T11:31:23Z

Any update on this topic ?

mcvsubbu · 2021-11-01T23:10:24Z

@kishoreg is anyone working on this?

kishoreg · 2021-11-01T23:14:40Z

We are thinking about it- we will fix the query-spi layer first so that the implementation is less invasive.

siddharthteotia self-assigned this Jul 7, 2020

siddharthteotia added the feature label Jul 7, 2020

siddharthteotia mentioned this issue Oct 26, 2020

Fact-dim join via Lookup UDF #6191

Closed

siddharthteotia changed the title ~~Join support~~ Join and subquery support Jul 12, 2021

siddharthteotia mentioned this issue Jul 12, 2021

Support query result post-aggregation / nested query #7043

Open

siddharthteotia mentioned this issue Mar 10, 2022

Add support for JOIN - Multi-stage execution query engine #8260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join and subquery support #5664

Join and subquery support #5664

siddharthteotia commented Jul 7, 2020 •

edited

Loading

dharakk commented Oct 28, 2020

siddharthteotia commented Jul 12, 2021

atris commented Jul 13, 2021

siddharthteotia commented Jul 13, 2021

atris commented Jul 13, 2021

siddharthteotia commented Jul 13, 2021

MrNeocore commented Sep 4, 2021

mcvsubbu commented Nov 1, 2021

kishoreg commented Nov 1, 2021

Join and subquery support #5664

Join and subquery support #5664

Comments

siddharthteotia commented Jul 7, 2020 • edited Loading

dharakk commented Oct 28, 2020

siddharthteotia commented Jul 12, 2021

atris commented Jul 13, 2021

siddharthteotia commented Jul 13, 2021

atris commented Jul 13, 2021

siddharthteotia commented Jul 13, 2021

MrNeocore commented Sep 4, 2021

mcvsubbu commented Nov 1, 2021

kishoreg commented Nov 1, 2021

siddharthteotia commented Jul 7, 2020 •

edited

Loading