-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
This task is part of feature #18249 illustrating a join ranking system that prioritizes joins based on their own properties and those of their inputs.
Interesting Properties
Let’s highlight several useful properties that can influence join ranking:
- Partitioning on filter columns: If the input table is partitioned by the query’s filter column(s), filtering can be performed more efficiently.
- Partitioning on join columns: Partitioning by join key(s) enables non-overlapping partitions or streams, facilitating parallel join execution.
- Sorting on join columns: If the input is sorted on join key(s), merge join becomes a viable and efficient option.
- Sorting on filter columns: Sorting by filter column(s) can accelerate filtering operations.
- High selectivity (filtered input): When a table is heavily filtered, less data needs to be read and joined, improving performance.
- Many-to-one (m:1) join relationships: These reduce the risk of output size explosion.
- Small input size: Smaller inputs typically lead to faster join execution.
Some of these properties may carry more weight than others. The following table illustrates how we might assign relative importance to each.
Ranking the Joins
Using properties, weights and selection criteria in Table 1, let us compute join ranks for Figure 2 in previous section
Figure 2 (repeated): Join Graph Annotated with Join Ranked
Join Rank Calculation for L–O:
- O is partitioned on the filter column: 1 × 1 = 1
- L is sorted on the join column: 0.5 × 1 = 0.5
- O is sorted on the join column: 0.5 × 1 = 0.5
- O is filtered on some date: 1 × 1 = 1
- L–O has a many-to-one (m:1) relationship: 1 × 1 = 1
Total join rank for L–O = 1 + 0.5 + 0.5 + 1 + 1 = 4
Using the same approach, we can compute the ranks for the remaining joins.
Re-ranking after each round
After a join is performed, certain input properties may be lost, requiring join ranks to be recomputed. As shown in Figure 12, the join rank between O and C drops to 2 after O is joined with L, whereas it was previously 4.
Figure 12: Level-1 Partial Plan (1)
Join Ranking Summary
As shown, a join ranking system can vary based on:
- The properties it considers
- The weights assigned to those properties
- The criteria used for selecting or prioritizing joins
This suggests we can design a flexible join ranking framework—one that allows customization of properties, weights, and selection logic.