Skip to content

Join Ranking #18257

@NGA-TRAN

Description

@NGA-TRAN

This task is part of feature #18249 illustrating a join ranking system that prioritizes joins based on their own properties and those of their inputs.

Interesting Properties

Let’s highlight several useful properties that can influence join ranking:

  • Partitioning on filter columns: If the input table is partitioned by the query’s filter column(s), filtering can be performed more efficiently.
  • Partitioning on join columns: Partitioning by join key(s) enables non-overlapping partitions or streams, facilitating parallel join execution.
  • Sorting on join columns: If the input is sorted on join key(s), merge join becomes a viable and efficient option.
  • Sorting on filter columns: Sorting by filter column(s) can accelerate filtering operations.
  • High selectivity (filtered input): When a table is heavily filtered, less data needs to be read and joined, improving performance.
  • Many-to-one (m:1) join relationships: These reduce the risk of output size explosion.
  • Small input size: Smaller inputs typically lead to faster join execution.

Some of these properties may carry more weight than others. The following table illustrates how we might assign relative importance to each.

Image

Ranking the Joins

Using properties, weights and selection criteria in Table 1, let us compute join ranks for Figure 2 in previous section

Image

Figure 2 (repeated): Join Graph Annotated with Join Ranked

Join Rank Calculation for L–O:

  • O is partitioned on the filter column: 1 × 1 = 1
  • L is sorted on the join column: 0.5 × 1 = 0.5
  • O is sorted on the join column: 0.5 × 1 = 0.5
  • O is filtered on some date: 1 × 1 = 1
  • L–O has a many-to-one (m:1) relationship: 1 × 1 = 1

Total join rank for L–O = 1 + 0.5 + 0.5 + 1 + 1 = 4

Using the same approach, we can compute the ranks for the remaining joins.

Re-ranking after each round

After a join is performed, certain input properties may be lost, requiring join ranks to be recomputed. As shown in Figure 12, the join rank between O and C drops to 2 after O is joined with L, whereas it was previously 4.

Image

Figure 12: Level-1 Partial Plan (1)

Join Ranking Summary

As shown, a join ranking system can vary based on:

  • The properties it considers
  • The weights assigned to those properties
  • The criteria used for selecting or prioritizing joins

This suggests we can design a flexible join ranking framework—one that allows customization of properties, weights, and selection logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions