Skip to content

Test for deadlock#67

Closed
Stolb27 wants to merge 3 commits intodeadlock_recover_ftsfrom
ADBDEV-785
Closed

Test for deadlock#67
Stolb27 wants to merge 3 commits intodeadlock_recover_ftsfrom
ADBDEV-785

Conversation

@Stolb27
Copy link
Collaborator

@Stolb27 Stolb27 commented Apr 24, 2020

@Stolb27 Stolb27 requested review from darthunix and leskin-in April 24, 2020 00:46
@Stolb27 Stolb27 requested a review from darthunix April 24, 2020 02:22
@Stolb27
Copy link
Collaborator Author

Stolb27 commented Apr 29, 2020

Request for test cancelled.

@Stolb27 Stolb27 closed this Apr 29, 2020
RekGRpth pushed a commit that referenced this pull request Dec 3, 2025
Fix Orca cost model to prefer hashing smaller tables

Previously in Orca it was possible to achieve bad hash join plans that hashed
a much bigger table. This happened because in Orca's cost model there is a
cost associated with columns used in the join conditions, and this cost was
smaller when tuples are hashed than when tuples fed from an outer child. This
doesn't really make sense since it could make Orca hash a bigger table if
there are enough join conditions, no matter how much bigger this table is.

To make sure this never happens, increase the cost per join column for inner
child, so that it is bigger than for outer child (same as cost per byte
already present).

Additionally, Orca increased cost per join column for outer child when
spilling was predicted, which doesn't make sense either since there is no
additional hashing when spilling is enabled. Postgres planner only imposes
additional per-byte (or rather per-page) cost when spilling hash join, so Orca
should have the same per-join-column cost for both spilling and non-spilling
cases.

A lot of tests are affected by this change, but for most of them only costs
are changed. For some, hash joins are reordered, swapping inner and outer
children, since Orca previously hashed the bigger child in some cases. In case
of LOJNullRejectingZeroPlacePredicates.mdp this actually restored the old plan
specified in the comment. Also add a new regress test.

One common change in some tests are replacing Hash Semi Join with a regular
Hash Join + Sort + GroupAggregate. There is only Left Semi Join, so swapping
the inner and outer children is impossible in case of semi joins. This means
that it's slightly cheaper to convert Hash Semi Join to regular Hash Join to
be able to swap the children. The opposite conversion also takes place where
previously GroupAggregate was used.

Another common change is replacing HashJoin(table1, Broadcast(table2)) gets
replaced with HashJoin(Redistribute(table1), Redistribute(table2)), adding
another slice. This happens because the cost for hashing is now slightly
bigger, and so Orca prefers to split hashing table2 to all segments, instead
of every segment hashing all rows as it would be with Broadcast.

Below are some notable changes in minidump files:

- ExtractPredicateFromDisjWithComputedColumns.mdp

This patch changed the join order from ((cust, sales), datedim) to ((datedim,
sales), cust). All three tables are identical from Orca's point of view: they
are all empty and all table scans are 24 bytes wide, so there is no reason for
Orca to prefer one join order over the other since they all have the same cost.

- HAWQ-TPCH-Stat-Derivation.mdp

The only change in the plan is swapping children on 3rd Hash Join in the plan,
one involving lineitem_ao_column_none_level0 and
HashJoin(partsupp_ao_column_none_level0, part_ao_column_none_level0).
lineitem_ao_column_none_level0 is predicted to have approximately 22 billion
rows and the hash join is predicted to have approximately 10 billion rows, so
making the hash join the inner child is good in this case, since the smaller
relation is hashed.

- Nested-Setops-2.mdp

Same here. Two swaps were performed between dept and emp in two different
places. dept contains 1 row and emp contains 10001, so it's better if dept is
hashed. A Redistribute Motion was also replaced with Broadcast Motion in both
cases.

- TPCH-Q5.mdp

Probably the best improvement out of these plans. The previous plan had this
join order:

```
-> Hash Join (6,000,000 rows)
   -> Hash Join (300,000,000 rows)
      -> lineitem (1,500,000,000 rows)
      -> Hash Join (500,000 rows)
         -> supplier (2,500,000 rows)
         -> Hash Join (5 rows)
            -> nation (25 rows)
            -> region (1 row)
   -> Hash Join (100,000,000 rows)
      -> customer (40,000,000 rows)
      -> orders (100,000,000 rows)
```

which contains hashing 100 million rows twice (first order, then its hash join
with customer). The new plan has no such issues:

```
-> Hash Join (6,000,000 rows)
   -> Hash Join (170,000,000 rows)
      -> lineitem (1,500,000,000 rows)
      -> Hash Join (20,000,000 rows)
         -> orders (100,000,000 rows)
         -> Hash Join (7,000,000 rows)
            -> customer (40,000,000 rows)
            -> Hash Join (5 rows)
               -> nation (25 rows)
               -> region (1 row)
   -> supplier (2,500,000 rows)
```

This plan only hashes around 30 million rows in total, much better than 200
million.

Ticket: ADBDEV-8413
@Stolb27 Stolb27 deleted the ADBDEV-785 branch February 12, 2026 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments