Test for deadlock by Stolb27 · Pull Request #67 · arenadata/gpdb

Stolb27 · 2020-04-24T00:46:10Z

According to https://github.com/greenplum-db/gpdb/pull/9985#issuecomment-618426312

src/test/regress/expected/fts_deadlock.out

src/test/regress/sql/fts_deadlock.sql

Stolb27 · 2020-04-29T00:22:14Z

Request for test cancelled.

Fix Orca cost model to prefer hashing smaller tables Previously in Orca it was possible to achieve bad hash join plans that hashed a much bigger table. This happened because in Orca's cost model there is a cost associated with columns used in the join conditions, and this cost was smaller when tuples are hashed than when tuples fed from an outer child. This doesn't really make sense since it could make Orca hash a bigger table if there are enough join conditions, no matter how much bigger this table is. To make sure this never happens, increase the cost per join column for inner child, so that it is bigger than for outer child (same as cost per byte already present). Additionally, Orca increased cost per join column for outer child when spilling was predicted, which doesn't make sense either since there is no additional hashing when spilling is enabled. Postgres planner only imposes additional per-byte (or rather per-page) cost when spilling hash join, so Orca should have the same per-join-column cost for both spilling and non-spilling cases. A lot of tests are affected by this change, but for most of them only costs are changed. For some, hash joins are reordered, swapping inner and outer children, since Orca previously hashed the bigger child in some cases. In case of LOJNullRejectingZeroPlacePredicates.mdp this actually restored the old plan specified in the comment. Also add a new regress test. One common change in some tests are replacing Hash Semi Join with a regular Hash Join + Sort + GroupAggregate. There is only Left Semi Join, so swapping the inner and outer children is impossible in case of semi joins. This means that it's slightly cheaper to convert Hash Semi Join to regular Hash Join to be able to swap the children. The opposite conversion also takes place where previously GroupAggregate was used. Another common change is replacing HashJoin(table1, Broadcast(table2)) gets replaced with HashJoin(Redistribute(table1), Redistribute(table2)), adding another slice. This happens because the cost for hashing is now slightly bigger, and so Orca prefers to split hashing table2 to all segments, instead of every segment hashing all rows as it would be with Broadcast. Below are some notable changes in minidump files: - ExtractPredicateFromDisjWithComputedColumns.mdp This patch changed the join order from ((cust, sales), datedim) to ((datedim, sales), cust). All three tables are identical from Orca's point of view: they are all empty and all table scans are 24 bytes wide, so there is no reason for Orca to prefer one join order over the other since they all have the same cost. - HAWQ-TPCH-Stat-Derivation.mdp The only change in the plan is swapping children on 3rd Hash Join in the plan, one involving lineitem_ao_column_none_level0 and HashJoin(partsupp_ao_column_none_level0, part_ao_column_none_level0). lineitem_ao_column_none_level0 is predicted to have approximately 22 billion rows and the hash join is predicted to have approximately 10 billion rows, so making the hash join the inner child is good in this case, since the smaller relation is hashed. - Nested-Setops-2.mdp Same here. Two swaps were performed between dept and emp in two different places. dept contains 1 row and emp contains 10001, so it's better if dept is hashed. A Redistribute Motion was also replaced with Broadcast Motion in both cases. - TPCH-Q5.mdp Probably the best improvement out of these plans. The previous plan had this join order: ``` -> Hash Join (6,000,000 rows) -> Hash Join (300,000,000 rows) -> lineitem (1,500,000,000 rows) -> Hash Join (500,000 rows) -> supplier (2,500,000 rows) -> Hash Join (5 rows) -> nation (25 rows) -> region (1 row) -> Hash Join (100,000,000 rows) -> customer (40,000,000 rows) -> orders (100,000,000 rows) ``` which contains hashing 100 million rows twice (first order, then its hash join with customer). The new plan has no such issues: ``` -> Hash Join (6,000,000 rows) -> Hash Join (170,000,000 rows) -> lineitem (1,500,000,000 rows) -> Hash Join (20,000,000 rows) -> orders (100,000,000 rows) -> Hash Join (7,000,000 rows) -> customer (40,000,000 rows) -> Hash Join (5 rows) -> nation (25 rows) -> region (1 row) -> supplier (2,500,000 rows) ``` This plan only hashes around 30 million rows in total, much better than 200 million. Ticket: ADBDEV-8413

Stolb27 added 2 commits April 24, 2020 10:23

test for deadlock

81ad084

fixed response

8afab4d

Stolb27 requested review from darthunix and leskin-in April 24, 2020 00:46

Stolb27 commented Apr 24, 2020

View reviewed changes

src/test/regress/expected/fts_deadlock.out Outdated Show resolved Hide resolved

darthunix reviewed Apr 24, 2020

View reviewed changes

src/test/regress/sql/fts_deadlock.sql Outdated Show resolved Hide resolved

darthunix reviewed Apr 24, 2020

View reviewed changes

src/test/regress/sql/fts_deadlock.sql Outdated Show resolved Hide resolved

fix review issues

9c7bbeb

Stolb27 requested a review from darthunix April 24, 2020 02:22

darthunix approved these changes Apr 27, 2020

View reviewed changes

Stolb27 closed this Apr 29, 2020

Stolb27 deleted the ADBDEV-785 branch February 12, 2026 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test for deadlock#67

Test for deadlock#67
Stolb27 wants to merge 3 commits intodeadlock_recover_ftsfrom
ADBDEV-785

Stolb27 commented Apr 24, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Stolb27 commented Apr 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Stolb27 commented Apr 24, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Stolb27 commented Apr 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments