Fix distinctness/sort enforcing #1683

max-hoffman · 2023-03-31T19:56:43Z

The original bug: dolthub/dolt#5651 duplicates a RIGHT_SEMI_LOOKUP_JOIN row because we were distincting right full row rather than the subset of join attributes.

This PR adds some more tests around ordering and sort enforcing in the memo.

The overview is that DISTINCT is weird because it is something in-between a property of a relational expression and the property of a relational group. It is an enforcer that we can implement as an ORDERED_DISTINCT or ditch altogether when child nodes provide supportive sort orders. We could imagine bifurcating the memo into buckets, with expression groups sectioned into groups based on sort orders, and costing considering the cardinality of children plus conditional sort enforcers. More work needed to think through how PG and CRDB do this generally.

zachmu

Mostly looks good, just some comments on naming

zachmu · 2023-04-03T22:57:45Z

enginetest/join_op_tests.go

@@ -134,14 +134,14 @@ var joinCostTests = []struct {
 			// queries that test subquery hoisting
 			{
 				// case 1: condition uses columns from both sides
-				Query: "/*case1*/ select * from ab where exists (select * from xy where ab.a = xy.x + 3)",
+				Query: "/*+case1*/ select * from ab where exists (select * from xy where ab.a = xy.x + 3)",


What's the point of adding the + to these comments?

/* breaks go's test regex matching, /*+ doesn't

zachmu · 2023-04-03T23:12:55Z

sql/analyzer/memo.go

+			}
+			relCost += dCost
+		} else {
+			n.setDistinct(noDistinctOp)


Is this a no-op?

currently no. i was flip-flopping on whether the 0-code should be "unknown" or "no distinct"

zachmu · 2023-04-03T23:15:07Z

sql/analyzer/memo.go

@@ -548,6 +654,8 @@ type relBase struct {
 	c float64
 	// cnt is this relations output row count
 	cnt float64
+	// distinct indicates a relExpr should be checked for sort enforcement


You mean distinctness?

zachmu · 2023-04-03T23:16:53Z

sql/analyzer/memo.go

+	case *mergeJoin:
+		var ret sql.Schema
+		for _, e := range r.innerScan.idx.Expressions() {
+			parts := strings.Split(e, ".")


This is fraught, column names can in fact include a "."

Not much you can do right now, but note it

zachmu · 2023-04-03T23:18:32Z

sql/analyzer/memo.go

+// sortedOnDistinct returns true if a relation's inputs are sorted on the
+// full output schema. The OrderedDistinct operator can be used in this
+// case.
+func sortedOnDistinct(rel relExpr) bool {


Should be sortedInputs, no? None of this logic has anything to do with distinct

zachmu · 2023-04-03T23:20:05Z

sql/analyzer/memo.go

+
+func sortedColsForRel(rel relExpr) sql.Schema {
+	switch r := rel.(type) {
+	case *tableScan:


Is this check sufficient? We don't actually enforce that tables return their results in PK order, right? I think we do have an index sub interface that supports declaring that, maybe OrderedIndex?

We could maybe add an attribute to PK schema, or a OrderedTable interface to indicate sorting. We have a lot of tests that would break if Dolt PK's suddenly stopped being ordered. In practice all kinds of crazy corruptions happen when entries in the primary get out of order.

max-hoffman added 7 commits March 29, 2023 08:51

Fix partial join hints

c457064

merge main

dab5253

Add table aliases

94dbeed

more join hints

976a2a3

Fixup distinctness/sort enforcing

fc54459

edits

bbe3824

edits

9d5ee9c

max-hoffman requested a review from zachmu March 31, 2023 20:31

max-hoffman assigned zachmu Mar 31, 2023

zachmu approved these changes Apr 3, 2023

View reviewed changes

Base automatically changed from max/anti-join-hints to main April 4, 2023 01:10

merge main

278e5cc

max-hoffman merged commit 3b15f9f into main Apr 4, 2023

max-hoffman deleted the max/right-semi-join-distinct branch April 4, 2023 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix distinctness/sort enforcing #1683

Fix distinctness/sort enforcing #1683

max-hoffman commented Mar 31, 2023 •

edited

Loading

zachmu left a comment

zachmu Apr 3, 2023

max-hoffman Apr 4, 2023

zachmu Apr 3, 2023

max-hoffman Apr 4, 2023

zachmu Apr 3, 2023

zachmu Apr 3, 2023

zachmu Apr 3, 2023

zachmu Apr 3, 2023

max-hoffman Apr 4, 2023

Fix distinctness/sort enforcing #1683

Fix distinctness/sort enforcing #1683

Conversation

max-hoffman commented Mar 31, 2023 • edited Loading

zachmu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-hoffman commented Mar 31, 2023 •

edited

Loading