Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move index costing into join planning phase #2191

Merged
merged 7 commits into from
Dec 14, 2023
Merged

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Dec 7, 2023

Put index costing inside join planning, so that in the future join planning will have better cardinalities (statistics) for join ordering. Most of the changes will look like refactoring the way we expression index lookups in the memo. I attempted to do this in a way that makes as few changes as possible to join planning; the goal here is to set me up for rewriting cardinality checks with stats objects. It didn't go as cleanly as I wanted, I ended up shifting a lot of join plans back to lookup plans because HASH_JOIN was beating LOOKUP_JOIN in several key places.

One downside of the current PR is that it converts a sysbench MERGE_JOIN into a LOOKUP_JOIN. I would prefer fixing this in the next PR when I do a bigger costing overhaul.

Variety of fixes for join hinting, correctness, etc.

At some point we appeared to fix this:
#1893

@max-hoffman max-hoffman marked this pull request as ready for review December 8, 2023 22:51
@max-hoffman max-hoffman changed the title Index cost refactor Move index costing into join planning phase Dec 8, 2023
Copy link
Contributor

@nicktobey nicktobey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like memo.Lookup and memo.BuildLookup are now unused and can be deleted.

return grp
}

func (m *Memo) MemoizeConcatLookupJoin(grp, left, right *ExprGroup, op plan.JoinType, filter []sql.Expression, lookups []*IndexScan) *ExprGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what a "Concat Lookup Join" is and why this isn't just "MemoizeLookupJoin". I suspect it's a single node that does multiple lookups on multiple indexes, but I'm not 100% sure. Can you leave a docstring?

sql/memo/memo.go Outdated
@@ -213,6 +257,21 @@ func (m *Memo) MemoizeProject(grp, child *ExprGroup, projections []sql.Expressio
return grp
}

func (m *Memo) MemoizeIta(grp *ExprGroup, ita *plan.IndexedTableAccess, alias string, index *Index) *ExprGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does ITA stand for? Can you add a docstring?

sql/memo/memo.go Outdated
@@ -297,6 +356,7 @@ func (m *Memo) optimizeMemoGroup(grp *ExprGroup) error {
n = n.Next()
}

grp.fixEnforcers()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this call needed? A comment would help.

func (e *ExprGroup) fixEnforcers() {
switch n := e.Best.(type) {
case *MergeJoin:
// todo: no ITA children that aren't the same index as sorting index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are unclear. Can you elaborate?

}
return result, nil
}

// fixEnforcers edits the children of a new best plan to account
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is confusing. Can you maybe give an example?

}

// Update best to a DFS path to a tablescan
func (e *ExprGroup) fixItaConflict() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you may want this to be a separate method from findTableScanPath, but I think it needs a docstring that explains how this fixes ita conflicts (and maybe what an ita conflict is.)

}

// create ranges, lookup, ITA for best indexScan
// TODO pass up FALSE filter information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is vague.

var retFilters []sql.Expression
if !iat.PreciseMatch() {
// cannot drop any filters
//itaGrp = m.MemoizeIta(nil, ret, aliasName, idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this commented out code?

return l*(cpuCostFactor+randIOCostFactor) - r*seqIOCostFactor - l*seqIOCostFactor, nil
}
if l*r*sel < l {
// 1 - (total rows - covered rows / total rows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this comment more clear?

if isInjectiveLookup(lookup.Index, n.JoinBase, lookup.Table.Expressions(), lookup.Table.NullMask()) {
sel = 0
} else {
sel = lookupJoinSelectivity(lookup) * optimisticJoinSel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably pull the isInjectiveLookup check (and multiplying by optimisticJoinSel) into lookupJoinSelectivity. This pattern repeats everywhere that lookupJoinSelectivity is called.

@max-hoffman max-hoffman merged commit 5b03152 into main Dec 14, 2023
7 checks passed
@max-hoffman max-hoffman deleted the max/index-cost-refactor branch December 14, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants