I have been playing around with various examples and found a lot of cases were the index scoring is not working out that well w.r.t choosing the index with the most restrictive spans.
The scoring is based on how many columns we can constrain, but this is not a very good criterion. If the range on the first column is not restricted to exact values, the constraints on the other columns don't have much value; e.g. /1/4/4-/100/4/5 is marginally better than /1-/100 (they both contain the full /2-/99 range); whereas an index where we get just /4/4-/4/5 would be a much better choice, even though we are constraining only two columns.
I think a better strategy would be to generate the actual spans and score based on them. We can estimate a cost for each span based the "spread" between the start and end keys, in particular on how many columns they "agree" (fixing those columns to a single value).
I have been playing around with various examples and found a lot of cases were the index scoring is not working out that well w.r.t choosing the index with the most restrictive spans.
The scoring is based on how many columns we can constrain, but this is not a very good criterion. If the range on the first column is not restricted to exact values, the constraints on the other columns don't have much value; e.g.
/1/4/4-/100/4/5is marginally better than/1-/100(they both contain the full/2-/99range); whereas an index where we get just/4/4-/4/5would be a much better choice, even though we are constraining only two columns.I think a better strategy would be to generate the actual spans and score based on them. We can estimate a cost for each span based the "spread" between the start and end keys, in particular on how many columns they "agree" (fixing those columns to a single value).