[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc #1012

DaveBirdsall · 2017-03-15T20:50:01Z

The problem was a query on a large table with a salted index chose a serial plan when a parallel plan on the salted index would have been superior.

The cause was a bit of missing logic. Function createNAFileSets (optimizer/NATable.cpp) relies on the presence of a files descriptor in the indexes descriptor to deduce that an index is salted. The code that generates these descriptors, Generator::createVirtualTableDesc (generator/Generator.cpp) however only creates a files descriptor for the clustering key or primary key indexes descriptor. So, the fix was to add logic to Generator::createVirtualTableDesc to create a files descriptor when a secondary index is salted.

Note that the only form of salting supported on indexes today is SALT LIKE TABLE. If in the future we add support for salting an index on a different set of columns and a different set of partitions than the table, many additional changes will be needed.

Two regression tests show changes in plan as a result of this change. In hive/EXPECTED017, index scans that formerly were serial are now parallel. In seabase/EXPECTED010, a table scan is now an index scan, though it is still serial. (Note: The one plan change for seabase/EXPECTED010 is near the bottom.)

See the JIRA for a discussion of the performance implications of this change.

Traf-Jenkins · 2017-03-15T20:51:14Z

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1659/

selvaganesang · 2017-03-15T23:28:11Z

Nice change to get the boost in performance. Is it necessary to do a full index scan to get the parallel index scan plan?

Traf-Jenkins · 2017-03-16T00:20:39Z

Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/1659/

sureshsubbiah · 2017-03-16T01:33:12Z

Good fix Dave. I likely caused this when salted indexes were initially added. Have been concerned about poor plans for index scans, but never thought of looking in the generator.

Could you please take a look at https://issues.apache.org/jira/browse/TRAFODION-2512? I am curious to see if this change solves the problem in 2512 too.

DaveBirdsall · 2017-03-16T22:44:05Z

@selvaganesang, I just tried a query with a key predicate on a leading index column and got a parallel plan. So we do get parallel plans on index scans even if we aren't doing full scans. I tried one with an equality predicate (expected result row count of only 4) and it chose a serial index scan plan. I think that's a good choice in my example as it did a unique access on each region.

DaveBirdsall · 2017-03-16T22:55:31Z

@sureshsubbiah, this fix might help JIRA TRAFODION-2512 in that it might pick a parallel index scan. However, there's a known issue with RangeSpecs and MDAM costing code: When RangeSpecs are present, the Optimizer chooses to traverse all columns in the key of the index if MDAM is chosen, when in this example, it could just do MDAM on the first column. For a discussion of that issue, see https://issues.apache.org/jira/browse/TRAFODION-1641. I'll play with the example in 2512 and see how it behaves.

DaveBirdsall · 2017-03-16T23:36:55Z

@sureshsubbiah, I tried an example like the one in JIRA TRAFODION-2512. I used a table 1/10th of the size of the one in the case because I was on a workstation rather than a cluster. I have good news and bad news. The good news is we indeed now choose a parallel plan with an index scan, and the index scan uses MDAM. The bad news is (as I feared above) the MDAM access traverses all the columns, when it could stop at column B. So, it might be an improvement but we could do much better once the second issue from JIRA TRAFODION-1641 is properly fixed.

DaveBirdsall · 2017-03-16T23:37:19Z

I will merge this tomorrow morning if there are no negative comments.

DaveBirdsall · 2017-03-16T23:41:23Z

@sureshsubbiah, I executed the statement SS using the parallel index scan on my workstation; it executed in 9 seconds elapsed. I then reprepared it after setting CQD HIDE_INDEXES 'OFF' so that I'd get the old plan (table access). It executes in 23 seconds. So with this particular data set, the query is improved with this fix, even though the use of MDAM is not optimal.

[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc

e5e1356

asfgit merged commit e5e1356 into apache:master Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc #1012

[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc #1012

DaveBirdsall commented Mar 15, 2017 •

edited

Traf-Jenkins commented Mar 15, 2017

selvaganesang commented Mar 15, 2017

Traf-Jenkins commented Mar 16, 2017

sureshsubbiah commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc #1012

[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc #1012

Conversation

DaveBirdsall commented Mar 15, 2017 • edited

Traf-Jenkins commented Mar 15, 2017

selvaganesang commented Mar 15, 2017

Traf-Jenkins commented Mar 16, 2017

sureshsubbiah commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 16, 2017

DaveBirdsall commented Mar 15, 2017 •

edited