Skip to content
This repository has been archived by the owner on Jun 7, 2021. It is now read-only.

[TRAFODION-2537] Add file_desc to salted secondary indexes indexes_desc #1012

Merged
merged 1 commit into from Mar 17, 2017

Conversation

DaveBirdsall
Copy link
Contributor

@DaveBirdsall DaveBirdsall commented Mar 15, 2017

The problem was a query on a large table with a salted index chose a serial plan when a parallel plan on the salted index would have been superior.

The cause was a bit of missing logic. Function createNAFileSets (optimizer/NATable.cpp) relies on the presence of a files descriptor in the indexes descriptor to deduce that an index is salted. The code that generates these descriptors, Generator::createVirtualTableDesc (generator/Generator.cpp) however only creates a files descriptor for the clustering key or primary key indexes descriptor. So, the fix was to add logic to Generator::createVirtualTableDesc to create a files descriptor when a secondary index is salted.

Note that the only form of salting supported on indexes today is SALT LIKE TABLE. If in the future we add support for salting an index on a different set of columns and a different set of partitions than the table, many additional changes will be needed.

Two regression tests show changes in plan as a result of this change. In hive/EXPECTED017, index scans that formerly were serial are now parallel. In seabase/EXPECTED010, a table scan is now an index scan, though it is still serial. (Note: The one plan change for seabase/EXPECTED010 is near the bottom.)

See the JIRA for a discussion of the performance implications of this change.

@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1659/

@selvaganesang
Copy link
Contributor

Nice change to get the boost in performance. Is it necessary to do a full index scan to get the parallel index scan plan?

@Traf-Jenkins
Copy link

@sureshsubbiah
Copy link
Contributor

Good fix Dave. I likely caused this when salted indexes were initially added. Have been concerned about poor plans for index scans, but never thought of looking in the generator.

Could you please take a look at https://issues.apache.org/jira/browse/TRAFODION-2512? I am curious to see if this change solves the problem in 2512 too.

@DaveBirdsall
Copy link
Contributor Author

@selvaganesang, I just tried a query with a key predicate on a leading index column and got a parallel plan. So we do get parallel plans on index scans even if we aren't doing full scans. I tried one with an equality predicate (expected result row count of only 4) and it chose a serial index scan plan. I think that's a good choice in my example as it did a unique access on each region.

@DaveBirdsall
Copy link
Contributor Author

@sureshsubbiah, this fix might help JIRA TRAFODION-2512 in that it might pick a parallel index scan. However, there's a known issue with RangeSpecs and MDAM costing code: When RangeSpecs are present, the Optimizer chooses to traverse all columns in the key of the index if MDAM is chosen, when in this example, it could just do MDAM on the first column. For a discussion of that issue, see https://issues.apache.org/jira/browse/TRAFODION-1641. I'll play with the example in 2512 and see how it behaves.

@DaveBirdsall
Copy link
Contributor Author

@sureshsubbiah, I tried an example like the one in JIRA TRAFODION-2512. I used a table 1/10th of the size of the one in the case because I was on a workstation rather than a cluster. I have good news and bad news. The good news is we indeed now choose a parallel plan with an index scan, and the index scan uses MDAM. The bad news is (as I feared above) the MDAM access traverses all the columns, when it could stop at column B. So, it might be an improvement but we could do much better once the second issue from JIRA TRAFODION-1641 is properly fixed.

@DaveBirdsall
Copy link
Contributor Author

I will merge this tomorrow morning if there are no negative comments.

@DaveBirdsall
Copy link
Contributor Author

@sureshsubbiah, I executed the statement SS using the parallel index scan on my workstation; it executed in 9 seconds elapsed. I then reprepared it after setting CQD HIDE_INDEXES 'OFF' so that I'd get the old plan (table access). It executes in 23 seconds. So with this particular data set, the query is improved with this fix, even though the use of MDAM is not optimal.

@asfgit asfgit merged commit e5e1356 into apache:master Mar 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants