Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-933 Local index support to Phoenix #1

Closed
wants to merge 10 commits into from
Closed

Conversation

chrajeshbabu
Copy link
Contributor

No description provided.

@chrajeshbabu chrajeshbabu changed the title Local index PHOENIX-933 Local index support to Phoenix Jul 7, 2014
+ " INCLUDE (long_col1, long_col2)";
String ddl = null;
if(localIndex){
ddl = "CREATE INDEX " + INDEX_TABLE_NAME + " ON " + DATA_TABLE_FULL_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be CREATE LOCAL INDEX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow missed this. Corrected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the if and the else branch match. If that's expected/correct, can you get rid of the if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual code. Earlier by mistake I missed to add LOCAL in if branch so if else branches are same.
if (localIndex) {
ddl = "CREATE LOCAL INDEX " + INDEX_TABLE_NAME + " ON " + DATA_TABLE_FULL_NAME + " (date_col)";
} else {
ddl = "CREATE INDEX " + INDEX_TABLE_NAME + " ON " + DATA_TABLE_FULL_NAME + " (date_col)";
}

Now I have corrected it.

@JamesRTaylor
Copy link
Contributor

Thanks for the pull, @chrajeshbabu. Nice work! Looks like you need to rebase as there are merge conflicts currently.

@@ -100,7 +109,7 @@ public static void serializeIntoScan(Scan scan, int thresholdBytes, int limit, L
}
}

public static OrderedResultIterator deserializeFromScan(Scan scan, RegionScanner s) {
public static OrderedResultIterator deserializeFromScan(Scan scan, RegionScanner s, int offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit more you need to do to handle ORDER BY correctly. It'd be for the case in which a data column was referenced in the ORDER BY while the index table is being used to satisfy the query. Not here, but in OrderedResultIterator. In getResultIterator, in the beginning of the for loop, you'll need to wrap the result in the same way if there are dataColumns (i.e. call IndexUtil.wrapResultUsingOffset)

        for (Tuple result = delegate.next(); result != null; result = delegate.next()) {
            int pos = 0;
            ImmutableBytesWritable[] sortKeys = new ImmutableBytesWritable[numSortKeys];
            for (Expression expression : expressions) {
                final ImmutableBytesWritable sortKey = new ImmutableBytesWritable();
                boolean evaluated = expression.evaluate(result, sortKey);
                // set the sort key that failed to get evaluated with null
                sortKeys[pos++] = evaluated && sortKey.getLength() > 0 ? sortKey : null;
            }
            queueEntries.add(new ResultEntry(sortKeys, result));
        }

Without this, the table data column expressions that aren't in the local index will fail to evaluate. You should add a test for this too.

I think the cleanest way to handle this is by wrapping the ResultIterator passed into the OrderedResultIterator. Just create a new ResultIterator that delegates to the original one and does the wrapping necessary. Then you won't need to change OrderedResultIterator at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, nix this, as the innerScanner you pass to OrderedResultIterator already does all of this. You likely already have a test for the ORDER BY case I mentioned, but if not, please add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I will verify James.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bq. There's a bit more you need to do to handle ORDER BY correctly. It'd be for the case in which a data column was referenced in the ORDER BY while the index table is being used to satisfy the query.
This is working fine James. I have added test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Would you mind updating this pull request? I think it's ready to get committed, no?

@JamesRTaylor
Copy link
Contributor

Excellent job, @chrajeshbabu! A few minor issues, but overall this is great! Looking forward to seeing how it performs!

@chrajeshbabu
Copy link
Contributor Author

Thanks for review @JamesRTaylor. I have resolved the conflicts and handled all the comments locally. I will submit it once I verify OrderedResultIterator scenario.

assertFalse(rs.next());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks familiar. If it matches original (I think from QueryIT), can you move it into the base test class instead of copy/paste?

@@ -54,7 +55,8 @@ protected SkipRangeParallelIteratorRegionSplitter(StatementContext context, Tabl

public List<HRegionLocation> filterRegions(List<HRegionLocation> allTableRegions, final ScanRanges ranges) {
Iterable<HRegionLocation> regions;
if (ranges == ScanRanges.EVERYTHING) {
if (ranges == ScanRanges.EVERYTHING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary and will cause the skip scan not to be used (which will be horrible for point queries). It should work fine to do the skip scan for each region (much better than doing a full region scan for every region).

Can you remove it and add a test that does a point lookup with multiple values (for example, a IN clause with multiple values for an indexed column)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the change skip scan will be used James. The change is required because the key ranges generated by compiler won't be in the local index regions key range because local index rows have prefixed region start key extra. Without the change mostly no region will be selected for scanning.

QueryIT#testSimpleInListStatement is the test case verifies the same.
Here is the explain query result.

CLIENT PARALLEL 4-WAY SKIP SCAN ON 2 KEYS OVER _LOCAL_IDX_ATABLE [-32768,2] - [-32768,4]
SERVER FILTER BY FIRST KEY ONLY AND ORGANIZATION_ID = '00D300000000XHP'
CLIENT MERGE SORT

@JamesRTaylor
Copy link
Contributor

Related to the revert of the ScanRanges change, you'll need to make this change to prevent the SkipRangeParallelIteratorRegionSplitter from being used (as you always need to scan all regions for a local index). Cleanest might be to just implement a simple ParallelIteratorRegionSplitter for use when a local index is used that just returns all regions:

public class ParallelIteratorRegionSplitterFactory {

    public static ParallelIteratorRegionSplitter getSplitter(StatementContext context, TableRef table, HintNode hintNode) throws SQLException {
        if (!isLocalIndex && context.getScanRanges().useSkipScanFilter()) {
            return SkipRangeParallelIteratorRegionSplitter.getInstance(context, table, hintNode);
        }
        return DefaultParallelIteratorRegionSplitter.getInstance(context, table, hintNode);
    }
}

@chrajeshbabu
Copy link
Contributor Author

bq. Cleanest might be to just implement a simple ParallelIteratorRegionSplitter for use when a local index is used that just returns all regions:
I will add new ParallelIteratorRegionSplitter for local index and remove the unnecessary changes in SkipRangeParallelIteratorRegionSplitter/DefaultParallelIteratorRegionSplitter.

Then I will submit another pull request.

Thanks @JamesRTaylor

@JamesRTaylor
Copy link
Contributor

Sounds good, @chrajeshbabu. Thanks!

@asfgit asfgit closed this Aug 30, 2014
@asfgit asfgit deleted the local-index branch August 30, 2014 15:55
tkhurana added a commit to tkhurana/phoenix that referenced this pull request Jan 15, 2023
tkhurana added a commit to tkhurana/phoenix that referenced this pull request May 25, 2023
tkhurana added a commit that referenced this pull request Jul 28, 2023
#1632)

* PHOENIX-6897 Filters on unverified index rows return wrong result (#1597)

* PHOENIX-6897 Filters on unverified index rows return wrong result

* Fixed checkstyle and missing license warnings

* Addressed review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants