-
Notifications
You must be signed in to change notification settings - Fork 1k
PHOENIX-5932 View Index rebuild results in surplus rows from other vi… #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,6 +60,7 @@ | |
| import org.apache.hadoop.hbase.client.Scan; | ||
| import org.apache.hadoop.hbase.coprocessor.ObserverContext; | ||
| import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment; | ||
| import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter; | ||
| import org.apache.hadoop.hbase.io.ImmutableBytesWritable; | ||
| import org.apache.hadoop.hbase.ipc.RpcControllerFactory; | ||
| import org.apache.hadoop.hbase.ipc.controller.InterRegionServerIndexRpcControllerFactory; | ||
|
|
@@ -87,6 +88,7 @@ | |
| import org.apache.phoenix.expression.aggregator.Aggregator; | ||
| import org.apache.phoenix.expression.aggregator.Aggregators; | ||
| import org.apache.phoenix.expression.aggregator.ServerAggregators; | ||
| import org.apache.phoenix.filter.AllVersionsIndexRebuildFilter; | ||
| import org.apache.phoenix.hbase.index.Indexer; | ||
| import org.apache.phoenix.hbase.index.ValueGetter; | ||
| import org.apache.phoenix.hbase.index.covered.update.ColumnReference; | ||
|
|
@@ -1081,7 +1083,15 @@ private RegionScanner rebuildIndices(final RegionScanner innerScanner, final Reg | |
| rawScan.setRaw(true); | ||
| rawScan.setMaxVersions(); | ||
| rawScan.getFamilyMap().clear(); | ||
| rawScan.setFilter(null); | ||
| // For rebuilds we use count (*) as query for regular tables which ends up setting the FKOF on scan | ||
| // This filter doesn't give us all columns and skips to the next row as soon as it finds 1 col | ||
| // For rebuilds we need all columns and all versions | ||
| if (scan.getFilter() instanceof FirstKeyOnlyFilter) { | ||
| rawScan.setFilter(null); | ||
| } else if (scan.getFilter() != null) { | ||
| // Override the filter so that we get all versions | ||
| rawScan.setFilter(new AllVersionsIndexRebuildFilter(scan.getFilter())); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to do any special filter logic down at line 1099 in the else block if the Scan was raw in the first place?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK we get raw scan here in case of old design and partial rebuild (Correct me if i'm wrong here @kadirozde ). I didn't want to mess with the old design and hence only made the changes for new.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we get raw scan only for the old design partial rebuilds (i.e., auto-rebuilds). |
||
| rawScan.setCacheBlocks(false); | ||
| for (byte[] family : scan.getFamilyMap().keySet()) { | ||
| rawScan.addFamily(family); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.phoenix.filter; | ||
|
|
||
| import java.io.IOException; | ||
|
|
||
| import org.apache.hadoop.hbase.Cell; | ||
| import org.apache.hadoop.hbase.filter.Filter; | ||
|
|
||
| /** | ||
| * This filter overrides the behavior of delegate so that we do not jump to the next | ||
| * column as soon as we find a value for a column but rather include all versions which is | ||
| * needed for rebuilds. | ||
| */ | ||
| public class AllVersionsIndexRebuildFilter extends DelegateFilter { | ||
|
|
||
| public AllVersionsIndexRebuildFilter(Filter originalFilter) { | ||
| super(originalFilter); | ||
| } | ||
|
|
||
| @Override | ||
| public ReturnCode filterKeyValue(Cell v) throws IOException { | ||
| ReturnCode delegateCode = super.filterKeyValue(v); | ||
| if (delegateCode == ReturnCode.INCLUDE_AND_NEXT_COL) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a comment here why we convert it to INCLUDE? Why are we not happy with NEXT_COL?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gokceni NEXT_COL skips this column and goes to the next one. What we want to do is when the underlying filter says yes to a column, we want to say yes too, but instead of jumping to the next col since we got a value, we want to get all versions.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree, comment in the code would be helpful here. @abhishek-chouhan |
||
| return ReturnCode.INCLUDE; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is simulating the effects of a FirstKeyOnlyFilter? Comment would be good |
||
| } else { | ||
| return delegateCode; | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.phoenix.filter; | ||
|
|
||
| import java.io.IOException; | ||
| import java.util.List; | ||
|
|
||
| import org.apache.hadoop.hbase.Cell; | ||
| import org.apache.hadoop.hbase.KeyValue; | ||
| import org.apache.hadoop.hbase.filter.Filter; | ||
| import org.apache.hadoop.hbase.filter.FilterBase; | ||
|
|
||
| public class DelegateFilter extends FilterBase { | ||
|
|
||
| protected Filter delegate = null; | ||
|
|
||
| public DelegateFilter(Filter delegate) { | ||
| this.delegate = delegate; | ||
| } | ||
|
|
||
| @Override | ||
| public void reset() throws IOException { | ||
| delegate.reset(); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean filterRowKey(byte[] buffer, int offset, int length) throws IOException { | ||
| return delegate.filterRowKey(buffer, offset, length); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean filterAllRemaining() throws IOException { | ||
| return delegate.filterAllRemaining(); | ||
| } | ||
|
|
||
| @Override | ||
| public ReturnCode filterKeyValue(Cell v) throws IOException { | ||
| return delegate.filterKeyValue(v); | ||
| } | ||
|
|
||
| @Override | ||
| public Cell transformCell(Cell v) throws IOException { | ||
| return delegate.transformCell(v); | ||
| } | ||
|
|
||
| @Override | ||
| public KeyValue transform(KeyValue currentKV) throws IOException { | ||
| return delegate.transform(currentKV); | ||
| } | ||
|
|
||
| @Override | ||
| public void filterRowCells(List<Cell> kvs) throws IOException { | ||
| delegate.filterRowCells(kvs); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean hasFilterRow() { | ||
| return delegate.hasFilterRow(); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean filterRow() throws IOException { | ||
| return delegate.filterRow(); | ||
| } | ||
|
|
||
| @Override | ||
| public KeyValue getNextKeyHint(KeyValue currentKV) throws IOException { | ||
| return delegate.getNextKeyHint(currentKV); | ||
| } | ||
|
|
||
| @Override | ||
| public Cell getNextCellHint(Cell currentKV) throws IOException { | ||
| return delegate.getNextCellHint(currentKV); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean isFamilyEssential(byte[] name) throws IOException { | ||
| return delegate.isFamilyEssential(name); | ||
| } | ||
|
|
||
| @Override | ||
| public byte[] toByteArray() throws IOException { | ||
| return delegate.toByteArray(); | ||
| } | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to explain why FirstKeyOnlyFilter is a special case. If the rebuild index scan is explicitly asking for only the first keyvalue, why do we avoid using the AllVersions filter which also only gives the first keyvalue?
And if there is a reason not to use the FirstKeyOnlyFilter, are we still OK with using the AllVersionsIndexRebuildFilter if the Scan's filter it will delegate to is a composite filter which contains a FirstKeyOnlyFilter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allversions filter does not only give the first key value, its purpose is to make sure all versions of a column are returned(when matched by underlying supplied filter), instead of just one. Usually the filters used in normal queries(which also end up being used for rebuild since we use select count(*)) returns only 1 version of a column, in rebuild we want to return all versions hence this.