PHOENIX-5932 View Index rebuild results in surplus rows from other vi…#797
PHOENIX-5932 View Index rebuild results in surplus rows from other vi…#797abhishek-chouhan wants to merge 1 commit intoapache:4.xfrom
Conversation
1c8bb6a to
f781099
Compare
| int numDeletes = 0; | ||
| for (Result result = scanner.next(); result != null; result = scanner.next()) { | ||
| for (Cell cell : result.rawCells()) { | ||
| if (KeyValue.Type.codeToType(cell.getTypeByte()) == KeyValue.Type.Put) { |
There was a problem hiding this comment.
Note for when you port to master: in HBase 2.x Cell has a getType() method that returns the Cell.Type enum. We should avoid using KeyValue wherever possible because it's IA.Private.
| } | ||
|
|
||
| @Test | ||
| public void testUpdatableViewIndex2() throws Exception { |
There was a problem hiding this comment.
More descriptive name, please.
There was a problem hiding this comment.
Tried a more descriptive name in the latest commit :)
| Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES); | ||
|
|
||
| try (Connection conn = DriverManager.getConnection(getUrl(), props)) { | ||
| // Create Table and Views |
There was a problem hiding this comment.
Good to have a comment explaining what about the schemas is important to the test (I assume that they're filtering on a non-PK, non-indexed column?
There was a problem hiding this comment.
Done. Yes, the point of 2 tests is to test out two different filters that end up being used. One test has view on a non-leading part of pk, other one has view on a non pk column.
| rs.next(); | ||
| assertEquals(2, rs.getInt(1)); | ||
| try (org.apache.hadoop.hbase.client.Connection hcon = | ||
| ConnectionFactory.createConnection(config)) { |
There was a problem hiding this comment.
You can factor out the cell counting into its own helper function to avoid duplication between the 2 tests. (TestUtil.getRawCellCount may also be useful if you can extend it to also keep track of what Cell types are scanned.)
| rawScan.setMaxVersions(); | ||
| rawScan.getFamilyMap().clear(); | ||
| rawScan.setFilter(null); | ||
| if (scan.getFilter() instanceof FirstKeyOnlyFilter) { |
There was a problem hiding this comment.
Please add a comment to explain why FirstKeyOnlyFilter is a special case. If the rebuild index scan is explicitly asking for only the first keyvalue, why do we avoid using the AllVersions filter which also only gives the first keyvalue?
And if there is a reason not to use the FirstKeyOnlyFilter, are we still OK with using the AllVersionsIndexRebuildFilter if the Scan's filter it will delegate to is a composite filter which contains a FirstKeyOnlyFilter?
There was a problem hiding this comment.
Allversions filter does not only give the first key value, its purpose is to make sure all versions of a column are returned(when matched by underlying supplied filter), instead of just one. Usually the filters used in normal queries(which also end up being used for rebuild since we use select count(*)) returns only 1 version of a column, in rebuild we want to return all versions hence this.
| rawScan.setFilter(null); | ||
| } else if (scan.getFilter() != null) { | ||
| rawScan.setFilter(new AllVersionsIndexRebuildFilter(scan.getFilter())); | ||
| } |
There was a problem hiding this comment.
Do we need to do any special filter logic down at line 1099 in the else block if the Scan was raw in the first place?
There was a problem hiding this comment.
AFAIK we get raw scan here in case of old design and partial rebuild (Correct me if i'm wrong here @kadirozde ). I didn't want to mess with the old design and hence only made the changes for new.
There was a problem hiding this comment.
Yes, we get raw scan only for the old design partial rebuilds (i.e., auto-rebuilds).
| public ReturnCode filterKeyValue(Cell v) throws IOException { | ||
| ReturnCode delegateCode = super.filterKeyValue(v); | ||
| if (delegateCode == ReturnCode.INCLUDE_AND_NEXT_COL) { | ||
| return ReturnCode.INCLUDE; |
There was a problem hiding this comment.
This is simulating the effects of a FirstKeyOnlyFilter? Comment would be good
| } | ||
|
|
||
| @Test | ||
| public void testUpdatableViewIndex() throws Exception { |
There was a problem hiding this comment.
Please move this and other test to IndexToolForNonTxGlobalIndexIT
f781099 to
c238805
Compare
| @Override | ||
| public ReturnCode filterKeyValue(Cell v) throws IOException { | ||
| ReturnCode delegateCode = super.filterKeyValue(v); | ||
| if (delegateCode == ReturnCode.INCLUDE_AND_NEXT_COL) { |
There was a problem hiding this comment.
Could you add a comment here why we convert it to INCLUDE? Why are we not happy with NEXT_COL?
There was a problem hiding this comment.
@gokceni NEXT_COL skips this column and goes to the next one. What we want to do is when the underlying filter says yes to a column, we want to say yes too, but instead of jumping to the next col since we got a value, we want to get all versions.
There was a problem hiding this comment.
Yes, I agree, comment in the code would be helpful here. @abhishek-chouhan
…ew indexes