Skip to content

PHOENIX-5932 View Index rebuild results in surplus rows from other vi…#797

Closed
abhishek-chouhan wants to merge 1 commit intoapache:4.xfrom
abhishek-chouhan:PHOENIX-5932
Closed

PHOENIX-5932 View Index rebuild results in surplus rows from other vi…#797
abhishek-chouhan wants to merge 1 commit intoapache:4.xfrom
abhishek-chouhan:PHOENIX-5932

Conversation

@abhishek-chouhan
Copy link
Copy Markdown
Contributor

…ew indexes

int numDeletes = 0;
for (Result result = scanner.next(); result != null; result = scanner.next()) {
for (Cell cell : result.rawCells()) {
if (KeyValue.Type.codeToType(cell.getTypeByte()) == KeyValue.Type.Put) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for when you port to master: in HBase 2.x Cell has a getType() method that returns the Cell.Type enum. We should avoid using KeyValue wherever possible because it's IA.Private.

}

@Test
public void testUpdatableViewIndex2() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More descriptive name, please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a more descriptive name in the latest commit :)

Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);

try (Connection conn = DriverManager.getConnection(getUrl(), props)) {
// Create Table and Views
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have a comment explaining what about the schemas is important to the test (I assume that they're filtering on a non-PK, non-indexed column?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yes, the point of 2 tests is to test out two different filters that end up being used. One test has view on a non-leading part of pk, other one has view on a non pk column.

rs.next();
assertEquals(2, rs.getInt(1));
try (org.apache.hadoop.hbase.client.Connection hcon =
ConnectionFactory.createConnection(config)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can factor out the cell counting into its own helper function to avoid duplication between the 2 tests. (TestUtil.getRawCellCount may also be useful if you can extend it to also keep track of what Cell types are scanned.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

rawScan.setMaxVersions();
rawScan.getFamilyMap().clear();
rawScan.setFilter(null);
if (scan.getFilter() instanceof FirstKeyOnlyFilter) {
Copy link
Copy Markdown
Contributor

@gjacoby126 gjacoby126 Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to explain why FirstKeyOnlyFilter is a special case. If the rebuild index scan is explicitly asking for only the first keyvalue, why do we avoid using the AllVersions filter which also only gives the first keyvalue?

And if there is a reason not to use the FirstKeyOnlyFilter, are we still OK with using the AllVersionsIndexRebuildFilter if the Scan's filter it will delegate to is a composite filter which contains a FirstKeyOnlyFilter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allversions filter does not only give the first key value, its purpose is to make sure all versions of a column are returned(when matched by underlying supplied filter), instead of just one. Usually the filters used in normal queries(which also end up being used for rebuild since we use select count(*)) returns only 1 version of a column, in rebuild we want to return all versions hence this.

rawScan.setFilter(null);
} else if (scan.getFilter() != null) {
rawScan.setFilter(new AllVersionsIndexRebuildFilter(scan.getFilter()));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do any special filter logic down at line 1099 in the else block if the Scan was raw in the first place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we get raw scan here in case of old design and partial rebuild (Correct me if i'm wrong here @kadirozde ). I didn't want to mess with the old design and hence only made the changes for new.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we get raw scan only for the old design partial rebuilds (i.e., auto-rebuilds).

public ReturnCode filterKeyValue(Cell v) throws IOException {
ReturnCode delegateCode = super.filterKeyValue(v);
if (delegateCode == ReturnCode.INCLUDE_AND_NEXT_COL) {
return ReturnCode.INCLUDE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simulating the effects of a FirstKeyOnlyFilter? Comment would be good

}

@Test
public void testUpdatableViewIndex() throws Exception {
Copy link
Copy Markdown
Contributor

@swaroopak swaroopak Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this and other test to IndexToolForNonTxGlobalIndexIT

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Override
public ReturnCode filterKeyValue(Cell v) throws IOException {
ReturnCode delegateCode = super.filterKeyValue(v);
if (delegateCode == ReturnCode.INCLUDE_AND_NEXT_COL) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here why we convert it to INCLUDE? Why are we not happy with NEXT_COL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gokceni NEXT_COL skips this column and goes to the next one. What we want to do is when the underlying filter says yes to a column, we want to say yes too, but instead of jumping to the next col since we got a value, we want to get all versions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, comment in the code would be helpful here. @abhishek-chouhan

Copy link
Copy Markdown
Contributor

@gjacoby126 gjacoby126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor

@kadirozde kadirozde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants