PHOENIX-5658 IndexTool to verify index rows inline #672

kadirozde · 2020-01-07T08:42:39Z

IndexTool uses UngroupedAggregateRegionObserver and IndexRegionObserver coprocessors to build an index table on the server side. UngroupedAggregateRegionObserver scans a data table regions and reconstructs the put mutations for each scanned row and applies them on the data table with the REPLAY_INDEX_REBUILD_WRITES option. These mutations are then only used to rebuild index rows by IndexRegionObserver. This PR adds an inline verification feature to UngroupedAggregateRegionObserver which reads these index rows and verifies their content to make sure they are build correctly.

gjacoby126

Looks good, mostly nits.

phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java

phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java

phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java

priyankporwal

LGTM. Thanks @kadirozde!

gjacoby126 · 2020-01-10T18:47:46Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java

+                } else if (Bytes.compareTo(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength(),
+                        IndexTool.ERROR_MESSAGE_BYTES, 0, IndexTool.ERROR_MESSAGE_BYTES.length) == 0) {
+                    errorMessageCell = cell;
+


nit: extra line

gjacoby126 · 2020-01-10T18:50:55Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java

+                    null, -1, true, false);
+            // The index tool output table should report that there is a missing index row
+            Cell cell = getErrorMessageFromIndexToolOutputTable(conn, dataTableFullName, "_IDX_" + dataTableFullName);
+            byte[] expectedValueBytes = Bytes.toBytes("Missing index rows - Expected: 1 Actual: 0");


Ideally could be columns for expected rows and actual rows and could check that, but that can be future work.

I have a test for the mismatch on the column value in the following test for the only-verify option. It is almost impossible to generate missing columns or wrong column values with the verify option since index rows are corrected during rebuild

gjacoby126 · 2020-01-10T18:52:11Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java

+            IndexRegionObserver.setIgnoreIndexRebuildForTesting(false);
+            Admin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin();
+            TableName indexToolOutputTable = TableName.valueOf(IndexTool.OUTPUT_TABLE_NAME_BYTES);
+            admin.disableTable(indexToolOutputTable);


disabling / dropping takes time, and this test suite uses its own cluster -- any reason we need to drop?

Having an empty output table to start with simplifies the tests.

gjacoby126 · 2020-01-10T18:55:38Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java

        if (directApi) {
            args.add("-direct");
        }
+        if (verify) {


what if I pass in verify=true, onlyVerify=true? Shouldn't there be an exception?

I can add an assert here.

After more thinking on this, I should not add an assert here as it is acceptable for the implementation to set both at this moment

gjacoby126 · 2020-01-10T19:06:08Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java

+                                           String errorMsg) {
+        try (Table hTable = ServerUtil.ConnectionFactory.getConnection(ServerUtil.ConnectionType.INDEX_WRITER_CONNECTION,
+                env).getTable(TableName.valueOf(IndexTool.OUTPUT_TABLE_NAME))) {
+            byte[] rowKey = new byte[Long.BYTES];


Shouldn't the row key also have the table being rebuilt? What if rebuilds are going on simultaneously on two different tables? Unlikely they'd be exact same ms, but possible.

Also, if the row key is just a timestamp, you'll hotspot (though hopefully this table will be low volume)

Yes, it is possible but very unlikely to get the same timestamp especially when IndexTool runs are not automated. I intentionally chose a very simple row key to make the table human readable and easy to query using the HBase shell. There will be one table row for each mismatch row. The excepted volume will be low. I suggest not to optimize this for very unlikely scenarios.

I thought about this again. Because of the "only-verify" option where the tool reports every mismatch row instead of just one, I need to make the row key unique. So, I am thinking about including the data row key in the in the row key for the output table.

gjacoby126 · 2020-01-10T19:19:34Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java

+                byte[] qualifier = CellUtil.cloneQualifier(expectedCell);
+                Cell actualCell = indexRow.getColumnLatestCell(family, qualifier);
+                if (actualCell == null) {
+                    exceptionMessage = "Index verify failed - Missing cell " + indexHTable.getName();


nit: some extract methods around here would make this easier to read. maybe one function for cell not found and another for checking all the columns?

gjacoby126 · 2020-01-10T19:25:24Z

phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixServerBuildIndexInputFormat.java

                scan.setTimeRange(0, scn);
                scan.setAttribute(BaseScannerRegionObserver.INDEX_REBUILD_PAGING, TRUE_BYTES);
+                if (getVerifyIndex(configuration)) {
+                    scan.setAttribute(BaseScannerRegionObserver.INDEX_REBUILD_VERIFY, TRUE_BYTES);


what happens if both set?

Verify wins

gjacoby126 · 2020-01-10T19:26:01Z

phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java

-
+            if (verify) {
+                PhoenixConfigurationUtil.setVerifyIndex(configuration, true);
+            }


what if both set?

if both "verify" and "only-verify" are set then as the help for the command says the only-verify option will be ignored: "To verify every data table row has the corresponding index row with the correct content (without building the index table). If the verify option is set then the only-verify option will be ignored");

gjacoby126 · 2020-01-10T19:27:07Z

phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java

+        }
+        HTableDescriptor tableDescriptor = new
+                HTableDescriptor(TableName.valueOf(OUTPUT_TABLE_NAME));
+        tableDescriptor.setValue("DISABLE_TABLE_SOR", "true");


Not a property that open source HBase or Phoenix recognizes.

I will remove it in the open source versions.

priyankporwal · 2020-01-10T23:17:20Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java

+                int length = Long.BYTES + dataRowKey.length;
+                rowKey = new byte[length];
+                Bytes.putLong(rowKey, 0, scan.getTimeRange().getMax());
+                for (int i = Long.BYTES, j = 0; i < length; i++, j++) {


Nit: System.arraycopy() might perform better than byte-by-byte copy. I assume it's java equivalent of memcpy.

See also Bytes.putBytes

Both suggestions work. I will go with Bytes.putBytes as it makes the code more readable.

gjacoby126

+1, thanks @kadirozde

…ailure metric

…ctured the code)

…r crash (apache#672) * PHOENIX-6504 Ban thirdparty guava imports to prevent RegionServer crash * Remove thirdparty guava from phoenix-pherf

kadirozde requested a review from gjacoby126 January 7, 2020 08:42

gjacoby126 requested changes Jan 7, 2020

View reviewed changes

swaroopak reviewed Jan 7, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java Outdated Show resolved Hide resolved

priyankporwal reviewed Jan 7, 2020

View reviewed changes

swaroopak reviewed Jan 7, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java Outdated Show resolved Hide resolved

priyankporwal reviewed Jan 9, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java Outdated Show resolved Hide resolved

priyankporwal reviewed Jan 9, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java Outdated Show resolved Hide resolved

priyankporwal reviewed Jan 9, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java Outdated Show resolved Hide resolved

priyankporwal approved these changes Jan 10, 2020

View reviewed changes

gjacoby126 requested changes Jan 10, 2020

View reviewed changes

priyankporwal reviewed Jan 10, 2020

View reviewed changes

gjacoby126 approved these changes Jan 11, 2020

View reviewed changes

kadirozde added 3 commits January 10, 2020 20:04

PHOENIX-5658 IndexTool to verify index rows inline

db18aeb

PHOENIX-5666 IndexRegionObserver incorrectly updates PostIndexUpdateF…

b886dbe

…ailure metric

PHOENIX-5658 IndexTool to verify index rows inline (addendum - restru…

ea06693

…ctured the code)

kadirozde closed this Jan 13, 2020

kadirozde deleted the 5658 branch January 13, 2020 20:31

PHOENIX-5658 IndexTool to verify index rows inline #672

PHOENIX-5658 IndexTool to verify index rows inline #672

Uh oh!

Conversation

kadirozde commented Jan 7, 2020

Uh oh!

gjacoby126 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

priyankporwal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gjacoby126 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants