PHOENIX-5190 Implement TaskRegionObserver for Index rebuild #457

gokceni · 2019-03-07T18:34:39Z

No description provided.

gokceni · 2019-03-07T18:35:02Z

@gjacoby126 @kadirozde @abhishek-chouhan

gjacoby126 · 2019-03-12T18:10:09Z

@gokceni - what JIRA does this correspond to?

gokceni · 2019-03-12T21:14:23Z

@gjacoby126 PHOENIX-5190 is the JIRA.

kadirozde

@gokceni, you have done a great job in extending the existing system task framework and implementing the new use case. I have one comment on your code for removing the INDEX_NAME column.

Your changes got me thinking about improving the design further as follows:

Map the task type to the class name of the corresponding task handler. This can be implemented as enum.
Have one thread that goes through the system task table and maps the task type to its class, creates an instance of it with task attributes and finally calls the run method of the instance.
Task handling classes should not know anything about the system task table or getting a phoenix connection to access its records.

I think this will simply adding new tasks and will make the code more readable.

kadirozde · 2019-03-14T00:45:33Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java

+                    PhoenixDatabaseMetaData.TENANT_ID + ", " +
+                    PhoenixDatabaseMetaData.TABLE_SCHEM + ", " +
+                    PhoenixDatabaseMetaData.TABLE_NAME + ", " +
+                    PhoenixDatabaseMetaData.INDEX_NAME + ", " +


INDEX_NAME should be a task specific attribute. I do not think it should be part of the generic task attributes. Either the name of the index table should be stored in the TABLE_NAME column or encoded in the DATA column. I suggest removing INDEX_NAME from from the DDL

gjacoby126 · 2019-03-14T18:20:12Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexRebuildTaskIT.java

+
+            task.run();
+
+            String viewIndexTableName = "_IDX_" + baseTable;


nit: should use the appropriate SchemaUtil method to construct the view index table name.

gjacoby126 · 2019-03-14T18:26:45Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexRebuildTaskIT.java

+            admin.disableTable(tableName);
+            admin.truncateTable(tableName, false);
+
+            data = "{GusWorkId:abc}";


nit: consider using a constant

gjacoby126 · 2019-03-14T18:30:42Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexRebuildTaskIT.java

+            assertTrue( count == numOfRuns);
+
+            // Check task status and other column values.
+            ResultSet rs = conn.createStatement().executeQuery("SELECT * " +


Seems like this would be useful to have as a utility function somewhere. In general it's good to break up long functions into helper functions with descriptive names when there's a discrete unit of work.

gjacoby126 · 2019-03-14T18:45:19Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java

+        mutateSystemTaskTable(conn, stmt, accessCheckEnabled);
+    }
+
+    public static void addTask(PhoenixConnection conn, TaskType taskType, String tenantId, String schemaName,


Seems like there should be a separation of concerns here -- persisting a Task to a Phoenix table and reading it back is distinct from scheduling a task. I'd suggest breaking some of these Task classes into their own files and either fully encapsulate their persistence logic in the Tasks (the Data Access Object pattern) or have standalone persistence classes that handle the persistence (the Repository pattern).

Either way would improve the readability and reduce the coupling between the scheduling logic and the persistence logic.

Also agree with @kadirozde 's ideas to use reflection and JSON parsing to reduce the amount of boilerplate necessary to hook in each new task type for this framework.

@gjacoby126 This is not a new method, it was already in this file. I moved it to this class. I am not seeing the patterns for other System tables. I will add a Tasks class tho.

gjacoby126 · 2019-03-14T18:50:18Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java

+
+        public static String[] getArgValues(boolean directApi, boolean useSnapshot, String schemaName,
+                String dataTable, String indxTable, String tenantId) {
+            final List<String> args = Lists.newArrayList();


This is leaking IndexTool internals into this task. Would be better to provide an abstraction so you can call the IndexTool directly.

gjacoby126 · 2019-03-14T18:52:17Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java

+                        tableName = taskRecord.getTableName();
+                        indexName = taskRecord.getIndexName();
+                        final String[] cmdArgs =
+                                getArgValues(true, false, schemaName, tableName, indexName, tenantId);


As I mentioned above, it would be better to have an API in the IndexTool that could be called directly in an object-oriented fashion. The IndexTool should only have a CLI so that it can be called from a real shell command line, but here we don't need such things.

gjacoby126 · 2019-03-14T18:53:46Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java

+                        JsonParser jsonParser = new JsonParser();
+                        JsonObject jsonObject = jsonParser.parse(data).getAsJsonObject();
+                        if (jsonObject.has("DisableBefore")) {
+                            String disableBefore = jsonObject.get("DisableBefore").toString();


This seems like logic that should be in the IndexTool itself.

gjacoby126 · 2019-03-14T18:55:38Z

phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixDatabaseMetaData.java

+    public static final String TASK_STATUS = "TASK_STATUS";
+    public static final String TASK_END_TS = "TASK_END_TS";
+    public static final String PRIORITY = "PRIORITY";
+    public static final String DATA = "DATA";


Maybe make the constant named "TASK_DATA_COLUMN" or something along those lines to indicate what it's for?

(and same with PRIORITY)

Will rename as TASK_DATA. The others don't have _COLUMN at the end

gjacoby126 · 2019-03-14T18:57:15Z

phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java

+
+            } catch (TableAlreadyExistsException e) {
+                long currentServerSideTableTimeStamp = e.getTable().getTimeStamp();
+                if (currentServerSideTableTimeStamp <= MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_5_1_0) {


Keep in mind this change will also likely go into 4.15, so the upgrade will need to happen for either 4.15 or 5.1

gokceni · 2019-03-14T21:09:16Z

@kadirozde (and @gjacoby126) for this comment:
"Have one thread that goes through the system task table and maps the task type to its class, creates an instance of it with task attributes and finally calls the run method of the instance."

If we do it like this, don't we tie both tasks to a single thread?
From the discussion we had, I thought we said each task having their own thread. Right now, there is 2. If we keep adding different task types, they will all run on 1 thread. Also, we might want to run them on different intervals (some tasks might not be that frequent, some might). What do you think?

kadirozde · 2019-03-14T21:32:44Z

@gokceni, regarding having one main thread responsible form scanning the system table allows us to separate the system table management concerns from the individual task handing ones. Initially having one thread will be fine as we will not have many tasks. I think, when we need more threads and scheduling task based on their priorities, this would be handled within the system task framework without changing task handling classes if the concerns are separated.

gjacoby126 · 2019-03-22T22:43:24Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexRebuildTaskIT.java

+            DropTableWithViewsIT.assertTaskColumns(conn, PTable.TaskStatus.COMPLETED.toString(), PTable.TaskType.INDEX_REBUILD,
+                    "{\"IndexName\":\""+ indexName +"\",\"TaskDetails\":\"SUCCESS\"}");
+
+            // See that index is rebuild and confirm index has rows


nit: typo (should be "index is rebuilt")

gjacoby126 · 2019-03-22T22:47:53Z

phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java

+                        initMethod.invoke(obj, env, timeMaxInterval);
+
+                        // Change task status to STARTED
+                        Task.addTask(connForTask, taskRecord.getTaskType(), taskRecord.getTenantId(), taskRecord.getSchemaName(),


If one server runs TaskRegionObserver, and it runs this line to set a Task as STARTED, and then immediately afterward the server dies, will the next server that runs TaskRegionObserver pick up the orphaned task and actually run it? Since you only screen out FAILED and COMPLETE above, I think the answer's yes, but making sure.

yes, we exclude Completed and Failed states and get anything else

gjacoby126 · 2019-03-22T23:04:27Z

phoenix-core/src/main/java/org/apache/phoenix/schema/task/Task.java

+                    " ) VALUES(?,?,?,?,?,?,?,?)");
+            stmt = setValuesToAddTaskPS(stmt, taskType, tenantId, schemaName, tableName);
+            if (taskStatus != null) {
+                stmt.setString(5, taskStatus);


Any reason this logic shouldn't be in setValuesToAddTaskPS? (You could have an overload that took just the original 4 params that calls the full 8 param function)

sure, will do

gjacoby126

+1. Thanks @gokceni !

kadirozde

LGTM. +1

gjacoby126 · 2019-04-10T20:13:25Z

Closing because this was pushed via git command line

gokceni changed the title ~~Implement TaskRegionObserver for Index rebuild~~ PHOENIX-5190 Implement TaskRegionObserver for Index rebuild Mar 13, 2019

kadirozde reviewed Mar 14, 2019

View reviewed changes

gjacoby126 requested changes Mar 14, 2019

View reviewed changes

gokceni force-pushed the gokcen branch 2 times, most recently from fb432f7 to 4a4764f Compare March 22, 2019 20:18

gjacoby126 reviewed Mar 22, 2019

View reviewed changes

gokceni force-pushed the gokcen branch from 4a4764f to 0aba2cb Compare March 25, 2019 21:55

gjacoby126 approved these changes Mar 25, 2019

View reviewed changes

gokceni force-pushed the gokcen branch 4 times, most recently from 9f7eb9a to b84af26 Compare April 2, 2019 21:30

kadirozde approved these changes Apr 2, 2019

View reviewed changes

gokceni force-pushed the gokcen branch 3 times, most recently from f6516fe to bc57c47 Compare April 3, 2019 18:21

Implement TaskRegionObserver for Index rebuild

194e8ec

gokceni force-pushed the gokcen branch from bc57c47 to 194e8ec Compare April 3, 2019 18:29

gjacoby126 closed this Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-5190 Implement TaskRegionObserver for Index rebuild #457

PHOENIX-5190 Implement TaskRegionObserver for Index rebuild #457

gokceni commented Mar 7, 2019

gokceni commented Mar 7, 2019

gjacoby126 commented Mar 12, 2019

gokceni commented Mar 12, 2019

kadirozde left a comment

kadirozde Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gokceni Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gjacoby126 Mar 14, 2019

gokceni Mar 14, 2019

gjacoby126 Mar 14, 2019

gokceni commented Mar 14, 2019

kadirozde commented Mar 14, 2019

gjacoby126 Mar 22, 2019

gjacoby126 Mar 22, 2019

gokceni Mar 25, 2019

gjacoby126 Mar 22, 2019

gokceni Mar 25, 2019

gjacoby126 left a comment

kadirozde left a comment

gjacoby126 commented Apr 10, 2019

PHOENIX-5190 Implement TaskRegionObserver for Index rebuild #457

PHOENIX-5190 Implement TaskRegionObserver for Index rebuild #457

Conversation

gokceni commented Mar 7, 2019

gokceni commented Mar 7, 2019

gjacoby126 commented Mar 12, 2019

gokceni commented Mar 12, 2019

kadirozde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gokceni commented Mar 14, 2019

kadirozde commented Mar 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjacoby126 left a comment

Choose a reason for hiding this comment

kadirozde left a comment

Choose a reason for hiding this comment

gjacoby126 commented Apr 10, 2019