-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-5190 Implement TaskRegionObserver for Index rebuild #457
Conversation
@gokceni - what JIRA does this correspond to? |
@gjacoby126 PHOENIX-5190 is the JIRA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gokceni, you have done a great job in extending the existing system task framework and implementing the new use case. I have one comment on your code for removing the INDEX_NAME column.
Your changes got me thinking about improving the design further as follows:
- Map the task type to the class name of the corresponding task handler. This can be implemented as enum.
- Have one thread that goes through the system task table and maps the task type to its class, creates an instance of it with task attributes and finally calls the run method of the instance.
- Task handling classes should not know anything about the system task table or getting a phoenix connection to access its records.
I think this will simply adding new tasks and will make the code more readable.
PhoenixDatabaseMetaData.TENANT_ID + ", " + | ||
PhoenixDatabaseMetaData.TABLE_SCHEM + ", " + | ||
PhoenixDatabaseMetaData.TABLE_NAME + ", " + | ||
PhoenixDatabaseMetaData.INDEX_NAME + ", " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INDEX_NAME should be a task specific attribute. I do not think it should be part of the generic task attributes. Either the name of the index table should be stored in the TABLE_NAME column or encoded in the DATA column. I suggest removing INDEX_NAME from from the DDL
|
||
task.run(); | ||
|
||
String viewIndexTableName = "_IDX_" + baseTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should use the appropriate SchemaUtil method to construct the view index table name.
admin.disableTable(tableName); | ||
admin.truncateTable(tableName, false); | ||
|
||
data = "{GusWorkId:abc}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider using a constant
assertTrue( count == numOfRuns); | ||
|
||
// Check task status and other column values. | ||
ResultSet rs = conn.createStatement().executeQuery("SELECT * " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this would be useful to have as a utility function somewhere. In general it's good to break up long functions into helper functions with descriptive names when there's a discrete unit of work.
mutateSystemTaskTable(conn, stmt, accessCheckEnabled); | ||
} | ||
|
||
public static void addTask(PhoenixConnection conn, TaskType taskType, String tenantId, String schemaName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there should be a separation of concerns here -- persisting a Task to a Phoenix table and reading it back is distinct from scheduling a task. I'd suggest breaking some of these Task classes into their own files and either fully encapsulate their persistence logic in the Tasks (the Data Access Object pattern) or have standalone persistence classes that handle the persistence (the Repository pattern).
Either way would improve the readability and reduce the coupling between the scheduling logic and the persistence logic.
Also agree with @kadirozde 's ideas to use reflection and JSON parsing to reduce the amount of boilerplate necessary to hook in each new task type for this framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gjacoby126 This is not a new method, it was already in this file. I moved it to this class. I am not seeing the patterns for other System tables. I will add a Tasks class tho.
|
||
public static String[] getArgValues(boolean directApi, boolean useSnapshot, String schemaName, | ||
String dataTable, String indxTable, String tenantId) { | ||
final List<String> args = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is leaking IndexTool internals into this task. Would be better to provide an abstraction so you can call the IndexTool directly.
tableName = taskRecord.getTableName(); | ||
indexName = taskRecord.getIndexName(); | ||
final String[] cmdArgs = | ||
getArgValues(true, false, schemaName, tableName, indexName, tenantId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, it would be better to have an API in the IndexTool that could be called directly in an object-oriented fashion. The IndexTool should only have a CLI so that it can be called from a real shell command line, but here we don't need such things.
JsonParser jsonParser = new JsonParser(); | ||
JsonObject jsonObject = jsonParser.parse(data).getAsJsonObject(); | ||
if (jsonObject.has("DisableBefore")) { | ||
String disableBefore = jsonObject.get("DisableBefore").toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like logic that should be in the IndexTool itself.
public static final String TASK_STATUS = "TASK_STATUS"; | ||
public static final String TASK_END_TS = "TASK_END_TS"; | ||
public static final String PRIORITY = "PRIORITY"; | ||
public static final String DATA = "DATA"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make the constant named "TASK_DATA_COLUMN" or something along those lines to indicate what it's for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and same with PRIORITY)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rename as TASK_DATA. The others don't have _COLUMN at the end
|
||
} catch (TableAlreadyExistsException e) { | ||
long currentServerSideTableTimeStamp = e.getTable().getTimeStamp(); | ||
if (currentServerSideTableTimeStamp <= MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_5_1_0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind this change will also likely go into 4.15, so the upgrade will need to happen for either 4.15 or 5.1
@kadirozde (and @gjacoby126) for this comment: If we do it like this, don't we tie both tasks to a single thread? |
@gokceni, regarding having one main thread responsible form scanning the system table allows us to separate the system table management concerns from the individual task handing ones. Initially having one thread will be fine as we will not have many tasks. I think, when we need more threads and scheduling task based on their priorities, this would be handled within the system task framework without changing task handling classes if the concerns are separated. |
fb432f7
to
4a4764f
Compare
DropTableWithViewsIT.assertTaskColumns(conn, PTable.TaskStatus.COMPLETED.toString(), PTable.TaskType.INDEX_REBUILD, | ||
"{\"IndexName\":\""+ indexName +"\",\"TaskDetails\":\"SUCCESS\"}"); | ||
|
||
// See that index is rebuild and confirm index has rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo (should be "index is rebuilt")
initMethod.invoke(obj, env, timeMaxInterval); | ||
|
||
// Change task status to STARTED | ||
Task.addTask(connForTask, taskRecord.getTaskType(), taskRecord.getTenantId(), taskRecord.getSchemaName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one server runs TaskRegionObserver, and it runs this line to set a Task as STARTED, and then immediately afterward the server dies, will the next server that runs TaskRegionObserver pick up the orphaned task and actually run it? Since you only screen out FAILED and COMPLETE above, I think the answer's yes, but making sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we exclude Completed and Failed states and get anything else
" ) VALUES(?,?,?,?,?,?,?,?)"); | ||
stmt = setValuesToAddTaskPS(stmt, taskType, tenantId, schemaName, tableName); | ||
if (taskStatus != null) { | ||
stmt.setString(5, taskStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this logic shouldn't be in setValuesToAddTaskPS? (You could have an overload that took just the original 4 params that calls the full 8 param function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thanks @gokceni !
9f7eb9a
to
b84af26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1
f6516fe
to
bc57c47
Compare
Closing because this was pushed via git command line |
No description provided.