-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-16785: Ensure replication actions are idempotent if any series of events are applied again. #194
Conversation
7fbd353
to
c173a5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought additionally during repl load , we were going to update the parent object with the event id every time we apply a event
* This can result in a "DROP IF NEWER THAN" kind of semantic | ||
*/ | ||
public ReplicationSpec getReplicationSpec() { | ||
if (replicationSpec == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a null check here specifically ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null check is to handle the non-repl-load case where they directly dereference the replSpec returned. If remove null check here, then need to add one in the caller. But, I prefer to keep it here.
* This can result in a "CREATE IF NEWER THAN" kind of semantic | ||
*/ | ||
public ReplicationSpec getReplicationSpec() { | ||
if (replicationSpec == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why null check here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -37,27 +37,18 @@ | |||
throw new SemanticException("Database name cannot be null for a table load"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contains an unsed import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will remove it.
actualDbName + "." + actualTblName, partSpec); | ||
Task<DDLWork> truncatePtnTask = | ||
TaskFactory.get( | ||
TruncateTableDesc truncateTableDesc = new TruncateTableDesc(actualDbName + "." + actualTblName, partSpec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think code width moves beyond 100 characters, may want to format it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will format it.
"Added rename table task : {}:{}->{}", renameTableTask.getId(), oldName, newName | ||
); | ||
new DDLWork(readEntitySet, writeEntitySet, renameTableDesc), context.hiveConf); | ||
context.log.debug("Added rename table task : {}:{}->{}", renameTableTask.getId(), oldName, newName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think code width moves beyond 100 characters, may want to format it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will format it.
@@ -786,7 +786,7 @@ private void analyzeDatabaseLoad(String dbName, FileSystem fs, FileStatus dir) | |||
|
|||
Task<? extends Serializable> dbRootTask = null; | |||
if (existEmptyDb(dbName)) { | |||
AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, dbObj.getParameters()); | |||
AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, dbObj.getParameters(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt the alter database desc include the new replication spec param definition as well. the same should be true if the database has to be created via CreateDatabaseDesc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bootstrap load flow. Need not check anything. Just overwrite it.
@@ -818,6 +803,20 @@ private static void createReplImportTasks( | |||
throw new SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName())); | |||
} | |||
} | |||
|
|||
if (table != null) { | |||
if (!replicationSpec.allowReplacementInto(table)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static boolean allowReplacement(String currReplState, String replacementReplState)
the above method seems to be not useful, why not rather have a definition like
public boolean allowReplacement(Map<String, String> forObjectWithParams)
which internally gets the corresponding state using getLastReplicatedStateFromParameters(Map<String, String> m) followed by getting the current state using this.getReplicationState()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just noticed, the caller pass different values for 2nd argument. So, will discuss with Sushanth and try to minimise the number of methods.
} | ||
} else { | ||
// If table doesn't exist, allow creating a new one only if the database state is older than the update. | ||
if ((parentDb != null) && (!replicationSpec.allowReplacementInto(parentDb))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems very improbable that the parentDB could be null ?
also the more i am looking at code the call replicationSpec.allowReplacementInto(parentDb) leads to negative equality, rather than this why not have replicationSpec.isOlderThan(). this way the condition above and other places will not have to do a negative of the conditional check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parentDB can be null if a createDbTask is in the head of this table import tasks. In this case, we shouldn't fail and safe to assume that we go through bootstrap load.
Some of these implementations are in the base code which has allowReplacementInto and allowEventReplacementInto methods one check for current_state_id and other checks against event_id. Probably they are there to support replv1. So, will need to find appropriate names for both the above methods Please suggest if you have any good names else we will keep the same. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion was replicationSpec.isOlderThan() as in the comment above, not sure if i missed something there,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 2 sets of methods. One is allowReplacementInto which checks the object's ID against current_state_id. There is another one allowEventReplcacementInto which checks it against event_id. Both are different. isOlderThan cannot replace both methods. So, need to find 2 appropriate methods. May be isOlderThan and isEventOlderThan, but that doesn't seem good names. So, just want to know if you have any better suggestions for these names.
@@ -2741,7 +2741,7 @@ private void analyzeAlterTableRenamePart(ASTNode ast, String tblName, | |||
partSpecs.add(oldPartSpec); | |||
partSpecs.add(newPartSpec); | |||
addTablePartsOutputs(tab, partSpecs, WriteEntity.WriteType.DDL_EXCLUSIVE); | |||
RenamePartitionDesc renamePartitionDesc = new RenamePartitionDesc(tblName, oldPartSpec, newPartSpec); | |||
RenamePartitionDesc renamePartitionDesc = new RenamePartitionDesc(tblName, oldPartSpec, newPartSpec, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like on the dump side we are not passing in the event id via the replication spec, but on the load side we use the event id from the global dump metadata file, while i think this will work for incremental replication, This does not seem right , given the fact that we did specific work where in we get the event id + db object on the primary warehouse during bootstrap, we should propagate the same id to make sure we are not associating different event ids on the load side. I am not sure if that is already happening, given the construction of objects on the dump side. Looks like the state id being an implicit part of the "Desc" Classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In bootstrap load case, we just load table by table where we set the current_state_id to each objects we operate. But, the id we used is same for all objects. However, the bootstrap problem is not in the scope of this JIRA. Please create a JIRA for this issue.
Also, the DESC classes are used for both real load flow and normal DDL operations too. For normal operations, we shouldn't check the event ID. Hence passing null.
If I didn't understand your comment correctly, please revert to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please create relevant jira's that you think are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was wrong about this point earlier. We set the proper current_state_id in ReplicationSpec for each object. Also, we propagate the same ID in the load side as well. So, there isn't any issue.
} | ||
|
||
@Test | ||
public void testIncrementalRepeatEventOnMissingObject() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to separate partitioned vs unpartitioned in separate tests. I also thought you were going to move to the TestReplicationScenariosAcrossInstances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this test I don't think appropriate to separate for partitioned and unpartitioned. Just care about the sequence of operations. Also, I need multiple tables to verify the change.
About moving to TestReplicationScenariosAcrossInstances, I think, the idea is deferred due to cmroot file system issues and we don't have any plan to fix that now. Please create a JIRA and will plan it in future.
I thought additionally during repl load , we were going to update the parent object with the event id every time we apply a event For this comment, we will do it as part of another JIRA BUG-81396 (No apache jira yet) which focus on failure case handling. Updating object state after event makes sense for failure handling only. So, appropriate to handle there. |
5a14643
to
2a427c1
Compare
23fa8d0
to
ad1841a
Compare
…f events are applied again.
…dded few debug logs
Already committed to master. |
Idempotent behaviour for all events during incremental load.