Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-16785: Ensure replication actions are idempotent if any series of events are applied again. #194

Closed
wants to merge 2 commits into from

Conversation

sankarh
Copy link
Contributor

@sankarh sankarh commented Jun 12, 2017

Idempotent behaviour for all events during incremental load.

@sankarh sankarh force-pushed the HIVE-16785 branch 3 times, most recently from 7fbd353 to c173a5d Compare June 16, 2017 08:22
Copy link
Contributor

@anishek anishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought additionally during repl load , we were going to update the parent object with the event id every time we apply a event

* This can result in a "DROP IF NEWER THAN" kind of semantic
*/
public ReplicationSpec getReplicationSpec() {
if (replicationSpec == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a null check here specifically ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null check is to handle the non-repl-load case where they directly dereference the replSpec returned. If remove null check here, then need to add one in the caller. But, I prefer to keep it here.

* This can result in a "CREATE IF NEWER THAN" kind of semantic
*/
public ReplicationSpec getReplicationSpec() {
if (replicationSpec == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why null check here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -37,27 +37,18 @@
throw new SemanticException("Database name cannot be null for a table load");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains an unsed import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will remove it.

actualDbName + "." + actualTblName, partSpec);
Task<DDLWork> truncatePtnTask =
TaskFactory.get(
TruncateTableDesc truncateTableDesc = new TruncateTableDesc(actualDbName + "." + actualTblName, partSpec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think code width moves beyond 100 characters, may want to format it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will format it.

"Added rename table task : {}:{}->{}", renameTableTask.getId(), oldName, newName
);
new DDLWork(readEntitySet, writeEntitySet, renameTableDesc), context.hiveConf);
context.log.debug("Added rename table task : {}:{}->{}", renameTableTask.getId(), oldName, newName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think code width moves beyond 100 characters, may want to format it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will format it.

@@ -786,7 +786,7 @@ private void analyzeDatabaseLoad(String dbName, FileSystem fs, FileStatus dir)

Task<? extends Serializable> dbRootTask = null;
if (existEmptyDb(dbName)) {
AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, dbObj.getParameters());
AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, dbObj.getParameters(), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt the alter database desc include the new replication spec param definition as well. the same should be true if the database has to be created via CreateDatabaseDesc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bootstrap load flow. Need not check anything. Just overwrite it.

@@ -818,6 +803,20 @@ private static void createReplImportTasks(
throw new SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName()));
}
}

if (table != null) {
if (!replicationSpec.allowReplacementInto(table)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static boolean allowReplacement(String currReplState, String replacementReplState)
the above method seems to be not useful, why not rather have a definition like
public boolean allowReplacement(Map<String, String> forObjectWithParams)
which internally gets the corresponding state using getLastReplicatedStateFromParameters(Map<String, String> m) followed by getting the current state using this.getReplicationState()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will make this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just noticed, the caller pass different values for 2nd argument. So, will discuss with Sushanth and try to minimise the number of methods.

}
} else {
// If table doesn't exist, allow creating a new one only if the database state is older than the update.
if ((parentDb != null) && (!replicationSpec.allowReplacementInto(parentDb))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems very improbable that the parentDB could be null ?

also the more i am looking at code the call replicationSpec.allowReplacementInto(parentDb) leads to negative equality, rather than this why not have replicationSpec.isOlderThan(). this way the condition above and other places will not have to do a negative of the conditional check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parentDB can be null if a createDbTask is in the head of this table import tasks. In this case, we shouldn't fail and safe to assume that we go through bootstrap load.

Some of these implementations are in the base code which has allowReplacementInto and allowEventReplacementInto methods one check for current_state_id and other checks against event_id. Probably they are there to support replv1. So, will need to find appropriate names for both the above methods Please suggest if you have any good names else we will keep the same. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestion was replicationSpec.isOlderThan() as in the comment above, not sure if i missed something there,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 2 sets of methods. One is allowReplacementInto which checks the object's ID against current_state_id. There is another one allowEventReplcacementInto which checks it against event_id. Both are different. isOlderThan cannot replace both methods. So, need to find 2 appropriate methods. May be isOlderThan and isEventOlderThan, but that doesn't seem good names. So, just want to know if you have any better suggestions for these names.

@@ -2741,7 +2741,7 @@ private void analyzeAlterTableRenamePart(ASTNode ast, String tblName,
partSpecs.add(oldPartSpec);
partSpecs.add(newPartSpec);
addTablePartsOutputs(tab, partSpecs, WriteEntity.WriteType.DDL_EXCLUSIVE);
RenamePartitionDesc renamePartitionDesc = new RenamePartitionDesc(tblName, oldPartSpec, newPartSpec);
RenamePartitionDesc renamePartitionDesc = new RenamePartitionDesc(tblName, oldPartSpec, newPartSpec, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like on the dump side we are not passing in the event id via the replication spec, but on the load side we use the event id from the global dump metadata file, while i think this will work for incremental replication, This does not seem right , given the fact that we did specific work where in we get the event id + db object on the primary warehouse during bootstrap, we should propagate the same id to make sure we are not associating different event ids on the load side. I am not sure if that is already happening, given the construction of objects on the dump side. Looks like the state id being an implicit part of the "Desc" Classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bootstrap load case, we just load table by table where we set the current_state_id to each objects we operate. But, the id we used is same for all objects. However, the bootstrap problem is not in the scope of this JIRA. Please create a JIRA for this issue.

Also, the DESC classes are used for both real load flow and normal DDL operations too. For normal operations, we shouldn't check the event ID. Hence passing null.

If I didn't understand your comment correctly, please revert to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please create relevant jira's that you think are needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong about this point earlier. We set the proper current_state_id in ReplicationSpec for each object. Also, we propagate the same ID in the load side as well. So, there isn't any issue.

}

@Test
public void testIncrementalRepeatEventOnMissingObject() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to separate partitioned vs unpartitioned in separate tests. I also thought you were going to move to the TestReplicationScenariosAcrossInstances

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this test I don't think appropriate to separate for partitioned and unpartitioned. Just care about the sequence of operations. Also, I need multiple tables to verify the change.

About moving to TestReplicationScenariosAcrossInstances, I think, the idea is deferred due to cmroot file system issues and we don't have any plan to fix that now. Please create a JIRA and will plan it in future.

@sankarh
Copy link
Contributor Author

sankarh commented Jun 16, 2017

I thought additionally during repl load , we were going to update the parent object with the event id every time we apply a event

For this comment, we will do it as part of another JIRA BUG-81396 (No apache jira yet) which focus on failure case handling. Updating object state after event makes sense for failure handling only. So, appropriate to handle there.

@sankarh sankarh force-pushed the HIVE-16785 branch 8 times, most recently from 5a14643 to 2a427c1 Compare June 22, 2017 13:19
@sankarh sankarh force-pushed the HIVE-16785 branch 2 times, most recently from 23fa8d0 to ad1841a Compare June 28, 2017 07:46
@sankarh
Copy link
Contributor Author

sankarh commented Jun 29, 2017

Already committed to master.

@sankarh sankarh closed this Jun 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants