New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-20517 : Creation of staging directory and Move operation is taking time in S3 #430
Conversation
75c506d
to
b2aa892
Compare
@@ -489,6 +489,11 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal | |||
+ "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 'hive.repl.dump.metadata.only' \n" | |||
+ " is set to true then this config parameter has no effect as external table meta data is flushed \n" | |||
+ " always by default."), | |||
REPL_ENABLE_MOVE_OPTIMIZATION("hive.repl.enable.move.optimization", false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, highlight the side effect of setting this flag to true in HDFS system where incremental repl may lead to partial data read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -489,6 +489,11 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal | |||
+ "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 'hive.repl.dump.metadata.only' \n" | |||
+ " is set to true then this config parameter has no effect as external table meta data is flushed \n" | |||
+ " always by default."), | |||
REPL_ENABLE_MOVE_OPTIMIZATION("hive.repl.enable.move.optimization", false, | |||
"If its set to true, replcopy task copy the source files directly to the destination path. \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall rephrase the statement as "If its set to true, REPL LOAD copies data files directly to the target table/partition location instead of copying to staging directory first and then move to target location. This optimize the REPL LOAD on object data stores such as S3 or WASB where creating a directory and move files are costly operations."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
run("CREATE TABLE " + tableNamePart + " (fld int) partitioned by (part int) STORED AS TEXTFILE", driver); | ||
|
||
run("insert into " + tableNameNoPart + " values (1) ", driver); | ||
run("insert into " + tableNameNoPart + " values (2) ", driver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall just limit to 1 row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for partitioned table first one goes in add partition flow and second one in insert flow
|
||
run("insert into " + tableNameNoPart + " values (1) ", driver); | ||
run("insert into " + tableNameNoPart + " values (2) ", driver); | ||
verifySetup("SELECT fld from " + tableNameNoPart , new String[]{ "1" , "2" }, driver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verifySetup can be removed as they are generic scenarios tested by many other test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
verifyRun("SELECT fld from " + tableNamePart + " where part = 10" , new String[]{ "1" , "2"}, driverMirror); | ||
verifyRun("SELECT fld from " + tableNamePart + " where part = 11" , new String[]{ "3" }, driverMirror); | ||
verifyRun("SELECT fld from " + tableNameNoPart , new String[]{ "1" , "2" }, driverMirror); | ||
verifyRun("SELECT count(*) from " + tableNamePart , new String[]{ "3"}, driverMirror); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall remove all count(*) queries as they are already verified with values query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count(*) for check if stats are updated or not
@@ -80,6 +84,8 @@ | |||
public static final String FUNCTIONS_ROOT_DIR_NAME = "_functions"; | |||
public static final String CONSTRAINTS_ROOT_DIR_NAME = "_constraints"; | |||
|
|||
private static final List<String> CLOUD_SCHEME_PREFIXES = Arrays.asList("s3a", "wasb"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to hardcode this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats the only way now
withinContext.replicationSpec.setIsReplace(true); | ||
// In case of ACID operations, same directory may have many other sub directory for different write id stmt id | ||
// combination. So we can not set isreplace to true. | ||
withinContext.replicationSpec.setIsReplace(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But REPL LOAD copies to sub-directory directly not base dir and so replace flag to true make sense in case of retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a check to remove stale files during copy , if the copy is done from cm path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this change for replace flag can be reverted right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no ..for acid tables the files need not be replaced
@@ -49,6 +49,8 @@ | |||
// If set to false, it'll behave as a traditional CopyTask. | |||
protected boolean readSrcAsFilesList = false; | |||
|
|||
protected boolean deleteDestIfExist = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is used by replcopy task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why do we have public methods to expose this? Can be either private member + public methods or just protected if accessed by extended classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to private
|
||
if (replicationSpec.isInReplicationScope() && | ||
x.getCtx().getConf().getBoolean(REPL_ENABLE_MOVE_OPTIMIZATION.varname, false)) { | ||
lft = LoadFileType.IGNORE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we skip move task instead of IGNORE type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that can be done only if its a add partition ..for insert operation move is required ..anyways thats handled in a separate patch
Path destPath; | ||
if (replicationSpec.isInReplicationScope() && | ||
x.getCtx().getConf().getBoolean(REPL_ENABLE_MOVE_OPTIMIZATION.varname, false)) { | ||
loadFileType = LoadFileType.IGNORE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we skip move task instead of IGNORE type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move task is required for statistics
a57ce7b
to
f3cf0bd
Compare
@@ -489,6 +489,12 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal | |||
+ "used in conjunction with 'hive.repl.dump.metadata.only' set to false. if 'hive.repl.dump.metadata.only' \n" | |||
+ " is set to true then this config parameter has no effect as external table meta data is flushed \n" | |||
+ " always by default."), | |||
REPL_ENABLE_MOVE_OPTIMIZATION("hive.repl.enable.move.optimization", false, | |||
"If its set to true, REPL LOAD copies data files directly to the target table/partition location \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall remove all \n in the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of the parameters are there with \n ?
@@ -1321,6 +1322,106 @@ public Boolean apply(@Nullable CallerArguments args) { | |||
.verifyResult(replicatedDbName + ".testFunctionOne"); | |||
} | |||
|
|||
@Test | |||
public void testMoveOptimizationBootstrapReplLoadRetryAfterFailureFor() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall remove "For" from test name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}; | ||
|
||
InjectableBehaviourObjectStore.setAddNotificationModifier(callerVerifier); | ||
replica.loadFailure(replicadb, tuple.dumpLocation, withConfigs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put in try catch block and call InjectableBehaviourObjectStore.resetAddNotificationModifier in finally block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
replica.load(replicadb, tuple.dumpLocation, withConfigs); | ||
|
||
replica.run("use " + replicadb) | ||
.run("select country from " + tbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be select place from tbl where country='india' to we know if the data file is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
try { | ||
Hive hiveDb = Hive.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use getHive() of Task class to consider the Task conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
withinContext.replicationSpec.setIsReplace(true); | ||
// In case of ACID operations, same directory may have many other sub directory for different write id stmt id | ||
// combination. So we can not set isreplace to true. | ||
withinContext.replicationSpec.setIsReplace(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this change for replace flag can be reverted right?
OVERWRITE_EXISTING | ||
OVERWRITE_EXISTING, | ||
/** | ||
* No need to rename the file, used in case of replication to s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"No need to move the file".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -49,6 +49,8 @@ | |||
// If set to false, it'll behave as a traditional CopyTask. | |||
protected boolean readSrcAsFilesList = false; | |||
|
|||
protected boolean deleteDestIfExist = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why do we have public methods to expose this? Can be either private member + public methods or just protected if accessed by extended classes.
Boolean success = addNotificationEventModifier.apply(entry); | ||
if ((success != null) && !success) { | ||
throw new MetaException("InjectableBehaviourObjectStore: Invalid addNotificationEvent operation on DB: " | ||
+ entry.getDbName() + " table: " + entry.getTableName() + " event : " + entry.getEventType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB and table name need not be set always. shall use String.valueOf(entry.getDbName()) as it return "null" string if null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printing null is fine i think
if ((success != null) && !success) { | ||
throw new MetaException("InjectableBehaviourObjectStore: Invalid addNotificationEvent operation on DB: " | ||
+ entry.getDbName() + " table: " + entry.getTableName() + " event : " + entry.getEventType()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to call super.addNotificationEvent(entry); if modifier method returns true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
f3cf0bd
to
9bffb7c
Compare
…ng time in S3 : review comment fix
9bffb7c
to
7af4eab
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
No description provided.