-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-16813: Incremental REPL LOAD should load the events in the same sequence as it is dumped. #192
Conversation
|
||
import org.apache.hadoop.fs.FileStatus; | ||
import org.apache.hadoop.fs.Path; | ||
import org.apache.hadoop.hive.ql.parse.repl.load.EventDumpDirComparator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the import is not required
setGetNextNotificationBehaviour(null); | ||
} | ||
|
||
public static com.google.common.base.Function<NotificationEventResponse,NotificationEventResponse> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get methods are just there to compliment the set methods and may be needed in future. It can be removed as well.
run("INSERT INTO TABLE " + dbName + ".unptned values('" + Integer.toString(i) + "')"); | ||
} | ||
|
||
verifySetup("SELECT count(a) from " + dbName + ".unptned", new String[] {"5"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify of basic sql commands not required, adding this only one place and not others please review the whole test when removing these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove only in the new test added. The proper cleanup in all the other tests can be done when we move it to TestReplicationScenariosAcrossInstances with another JIRA else it will be duplicate effort.
@@ -0,0 +1,30 @@ | |||
package org.apache.hadoop.hive.ql.parse.repl.load; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need license here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
dirList[i] = new FileStatus(5, true, 1, 64, 100, | ||
new Path("hdfs://tmp/"+Integer.toString(i-1))); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setup seems complicated :)
int size = 30;
FileStatus[] dirList = new FileStatus[size];
for(int i = 0; i<size;i++){
dirList[i] = new FileStatus(5, true, 1, 64, 100,
new Path("hdfs://tmp/"+Integer.toString(size-i)));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried to have values in sequence 1, 0, 3, 2, 5, 4.
Your method should work as well.
advanceDumpDir(); | ||
String cmd = "REPL DUMP " + dbName + " FROM " + replDumpId; | ||
if (metaStoreClient.getCurrentNotificationEventId().getEventId() < 150) { | ||
cmd = cmd + " TO 150"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we need the if clause ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't specify the TO clause, then internally we set it to currentNotificationEventId from metastore. If the currentNotificationEventId is less than 100, then test will fail. This can happen randomly. So, just having this if clause to be safe.
@@ -589,6 +590,72 @@ public void testIncrementalAdds() throws IOException { | |||
} | |||
|
|||
@Test | |||
public void testIncrementalLoadWithVariableLengthEventId() throws IOException, TException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truncate(event id 99) followed by 1 insert(event id 100) would be sufficient ?
there are multiple events and iterator is from 90 might not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, if the number of events for TRUNCATE changes in future, then this test may not hold good. So, better to have several events to ensure it doesn't fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the test is doing insert followed by truncate such that truncate gets a id more than 99. So it depends on number of events that insert generates, given minimum it should be 1 event atleast, starting at 99 for a insert will be good and truncate will move to events above 99. so 1 insert + 1 truncate should be enough, since truncate can have as many events associated with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work now as INSERT generates 3 events ALTER+INSERT+ALTER. So, if we set 99 as start ID and have one insert, then insert will cross the boundary of 100. If we set start as 98, then it will be problem if in future we reduce the number of events for insert. So, I feel, it is better to keep 5 inserts for safer side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what alter(99) does. If it is just stats change (most probably for insert), then the impact won't be visible as truncate was the last operation of data files and hence select will return 0 records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well if we cant detect the affect of alter(99) then might be better to write the test more tightly coupled with the number of events that insert generates ( 3 here ) . as number of events per insert might be more than 3 say about 7, with first 5 doing things around stat and as such then we face the same problem as with my previous example, but with the code below. one way to make it more robust might be to, in the behavior injection for event id we explicitly check for the truncate event and give it and id of 100 with all else being an increment from 1, ofcourse assuming truncate emits one event else we figure out all events for truncate to be 100+counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think, we will increase the number of events for INSERT in future. But, still I like the idea of setting TRUNCATE event ID as 100 and others starts from 1. Will try and see if any issues with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think. this test will have other issues as well. If we set to 1 (anything less than lastReplId of bootstrap), then, it is possible that old events gets applied twice due to filter logic and cause inconsistency. Need to see, if this test case can be removed as it doesn't make sense in future. However, I add unit test to verify the sort order of event IDs with new comparator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, the change worked. The new patch is submitted. Pls review.
a3bcf09
to
83050f7
Compare
…sequence as it is dumped.
Already committed to master. |
No description provided.