Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-16813: Incremental REPL LOAD should load the events in the same sequence as it is dumped. #192

Closed
wants to merge 2 commits into from

Conversation

sankarh
Copy link
Contributor

@sankarh sankarh commented Jun 5, 2017

No description provided.


import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.parse.repl.load.EventDumpDirComparator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the import is not required

setGetNextNotificationBehaviour(null);
}

public static com.google.common.base.Function<NotificationEventResponse,NotificationEventResponse>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get methods are just there to compliment the set methods and may be needed in future. It can be removed as well.

run("INSERT INTO TABLE " + dbName + ".unptned values('" + Integer.toString(i) + "')");
}

verifySetup("SELECT count(a) from " + dbName + ".unptned", new String[] {"5"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify of basic sql commands not required, adding this only one place and not others please review the whole test when removing these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove only in the new test added. The proper cleanup in all the other tests can be done when we move it to TestReplicationScenariosAcrossInstances with another JIRA else it will be duplicate effort.

@@ -0,0 +1,30 @@
package org.apache.hadoop.hive.ql.parse.repl.load;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need license here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

dirList[i] = new FileStatus(5, true, 1, 64, 100,
new Path("hdfs://tmp/"+Integer.toString(i-1)));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup seems complicated :)

int size = 30;
FileStatus[] dirList = new FileStatus[size];

for(int i = 0; i<size;i++){
  dirList[i] = new FileStatus(5, true, 1, 64, 100,
      new Path("hdfs://tmp/"+Integer.toString(size-i)));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to have values in sequence 1, 0, 3, 2, 5, 4.
Your method should work as well.

advanceDumpDir();
String cmd = "REPL DUMP " + dbName + " FROM " + replDumpId;
if (metaStoreClient.getCurrentNotificationEventId().getEventId() < 150) {
cmd = cmd + " TO 150";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need the if clause ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't specify the TO clause, then internally we set it to currentNotificationEventId from metastore. If the currentNotificationEventId is less than 100, then test will fail. This can happen randomly. So, just having this if clause to be safe.

@@ -589,6 +590,72 @@ public void testIncrementalAdds() throws IOException {
}

@Test
public void testIncrementalLoadWithVariableLengthEventId() throws IOException, TException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate(event id 99) followed by 1 insert(event id 100) would be sufficient ?

there are multiple events and iterator is from 90 might not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, if the number of events for TRUNCATE changes in future, then this test may not hold good. So, better to have several events to ensure it doesn't fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the test is doing insert followed by truncate such that truncate gets a id more than 99. So it depends on number of events that insert generates, given minimum it should be 1 event atleast, starting at 99 for a insert will be good and truncate will move to events above 99. so 1 insert + 1 truncate should be enough, since truncate can have as many events associated with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work now as INSERT generates 3 events ALTER+INSERT+ALTER. So, if we set 99 as start ID and have one insert, then insert will cross the boundary of 100. If we set start as 98, then it will be problem if in future we reduce the number of events for insert. So, I feel, it is better to keep 5 inserts for safer side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what alter(99) does. If it is just stats change (most probably for insert), then the impact won't be visible as truncate was the last operation of data files and hence select will return 0 records.

Copy link
Contributor

@anishek anishek Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well if we cant detect the affect of alter(99) then might be better to write the test more tightly coupled with the number of events that insert generates ( 3 here ) . as number of events per insert might be more than 3 say about 7, with first 5 doing things around stat and as such then we face the same problem as with my previous example, but with the code below. one way to make it more robust might be to, in the behavior injection for event id we explicitly check for the truncate event and give it and id of 100 with all else being an increment from 1, ofcourse assuming truncate emits one event else we figure out all events for truncate to be 100+counter

Copy link
Contributor Author

@sankarh sankarh Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think, we will increase the number of events for INSERT in future. But, still I like the idea of setting TRUNCATE event ID as 100 and others starts from 1. Will try and see if any issues with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think. this test will have other issues as well. If we set to 1 (anything less than lastReplId of bootstrap), then, it is possible that old events gets applied twice due to filter logic and cause inconsistency. Need to see, if this test case can be removed as it doesn't make sense in future. However, I add unit test to verify the sort order of event IDs with new comparator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the change worked. The new patch is submitted. Pls review.

@sankarh sankarh force-pushed the HIVE-16813 branch 5 times, most recently from a3bcf09 to 83050f7 Compare June 8, 2017 19:47
@sankarh
Copy link
Contributor Author

sankarh commented Jun 9, 2017

Already committed to master.

@sankarh sankarh closed this Jun 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants