Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support create and replace transactions in Catalog #362

Merged
merged 1 commit into from Aug 26, 2019

Conversation

@aokolnychyi
Copy link
Contributor

commented Aug 7, 2019

This PR adds support for create and replace transactions in Catalog and resolves #261.

* @return a {@link Transaction} to create the table
* @throws AlreadyExistsException if the table already exists
*/
Transaction newCreateTableTransaction(

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 7, 2019

Author Contributor

I am a bit concerned about the number of methods we will have in Catalog. So, I did not include overloaded helper methods as we have for create table statements. Let me know what other people think.

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 7, 2019

Contributor

I'd rather keep them in, but it should be fine this way. We can always add them later.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 15, 2019

Author Contributor

I'll add them

@aokolnychyi aokolnychyi force-pushed the aokolnychyi:catalog-create-replace-txn branch 2 times, most recently from b903714 to 95ac3e5 Aug 24, 2019
@aokolnychyi

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2019

@rdblue ready for another review round

@Override
public void commit(TableMetadata base, TableMetadata metadata) {
doCommit(base, metadata);
requestRefresh();

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

Being able to ensure this is called is an unexpected benefit!

if (base == null && !orCreate) {
throw new NoSuchTableException("Table doesn't exist");
} else if (base == null) {
commitCreateTransaction();

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

I don't think this is correct because commitCreateTransaction won't succeed if the table is created concurrently, but create or replace should succeed.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 24, 2019

Author Contributor

Should we use replaceMetadata to get the number of retries if base is null?

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 24, 2019

Author Contributor

If base is null, I think we can either use replaceMetadata instead of base and keep the existing try-catch or we can actually call commitCreateTransaction wrapped in another try first. If we get AlreadyExistsException, we can call the second one, which will replace the table.

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

Isn't base going to be reinitialized correctly just before the commit? I think the right place to do this is in retry.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 25, 2019

Author Contributor

We need to know how many retries to use, etc. We cannot configure Tasks as base is null.

Tasks.foreach(ops)
     .retry(base.propertyAsInt(COMMIT_NUM_RETRIES, COMMIT_NUM_RETRIES_DEFAULT))
     .exponentialBackoff(
         base.propertyAsInt(COMMIT_MIN_RETRY_WAIT_MS, COMMIT_MIN_RETRY_WAIT_MS_DEFAULT),
         base.propertyAsInt(COMMIT_MAX_RETRY_WAIT_MS, COMMIT_MAX_RETRY_WAIT_MS_DEFAULT),
         base.propertyAsInt(COMMIT_TOTAL_RETRY_TIME_MS, COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT),
         2.0 /* exponential */)
     .onlyRetryOn(CommitFailedException.class)
     .run(underlyingOps -> {
       ...
     }

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 25, 2019

Contributor

In that case, I think we should use the settings from the current metadata instead of base.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 25, 2019

Author Contributor

Done

Assert.assertTrue("Table should be created", catalog.tableExists(TABLE_IDENTIFIER));

// expect the transaction to fail
txn.commitTransaction();

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

I prefer to use TestHelpers.assertThrows for cases like this instead of the ExpectedException rule because I think what is happening is more clear in the test and so that you can make more assertions after the exception is thrown to validate that the conflicting table has not changed.

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

This is minor, so don't worry about it if it would be a lot of work to convert.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 24, 2019

Author Contributor

Let me consume the test jar for iceberg-api in iceberg-hive then.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 24, 2019

Author Contributor

Wait, what is the difference between AssertHelpers and TestHelpers?

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

AssertHelpers is the one in core, so probably use that one.

Before we were producing test Jars to share code, we copied this in a few places. Should probably clean that up.

.set("prop", "value")
.commit();
// expect the transaction to fail
txn.commitTransaction();

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

Shouldn't the exception be thrown immediately when the transaction is created?


@Test
public void testCreateOrReplaceTableTxnTableNotExists() {
Transaction txn = catalog.newReplaceTableTransaction(TABLE_IDENTIFIER, SCHEMA, SPEC, true);

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 24, 2019

Contributor

Minor: this should assert that the table doesn't exist to ensure it is testing the right behavior.

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 25, 2019

Author Contributor

Added the check above

@aokolnychyi aokolnychyi force-pushed the aokolnychyi:catalog-create-replace-txn branch from 95ac3e5 to b340a29 Aug 25, 2019
@@ -184,6 +183,9 @@ public void commit(TableMetadata base, TableMetadata metadata) {
});
}
threw = false;
} catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {

This comment has been minimized.

Copy link
@aokolnychyi

aokolnychyi Aug 25, 2019

Author Contributor

This part requires attention. This change was needed to always throw AlreadyExistsException instead of RuntimeException when tables are created concurrently.

@aokolnychyi aokolnychyi force-pushed the aokolnychyi:catalog-create-replace-txn branch from b340a29 to 122528a Aug 26, 2019
@@ -66,6 +67,35 @@ public int currentVersion() {
return version;
}

@Override
public TableMetadata refresh() {

This comment has been minimized.

Copy link
@rdblue

rdblue Aug 26, 2019

Contributor

FYI @moulimukherjee and @aokolnychyi: Mouli just added docs for implementing your own BaseMetastoreTableOperations. We'll want to update those for this change to override doRefresh at some point, depending on when this makes it in.

@rdblue

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

+1

Thanks for fixing this, @aokolnychyi!

@rdblue rdblue merged commit d7c7d0f into apache:master Aug 26, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rdblue rdblue added this to the Java 0.1.0 Release milestone Aug 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.