Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add REST API for committing changes against multiple tables #7569

Merged
merged 1 commit into from Jun 19, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented May 9, 2023

The main purpose of this PR is to extract the REST-specific changes out of #6948

@github-actions github-actions bot added the core label May 9, 2023
@nastra nastra force-pushed the rest-spec-for-multi-table-diff branch from ee59679 to 143bf62 Compare May 9, 2023 14:15
@nastra nastra marked this pull request as draft May 9, 2023 14:17
@nastra nastra force-pushed the rest-spec-for-multi-table-diff branch 3 times, most recently from fa5dcc7 to 353277d Compare May 10, 2023 07:24
@@ -1756,6 +1871,30 @@ components:
items:
$ref: '#/components/schemas/TableUpdate'

CommitTransactionTableRequest:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommitTableRequest does already exist in this file unfortunately, hence naming it CommitTransactionTableRequest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could refactor to work around this. Could we just use a list of CommitTableRequest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing CommitTableRequest doesn't have an identifier field unfortunately

@nastra nastra force-pushed the rest-spec-for-multi-table-diff branch from 353277d to 8c17db8 Compare May 10, 2023 08:30
@nastra nastra marked this pull request as ready for review May 10, 2023 08:31
$ref: '#/components/responses/UnauthorizedResponse'
403:
$ref: '#/components/responses/ForbiddenResponse'
404:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense with multiple tables? The route exists. Maybe this is a bad request?

Copy link
Contributor Author

@nastra nastra May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where I thought it might make sense to throw a 404 is when the server gets changes for a table that doesn't exist (anymore). Alternatively we could handle this case via a general CommitFailedException, but CommitFailedException is retryable and I don't think we'd want to retry in this particular case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm undecided. On one hand, that would already work and we wouldn't need to find a response code for "concurrent delete" that avoids the retry. On the other, the route doesn't represent that table.

I think I'm inclined to go with what you have here right now. It makes sense and we can always deprecate its use later.

@nastra nastra force-pushed the rest-spec-for-multi-table-diff branch 3 times, most recently from b3fcfd3 to 6a7f07a Compare May 16, 2023 16:23
@nastra nastra requested a review from rdblue May 16, 2023 16:23
public interface TableCommit {
TableIdentifier identifier();

TableMetadata base();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted above, I think it would be better to use requirements and updates instead of mixing the updates with a TableMetadata.

I think there is a drawback to that approach, though. Any catalog that commits using metadata location rather than requirements and updates would be harder to update for multi-table transactions because those require the base metadata and the new metadata.

Overall, I think I would still prefer the requirements and updates. There are a couple of reasons:

  1. Implementations of TableOperations almost always refresh immediately before attempting a commit anyway. Loading table metadata should not be a performance problem.
  2. Some operations don't refresh table metadata before attempting to commit because they do not retry. For those cases, we have to address that they are not normally retried and use generic retry logic. That's what requirements/updates already do.

For a concrete example of point 2 above, consider UpdateSchema that does not retry. If the table changes concurrently, then the schema update fails. That's because the operation doesn't validate that it can still apply the schema changes. However, the REST protocol does provide a way to validate a schema update can still be applied, by sending an assert-last-assigned-field-id requirement.

If we were to try to perform a schema update using a multi-table commit method, that method doesn't know what is update is happening and will always retry. To make that retry safe, we should use the requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with you on just using requirements and updates in TableCommit due to the points you mentioned. The only difficulty is that requirements are currently at the rest-layer and we'd have to refactor them out to make them generally available/usable. I didn't want to overcomplicate this PR with such a refactoring and decided to keep the base metadata in the class for now, which we use to build the requirements from

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is the primary blocker, so I recommend we get started on the refactor and get some of the other pieces of this PR (like REST updates and request objects) in.

* not do any other conflict detection. Therefore, it does not guarantee true transactional
* atomicity, which is left to the implementation details of a REST server.
*/
public static void commitTransaction(Catalog catalog, CommitTransactionRequest request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this method should check whether the underlying catalog implements commitTransaction. If it does, then it should delegate to the catalog. If not, it should throw an exception.

The CatalogHandlers act as a reference implementation for the REST protocol, so I think it would be problematic if we included a method that took an atomic multi-table commit and implemented it as a sequence of individual commits.

}

@Value.Immutable
interface CommitTableRequest {
Copy link
Contributor

@rdblue rdblue May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems confusing to have a CommitTableRequest and an UpdateTableRequest that are basically the same thing but with or without the identifier. Then there is also the naming problem in the spec. Those issues are a bit of a red flag that we need to address duplication.

I think there are two options. First, we could reuse the UpdateTableRequest directly and pull the identifier to a higher level, like this:

    CommitTransactionRequest:
      type: array
      items:
        $ref: '#/components/schemas/TransactionCommit'

    TransactionCommit:
      type: object
      required:
        - identifier
        - commit
      properties:
        identifier:
          $ref: '#/components/schemas/TableIdentifier'
        request:
          $ref: '#/components/schemas/CommitTableRequest'

    CommitTableRequest:
      type: object
      required:
        - requirements
        - updates
      properties:
        requirements:
          type: array
          items:
            $ref: '#/components/schemas/TableRequirement'
        updates:
          type: array
          items:
            $ref: '#/components/schemas/TableUpdate'

That's okay, but still fairly awkward. The next option is to reuse the existing schema directly and just add an optional identifier field:

    CommitTransactionRequest:
      type: array
      items:
        description: Each table commit request must provide an `identifier`
        $ref: '#/components/schemas/CommitTableRequest'

    CommitTableRequest:
      type: object
      required:
        - requirements
        - updates
      properties:
        identifier:
          description: Table identifier to update; must be present for CommitTransactionRequest
          $ref: '#/components/schemas/TableIdentifier'
        requirements:
          type: array
          items:
            $ref: '#/components/schemas/TableRequirement'
        updates:
          type: array
          items:
            $ref: '#/components/schemas/TableUpdate'

I prefer the second option. It's not a problem to add an optional field and ensure it is set when the object is used in a transaction. We'd also want to check that the identifier is either not set or is set to the same table in a normal table update.

for (TableCommit commit : commits) {
UpdateTableRequest.Builder updateTableBuilder = UpdateTableRequest.builderFor(commit.base());
commit.changes().forEach(updateTableBuilder::update);
UpdateTableRequest updateTableRequest = updateTableBuilder.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we may need to refactor how we produce requirements. This would not work for non-REST catalogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I agree, we'd have to pull them out of the REST layer and make them work for non-REST catalogs

(BaseTransaction.TransactionTable) transaction.table();

// this performs validations and makes temporary commits that are in-memory
commit(txTable.operations(), updateTableRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you'd want to use a transaction to get the bulk of the work done before committing, but I don't think this actually works because you're committing the transactions sequentially below.

I think to test we need to pass the commitTransaction call through and implement it either for an InMemoryCatalog or JDBC.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, @nastra! I think we can get this in fairly quickly. Thanks for getting this ready!

@nastra nastra force-pushed the rest-spec-for-multi-table-diff branch 2 times, most recently from ae56b38 to 794744c Compare May 24, 2023 18:41
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra, can you move the REST spec updates and request objects (with tests) to a different PR? I think those are about ready to go and this is currently blocked by producing requirements and refactoring that code. That way we can get some of this in rather than waiting.

@nastra
Copy link
Contributor Author

nastra commented May 30, 2023

@rdblue I've opened #7741 that contains the REST-specific changes and #7750 that contains the UpdateRequirement refactorings.

this.requirements = requirements;
this.updates = updates;
}

UpdateTableRequest(
public UpdateTableRequest(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is required to be public so that it can be used in RESTSessionCatalog.
It might be also worth adding a Builder to UpdateTableRequest so that we don't have to make it public here.
In this case we have two options:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a create method like TableCommit? Or one that accepts a TableCommit and returns the correct request? That would be simpler than a builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that would also work. I'll make the respective changes in the next PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in #7867

@nastra nastra force-pushed the rest-spec-for-multi-table-diff branch from f3c9e0c to 22ec8a8 Compare June 17, 2023 04:58

List<MetadataUpdate> updates();

static TableCommit create(TableIdentifier identifier, TableMetadata base, TableMetadata updated) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in #7867

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major that would block this, but I made a couple of comments. Thanks, @nastra! Good to have transactions moving forward.

@rdblue rdblue merged commit d8f2daf into apache:master Jun 19, 2023
41 checks passed
@nastra nastra deleted the rest-spec-for-multi-table-diff branch June 19, 2023 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

3 participants