Skip to content

Conversation

@adnanhemani
Copy link
Contributor

@adnanhemani adnanhemani commented Dec 3, 2025

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • [N/A] 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

This removes the *CommitTableEvent events altogether as they have no explicit use cases now that all APIs have been instrumented. As a result, this will also solve the phantom events noticed in #2990. To ensure that there is an event emitted that shows that a change to the table was ultimately made, this change introduces an additional UpdateTableEvent emission for each of the update table requests within this transaction.

ML thread: https://lists.apache.org/thread/pnkd20prz69vd5bho1b0mqofcbyg69rg

Closes #2990.

@adnanhemani
Copy link
Contributor Author

cc @dimas-b @adutra

@adnanhemani adnanhemani changed the title Ahemani/remove commit table events Remove *CommitTableEvent, Add *UpdateTableEvent to Transactions Dec 3, 2025
@adnanhemani adnanhemani marked this pull request as draft December 3, 2025 03:00
@adnanhemani adnanhemani marked this pull request as ready for review December 3, 2025 04:22
req.identifier().name(),
req,
(LoadTableResponse) resp.getEntity()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right behavior for multi-table transactions. Emitting AfterUpdateTableEvent only after the entire transaction succeeds ensures the correct semantics and avoids exposing partial state. Thanks for making this change.

If there’s interest in finer-grained events for multi-table transactions (e.g., per-table events that fire even before the full transaction commits), we could explore that as a follow-up discussion.

Here is the related dev ML discussion: https://lists.apache.org/thread/5og0qjo8l9rf0kytqjg4gn7d9r81gf79.

Can we add tests for this new logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test!

@adnanhemani
Copy link
Contributor Author

adnanhemani commented Dec 5, 2025

I've made the LoadTableResponse optional in the AfterUpdateTableEvent to support this use case. I will attach a new issue to make this not optional again (and therefore, support surfacing data from within the IcebergCatalogAdapter to Events): #3209.

Comment on lines +128 to +129
BEFORE_COMMIT_TABLE, // REMOVED FROM SOURCE CODE
AFTER_COMMIT_TABLE, // REMOVED FROM SOURCE CODE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove them as there is no usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that will change the ordinal numbers of the enum and so I'm avoiding doing that. Given Events are still in "preview" we have the ability to do so without any prior notice - but just being careful. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid concern. I'm OK to remove it as this is preview feature, but it requires notice. Another option is to adding code field to explicitly set a number for the type. This is more robust so that it prevents any developer adding an event in the middle and accidentally break downstreams. It isn't a blocker for this PR though. Can you file an issue for this?

Something like:

public enum EventType {
    AFTER_UPDATE(101),
    AFTER_DELETE(102);

    private final int code;

    EventType(int code) {
        this.code = code;
    }

    public int code() {
        return code;
    }
}

Adding a constructor will prevent accidental ordinal changes and make future evolution safer. Given the downstream relies on the event type, I think it's reasonable to add a explicit code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds good! I will open the issue!

flyrain
flyrain previously approved these changes Dec 5, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Dec 5, 2025
req.identifier().namespace(),
req.identifier().name(),
req,
null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar events emitted from updateTable have a value for the last parameter (LoadTableResponse), but here it is null.

It that parameter useful to the consumer when it is not available in all situations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented about this earlier: #3195 (comment)

We should get back to the state where this is not null soon. But for that, we will have to solve the larger issue in #3209. I think it is fine to keep this code for now and get back to working on #3209 (hopefully in the near future) to fix this null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing the earlier comment 🤦

If our intention is to provide the same set of parameters (non-null) to events of the same type in all situations, then given that this PR deviates from that goal, I believe we ought to mark this on the javadoc of AfterUpdateTableEvent at least.

A CHANGELOG entry is not required, IMHO, since it's developer-level change at this stage.

That said, the plan SGTM 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be to remove LoadTableResponse from AfterUpdateTableEvent in this PR (in all cases) and re-add when we can provide it in all contexts. I think it would provide a more coherent migration path from the consumer's POV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be to remove LoadTableResponse from AfterUpdateTableEvent in this PR (in all cases) and re-add when we can provide it in all contexts.

This is fair too. I was trying to reduce the amount of changes to the event structure itself and use the fact that this is null (only in the case of transactions, to be clear) to place urgency in trying to fix #3209. I'll make the change for the Javadoc - let me know if removing the LoadTableResponse is a blocking concern for you. Else, I'd prefer to keep this approach the way it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx - LGTM 👍

@flyrain flyrain merged commit 49fb9ac into apache:main Dec 9, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PolarisEventListener may produce events that relate to phantom data

3 participants