Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultEventPublicationRegistry markCompleted method's REQUIRES_NEW transaction propagation means events are marked completed before handler transactions are committed. #485

Open
masch712 opened this issue Jan 30, 2024 · 4 comments
Assignees
Labels
meta: waiting for feedback Waiting for feedback of the original reporter

Comments

@masch712
Copy link

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void markCompleted(Object event, PublicationTargetIdentifier targetIdentifier) {

The markCompleted method of DefaultEventPublicationRegistry is invoked in a MethodInterceptor defined in CompletionRegisteringAdvisor, which intercepts the event handler method, invokes it, then calls the EventPublicationRegistry's markCompleted method to mark the eventPublication as complete.

Because of the REQUIRES_NEW transaction propagation type, markCompleted runs with a completely distinct database connection, and independent transaction from the event handler's transaction. This means there's a chance the event handler's transaction can commit, but the markCompleted transaction does not, which leaves the eventPublication appearing incomplete in the database even though it was completely successfully handled.

I propose that markCompleted should have the REQUIRED transaction propagation type. I believe this should cause markCompleted to commit the eventPublication as completed in the same transaction as the event handler.

Let me know what you think, I can submit a PR.

@odrotbohm
Copy link
Member

odrotbohm commented Jan 30, 2024

Can you provide a reproducer for the behavior you assume? I don't think what you describe is correct as the completing method interceptor is ordered before the standard transaction interceptor decorating the transactional event listener. In other words: only if the listener has completed successfully and the transaction wrapping the listener has completed, the event publication will be marked completed, too.

We need to use Propagation.REQUIRES_NEW as we support non-asynchronous, transactional event listeners, too. Those are invoked in the transaction cleanup phase of the original transactions, and a simple REQUIRED would use the already committed transaction that's in an undefined state at that time. REQUIRES_NEW makes sure we get a new transaction that could potentially roll back in error cases property.

EDIT: I think I misread your original comment, as the title is slightly misleading:

This means there's a chance the event handler's transaction can commit, but the markCompleted transaction does not, which leaves the eventPublication appearing incomplete in the database even though it was completely successfully handled.

I originally thought you were trying to point out that the publication might be marked completed, although the listener failed. You obviously did not do that. That said, the second argument still holds true, and we generally live with the possibility of a publication staying uncompleted in rare error scenarios. In other words: we implement “at least once” semantics and expect the listeners to either perform idempotent work (i.e., being fine with being invoked more than once for a single event).

@odrotbohm odrotbohm self-assigned this Jan 30, 2024
@odrotbohm odrotbohm added the meta: waiting for feedback Waiting for feedback of the original reporter label Jan 30, 2024
@masch712
Copy link
Author

we implement “at least once” semantics and expect the listeners to either perform idempotent work (i.e., being fine with being invoked more than once for a single event).

Thanks, this is helpful.
For what it's worth, I think there's an opportunity to make the JpaEventPublicationRepository concurrency-safe using database locking mechanisms. I'll post a PR if we find the need for such a feature. But with the proliferation of "at-least one" messaging in the software industry, maybe it'd be prudent for me and my colleagues to just write idempotent code :-)

I did make a simple repo that demonstrates this github issue, if you're interested:
https://github.com/masch712/modulith-event-demo/blob/27f324866f23ad131e1a2472137df3459e4707e9/src/test/java/com/athenahealth/collector/colsubs/modulitheventdemo/ModulithEventDemoApplicationTests.java#L164-L214

@odrotbohm
Copy link
Member

We're currently investigating options to improve on the use case. I just wanted to describe how things work as they do right now and why that is.

I don't think I follow on the concurrency-safe aspect of JpaEventPublicationRepository. It's thread-safe as is, we expect event instances to be unique (at least in their serialized form) and them being consumed in the application instance they were produced in in the first place. For replays, that's a different story, as one would clearly need to select a single app instance to process those. Or use something like SchedLock to let the instances use a distributed lock.

The test case looks rather convoluted and isn't as easy to follow. Are you aware of the Scenario API? It allows writing up interaction scenarios without having to manually deal with latches and other aspects of asynchronicity. In standard @SpringBootTests, support for it can be activated through @EnableScenarios.

@masch712
Copy link
Author

I was not aware of the Scenario API, very cool! I'm not sure that could help with my particular test though, since I'm testing the Modulith internals here. Sorry for the complexity of that test, I realize it's not particularly readable. I'll try to explain what it's doing. The most important bullets are the last two.

  • First, it writes a row to the foo_entity table. Foo is just a dummy entity.
  • Then, it sets up a partial mock (via Mockito's Spy functionality) on the eventPublicationRegistry.
    • The hypothesis I'm testing in this JUnit test is that my @ApplicationModuleListener-annotated method's transaction commits before the markCompleted transaction starts.
    • So I mock markCompleted to pause and wait for a latch before it can proceed. While markComplete is paused, we can run run our test assertion.
  • We kick off a thread that publishes the event.
    • I thought it was important to do this in a child thread so that it runs a distinct transaction from the main test thread, in case the main test thread was running in its own transaction. This might have been overkill.
  • We wait for the event-publishing thread to run and complete
    • Our Async, @ApplicationModuleListener-annotated event handler might not be done yet at the point that the event-publishing thread completes.
    • When markCompleted begins, we know the event handler has completed; so we await the markCompletedBeginLatch: markCompletedBeginLatch.await();
  • Query the database and see that the handler committed its transaction, and foo_entity.handled_at is set in the database
  • Call eventPublicationRepository.findIncompletePublications() and see the event is still incomplete.

I don't think I follow on the concurrency-safe aspect of JpaEventPublicationRepository.

I'm specifically concerned with replays here, but I've also found that on publishEvent, the event_publication row is committed as an incomplete event to the database before the event handler begins. So imagine a distributed environment, where a healthy application instance is healthily publishing events and handling them; we scale out and start up a new application instance; if we try to "replay" incomplete events on startup, we're going to retrieve some of these brand new events actively being handled by the healthy instance.

I could see SchedLock being useful in a distributed system that's not using a single ACID database for its event_publication ledger, but in cases using relational databases and event_publication is in the same database, I think a pessimistic locking technique is probably appropriate here.

But again, if "at least once" is the final decision for the event handling paradigm in Modulith, then event_publicatoin locking isn't necessary at all, and you can close this ticket.

Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: waiting for feedback Waiting for feedback of the original reporter
Projects
None yet
Development

No branches or pull requests

2 participants