Handle command failures within process managers #79

bamorim · 2017-09-20T20:29:17Z

How can we make our process react when things go wrong?

Before I start, I'd like to say that I'm posting this because of the idea of distributed sagas. You can check about that in the videos below:
Distributed Sagas: A Protocol for Coordinating Microservices - Caitie McCaffrey - JOTB17
Using sagas to maintain data consistency in a microservice architecture by Chris Richardson

Lets say we are modeling a Travel process which need to get reservation for hotel, car and flight.
Each aggregate (hotel, car and flight) has some kind of command ReserveX{} and a CancelXReservation{} and events XReserved and ReservationXCanceled. However, if it is not possible to reserve (no seats in the flight) the aggregate will return something like {:error, :no_available_seats}.

Now lets think about the TravelProcessManager, basically it listen to TravelCreated events (on the travel aggregate) and when that happens it dispatches three commands, ReserveHotel, ReserveCar, ReserveFlight.

In the happy path, it waits for the XReserved events. However, whenever it receives a {:error, :no_available_seats} it should cancel all other reservations that may have been made.

How can we handle that?
One idea is to make aggregates emit events for failure as well.
Maybe that is not the resposability of a process manager and I should implement that thing myself, and create some lib to implement that protocol directly. (Its basically a writeahead log and a rollback procedure using some DAG stuff). Anyway, I think that maybe something nice to add to the commanded lib.

I'd like to know what you guys think about it,
Cheers,
Bernardo

slashdotdash · 2017-09-20T20:48:36Z

Using domain events to model failures is how you'd solve the problem currently.

Instead of your aggregate returning an {:error, :no_available_seats} error, it should raise a domain event such as FlightReservationFailed and could provide the failure as a field (e.g. reason: :no_seats_available). Your travel process manager can then subscribe to both success and failure events and handle each appropriately. It would dispatch cancellation commands in the case of any step failure.

A benefit to using domain events for errors is they provide useful auditing and analytics. The business might find it convenient to report on these events in the future to answer pertinent questions (e.g. "how often do we fail to reserve seats on flights for airline X?").

There's a pending issue to add a retry/error handling mechanism to event handlers and process managers (#20). This feature would allow your aggregate to return error tuples, or raise an exception, and provide an extension point in your process manager to handle errors on a case-by-case basis. But it needs to be implemented.

Does that answer your question?

bamorim · 2017-09-20T22:50:55Z

Yes, that seems a good idea. So the point is to return these events, but in the cases where this is a CreateX command it should not create a valid aggregate, right? By returning an event, we assume it worked and that we can fold over the events to get the state of the aggregate, which actually shouldn't exist. How would you work in that situation?
Just allowing the aggregate to "exist" but actually in an "invalid" state?
Also, if domain errors are just events, how can I know that a command worked after dispatching it? Also, what is the point to returning {:error, reason}? Which kind of errors you think I should return in the tuple and witch you think I should return in events?

slashdotdash · 2017-09-21T10:24:34Z

It's ok to have an aggregate in such a state, think of it as an unfulfilled reservation request rather than an "error state". You can use Commanded's Aggregate lifespan feature to shutdown these aggregates after an error event if you are concerned with them running indefinitely.

how can I know that a command worked after dispatching it?

You receive an :ok response from a successful command dispatch, or an {:error, reason} when it fails. However, by using failure domain events the command will successfully dispatch (returning :ok). You will need an event handler to handle these events and notify the end user as appropriate (alert email, in-app notification, use Phoenix channels to push the failure to the user's browser).

The general pattern with long running processes, such as your reservation, is to notify the user that their request was accepted, start processing the request in the background, and inform the user that they will be notified upon success/failure. One approach is to have a read model projection for the reservation that is updated from the domain events. As an example if you're building a web app, after successful submission you redirect the user to a page that polls/subscribes the user to updates of their reservation status.

what is the point to returning {:error, reason}? Which kind of errors you think I should return in the tuple and witch you think I should return in events?

As a rule of thumb, use domain events for failures of commands dispatched by a process manager. You can use {:error, reason} tuples elsewhere and to guard against bugs (e.g. attempting to reserve a flight for a date in the past).

slashdotdash · 2017-10-18T09:41:14Z

@bamorim You can now handle errors in your process managers using the new feature described in #93.

Here's your example travel process manager responding to seat failure by cancelling the hotel and car reservations:

defmodule TravelProcessManager do
  use Commanded.ProcessManagers.ProcessManager,
    name: "TravelProcessManager",
    router: TravelRouter

  def error({:error, :no_available_seats}, _failed_command, _pending_commands, context) do
    {:continue, [%CancelHotel{...}, %CancelCar{...}], context}
  end
end

Now you can choose to model failures as errors (e.g. {:error, :no_available_seats}) or domain events (e.g. FlightReservationFailed).

bamorim · 2017-10-18T09:57:17Z

@slashdotdash that is amazing. I'm looking forward into using commanded more seriously here at work. Thank you for your great work. :D

…

On Oct 18, 2017 7:41 AM, "Ben Smith" ***@***.***> wrote: @bamorim <https://github.com/bamorim> You can now handle errors in your process managers using the new feature described in #93 <#93>. Here's your example travel process manager responding to seat failure by cancelling the hotel and car reservations: defmodule TravelProcessManager do use Commanded.ProcessManagers.ProcessManager, name: "TravelProcessManager", router: TravelRouter def error({:error, :no_available_seats}, _failed_command, _pending_commands, context) do {:continue, [%CancelHotel{...}, %CancelCar{...}], context} endend — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#79 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAyisbyoatuEFppqaTvbK5bVp_7tBFanks5stce8gaJpZM4PebnE> .

slashdotdash changed the title ~~Handle command failures withing process managers~~ Handle command failures within process managers Sep 22, 2017

slashdotdash closed this as completed Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle command failures within process managers #79

Handle command failures within process managers #79

bamorim commented Sep 20, 2017 •

edited

slashdotdash commented Sep 20, 2017 •

edited

bamorim commented Sep 20, 2017 •

edited

slashdotdash commented Sep 21, 2017

slashdotdash commented Oct 18, 2017 •

edited

bamorim commented Oct 18, 2017 via email

Handle command failures within process managers #79

Handle command failures within process managers #79

Comments

bamorim commented Sep 20, 2017 • edited

slashdotdash commented Sep 20, 2017 • edited

bamorim commented Sep 20, 2017 • edited

slashdotdash commented Sep 21, 2017

slashdotdash commented Oct 18, 2017 • edited

bamorim commented Oct 18, 2017 via email

bamorim commented Sep 20, 2017 •

edited

slashdotdash commented Sep 20, 2017 •

edited

bamorim commented Sep 20, 2017 •

edited

slashdotdash commented Oct 18, 2017 •

edited