Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle command failures within process managers #79

Closed
bamorim opened this issue Sep 20, 2017 · 5 comments
Closed

Handle command failures within process managers #79

bamorim opened this issue Sep 20, 2017 · 5 comments

Comments

@bamorim
Copy link
Contributor

bamorim commented Sep 20, 2017

How can we make our process react when things go wrong?

Before I start, I'd like to say that I'm posting this because of the idea of distributed sagas. You can check about that in the videos below:
Distributed Sagas: A Protocol for Coordinating Microservices - Caitie McCaffrey - JOTB17
Using sagas to maintain data consistency in a microservice architecture by Chris Richardson

Lets say we are modeling a Travel process which need to get reservation for hotel, car and flight.
Each aggregate (hotel, car and flight) has some kind of command ReserveX{} and a CancelXReservation{} and events XReserved and ReservationXCanceled. However, if it is not possible to reserve (no seats in the flight) the aggregate will return something like {:error, :no_available_seats}.

Now lets think about the TravelProcessManager, basically it listen to TravelCreated events (on the travel aggregate) and when that happens it dispatches three commands, ReserveHotel, ReserveCar, ReserveFlight.

In the happy path, it waits for the XReserved events. However, whenever it receives a {:error, :no_available_seats} it should cancel all other reservations that may have been made.

How can we handle that?
One idea is to make aggregates emit events for failure as well.
Maybe that is not the resposability of a process manager and I should implement that thing myself, and create some lib to implement that protocol directly. (Its basically a writeahead log and a rollback procedure using some DAG stuff). Anyway, I think that maybe something nice to add to the commanded lib.

I'd like to know what you guys think about it,
Cheers,
Bernardo

@slashdotdash
Copy link
Member

slashdotdash commented Sep 20, 2017

Using domain events to model failures is how you'd solve the problem currently.

Instead of your aggregate returning an {:error, :no_available_seats} error, it should raise a domain event such as FlightReservationFailed and could provide the failure as a field (e.g. reason: :no_seats_available). Your travel process manager can then subscribe to both success and failure events and handle each appropriately. It would dispatch cancellation commands in the case of any step failure.

A benefit to using domain events for errors is they provide useful auditing and analytics. The business might find it convenient to report on these events in the future to answer pertinent questions (e.g. "how often do we fail to reserve seats on flights for airline X?").

There's a pending issue to add a retry/error handling mechanism to event handlers and process managers (#20). This feature would allow your aggregate to return error tuples, or raise an exception, and provide an extension point in your process manager to handle errors on a case-by-case basis. But it needs to be implemented.

Does that answer your question?

@bamorim
Copy link
Contributor Author

bamorim commented Sep 20, 2017

Yes, that seems a good idea. So the point is to return these events, but in the cases where this is a CreateX command it should not create a valid aggregate, right? By returning an event, we assume it worked and that we can fold over the events to get the state of the aggregate, which actually shouldn't exist. How would you work in that situation?
Just allowing the aggregate to "exist" but actually in an "invalid" state?
Also, if domain errors are just events, how can I know that a command worked after dispatching it? Also, what is the point to returning {:error, reason}? Which kind of errors you think I should return in the tuple and witch you think I should return in events?

@slashdotdash
Copy link
Member

It's ok to have an aggregate in such a state, think of it as an unfulfilled reservation request rather than an "error state". You can use Commanded's Aggregate lifespan feature to shutdown these aggregates after an error event if you are concerned with them running indefinitely.

how can I know that a command worked after dispatching it?

You receive an :ok response from a successful command dispatch, or an {:error, reason} when it fails. However, by using failure domain events the command will successfully dispatch (returning :ok). You will need an event handler to handle these events and notify the end user as appropriate (alert email, in-app notification, use Phoenix channels to push the failure to the user's browser).

The general pattern with long running processes, such as your reservation, is to notify the user that their request was accepted, start processing the request in the background, and inform the user that they will be notified upon success/failure. One approach is to have a read model projection for the reservation that is updated from the domain events. As an example if you're building a web app, after successful submission you redirect the user to a page that polls/subscribes the user to updates of their reservation status.

what is the point to returning {:error, reason}? Which kind of errors you think I should return in the tuple and witch you think I should return in events?

As a rule of thumb, use domain events for failures of commands dispatched by a process manager. You can use {:error, reason} tuples elsewhere and to guard against bugs (e.g. attempting to reserve a flight for a date in the past).

@slashdotdash slashdotdash changed the title Handle command failures withing process managers Handle command failures within process managers Sep 22, 2017
@slashdotdash
Copy link
Member

slashdotdash commented Oct 18, 2017

@bamorim You can now handle errors in your process managers using the new feature described in #93.

Here's your example travel process manager responding to seat failure by cancelling the hotel and car reservations:

defmodule TravelProcessManager do
  use Commanded.ProcessManagers.ProcessManager,
    name: "TravelProcessManager",
    router: TravelRouter

  def error({:error, :no_available_seats}, _failed_command, _pending_commands, context) do
    {:continue, [%CancelHotel{...}, %CancelCar{...}], context}
  end
end

Now you can choose to model failures as errors (e.g. {:error, :no_available_seats}) or domain events (e.g. FlightReservationFailed).

@bamorim
Copy link
Contributor Author

bamorim commented Oct 18, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants