Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlSnapshotStore with autoinitialization stops if DB is temporrarily inaccessible #3870

Open
balcko opened this issue Jul 29, 2019 · 5 comments

Comments

@balcko
Copy link

commented Jul 29, 2019

Hi,
we are using Akka.Net version 1.3.12 on production with clustering and SqlServer persistence.
Application is hosted as a service on Windows server with target framework net461.

Even after the fix of akkadotnet/Akka.Persistence.SqlServer#104 we still encountered an issue, that persistent actors could not start after the planned DB maintenance end. The only solution for us was to restart actor systems in whole cluster via pbm.
In logs we have found following errors related to SnapshotStore:

  • Circuit Breaker is open; calls are failing fast
  • Error during snapshot store initialization

I have visually debugged the SqlServerSnapshotStore code and found two issues there, which probably caused the actor to stop:

  1. _breaker.WithCircuitBreaker(() => DeleteAsync(saveSnapshotFailure.Metadata));

If the DB becomes unavailable and saving snapshot fails so many times that circuit breaker opens, this line of code will immediately throw even without awaiting the task. This causes the actor to restart. Issue 2 happens afterwards

  1. If autoinitiliaze setting is on (our case), after the actor restart, parent SqlSnapshotStore starts initialization. Since DB is still unavailable, it fails and as a result actor is permanently stopped on the line

Now the whole actor system is screwed, since all new persistent actors can not start.
Fix could be following:

  1. exception should definitely not restart the actor (maybe just log exception and swallow it). Another possible approach would be to skip the whole "rollback" delete after the failed save, since it will anyway probably fail and missed snapshot save should not matter, since snapshotting should be used only for optimizing replays and in this case if event has been successfully persisted before, then missed snapshot save won't affect event sourcing.

  2. Actor should not stop if autoinitialize fails, the error should be logged and autoinitialization should be retried after some delay.

@ondrejpialek

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

This is a duplicate of #3665, but you may close the original ticket as there is more info here (or I can copy over the details from the other).

@Aaronontheweb

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Thanks! I'll be looking into this.

@Aaronontheweb

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

@ondrejpialek @balcko so the issue with the CircuitBreaker described in solution 1 was already addressed and fixed in Akka.NET v1.3.13: #3754 - I'd strongly recommend upgrading to 1.3.14 to get the latest fixes there.

I can work on the back-off issue with auto-initialize = on as that fix has merit on its own, but an immediate fix is available.

@ondrejpialek

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

We turned off auto-initialize on staging and production so no longer affected by that problem (which made the issue more prevalent I guess?).

We did not encounter these issues since turning that off, but seeing the CircuitBreaker is still susceptible we will update soon :) Thanks!

@balcko

This comment has been minimized.

Copy link
Author

commented Aug 2, 2019

Great, thanks, we will update to the latest version.
Regarding the auto-initialization, it is not that critical, but would be nice to have it for the future,thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.