Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent writers for Azure Blob Storage #110

Open
sgryt opened this issue Mar 23, 2016 · 5 comments
Open

Concurrent writers for Azure Blob Storage #110

sgryt opened this issue Mar 23, 2016 · 5 comments

Comments

@sgryt
Copy link
Contributor

sgryt commented Mar 23, 2016

First off, let me start by noting that, yes, this suggestion is seemingly at odds with the lock-free design goal, but please bear with me a bit before discarding the idea entirely.

The thing is that, for some use cases, the Single Writer limitation hinders the scalability of the design: The writer process may not be able to keep up with the rate of updates in the system.

One such case is a not-entirely-fictional multi-tenant infrastructure SaaS system operated by Grean. We have a situation with a high enough frequency of temporally concurrent updates for the writer to keep up with, but at the same time, the per-event-stream contention frequency is quite low (but not ignorable).

This has the interesting side-effect of making the Single Writer behave a lot like a global mutex in the system, which leads me back to the opening comment of this issue: For this use case, the system as-is does not really exhibit lock-free behaviour anyway.

Also, after having this system in production for about 6 months, we must conclude that operating a Single Writer process in Azure is not entirely predictable.
Following the same principles as described here, we have found that it is not entirely uncommon for the Azure Scheduler to either skip a ping entirely, or ping too early. That leads to periods of time where the Single Writer process is down, and commands pile up in the queue accordingly - making it even harder for the writer to keep up afterwards.

So, we are currently testing a prototype version of a concurrent writers implementation with a CQRS-style service. The implementation relies on ETag's (for consistent reads during processing of individual commands), on short-lived blob leases (15 seconds) for locking and on a 'self-healing' command processing implementation (so commands can be safely retried, even in the event of partially succeding updates to the underlying storage).

If the real-world tests deem it fit for production, we are considering contributing this feature to AtomEventStore, as a separate add-on along the lines of the AzureBlob add-on. It would, however, be an FSharp implementation - which hopefully wouldn't make anyone frown upon it.

I'm submitting this issue now in the hope of getting feedback from other users of AtomEventStore on such a potential feature - particularly for critical review of the concept, but also to hear from others who may find it useful.

@ploeh
Copy link
Contributor

ploeh commented Mar 27, 2016

Are you using a single writer for all writes?

If so, I could imagine that scalability is impacted in a multi-tenant system.

Could you instead set up a writer per tenant? I'd imagine that each tenant's event store is independent on other tenants' event stores, so that shouldn't effect consistency.

Furthermore, even within a single tenant, if you have Aggregate Roots (in the DDD sense), these might be independent as well. If they are, couldn't you further assign a single writer per Aggregate Root?

At that point, you'd basically have yourself a nice, poor man's implementation of the Actor pattern.

@sgryt
Copy link
Contributor Author

sgryt commented Mar 27, 2016

Thanks for taking the time to pose these questions - it's much appreciated :)

Let me try to elaborate a bit:

Yes, ATM there is only one writer. We have considered ways of sharding the data wrt writers, but it's not obvious if we can do that.

The tenants stores are not isolated - the core domain revolves around B2B collaboration, and being a tenant only means that the organization in question has more features available than non-tenant organizations (and some billing differences as well). In turn, that makes the writes (potentially) touch data across tenant boundaries. Essentially, tenants orgs can grant and revoke access to their applications to other organizations - including other tenants. It would be quite a lot more tricky to model this is we had to impose single-writer restrictions in the domain model.

Having multiple single-writers sounds appealing, though. But it also sounds like an awful lot of complexity at the deployment layer - I must admit that I'm not even sure if I can think of a way to make this dynamic (when we have a new tenant to onboard, or remove an existing one) - given how Azure WebJobs are deployed.
But perhaps you have some insights on that part which would make this easy to integrate with GitHub deployed to Azure?

-----Oprindelig meddelelse-----
Fra: "Mark Seemann" notifications@github.com
Sendt: ‎27-‎03-‎2016 13:36
Til: "GreanTech/AtomEventStore" AtomEventStore@noreply.github.com
Cc: "sgryt" christensen.mikkel@gmail.com
Emne: Re: [AtomEventStore] Concurrent writers for Azure Blob Storage (#110)

Are you using a single writer for all writes?
If so, I could imagine that scalability is impacted in a multi-tenant system.
Could you instead set up a writer per tenant? I'd imagine that each tenant's event store is independent on other tenants' event stores, so that shouldn't effect consistency.
Furthermore, even within a single tenant, if you have Aggregate Roots (in the DDD sense), these might be independent as well. If they are, couldn't you further assign a single writer per Aggregate Root?
At that point, you'd basically have yourself a nice, poor man's implementation of the Actor pattern.

You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@ploeh
Copy link
Contributor

ploeh commented Mar 27, 2016

Not knowing your domain, I'm not sure how much value I can provide...

It's clear that having a universally single writer doesn't scale. Once you have enough write operations to have the single writer being active 100 % of the time, it can't scale beyond that.

As far as I can tell, the only way to address that issue is to have multiple concurrent writers.

Does your suggestion with optimistic locking address that problem differently? If you only have a single resource (a single event stream), how does that address the issue? Wouldn't you only have more failed write attempts, leading to worse throughput?

True, optimistic locking can work when writes are spread over multiple resources, but if you have multiple resources, couldn't you (again) have a single writer associated with each resource?

@sgryt
Copy link
Contributor Author

sgryt commented Mar 28, 2016

A thinking mind partaking in a discussion is almost always providing value, IME - and it certainly is the case here :)

I know I've been a bit short on the context...I thought that too many specifics here might just confuse the picture (trying to avoid dragging readers through too many internals). I hope that's correct, but feel free to ask for further details, and I'll try to present the relevant executive extract.

Regarding the number of resources: We have an event stream per aggregate, and the expected probability of concurrently executing updates on any given event stream is quite low (but non-zero). That is the primary reason we believe we can increase throughput substantially by having concurrent writers. But as you say, the same (and perhaps even better) throughput could be achieved if we could uniquely partition the resources to different writers. I'm just at a loss wrt how to actually go about implementing that, without some sort of intricate coordination mechanism between the writers (which, gut-feeling-wise, sounds like just another global locking problem to solve)?

In any case, sticking to single writer(s) - sharded or not - will not likely address the problems we are experiencing with the Azure Scheduler flakiness, or the less-than-simple deployment procedure for setting up a scheduled web job. It would be great to get the simplicity of always-on web jobs (just a git push away from source code to running jobs), along with the (potentially) increased throughput. Even if it requires a bit more infrastructure-related code to handle the locking. So I guess I'm trying to ask if you can spot any underlying flaw in the reasoning that would make AtomEventStore entirely unfit for a concurrent writers scenario?

About locking schemes: Perhaps I'm rusty on the nomenclature, but I thought that I had described something along the lines of a pessimistic locking scheme? The "phases" of the command processing we are using (probably fairly standard, modulo some retry policies and lease-renewals):

  • Validate the command, typically be reading 1 -> N resources and checking that the command is valid for the given state of the system. For each read resource (on which the validation most likely relies, as it was read as part of the validation phase), remember the ETag
  • If validation succeeds, calculate which resources (event streams) to update
  • Take a (short-lived) lease on the blobs for each event stream. If any of the to-update resources where read during the validation phase (again, thus most likely partaking in the validation decision), make sure the ETag has not changed - otherwise, release any leases taken already and retry the command entirely
  • Update the event streams (save new events to storage)
  • Release the leases again

I should note that the short-lived lease period (15 seconds is the current minimum allowed by Azure) should make the probability of a deadlock very low - as Azure guarantees lock expiry. That may fail, of course, but we are willing to take the risk on that part.

Perhaps this is not exactly a classic pessimistic locking scheme as we are not locking at read-time, but it just seems more like pessimistic locking that the optimistic cousin?

@ploeh
Copy link
Contributor

ploeh commented Apr 12, 2016

There's a couple of different issues here, as far as I can tell.

Constancy

I'm not surprised to hear that when you run a scheduled web job, it'd occasionally miss a beat. In my opinion, it's a good thing if an application can handle that messages sometimes pile up, but I can see that it'd be annoying if it happens too often.

One thing I should point out is that when I originally described how to fake a continuously Polling Consumer with scheduled tasks, I described the most cost-effective solution. This particular configuration is useful because you can run scheduled web jobs even on the free tier.

If you're running a web application in a paid tier, on the other hand, you now have access to continuous web jobs. By default, continuous web jobs run in parallel, but apparently, you can also configure them to run as singletons. I haven't tried this myself, but thought it worthwhile to point out, as it may address the constancy issue.

Capacity

Even if you can solve the constancy problem, you may still have so many writes that a a single writer can't keep up.

How to partition a system into concurrent, but non-overlapping writers depend on various factors, but this quickly seems to point in the direction of supervisor trees. It might be worthwhile, then, to look into Akka.NET, but again I can't claim experience with it, although I hear good things about it.

Additionally, if you don't mind the vendor lock-in, you could also look into that other Actor-based offering from Azure: Azure Service Fabric. Usual caveat: I don't really know what I'm talking about here.

Optimistic locking

My problem with any optimistic locking scheme is that I can't think of a way to implement it without round-tripping some sort of tag (timestamp, ETag, sequential ID) in the entire operation. When reads are separated from writes, you'll need a tag associated with your reads, and then when you write, you'll pass that tag to the persistent resource, so that concurrency conflicts can be identified.

This means that business logic is infected with persistence concerns.

As I write this, I wonder if this particular concern can be solved with the Reader monad, passing the tag around as the read-only state. I'll need to think more about this, but did you have a solution to this in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants