-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow (middleware) components to define internal actors #6538
Comments
Another issue in components-contrib that could benefit from internal actors: dapr/components-contrib#2635 |
Why not extend this to all component types and also to other parts of the runtime? For example, making the entire Resiliency feature distributed (global counts for circuits, retries) is a very lucrative feature that could benefit from this (It may be that internal actors are abstracted well enough to be used already by other parts of the runtime). |
I think that's a good idea for things like circuits (maybe less for retries since that would require sharing the entire request's state). The downside of this approach of relying on actors (and why I am limiting it to middleware components for now) is that it requires having an actor runtime set up. So, a compatible actor state store, a placement service running, etc. Perhaps there could be a separate issue for other places where the actor runtime could be leveraged more? |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
I am very interested in this issue to enable more internal actors.
I have investigated this and it seems that the work that's already been done for Workflow covers most of what's needed. The dapr/pkg/actors/internal_actor.go Lines 34 to 44 in e57a91f
The only "problem" is that to expose this to components in components-contrib, we have another issue with circular dependencies. The biggest problem in the interface above is The Lines 76 to 96 in e57a91f
Some of these methods are internal and don't need to be exposed to an internal actor, but even simplifying the interface, we have a lot of types that need to be exposed. (#6892) The most obvious solution to this problem would be to move the code to dapr/kit, but that would mean essentially moving the entire actors (or at least most of the types, such as What are your thoughts @dapr/maintainers-dapr ? |
Moving to /kit will cause governance issues. Moving to a new repository to host the actors code with the same governance as dapr/dapr is doable, but we need to be entirely convinced first about why this can't be done via interface/package refactoring in this repository. |
An external repo repo, be it kit or a new one, would also be a bit of a nightmare from a management standpoint. A package refactoring would not work due to the circular dependency. An interface would not be enough since we have all the *Request and *Response structs, but more crucially we have the I'm not sure what the right answer is. |
I am not in favor of adding a circular dependency between other building blocks and actors for a few reasons.
Regarding OAuth and rate limiting, there are ways to share state without actors and we should favor those first. I will share here in this issue some references on how Redis, Memcached and similar stack can be used. The abstraction here might be simply an optional reference to a state store that a middleware component can add. Regarding workflows, bringing logic to components contrib seems odd too. Cleaning up workflow history can perfectly fit in runtime. |
The benefits of using actors are:
I don't believe bringing this logic to components-contrib was something that was proposed. The case above was about using an internal actor for cleanups, but wasn't about using it in components-contrib. |
I agree that actors has advantages but we can solve the problems on each scenario without using actors. As I said, it is dangerous circular dependency that will cause more issues than it will solve. I am not in favor of this direction for the project. |
Looking forward hearing your suggestions. We are currently blocked by this on things like dapr/components-contrib#2967 (the component isn't really usable today in some scenarios) |
These are solved problems in the industry today and don't use actors. I am doing a simple Google search and rate limiting is a common scenario for it. It makes total sense to have a Redis-specific implementation for rate limiter: https://redis.io/commands/incr/#pattern-rate-limiter If someone finds a way to write one with SQL, that would not be as common, so I don't think we need an interface. Just let the middleware be specific per stack. I will be doing more Google searching and find more about OAuth. Again, it is a solved problem in the industry - we are not the first ones to solve this. |
Yes, that's what as a contrib maintainer I do not want, because it will lead to others making (or asking for) implementations for N more components. The burden of maintaining that will be on us. |
Well, the industry has moved towards that direction already. I doubt there will be many more. SQL is not widely used for rate limiting. I doubt you will have more than a couple of stateful rate limiters. Having a native Redis implementation benefits the entire community and is a common implementation each company has to do. It seems a perfect fit for a specific component. As a contrib maintainer, I vote to have it. (Are we tracking this specific proposal somewhere? PR? Issue?) |
Regarding, dapr/components-contrib#2967 - it seems to be solvable by using the state store interface directly, using the TTL feature to auto-delete old keys. Then, add an in-memory cache on top, respecting the ttl value (@JoshVanL added this recently) and you have the tokens saved. If you want to improve reuse, there is an opportunity to create a generic cached state store implementation that can take any state store and add a cache layer on top with a cache replace policy (LRU, FIFO, etc). With the advantage of being easily adopted with existing Dapr APIs. |
Maybe, but it's not just SQL. What about Memcached? CosmosDB? etcd? You can see where I'm going and where the community will go. Assuming we do go down that road, it also solves the rate-limiter issue, but we'll have the OAuth2 middleware too, and possibly other scenarios in the future. I really do not believe the additional overhead on maintainers and contributors is worth the effort just because leveraging an existing feature of the Dapr runtime seems odd to you.
That requires users to provide their own state store, so we need to ask the runtime to give us the initialized component. You still have the "circular dependency". PS: this issue was opened on June 18 and I really wish we had heard those concerns before. There was some conversation already. We did delay fixing dapr/components-contrib#2967 because we were waiting for this, and it is impacting users... |
There are some discussions on #6935 which is aiming to give distributed rate-limited capabilities (rather than the current per-process rate-limiting). The thinking for that is that having the rate-limit state be handle directly by Dapr would be beneficial as it simplifies the user configuration as well as removing the need to add support for multiple stores. Internal actors feel like they would be a good candidate for this scenario (assuming they were made available) |
Specifically for the rate-limiting scenario, using Redis is the fastest and most efficient option. While using internal Actors is a convenient option, it appears to have more hops to get to the Actor to get/set token bucket and that is potentially done frequently across multiple Pods. Using the built-in features of Redis for this specific use-case makes sense IMHO. |
@sujitdmello i am afraid that any implementation in a component (such as the rate limiter middleware) that is specific to a service/store (such as just redis) would not work. The design needs to leverage existing components to prevent having to re-implement support for various state stores again. actors still feel the most appropriate solution to me. They already interface with most state stores, have built-in concurrency control that is more reliable than using etags (opportunistic concurrency), and have built-in routines for caching in-memory. additionally each dapr runtime already comes with an actor state store when configured. If we had to reference components without using actors, we’d need to worry about referencing external components, initialization orders, etc. |
So in summary my understanding is:
How does that sound @artursouza @yaron2 @mukundansundar @daixiang0? I agree with the technologies you pointed our @artursouza, but I also think that this doesn't have have to be mutually exclusive at all. So I'd like us to make both happen! |
I don't think that implementing rate limiting with Actors is the right thing for the project. It creates a circular dependency that will be hard to debug in runtime. At this point, I am not in favor of this change. |
@artursouza can you please go into more detail why this is harder to debug than say debugging a Redis implementation? I'd like to especially understand where the circular dependency comes into play in making this unreasonably difficult. |
In addition, actors are not designed for high availability and can be downtime when rebalancing. Causing downtime to rate limiting. |
This is factually incorrect. Yes, actors can go down at any time. For a rate-limiter, losing the state that is stored in-memory is generally not an issue, as it would just reset the rate-limiting counter. For other situations where state matters, state can be persisted on the actor store. But an actor can always be invoked, even during rebalancing, on some host. This is a behavior we have tests (E2E and integration) for.
Given the problem that we are facing, which is impacting actual Dapr users that have been waiting for months, what would your recommended alternative be? Even using state stores (not actors) does add a circular dependency, if nothing else because of the component registry. |
Note that distributed rate limiters do not offer exact counting generally, so some inaccuracies are acceptable with all rate limiters - e.g. some events not getting counted, or getting counted too late. Of course poor latency or a complete loss of the current rate limiting bucket values would not be ok. |
I am repeating myself here. I proposed already, maybe ignored, to implement it via exposing the state store interface to middlewares. A horizontal dependency in contrib. Users have asked this for redis, which would have been implemented already if there is an actual rush. "No is temporary, yes is forever." |
Essentially what we need for a distributed rate limiter is an atomic incrementer. We can use etags to achieve that. If we can do that we'll get the following benefits:
|
When you think about this, this also has a dependency on the runtime. Yes, the component's code lives in contrib, but the registry and initialization live in the runtime. The runtime must be the one initializing components because it needs to fetch secrets from the Operator. Unless you are suggesting that each component initializes other components. This would be horizontal indeed, but then we'd need to support the metadata for those components into the "parent" components. For example, a middleware would have to add fields for all the fields supported by the state store it wants to use. This is a mess from the perspective of the user.
The most pressing issue is actually OAuth2, which today doesn't work. Either you are limited to 1 instance of your app (keeping token in-memory), or it sotres in a cookie and it can't work with Azure AD because the tokens don't fit in a cookie. |
I am personally interested in saving users compute resources and money in various clouds or even their own self hosted deployments. For this reason as I mentioned it is preferable to reuse an existing database server / container that is already deployed / provisioned. Hope that makes sense. A separate Redis deployment as a rate limiter may appeal to some users. I agree - but I personally am very turned off by this world where we make everyone deploy the whole kitchen sink for tiny features from the respective services. |
Yes. For that reason, we can expose the state store API / which is a much simpler solution and easier to debug and operate than actors. Which is my proposal to solve this problem. |
If we can achieve rate limiting like this (#6538 (comment)) it will further reduce consumption of resources as both actor hosts and placement are known to consume memory as scale increases. In addition you could use rate limiters and skip installing placement altogether |
Please, let's not get carried away too much with rate-limiting. The problem is more generic. I mentioned OAuth2 is the biggest issue right now. I have also explained above why I don't think state stores are the solution. something needs to initialize the component, so it's either a circular dependency or a hell of a messy job: #6538 (comment) Additionally, actors can keep data in-memory while they're running, so we avoid I/O with the state store on every call |
Let's say we can find a solution using state stores (like etags and such), wouldn't we still need a reference from runtime with an initialized state store? I think there is still a circular dependency no matter what. |
The middleware (in contrib) would need to accept a state.StateStore interface that also lives in contrib. |
But the state store object (which complies with that interface) is instantiated by the runtime. So you have your circular dependency there too? My understanding is that @artursouza doesn't want any circular dependency, so we don't really have options. He defined circular dependencies at runtime too:
Then you'd need to figure a way in the metadata of the middleware to know which state store you're referencing. Using actors:
I don't see much differences there, except that with actors we can keep data in-memory and avoid (expensive) I/O on every call... That said, using state stores would still be acceptable in the case you're describing above (although I like them much less because that always requires I/O and it's a lot slower for things like OAuth tokens). But this doesn't allow getting away of the "circular dependency" Artur is concerned with. |
I get what you are saying. Let me see if I can write this up so everybody understands :) |
That is a much reduced circular dependency and we can eventually refactor that into contrib 100%. Having OAuth or rate limiter to depend on actors is a much bigger pill to swallow, with ops concerns. We can implement a cache layer too and avoid read for each request on oauth. Also, rate limiter can be eventual consistent, which allows caching there too. A combination of token bucket with async reads and writes for state store can be really fast with a quick time to market for users. |
In runtime during the component initialization we somehow need to get a reference to the state store to be used. The name can be obtained from the Middleware metadata. Then we either init that state store, or load the existing reference of it and pass it into the constructor for the middleware. dapr/cmd/daprd/components/middleware_http_oauth2.go Lines 28 to 34 in 770a047
And in components-contrib we'd update the interface to take the state store reference: This would become // Middleware is the interface for a middleware.
type Middleware interface {
GetHandler(ctx context.Context, statestore state.Store, metadata Metadata) (func(next http.Handler) http.Handler, error)
} For a strong reference to the state store name to be used we should add this to the middleware metadata. type Metadata struct {
metadata.Base `json:",inline"`
Statestore string `json:"statestore"`
} Does that sound right? That would not change anything about imports in either contrib or runtime. |
The circular dependency is a secondary win/concern. In addition to what I described above, we should consider a lot of users use Redis and other NoSQL DBs for rate limiting/session tokens because of their performance. As we know, Redis as an actor state store is deprecated and not recommended, which will further limit users of using a state store that's fit for purpose for this scenario. So for me, the main benefits are:
|
Another possibility this enables (not saying we have to go down that route): I'm a bit worried about the overhead of the Dapr envelope for all of these operations. |
For sure, we can create granular interfaces for internal use and as these grow to critical mass we can even look at these as candidates for state store API expansion |
@berndverst your design has some valid thoughts, although:
However, I still think this is a very much sub-par idea than using actors. This requires re-implementing caching, worrying about when the cache needs to be consistent or not, does more I/O with storage, and OCC with etags means that operations more often than not need to be retried (see what's going on with reminders that fail after a certain load due to too many etag conflicts). This is gonna hurt us badly down the road like it's hurting us with reminders, and in the best case scenario requests to state store will be retried many times (adding lots of latency). Why re-implementing the wheel when we already have it done, and have it done well (and is getting better)?
As mentioned before many times, the choice of actor state store is really irrelevant for rate-limiting, as with actors the state would be kept in-memory and there'd be no operation on the state store at all... Heck, in this specific example (rate-limiting), assuming you don't use actors for anything else, you could even initialize the actor runtime with the in-memory component as actor state store, as it doesn't matter at all. Actors aren't just for storage but they have business logic too. In the case of rate-limiting, we wouldn't just store the state in the actor: we'd send the caller IP and ask the actor to tell us if the call is authorized. This allows performing the rate-limiting with 1 network call only instead of 2 (read and set). So this should offer half the latency for rate limiting (assuming calls to persistent store have the same latency as calls to actors within the cluster) This also means the same actor can be re-used, for example for a rate-limiter binding too! Lastly, while actors a bit more complex, if we use the actors APIs directly (no SDK) it's not too bad. And if we are scared of using actors ourselves, what message are sending to our users?? |
Actors with their single threaded nature can become a pretty big bottleneck even when saving state in-memory in terms of the waited I/O calls. However, this doesn't concern me so much for the same reason i'm not concerned too much about the I/O operations to the database - both incrementing and reading can be made asynchronously and outside the path of the main call, as we've concluded that counting can both exceed the threshold and be updated in an eventually consistent way without real time coordination between all instances. More so, if there's an I/O bottleneck waiting for entry into an actor,that might create a noisy neighber problem for other functionality of the Dapr sidecar hosting that actor instance. The only selling point for me for using actors is the in-memory serving, so now its a question of tradeoffs - does this feature justify all the disadvantages I mentioned above, also considering we can with little work still implement in-memory serving at the middleware component level. Re: message to users, to me its more about sending the message that we do our best to avoid The Law of the Instrument and use the right tool for the right job, and not simply because a certain tool can also do that job. |
Your suggestion of using etags is pretty much limited to single-threaded nature too. If you need to read from the state store, get the etag, and then save with the etag, you're still limited to one concurrent transactions. If there were more than one concurrent transaction, you'd have an etag conflicts, which means the read-write loop needs to be repeated until it succeeds - like we do with reminders (and the performance of reminders is just abysmal).
Surely there's a trade-off here with how "eventual" the consistency is. With limit of X rps and N replicas, you may have bursts where you're allowing N * X requests, which may or may not be acceptable. Ideally this would have strong consistency, with the etag conflict issues.
I'm not following this. Isn't the queue per-actor?
I'm actually convinced actors are the right tool for this job. I have explained why multiple times, and it's the most generic and universal solution for the problems we have today and in the future. Bernd almost convinced me that state stores were a viable options until I realized how they have issues with etag conflicts, how they require re-implementing caches everywhere, and how they involve making trade-off between performance and consistency. Most critically, I really worry using OCC (etags) we will box ourselves with having to deal with the same issues we have with reminders today. |
Handling eTags is common when implementing distributed systems. The problem with reminders is that we are trying to persist a list on a limited set of keys, not because we are handling eTags. I think we have plenty of data to reach a decision among maintainers of runtime. We don't need to have unanimous agreement. People have different views on architecture principles and that is OK. I vote (binding vote) for using state store interface to implement OAuth and rate limiter middlewares. |
I definitely agree @artursouza that the isue with reminders is not etag but with the underlying data structure we chose. I would like a little more time to examine this issue.
Yes, but the main difference is that a Dapr sidecar is not blocking connections due to the single threaded execution as that happens in the DB.
It is, but an actor instance is always a representation of a resource (for example, a URL path for a given app ID). If the rate limiting representation of actors means there will be concurrent hits on the same resource (actor ID) this will lead to somewhat of an undesirable way of using actors. |
That is true, but it's also true that etags are a big part of the problem too. I have been working on actors a lot lately and I can confirm this is the case. For example, the reason why the Workflow perf test fails 100% of times with 3 replicas of the app (and frequently with 2 replicas too) is because of etag conflicts while storing reminders. Essentially, 2 instances try to store reminders at the same time and they are stuck in a loop where they fetch data with one etag, and by the time they go write it, the etag has changed and the write fails. This is the typical problem with "distributed systems" that rely on OCC. The solution is to either fail (causing errors to be cascaded) or retry, possibly for an infinite amount of time, which increases latency significantly.
There can be workarounds to this. I have some ideas (that will be on a separate proposal) such as being able to run an actor concurrently if we know that it doesn't need single-threaded execution (like in the case of tokens - but it will need to be single-threaded for rate-limiting)
This impacts components-contrib too, probably more than runtime (for the benefits), and contrib maintainers should be involved in the decision as well. |
Note this is about extending the capabilities of existing components, not creating a new component class.
Internal actors have been added in Dapr 1.10 to enable workflows. They allow the runtime itself to act as actor host for some actor types.
It would be helpful to extend the ability to register internal actor types to components, so that components can host actors when needed to perform certain tasks.
The most immediate need for this is to enable the Rate Limiter middleware to perform rate-limiting across multiple instances of the same service. Right now, the state is kept in-memory, so if the service is scaled horizontally, each instance has to manage its own rate-limiting.
From an implementation standpoint, this could be enabled by exposing the method
actors.RegisterInternalActor
to components, for example adding it to the context that’s passed to the components’Init
methodThe text was updated successfully, but these errors were encountered: