Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow (middleware) components to define internal actors #6538

Open
ItalyPaleAle opened this issue Jun 18, 2023 · 49 comments
Open

Allow (middleware) components to define internal actors #6538

ItalyPaleAle opened this issue Jun 18, 2023 · 49 comments
Labels
Milestone

Comments

@ItalyPaleAle
Copy link
Contributor

ItalyPaleAle commented Jun 18, 2023

Note this is about extending the capabilities of existing components, not creating a new component class.

Internal actors have been added in Dapr 1.10 to enable workflows. They allow the runtime itself to act as actor host for some actor types.

It would be helpful to extend the ability to register internal actor types to components, so that components can host actors when needed to perform certain tasks.

The most immediate need for this is to enable the Rate Limiter middleware to perform rate-limiting across multiple instances of the same service. Right now, the state is kept in-memory, so if the service is scaled horizontally, each instance has to manage its own rate-limiting.

From an implementation standpoint, this could be enabled by exposing the method actors.RegisterInternalActor to components, for example adding it to the context that’s passed to the components’ Init method

EDIT: Updated the proposal to indicate this is meant for middleware components only. At least at this stage, it may not make much sense to allow this for other kinds of components.

@ItalyPaleAle
Copy link
Contributor Author

Another issue in components-contrib that could benefit from internal actors: dapr/components-contrib#2635

@ItalyPaleAle ItalyPaleAle changed the title Allow components to define internal actors Allow (middleware) components to define internal actors Jul 3, 2023
@yaron2
Copy link
Member

yaron2 commented Jul 5, 2023

Why not extend this to all component types and also to other parts of the runtime? For example, making the entire Resiliency feature distributed (global counts for circuits, retries) is a very lucrative feature that could benefit from this (It may be that internal actors are abstracted well enough to be used already by other parts of the runtime).

@ItalyPaleAle
Copy link
Contributor Author

Why not extend this to all component types and also to other parts of the runtime? For example, making the entire Resiliency feature distributed (global counts for circuits, retries) is a very lucrative feature that could benefit from this (It may be that internal actors are abstracted well enough to be used already by other parts of the runtime).

I think that's a good idea for things like circuits (maybe less for retries since that would require sharing the entire request's state).

The downside of this approach of relying on actors (and why I am limiting it to middleware components for now) is that it requires having an actor runtime set up. So, a compatible actor state store, a placement service running, etc.

Perhaps there could be a separate issue for other places where the actor runtime could be leveraged more?

@dapr-bot
Copy link
Collaborator

dapr-bot commented Sep 3, 2023

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Sep 3, 2023
@ItalyPaleAle ItalyPaleAle added this to the v1.13 milestone Sep 5, 2023
@ItalyPaleAle ItalyPaleAle removed the stale Issues and PRs without response label Sep 5, 2023
@ItalyPaleAle
Copy link
Contributor Author

ItalyPaleAle commented Sep 6, 2023

I am very interested in this issue to enable more internal actors.

  • Components, especially middleware, can greatly benefit from them
    • Examples include rate limiting, OAuth2
  • In the runtime, we can use them for other background tasks too, so they are coordinated even when multiple apps are running. Some examples could include:

I have investigated this and it seems that the work that's already been done for Workflow covers most of what's needed.

The InternalActor interface is all that a struct needs to implement:

// InternalActor represents the interface for invoking an "internal" actor (one which is built into daprd directly).
type InternalActor interface {
SetActorRuntime(actorsRuntime Actors)
InvokeMethod(ctx context.Context, actorID string, methodName string, data []byte) (any, error)
DeactivateActor(ctx context.Context, actorID string) error
// InvokeReminder invokes reminder logic for an internal actor.
// Note that the DecodeInternalActorReminderData function should be used to decode the [data] parameter.
InvokeReminder(ctx context.Context, actorID string, reminderName string, data []byte, dueTime string, period string) error
InvokeTimer(ctx context.Context, actorID string, timerName string, params []byte) error
}

The only "problem" is that to expose this to components in components-contrib, we have another issue with circular dependencies.

The biggest problem in the interface above is SetActorRuntime(actorsRuntime Actors) which involves setting the entire actor runtime's interface, so the internal actor can perform operations.

The Actors interface contains links to a lot of objects that are defined in dapr/dapr:

// Actors allow calling into virtual actors as well as actor state management.
//
//nolint:interfacebloat
type Actors interface {
io.Closer
Call(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error)
Init(context.Context) error
GetState(ctx context.Context, req *GetStateRequest) (*StateResponse, error)
GetBulkState(ctx context.Context, req *GetBulkStateRequest) (BulkStateResponse, error)
TransactionalStateOperation(ctx context.Context, req *TransactionalRequest) error
GetReminder(ctx context.Context, req *GetReminderRequest) (*internal.Reminder, error)
CreateReminder(ctx context.Context, req *CreateReminderRequest) error
DeleteReminder(ctx context.Context, req *DeleteReminderRequest) error
RenameReminder(ctx context.Context, req *RenameReminderRequest) error
CreateTimer(ctx context.Context, req *CreateTimerRequest) error
DeleteTimer(ctx context.Context, req *DeleteTimerRequest) error
IsActorHosted(ctx context.Context, req *ActorHostedRequest) bool
GetActiveActorsCount(ctx context.Context) []*runtimev1pb.ActiveActorsCount
RegisterInternalActor(ctx context.Context, actorType string, actor InternalActor) error
}

Some of these methods are internal and don't need to be exposed to an internal actor, but even simplifying the interface, we have a lot of types that need to be exposed. (#6892)

The most obvious solution to this problem would be to move the code to dapr/kit, but that would mean essentially moving the entire actors (or at least most of the types, such as CreateReminderRequest etc) runtime to dapr/kit, with possible maintenance issues in the future. A big issue is also that Call requires the invokev1 package.

What are your thoughts @dapr/maintainers-dapr ?

@yaron2
Copy link
Member

yaron2 commented Sep 7, 2023

Moving to /kit will cause governance issues. Moving to a new repository to host the actors code with the same governance as dapr/dapr is doable, but we need to be entirely convinced first about why this can't be done via interface/package refactoring in this repository.

@ItalyPaleAle
Copy link
Contributor Author

An external repo repo, be it kit or a new one, would also be a bit of a nightmare from a management standpoint.

A package refactoring would not work due to the circular dependency.

An interface would not be enough since we have all the *Request and *Response structs, but more crucially we have the invokev1.InvokeMethodRequest and invokev1.InvokeMethodResponse types.

I'm not sure what the right answer is.

@artursouza
Copy link
Member

I am not in favor of adding a circular dependency between other building blocks and actors for a few reasons.

  1. Actors is a special building block that is built on top of others, at the runtime. The interface might solve a circular dependency at compilation time but it still creates a circular dependency at runtime - which is way worse. Imagine trying to apply a middleware to an actors API, now it becomes a circular dependency.

Regarding OAuth and rate limiting, there are ways to share state without actors and we should favor those first. I will share here in this issue some references on how Redis, Memcached and similar stack can be used. The abstraction here might be simply an optional reference to a state store that a middleware component can add.

Regarding workflows, bringing logic to components contrib seems odd too. Cleaning up workflow history can perfectly fit in runtime.

@ItalyPaleAle
Copy link
Contributor Author

The benefits of using actors are:

  • It includes support for storing state without having to explicitly add another component. For example, if we were to allow any state store, we'd need users to create a new component specifically for that, and then a way to reference that.
  • It allows keeping state in-memory, which avoids the need to make requests to persistent storage every time (which are slower). Of course, this state can be lost at any time, but in some cases that is acceptable.
  • It offers the ability to use reminders to perform garbage collection, both in the state store and in external services as needed.
  • It offers concurrency control without the need to mess with etags
  • It's a one-stop solution that would fix a lot of problems we have today and in the future, and doesn't require one-off implementations that will need to be maintained by contrib maintainers in the future
  • It allows us to dogfood actors more, which is probably something we need to do

Regarding workflows, bringing logic to components contrib seems odd too. Cleaning up workflow history can perfectly fit in runtime.

I don't believe bringing this logic to components-contrib was something that was proposed. The case above was about using an internal actor for cleanups, but wasn't about using it in components-contrib.

@artursouza
Copy link
Member

The benefits of using actors are:

  • It includes support for storing state without having to explicitly add another component. For example, if we were to allow any state store, we'd need users to create a new component specifically for that, and then a way to reference that.
  • It allows keeping state in-memory, which avoids the need to make requests to persistent storage every time (which are slower). Of course, this state can be lost at any time, but in some cases that is acceptable.
  • It offers the ability to use reminders to perform garbage collection, both in the state store and in external services as needed.
  • It offers concurrency control without the need to mess with etags
  • It's a one-stop solution that would fix a lot of problems we have today and in the future, and doesn't require one-off implementations that will need to be maintained by contrib maintainers in the future
  • It allows us to dogfood actors more, which is probably something we need to do

Regarding workflows, bringing logic to components contrib seems odd too. Cleaning up workflow history can perfectly fit in runtime.

I don't believe bringing this logic to components-contrib was something that was proposed. The case above was about using an internal actor for cleanups, but wasn't about using it in components-contrib.

I agree that actors has advantages but we can solve the problems on each scenario without using actors. As I said, it is dangerous circular dependency that will cause more issues than it will solve. I am not in favor of this direction for the project.

@ItalyPaleAle
Copy link
Contributor Author

Looking forward hearing your suggestions. We are currently blocked by this on things like dapr/components-contrib#2967 (the component isn't really usable today in some scenarios)

@artursouza
Copy link
Member

artursouza commented Oct 23, 2023

These are solved problems in the industry today and don't use actors. I am doing a simple Google search and rate limiting is a common scenario for it. It makes total sense to have a Redis-specific implementation for rate limiter: https://redis.io/commands/incr/#pattern-rate-limiter

If someone finds a way to write one with SQL, that would not be as common, so I don't think we need an interface. Just let the middleware be specific per stack.

I will be doing more Google searching and find more about OAuth. Again, it is a solved problem in the industry - we are not the first ones to solve this.

@ItalyPaleAle
Copy link
Contributor Author

It makes total sense to have a Redis-specific implementation for rate limiter

Yes, that's what as a contrib maintainer I do not want, because it will lead to others making (or asking for) implementations for N more components. The burden of maintaining that will be on us.

@artursouza
Copy link
Member

It makes total sense to have a Redis-specific implementation for rate limiter

Yes, that's what as a contrib maintainer I do not want, because it will lead to others making (or asking for) implementations for N more components. The burden of maintaining that will be on us.

Well, the industry has moved towards that direction already. I doubt there will be many more. SQL is not widely used for rate limiting. I doubt you will have more than a couple of stateful rate limiters. Having a native Redis implementation benefits the entire community and is a common implementation each company has to do. It seems a perfect fit for a specific component. As a contrib maintainer, I vote to have it. (Are we tracking this specific proposal somewhere? PR? Issue?)

@artursouza
Copy link
Member

Regarding, dapr/components-contrib#2967 - it seems to be solvable by using the state store interface directly, using the TTL feature to auto-delete old keys. Then, add an in-memory cache on top, respecting the ttl value (@JoshVanL added this recently) and you have the tokens saved.

If you want to improve reuse, there is an opportunity to create a generic cached state store implementation that can take any state store and add a cache layer on top with a cache replace policy (LRU, FIFO, etc). With the advantage of being easily adopted with existing Dapr APIs.

@ItalyPaleAle
Copy link
Contributor Author

ItalyPaleAle commented Oct 23, 2023

SQL is not widely used for rate limiting. I doubt you will have more than a couple of stateful rate limiters.

Maybe, but it's not just SQL. What about Memcached? CosmosDB? etcd? You can see where I'm going and where the community will go.

Assuming we do go down that road, it also solves the rate-limiter issue, but we'll have the OAuth2 middleware too, and possibly other scenarios in the future.

I really do not believe the additional overhead on maintainers and contributors is worth the effort just because leveraging an existing feature of the Dapr runtime seems odd to you.

Regarding, dapr/components-contrib#2967 - it seems to be solvable by using the state store interface directly, using the TTL feature to auto-delete old keys

That requires users to provide their own state store, so we need to ask the runtime to give us the initialized component. You still have the "circular dependency".

PS: this issue was opened on June 18 and I really wish we had heard those concerns before. There was some conversation already. We did delay fixing dapr/components-contrib#2967 because we were waiting for this, and it is impacting users...

@stuartleeks
Copy link
Contributor

There are some discussions on #6935 which is aiming to give distributed rate-limited capabilities (rather than the current per-process rate-limiting).

The thinking for that is that having the rate-limit state be handle directly by Dapr would be beneficial as it simplifies the user configuration as well as removing the need to add support for multiple stores.

Internal actors feel like they would be a good candidate for this scenario (assuming they were made available)

@sujitdmello
Copy link

Specifically for the rate-limiting scenario, using Redis is the fastest and most efficient option. While using internal Actors is a convenient option, it appears to have more hops to get to the Actor to get/set token bucket and that is potentially done frequently across multiple Pods. Using the built-in features of Redis for this specific use-case makes sense IMHO.

@ItalyPaleAle
Copy link
Contributor Author

@sujitdmello i am afraid that any implementation in a component (such as the rate limiter middleware) that is specific to a service/store (such as just redis) would not work. The design needs to leverage existing components to prevent having to re-implement support for various state stores again.

actors still feel the most appropriate solution to me. They already interface with most state stores, have built-in concurrency control that is more reliable than using etags (opportunistic concurrency), and have built-in routines for caching in-memory.

additionally each dapr runtime already comes with an actor state store when configured. If we had to reference components without using actors, we’d need to worry about referencing external components, initialization orders, etc.

@berndverst
Copy link
Member

berndverst commented Nov 9, 2023

So in summary my understanding is:

  • The component interfaces could be updated such that they receive an optional reference to the actor runtime (an easy change).
  • In runtime you can pass nil for this reference, and only pass in the actor runtime for the components that really require it (e.g. generic rate limiter component). This is how runtime maintainers can be certain that components do not arbitrarily start relying on actors.
  • Independently, there can still be native Redis rate limiter and other component implementations. None of this work is preventing any of that, and I will support such efforts.

How does that sound @artursouza @yaron2 @mukundansundar @daixiang0?

I agree with the technologies you pointed our @artursouza, but I also think that this doesn't have have to be mutually exclusive at all. So I'd like us to make both happen!

@artursouza
Copy link
Member

I don't think that implementing rate limiting with Actors is the right thing for the project. It creates a circular dependency that will be hard to debug in runtime. At this point, I am not in favor of this change.

@berndverst
Copy link
Member

I don't think that implementing rate limiting with Actors is the right thing for the project. It creates a circular dependency that will be hard to debug in runtime. At this point, I am not in favor of this change.

@artursouza can you please go into more detail why this is harder to debug than say debugging a Redis implementation?

I'd like to especially understand where the circular dependency comes into play in making this unreasonably difficult.

@artursouza
Copy link
Member

In addition, actors are not designed for high availability and can be downtime when rebalancing. Causing downtime to rate limiting.

@ItalyPaleAle
Copy link
Contributor Author

In addition, actors are not designed for high availability and can be downtime when rebalancing. Causing downtime to rate limiting.

This is factually incorrect.

Yes, actors can go down at any time. For a rate-limiter, losing the state that is stored in-memory is generally not an issue, as it would just reset the rate-limiting counter. For other situations where state matters, state can be persisted on the actor store.

But an actor can always be invoked, even during rebalancing, on some host. This is a behavior we have tests (E2E and integration) for.

At this point, I am not in favor of this change.

Given the problem that we are facing, which is impacting actual Dapr users that have been waiting for months, what would your recommended alternative be? Even using state stores (not actors) does add a circular dependency, if nothing else because of the component registry.

@berndverst
Copy link
Member

In addition, actors are not designed for high availability and can be downtime when rebalancing. Causing downtime to rate limiting.

Note that distributed rate limiters do not offer exact counting generally, so some inaccuracies are acceptable with all rate limiters - e.g. some events not getting counted, or getting counted too late. Of course poor latency or a complete loss of the current rate limiting bucket values would not be ok.

@artursouza
Copy link
Member

I am repeating myself here. I proposed already, maybe ignored, to implement it via exposing the state store interface to middlewares. A horizontal dependency in contrib.

Users have asked this for redis, which would have been implemented already if there is an actual rush.

"No is temporary, yes is forever."

@yaron2
Copy link
Member

yaron2 commented Nov 9, 2023

Essentially what we need for a distributed rate limiter is an atomic incrementer. We can use etags to achieve that. If we can do that we'll get the following benefits:

  1. Contrib depends on contrib only and we avoid a circular dependency
  2. Users have more options for state stores
  3. User experience is somewhat better as users do not have to wonder why they need an actor state store for rate limiters when not using actors
  4. Debugging would be easier, as etags don't carry the same amount of issues that actors might bring

@ItalyPaleAle
Copy link
Contributor Author

I proposed already, maybe ignored, to implement it via exposing the state store interface to middlewares. A horizontal dependency in contrib.

When you think about this, this also has a dependency on the runtime.

Yes, the component's code lives in contrib, but the registry and initialization live in the runtime. The runtime must be the one initializing components because it needs to fetch secrets from the Operator.

Unless you are suggesting that each component initializes other components. This would be horizontal indeed, but then we'd need to support the metadata for those components into the "parent" components. For example, a middleware would have to add fields for all the fields supported by the state store it wants to use. This is a mess from the perspective of the user.

Users have asked this for redis, which would have been implemented already if there is an actual rush.

The most pressing issue is actually OAuth2, which today doesn't work. Either you are limited to 1 instance of your app (keeping token in-memory), or it sotres in a cookie and it can't work with Azure AD because the tokens don't fit in a cookie.

@berndverst
Copy link
Member

Users have asked this for redis, which would have been implemented already if there is an actual rush.

"No is temporary, yes is forever."

I am personally interested in saving users compute resources and money in various clouds or even their own self hosted deployments. For this reason as I mentioned it is preferable to reuse an existing database server / container that is already deployed / provisioned. Hope that makes sense.

A separate Redis deployment as a rate limiter may appeal to some users. I agree - but I personally am very turned off by this world where we make everyone deploy the whole kitchen sink for tiny features from the respective services.

@artursouza
Copy link
Member

Users have asked this for redis, which would have been implemented already if there is an actual rush.

"No is temporary, yes is forever."

I am personally interested in saving users compute resources and money in various clouds or even their own self hosted deployments. For this reason as I mentioned it is preferable to reuse an existing database server / container that is already deployed / provisioned. Hope that makes sense.

A separate Redis deployment as a rate limiter may appeal to some users. I agree - but I personally am very turned off by this world where we make everyone deploy the whole kitchen sink for tiny features from the respective services.

Yes. For that reason, we can expose the state store API / which is a much simpler solution and easier to debug and operate than actors. Which is my proposal to solve this problem.

@yaron2
Copy link
Member

yaron2 commented Nov 9, 2023

If we can achieve rate limiting like this (#6538 (comment)) it will further reduce consumption of resources as both actor hosts and placement are known to consume memory as scale increases. In addition you could use rate limiters and skip installing placement altogether

@ItalyPaleAle
Copy link
Contributor Author

ItalyPaleAle commented Nov 9, 2023

Please, let's not get carried away too much with rate-limiting. The problem is more generic. I mentioned OAuth2 is the biggest issue right now.

I have also explained above why I don't think state stores are the solution. something needs to initialize the component, so it's either a circular dependency or a hell of a messy job: #6538 (comment)

Additionally, actors can keep data in-memory while they're running, so we avoid I/O with the state store on every call

@berndverst
Copy link
Member

Users have asked this for redis, which would have been implemented already if there is an actual rush.

"No is temporary, yes is forever."

I am personally interested in saving users compute resources and money in various clouds or even their own self hosted deployments. For this reason as I mentioned it is preferable to reuse an existing database server / container that is already deployed / provisioned. Hope that makes sense.
A separate Redis deployment as a rate limiter may appeal to some users. I agree - but I personally am very turned off by this world where we make everyone deploy the whole kitchen sink for tiny features from the respective services.

Yes. For that reason, we can expose the state store API / which is a much simpler solution and easier to debug and operate than actors. Which is my proposal to solve this problem.

Let's say we can find a solution using state stores (like etags and such), wouldn't we still need a reference from runtime with an initialized state store? I think there is still a circular dependency no matter what.

@yaron2
Copy link
Member

yaron2 commented Nov 9, 2023

The middleware (in contrib) would need to accept a state.StateStore interface that also lives in contrib.

@ItalyPaleAle
Copy link
Contributor Author

ItalyPaleAle commented Nov 9, 2023

The middleware (in contrib) would need to accept a state.StateStore interface that also lives in contrib.

But the state store object (which complies with that interface) is instantiated by the runtime. So you have your circular dependency there too? My understanding is that @artursouza doesn't want any circular dependency, so we don't really have options.

He defined circular dependencies at runtime too:

Actors is a special building block that is built on top of others, at the runtime. The interface might solve a circular dependency at compilation time but it still creates a circular dependency at runtime - which is way worse.

Then you'd need to figure a way in the metadata of the middleware to know which state store you're referencing.

Using actors:

  • We could still have an interface in contrib. This interface is for an abstraction on top of actors (that doesn't depend on other internals in the runtime)
  • The runtime passes a concrete object that implements that interface

I don't see much differences there, except that with actors we can keep data in-memory and avoid (expensive) I/O on every call...

That said, using state stores would still be acceptable in the case you're describing above (although I like them much less because that always requires I/O and it's a lot slower for things like OAuth tokens). But this doesn't allow getting away of the "circular dependency" Artur is concerned with.

@berndverst
Copy link
Member

The middleware (in contrib) would need to accept a state.StateStore interface that also lives in contrib.

I get what you are saying. Let me see if I can write this up so everybody understands :)

@artursouza
Copy link
Member

The middleware (in contrib) would need to accept a state.StateStore interface that also lives in contrib.

But the state store object (which complies with that interface) is instantiated by the runtime. So you have your circular dependency there too? My understanding is that @artursouza doesn't want any circular dependency, so we don't really have options.

Then you'd need to figure a way in the metadata of the middleware to know which state store you're referencing.

Using actors:

  • We could still have an interface in contrib. This interface is for an abstraction on top of actors (that doesn't depend on other internals in the runtime)

  • The runtime passes a concrete object that implements that interface

I don't see much differences there, except that with actors we can keep data in-memory and avoid (expensive) I/O on every call...

That said, using state stores would still be acceptable in the case you're describing above (although I like them much less because that always requires I/O and it's a lot slower for things like OAuth tokens). But this doesn't allow getting away of the "circular dependency" Artur is concerned with.

That is a much reduced circular dependency and we can eventually refactor that into contrib 100%. Having OAuth or rate limiter to depend on actors is a much bigger pill to swallow, with ops concerns.

We can implement a cache layer too and avoid read for each request on oauth. Also, rate limiter can be eventual consistent, which allows caching there too. A combination of token bucket with async reads and writes for state store can be really fast with a quick time to market for users.

@berndverst
Copy link
Member

berndverst commented Nov 9, 2023

In runtime during the component initialization we somehow need to get a reference to the state store to be used. The name can be obtained from the Middleware metadata. Then we either init that state store, or load the existing reference of it and pass it into the constructor for the middleware.

func init() {
httpMiddlewareLoader.DefaultRegistry.RegisterComponent(func(log logger.Logger) httpMiddlewareLoader.FactoryMethod {
return func(metadata middleware.Metadata) (httpMiddleware.Middleware, error) {
return oauth2.NewOAuth2Middleware(log).GetHandler(context.TODO(), metadata)
}
}, "oauth2")
}

return oauth2.NewOAuth2Middleware(log).GetHandler(context.TODO(), statestore, metadata)

statestore would be the reference here.

And in components-contrib we'd update the interface to take the state store reference:

https://github.com/dapr/components-contrib/blob/8fe74b15aa5c89c730076bfa735fa57f7e4d0058/middleware/middleware.go#L21-L24

This would become

// Middleware is the interface for a middleware.
type Middleware interface {
	GetHandler(ctx context.Context, statestore state.Store, metadata Metadata) (func(next http.Handler) http.Handler, error)
}

For a strong reference to the state store name to be used we should add this to the middleware metadata.

https://github.com/dapr/components-contrib/blob/8fe74b15aa5c89c730076bfa735fa57f7e4d0058/middleware/metadata.go#L19-L21

type Metadata struct {
	metadata.Base `json:",inline"`
        Statestore string `json:"statestore"`
}

Does that sound right?

That would not change anything about imports in either contrib or runtime.

@yaron2
Copy link
Member

yaron2 commented Nov 9, 2023

The circular dependency is a secondary win/concern.

In addition to what I described above, we should consider a lot of users use Redis and other NoSQL DBs for rate limiting/session tokens because of their performance. As we know, Redis as an actor state store is deprecated and not recommended, which will further limit users of using a state store that's fit for purpose for this scenario.

So for me, the main benefits are:

  1. Reliablity and better troubleshooting. Actors bring way more complexity than etags
  2. Reduced resource consumption and footprint
  3. Architectural decoupling from actors and it's subsystem
  4. Larger choice of fit for purpose state stores for users

@berndverst
Copy link
Member

The circular dependency is a secondary win/concern.

In addition to what I described above, we should consider a lot of users use Redis and other NoSQL DBs for rate limiting/session tokens because of their performance. As we know, Redis as an actor state store is deprecated and not recommended, which will further limit users of using a state store that's fit for purpose for this scenario.

So for me, the main benefits are:

  1. Reliablity and better troubleshooting. Actors bring way more complexity than etags
  2. Reduced resource consumption and footprint
  3. Architectural decoupling from actors and it's subsystem
  4. Larger choice of fit for purpose state stores for users

Another possibility this enables (not saying we have to go down that route):
We could expand the state store interface to include efficient methods that can be easily implemented by every state store. Something like Increment, Get and Set which do not use the Dapr envelope in use today.

I'm a bit worried about the overhead of the Dapr envelope for all of these operations.

@yaron2
Copy link
Member

yaron2 commented Nov 10, 2023

The circular dependency is a secondary win/concern.
In addition to what I described above, we should consider a lot of users use Redis and other NoSQL DBs for rate limiting/session tokens because of their performance. As we know, Redis as an actor state store is deprecated and not recommended, which will further limit users of using a state store that's fit for purpose for this scenario.
So for me, the main benefits are:

  1. Reliablity and better troubleshooting. Actors bring way more complexity than etags
  2. Reduced resource consumption and footprint
  3. Architectural decoupling from actors and it's subsystem
  4. Larger choice of fit for purpose state stores for users

Another possibility this enables (not saying we have to go down that route): We could expand the state store interface to include efficient methods that can be easily implemented by every state store. Something like Increment, Get and Set which do not use the Dapr envelope in use today.

I'm a bit worried about the overhead of the Dapr envelope for all of these operations.

For sure, we can create granular interfaces for internal use and as these grow to critical mass we can even look at these as candidates for state store API expansion

@ItalyPaleAle
Copy link
Contributor Author

ItalyPaleAle commented Nov 10, 2023

@berndverst your design has some valid thoughts, although:

  • Rather than adding a property to the constructor you may want to add it as a property to the Metadata struct (alongside Name and Properties). This way we can extend it to pass other kinds of components in the future, and it is also simpler to change as an API (you know, the usual dance with contrib, runtime, and stuff)
  • We need to solve the problem of the initialization order. The runtime needs to be aware of when a component depends on another component and initialize them in order (sort of what we do for referencing secrets in secret stores).

However, I still think this is a very much sub-par idea than using actors. This requires re-implementing caching, worrying about when the cache needs to be consistent or not, does more I/O with storage, and OCC with etags means that operations more often than not need to be retried (see what's going on with reminders that fail after a certain load due to too many etag conflicts). This is gonna hurt us badly down the road like it's hurting us with reminders, and in the best case scenario requests to state store will be retried many times (adding lots of latency).

Why re-implementing the wheel when we already have it done, and have it done well (and is getting better)?

In addition to what I described above, we should consider a lot of users use Redis and other NoSQL DBs for rate limiting/session tokens because of their performance. As we know, Redis as an actor state store is deprecated and not recommended, which will further limit users of using a state store that's fit for purpose for this scenario.

As mentioned before many times, the choice of actor state store is really irrelevant for rate-limiting, as with actors the state would be kept in-memory and there'd be no operation on the state store at all...

Heck, in this specific example (rate-limiting), assuming you don't use actors for anything else, you could even initialize the actor runtime with the in-memory component as actor state store, as it doesn't matter at all.

Actors aren't just for storage but they have business logic too. In the case of rate-limiting, we wouldn't just store the state in the actor: we'd send the caller IP and ask the actor to tell us if the call is authorized.

This allows performing the rate-limiting with 1 network call only instead of 2 (read and set). So this should offer half the latency for rate limiting (assuming calls to persistent store have the same latency as calls to actors within the cluster)

This also means the same actor can be re-used, for example for a rate-limiter binding too!

Lastly, while actors a bit more complex, if we use the actors APIs directly (no SDK) it's not too bad. And if we are scared of using actors ourselves, what message are sending to our users??

@yaron2
Copy link
Member

yaron2 commented Nov 10, 2023

Actors with their single threaded nature can become a pretty big bottleneck even when saving state in-memory in terms of the waited I/O calls. However, this doesn't concern me so much for the same reason i'm not concerned too much about the I/O operations to the database - both incrementing and reading can be made asynchronously and outside the path of the main call, as we've concluded that counting can both exceed the threshold and be updated in an eventually consistent way without real time coordination between all instances. More so, if there's an I/O bottleneck waiting for entry into an actor,that might create a noisy neighber problem for other functionality of the Dapr sidecar hosting that actor instance.

The only selling point for me for using actors is the in-memory serving, so now its a question of tradeoffs - does this feature justify all the disadvantages I mentioned above, also considering we can with little work still implement in-memory serving at the middleware component level.

Re: message to users, to me its more about sending the message that we do our best to avoid The Law of the Instrument and use the right tool for the right job, and not simply because a certain tool can also do that job.

@ItalyPaleAle
Copy link
Contributor Author

Actors with their single threaded nature can become a pretty big bottleneck even when saving state in-memory in terms of the waited I/O calls

Your suggestion of using etags is pretty much limited to single-threaded nature too. If you need to read from the state store, get the etag, and then save with the etag, you're still limited to one concurrent transactions.

If there were more than one concurrent transaction, you'd have an etag conflicts, which means the read-write loop needs to be repeated until it succeeds - like we do with reminders (and the performance of reminders is just abysmal).

as we've concluded that counting can both exceed the threshold and be updated in an eventually consistent way without real time coordination between all instances.

Surely there's a trade-off here with how "eventual" the consistency is. With limit of X rps and N replicas, you may have bursts where you're allowing N * X requests, which may or may not be acceptable. Ideally this would have strong consistency, with the etag conflict issues.

if there's an I/O bottleneck waiting for entry into an actor,that might create a noisy neighber problem for other functionality of the Dapr sidecar hosting that actor instance.

I'm not following this. Isn't the queue per-actor?

use the right tool for the right job, and not simply because a certain tool can also do that job.

I'm actually convinced actors are the right tool for this job. I have explained why multiple times, and it's the most generic and universal solution for the problems we have today and in the future.

Bernd almost convinced me that state stores were a viable options until I realized how they have issues with etag conflicts, how they require re-implementing caches everywhere, and how they involve making trade-off between performance and consistency.

Most critically, I really worry using OCC (etags) we will box ourselves with having to deal with the same issues we have with reminders today.

@artursouza
Copy link
Member

artursouza commented Nov 10, 2023

Handling eTags is common when implementing distributed systems. The problem with reminders is that we are trying to persist a list on a limited set of keys, not because we are handling eTags.

I think we have plenty of data to reach a decision among maintainers of runtime. We don't need to have unanimous agreement. People have different views on architecture principles and that is OK.

I vote (binding vote) for using state store interface to implement OAuth and rate limiter middlewares.

@yaron2
Copy link
Member

yaron2 commented Nov 10, 2023

I definitely agree @artursouza that the isue with reminders is not etag but with the underlying data structure we chose.

I would like a little more time to examine this issue.

@ItalyPaleAle:

Your suggestion of using etags is pretty much limited to single-threaded nature too

Yes, but the main difference is that a Dapr sidecar is not blocking connections due to the single threaded execution as that happens in the DB.

I'm not following this. Isn't the queue per-actor?

It is, but an actor instance is always a representation of a resource (for example, a URL path for a given app ID). If the rate limiting representation of actors means there will be concurrent hits on the same resource (actor ID) this will lead to somewhat of an undesirable way of using actors.

@ItalyPaleAle
Copy link
Contributor Author

The problem with reminders is that we are trying to persist a list on a limited set of keys, not because we are handling eTags.

That is true, but it's also true that etags are a big part of the problem too. I have been working on actors a lot lately and I can confirm this is the case. For example, the reason why the Workflow perf test fails 100% of times with 3 replicas of the app (and frequently with 2 replicas too) is because of etag conflicts while storing reminders. Essentially, 2 instances try to store reminders at the same time and they are stuck in a loop where they fetch data with one etag, and by the time they go write it, the etag has changed and the write fails. This is the typical problem with "distributed systems" that rely on OCC. The solution is to either fail (causing errors to be cascaded) or retry, possibly for an infinite amount of time, which increases latency significantly.

Yes, but the main difference is that a Dapr sidecar is not blocking connections due to the single threaded execution as that happens in the DB.

There can be workarounds to this. I have some ideas (that will be on a separate proposal) such as being able to run an actor concurrently if we know that it doesn't need single-threaded execution (like in the case of tokens - but it will need to be single-threaded for rate-limiting)

I think we have plenty of data to reach a decision among maintainers of runtime

This impacts components-contrib too, probably more than runtime (for the benefits), and contrib maintainers should be involved in the decision as well.

@yaron2 yaron2 modified the milestones: v1.13, v1.14 Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants