Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Enable Pub/Sub CloudEvent filtering for subscribers #2582

Closed
artursouza opened this issue Dec 15, 2020 · 38 comments · Fixed by #3406
Closed

[Proposal] Enable Pub/Sub CloudEvent filtering for subscribers #2582

artursouza opened this issue Dec 15, 2020 · 38 comments · Fixed by #3406
Assignees
Labels
area/runtime/pubsub kind/feature New feature request P1 size/XL 4 weeks of work (should this be an epic?) stale Issues and PRs without response
Milestone

Comments

@artursouza
Copy link
Member

artursouza commented Dec 15, 2020

In what area(s)?

/area runtime

/area operator
/area placement
/area docs
/area test-and-release

Describe the feature

Subscribers should be able to filter based on the CloudEvent subject and other properties. See https://github.com/cloudevents/spec

Release Note

RELEASE NOTE: ADD Support for CloudEvent filtering in PubSub's subscriber

@artursouza artursouza added area/runtime/pubsub kind/feature New feature request size/XL 4 weeks of work (should this be an epic?) P2 labels Dec 15, 2020
@msfussell msfussell changed the title Allow PubSub filtering for subscriber Enable Pub/Sub CloudEvent filtering for subscribers Feb 6, 2021
@msfussell msfussell removed the P2 label Feb 6, 2021
@msfussell msfussell changed the title Enable Pub/Sub CloudEvent filtering for subscribers [Proposal] Enable Pub/Sub CloudEvent filtering for subscribers Feb 6, 2021
@skyao
Copy link
Member

skyao commented Feb 22, 2021

Any more information about this proposal? What the plan?

@artursouza
Copy link
Member Author

We will start the design for this in March to implement in v1.2. The proposal will be presented here for community commenting.

@artursouza artursouza added this to the v1.2 milestone Mar 2, 2021
@jigargandhi
Copy link
Member

I would like to work on this if unassigned

@artursouza
Copy link
Member Author

artursouza commented Mar 9, 2021

I think it is a good issue to work on. PubSub payload is json cloud events, which makes me think that JsonPath matching is a good fit as the query language to filter. The filter should probably be defined as a response from the app to the subscribe callback. The SDKs need a corresponding issue too to add this support.

I will assign this to you, @jigargandhi

WDYT?

@jigargandhi
Copy link
Member

I agree that

The filter should probably be defined as a response from the app to the subscribe callback. The SDKs need a corresponding issue too to add this support.

When you mentioned JsonPath filtering are you thinking of looking into the data of the message and then filter based on whether the JsonPath is satisfied or not.

I have a different opinion though:
We should not look into data rather use the following fields

  • Type, a mandatory field this includes type of the message and apps might want to subscribe to certain types of messages which they can handle
  • Source, this is a mandatory attribute in cloud events spec and is a URI, Apps might subscribe certain prefix paths of producers.
  • Subject this is an optional field and can be used to do string matching.

Makes sense?

@pkedy
Copy link
Member

pkedy commented Mar 15, 2021

@jigargandhi I like your thoughts above because decoding data could be expensive. However, I believe that developers will likely want to filter based on the contents of data. As an optimization, the filtering code can "lazy decode" data (as determined by datacontenttype) only if it is used in the filter expression (possibly determined by strings.Contains(.data.) or RegEx during initialization).

I think JsonPath is a good option as a filtering mechanism but I also wanted to point out CEL. It would feel more like programming a predicate statement in a familiar programming language vs. writing a potentially unfamiliar expression syntax. It's used in a few projects like Istio and Firebase for this purpose. Also, I'm confident this will come up, but as much as I like WebAssembly, it is overkill for this use case. Message transformation would be a different story though ;)

As @artursouza mentioned, another consideration is where this filtering mechanism lives and where the filter expressions are configured. I think the filtering code would live in dapr/dapr (not dapr/components-contrib). Maybe that part is obvious. :) Where expressions live is more ambiguous. Having them returned by the subscribe call would work but what if that should be a responsibility of the operator or be configuration driven? My sense is that expressions are configuration in disguise. Unfortunately, it is probably not a good idea to use the component's metadata for configuration because they can be shared by multiple applications. A component would have to be created for each application if they had different filter rules. This would work but is not an ideal (or scalable?) solution.

In the 6 month plan, there is mention of a configuration component (no issue for this yet) that would apply at the application scope. This would be the ideal place for the developer to configure filtering rules but also subscriptions as well (since that is really configuration also). I propose waiting to implement this until after the configuration component exists. Maybe this is rationale for prioritizing it sooner than 6 months ;) I'd hate to see a short term solution implemented only to have it deprecated shortly after.

If we want to proceed with this now, there are two approaches above that could work. @jigargandhi can you think of any others?

Lastly, I was curious if others share the following opinion. I believe developers should prefer filtering based on type and other cloud event fields and avoid filtering based on specific topic names (I can elaborate on why if needed). This thinking could influence the design as the configuration of topics and filters may not be directly coupled. For Dapr, this might be over engineering the solution, but curious if others think about this the same way.

Overall, my feeling is that this feature needs to be simple both in implementation and usage. Unless users really want more stream processing-like features, anything "complex" should be handled by the application.

@artursouza
Copy link
Member Author

artursouza commented Mar 15, 2021

I agree to consider CEL instead of JsonPath.

Regarding the 'data' attribute. In CloudEvent spec, data attribute is a JsonObject, so filtering based on data is the same as filtering by any other field in the cloud event - we might be able to lazy load this (like @pkedy mentioned) but not 100% sure at this point.

In general terms, I agree to almost everything @pkedy mentioned, except the "waiting" part. This feature should not wait for the app config feature (IMO) because we have enough people waiting for the filter feature. Second, I don't think the /dapr/subscribe callback will go away anytime soon. Some developers might prefer it over the app config.

@pkedy
Copy link
Member

pkedy commented Mar 15, 2021

Having looked at this a bit more I think the easiest solution is to add an optional Filter field to the Subscription struct. Decoupling the concept of topic and filtering/processing gets very complicated very fast ;)

@rynowak
Copy link
Contributor

rynowak commented Mar 15, 2021

Lastly, I was curious if others share the following opinion. I believe developers should prefer filtering based on type and other cloud event fields and avoid filtering based on specific topic names (I can elaborate on why if needed). This thinking could influence the design as the configuration of topics and filters may not be directly coupled. For Dapr, this might be over engineering the solution, but curious if others think about this the same way.

Personally I've seen this kind of topic come up over and over again during my work in the web framework space. Typical places where this would come up with be things like versioning, or A/B testing - basically anywhere you want to apply routing rules based on policy.

Like a cloud event, HTTP requests are also separated into envelope and payload. It's always better for everyone if you can do your filtering based on the envelope (headers or URL in HTTP) or other fields of the cloud event besides data in cloud events.

It's typically better for the app developer to make metadata part of the metadata rather than a business object.

It's better for the runtime because it's more efficient, metadata is optimized for this kind of thing.

It's typically just overall simpler if you avoid building a rule engine that can inspect arbitrary content (consider non-JSON data payloads).


Clarifying after an offline chat with @pkedy - I'm not saying don't, I'm just saying that if you can steer users in the direction of relying on metadata that typically has a better result.

@pkedy
Copy link
Member

pkedy commented Mar 15, 2021

In the case with pubsub you might think of it as three layers:

  1. topic / broker
  2. envelope (cloud event wrapper)
  3. payload (cloud event data field)

What I was trying suggest was that you should try to leverage envelope/cloud event fields primarily, the payload only if needed, but don't use topic/broker level information for filtering. I think @rynowak and I are aligned on that. :)

@halspang halspang modified the milestones: v1.2, v1.3 May 19, 2021
@artursouza artursouza added the P1 label May 21, 2021
@greenie-msft greenie-msft added this to Backlog in Dapr Roadmap Jun 3, 2021
@greenie-msft greenie-msft moved this from Backlog to Planned (Committed) in Dapr Roadmap Jun 3, 2021
@greenie-msft greenie-msft moved this from In Design (Discussion) to Planned (Committed) in Dapr Roadmap Jun 10, 2021
@artursouza artursouza moved this from Planned (Committed) to In Progress (Development) in Dapr Roadmap Jun 22, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Issues and PRs without response label Jun 24, 2021
@daixiang0
Copy link
Member

@artursouza please help add label to remove stale.

@dapr-bot dapr-bot removed the stale Issues and PRs without response label Jun 30, 2021
@georgestevens99
Copy link

georgestevens99 commented Jul 21, 2021

I find this feature very useful and am looking forward to using it some day. FYI I too like the 2nd of the 2 most recent code examples.

However the terminology used in the yml and code examples seems to have run off the rails. When I look at the general usage of pubsub in the industry I see the following primary terms being used:

  • Events
  • EventTypes
  • Publishers
  • Subscribers
  • Event Handlers, i.e. the name(s) of subscriber's code that processes an Event, or in this case an Event's individual EventType. Developers have been using this term for decades, and in more places than pubsub. Almost every dev with more that a few years of experience knows what an EventHandler is and that they typically have names.

Given that, what the heck is a "route"? Where did that come from? And it is not just in this filtering proposal, but in the original Event Subscriber component as well. With the term "route" used, now everyone who uses pubsub has to figure out, or experiment, or read the docs, or ask questions about what a "route" is and what about the event handlers, typically expecting to find this commonly used term in the "simple mental model" of a Dapr pubsub that the Subscription component offers. And this takes more time and causes more work, detracting from one of the key Dapr value propositions of reducing the time it takes to develop code.

Please do 3 things:

  1. Use the term EventHandlerName (or something like that) in both this new filter feature and in an upgraded version of the Subscribe component. Discard the term route, although that may well be how an Event is directed to an EventHandler is implemented, it most certainly does not belong in the conceptual model provided by a Subscription component.

  2. Update the PubSub building block high level concept description to add the term EventHandler. Note that the term route does not appear there (at least as far as I can see). This way the concepts introduced in each building block description will drive the terms used in the customer interfaces of the dapr components that implement the functionality of the building block.

  3. Institute a policy and a practice that all components "shall" use only terms that are in the building block description, and if necessary update the building block description to include key non-implementation concepts. And review each "customer facing" Dapr item (proposal, component, document, sample) to make sure it remains true to the terminology of the Dapr Building Blocks involved. In short the Building Blocks are the ultimate source of truth about the high level concepts and terms of Dapr. Thus, review every change to make sure that continues for the life of Dapr.

The reason I am so enthusiastic about this is because one of Dapr's greatest strengths (enabling polygot systems) is also one of its greatest weakness. There is always a strong tendency for entropy to take over and rapid decay of the structure and usability of a code base to set in. It has truly amazed me to see how fast it can set in and make things much harder to use. The potential for rapid code and feature rot is even greater with a polygot group of devs using Dapr in their systems, and also being some of the ones that develop the Dapr code and features. With such a polygot group there is a super wide variety of concepts, points of view, and terms the polygot developers are familiar with. And if Dapr goes down the route of implementing features using a bunch of lower level concepts unique to a narrow range of technologies, others using some other kinds of terms, and still others using vastly different ones, Dapr will degenerate to a hard to use, harder to learn logically unrelated mismash of capabilties.

The only thing that can prevent this inevitable slide down hill under the above curcumstances is to doing the work necessary keep all components and documents focused on the simple mental models and their terms presented in the Dapr Building Blocks. It is these mental models which must be the guiding lights for all Dapr features, terms, and components. Why? Because it is these widely used, fairly high level, and general concepts that tie together the disparate points of view of the polygot Dapr customers and developers to form a conceptual common ground. This then allows them to rather quickly grok what a well designed component does and how to use it to satisfy their needs.

Thanks!

@georgestevens99
Copy link

@artursouza I get your confused emoji, I think. Thanks.

Please apply all that I said about "routes" above, to the term "path". Instead of "path" I'd like to see "eventHandler". In other words the name of the method that gets invoked as a result of the "path". I think that is what "path" means, and please correct me if I am wrong.

The idea I am trying to promote is at the customer interface level (component), stay focused on the high level concepts of pubsub rather than how this particular pubsub is implemented (with routes and paths). Hide the implementation details from the customer level so they are dealing with a widely known conceptual pubsub model as I sketched at the beginning of my previous post.

And if I have misunderstood the meaning of "routes" and "paths" in your examples, that that also should tell you something about the ease of quickly understanding your pubsub event types and how to use them.

Thanks.

@orizohar
Copy link

This issue also requires a docs PR

@pkedy
Copy link
Member

pkedy commented Aug 30, 2021

@orizohar I have draft content for the Docs PR but it includes examples from the SDKs which don't have this feature added yet. For that reason I'm holding off on the Docs PR.

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added stale Issues and PRs without response and removed stale Issues and PRs without response labels Sep 29, 2021
@theperm
Copy link

theperm commented Oct 11, 2021

@georgestevens99 I too have a solid background EDA, but i didn't see it from your perspective. Since the subscriber is being configured I think the term route and path are in the context of the Web API that handle the messages. If you think of it like this instead of "Routes" use "HandledByRoute/s" and a route having a (URL) Path makes more sense. I suspect if your thinking in terms of the web API side it feels natural, given the actual handler is decoupled from messaging so all it exposes is the route. Coming at if from the messaging side you loose the familiar terms (at the component config) but then you could argue messaging don't care who subscribes, so why care how and what terms they use to configure the subscriptions? Food for thought.

The component is the "middleware" here wiring up the two sides - I actually think something like "HandledByRoute" would resonate with both sides of the fence.

@georgestevens99
Copy link

georgestevens99 commented Oct 11, 2021 via email

@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added stale Issues and PRs without response and removed stale Issues and PRs without response labels Nov 10, 2021
@dapr-bot
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added stale Issues and PRs without response and removed stale Issues and PRs without response labels Dec 10, 2021
@dapr-bot
Copy link
Collaborator

dapr-bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

@dapr-bot dapr-bot added the stale Issues and PRs without response label Jan 9, 2022
@dapr-bot
Copy link
Collaborator

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

@greenie-msft greenie-msft moved this from Done to Released in Dapr Roadmap Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime/pubsub kind/feature New feature request P1 size/XL 4 weeks of work (should this be an epic?) stale Issues and PRs without response
Projects
Development

Successfully merging a pull request may close this issue.