Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance cloudevent matching in event states #153

Closed
wants to merge 7 commits into from

Conversation

tsurdilo
Copy link
Collaborator

  • Aligns referencing events in event states with how referencing functions in actions is done
  • Adds addition properties to event definitions which are properties of CloudEvent format
  • Adds ability to do event matching in event states based on the CloudEvent payload (data property)
  • Renames "functionref" to "functionRef" to align with "eventRef" (same with its "refname" property)

@cathyhongzhang
Copy link
Collaborator

Maybe I misunderstand the change. I don't see a need to change the "expression". In many applications, an action is triggered by an"and/or" combination of events. We need to support this: "Boolean expression which consists of one or more Event operands and the Boolean operators". Existing spec defines all the event attributes in the trigger event itself (if needed, we can add more attributes to the event definition itself for more granular definition of events), then in the eventState and eventAction or any other places that need to reference the event, we just reference the event name/ID.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Jan 30, 2020

@cathyhongzhang In your case then there is no need to define events at all in the workflow definition as the expression can be evaluated against the incoming events (which it has to anyways).

using an expression is not a good idea because there is no way that you can use an expression for something that a complex event processing engine should be responsible for.

There is no way that our workflow definition can express "Trigger actions on event A followed by event B within 4 seconds or event C within 2.5 seconds apart"...

The events that a serverless workflow consumes are orchestration events
already processes by a CEP engine, AWS has this already for example called CloudWatch where you can define complex event processing rules, and that engine is responsible to then to trigger a single event when the complex condition is met. That is the event that then the event state in the aws step functions workflow consumes.

So in CloudWatch or another complex event processing service you should define your complex rules which upon the condition stated above just triggers a single event lets call it "X", and all our workflow definition is concerned about is to say that actions should trigger on existence of "X" event.

Hope this makes sense.

I am curious how you guys do this in FunctionGraph where you also have an event-expression field but then the documentation says its just a name of a defined event object. Probably the same as what I define here. Let me know.

Screen Shot 2020-01-30 at 1 48 16 PM

https://support.huaweicloud.com/en-us/devg-functiongraph/functiongraph_02_0330.html

@cathyhongzhang
Copy link
Collaborator

I don't think we can or should have a requirement : if you want to use the workflow spec, you must have a "Cloud Watch" like event processing rule engine to filter or generate another set of "orchestration events". In all our customer cases, the actions are triggered by one CloudEvent or a simple "and/or/not" combination of the CloudEvents. I agree that as more applications start to use the workflow spec, there will be cases that need complex event rules and theoretically the rules can become very complex. Here is my thought: we can state in the spec that if the event rules are complicated, it is expected that the user has a middleware entity (like CLoud Watch) that will handle the complex event processing rule and generate a new CLoudEvent. This new CloudEvent will then be consumed by the workflow engine. But in majority cases, the original event producer can directly send the events to the workflow engine. The boolean expression of events works for both the majority cases and those few complex cases.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Jan 30, 2020

@cathyhongzhang yes the way we deal with events in this PR offers both. The way to consume all events from an event source with a certain type and other params, with additional check of the event payload.

In cases where an event state needs complex interactions between events this is better suited for complex event processing services to define those rules. From the workflow perspective then it is the same thing -- consuming an event from an event source where that source may be the CEP service.

For us to define an expression to give users some sort of "chance" to define complex events rules is counter-intuitive tho and we should not use it imo.

@tsurdilo
Copy link
Collaborator Author

I will update the text to mention what you stated.

@cathyhongzhang
Copy link
Collaborator

I am afraid that I can not agree with some of your points. Most, if not all, event Bridge/Grid/Router will do filtering based on CloudEvents params and event payloads, but will not do boolean combination of different events. So our workflow does not need to duplicate the work of filtering/checking on CLoudEvents params and/or payload. What we need to provide is the missing functionality of boolean combination of events.

@tsurdilo
Copy link
Collaborator Author

updated

@cathyhongzhang
Copy link
Collaborator

The boolean expression is not complex event rules. They are just boolean combination of event names/IDs, which is needed in many usage scenarios.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Jan 30, 2020

@cathyhongzhang I think even the smallest of expressions you might want to define like

Event "A" and event "B" includes temporal reasoning which already is a complex rule.
Event "A" or event "B" - again same thing..what is the duration, what is the type of our event streaming mode, do we have any "sliding time windows" during which we define the existence of one event but not the other....
workflows are not responsible for this

What we care about is existence of events of certain types and sources etc which trigger actions, or resume the workflow execution (correlation token scenarios)

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Jan 30, 2020

@cathyhongzhang regarding the payload to explain the reasoning behind the addition:
Since our event state is not a switch state (we had these discussions iirc before), we can express currently:

"If an "Order" event happens perform "Automatic Provisioning Actions""

but we cannot at the same time express:

"If an "Oder" event happens with cost > $1000 perform "Manager Approval" actions". If it's less than 1000 perform "Automatic Provisioning Actions"

The decision what actions to perform is orchestration logic and should be able to be handled by the workflows which are responsible for orchestration logic.

The way we have to handle it now is pass the event payload to functions and let them decide what to do (manager approval or not), and functions are responsible for business logic not orchestration logic.

If we were not to include this, we would have to define two events, lets say a "NormalOrder" event and a "HighOrder" event ..and that goes all the way back to the start of our conversation that there is some service that handles event processing for serverless workflows, but I think for cases where there is no temporal / time-based reasoning involved (unions, intersections, etc based on time constraints) we should offer the ability to reason over the payload which is static.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Jan 30, 2020

@salaboy @manuelstein @mswiderski - what do you guys think? I think this is maybe good discussion to include more people as events are a big part of the specification.

@cathyhongzhang
Copy link
Collaborator

There is no temporal reasoning with "Event A and event B". It just means both events need to happen before the action is triggered. Same for "Event A or Event B", which just means etiher event could trigger the action. There are scenarios that need all the events to happen or just need one of them to happen. If the user needs temporal or other complicated relationship between these events, they need to have a separate event rule engine.

@cathyhongzhang
Copy link
Collaborator

Payload varies a lot among different event sources/producers. I don't feel it is a good idea to bring the payload filtering into the event expression since it can make it very complicated. It is better to leave it to the middleware "event rule engine". Also as I mentioned before many existing middleware or event bridge/grid/route already support paylaod filtering and there is no need for the workflow to repeat this.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Jan 31, 2020

@cathyhongzhang

(1) Our Event task defines a parameter:
timeout | Time period to wait for incoming events which match the expression
So from the start we define presence of events in terms of time in that state - "A and B within $timeout$" and we use time for making orchestration decisions, as if an event does not happen within this timeout period, actions are not performed.

(2) If we allow users to say "Event A and event B" with an expression you run into immediate problems of community wanting to specify "A before B" , "B before A", "A and B at same time" which you cannot use a boolean expression for.
Do we allow "A and A"?
How do we deal with data integrity with "A and B and C" for example when we merge all events data into the state data input (overwrite)? Using boolean expressions quickly will become an implementation nightmare as well as non-portable.
If in same event state we specify multiple eventsActions, :
a) "If event A"
b) "If event A and B"
c) "If event A and B and C"
what actions do you execute in the case incoming events are like "AABAABC"?
is it a), b), a), c)
or a), a) nothing, a), c) or any of other possibilities?

Our spec has to be precise:
"If an instance of event A -> run these actions" "If an instance of event B -> run these actions"...that is portable. How these events A, B are produced, or if one cloud provider has CEP engine, but another has an event bridge, we do not care.

(3) Payload varies a lot among different event sources/producers -- thats why we enforce the CloudEvents format which lifts this ambiguity and payload is the data element which can be of type "object" (JavaScript) or "string" as per that specification

(4) Also as I mentioned before many existing middleware or event bridge/grid/route already support paylaod filtering and there is no need for the workflow to repeat this.
What? In our specification we already define that event payload is merged with the state data input (with filter), as such we already define that we filter and reason over it (choice state for example) .....

If in our event state we reason over it as provided in this pr, or after the event payload gets merged into the state input data is reasoned over by the next Choice state, it is the same thing!

I think it would be very useful if you could describe how "event-expression" works in FunctionGraph. The description indicates to point to a name of a defined event, which is exact same thing as what we are trying to do here.

@cathyhongzhang
Copy link
Collaborator

The timeout is outside the expression. It is a timeout for that state, which denotes that the workflow will transit from this event state to next state if the event expression is not satisfied within that timeout period (A side note on this: We can update this part to allow the event state to trasit to different next states when expression is satified vs timeout, wdyt?)

I don't feel adding timing logic into the event expression, such as "event A before event B or event B happening X min after event A , etc.", is a good idea since it can become very complicated.

BTW, the CloudEvents WG is discussing subsciprion/discovery API including filtering for CloudEvents and how an event pub/sub engine/middleware should work with CloudEvents producer on information passing and filtering. So I think we can probably leave this filtering of "external" CloudEvents to the pub/sub middleware or subscription API. As to filtering of information defined in the spec (here the information could be event, or function executuion result, or internal state parameter), I am seeing it as info internal to the workflow.

I am not sure how latest Function Graph works. But the first release only supports the simple case, one event, with future extension to support event-expression (and/or/not combination of events ).

I think we need more people's input/opinion on this. I think this will be a good topic for our first weekly meeting discussion.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Feb 4, 2020

@cathyhongzhang
The timeout is outside the expression. It is a timeout for that state, which denotes that the workflow will transit from this event state to next state if the event expression is not satisfied

Nope...it is currently part of the "eventsActions" array so the state can have multiple timeouts each associated with the event expression. (and adding even more confusion along with the expression).
Look at the spec.md please first before making these statements because we are seriously wasting time here instead of making progress.

@cathyhongzhang
Copy link
Collaborator

Can I request you not to make negative "waste time" comment? Negative comment is bad for community cooperation.

Of course I read the spec before making the statement. You misunderstand my statement! If you read carefully, you can see what I said is "outside the expression", which does not mean "outside of eventActions". The timeout is not associated with individual event inside the expression, as shown in your example. The timeout is associated with the expression as a whole, which means that the workflow will transit from this event state to next state if the whole event expression is not satisfied within that timeout period.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Feb 4, 2020

@cathyhongzhang
I am not sure how latest Function Graph works. But the first release only supports the simple case, one event, with future extension to support event-expression (and/or/not combination of events ).

And that's exactly how we should tackle this as well. Start small and well-defined and then in later versions and per community request add things as needed.

The current way we describe this is way too confusing and you run a high risk that it's not portable / vendor neutral at this point in time.

I would suggest going with these changes in addition to making the timeout property be on the state level as a fallback-all option.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Feb 4, 2020

The timeout is associated with the expression as a whole, which means that the workflow will transit from this event state to next state if the whole event expression is not satisfied within that timeout period.

so you have
"eventExpression": "event.name = 'A'"
"timeOut": "PT2S"

and I have
"eventTimeExpression: "event.name = 'A' and seconds <= 2" or something...what is the difference
2 properties instead of 1 to define event existence in terms of time.

@cathyhongzhang
Copy link
Collaborator

Compared with some new additions, I think the "and/or/not" is small and well defined. Let's get more input from other people in our weekly meeting. We need more people to comment on many other parts of our spec. It is not good to make this spec a one or two people spec.

@cathyhongzhang
Copy link
Collaborator

For your single event expression example, there is not much difference, but for an expression consisting of combination of events, there is a difference. For example, if the expression is "event A and event B" , timeout is 60s, it means the workflow will only execute the action in this event state if both eventA and eventB happen within 60s (it does no matter eventA happens 2s or 20s ... before/after eventB).

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Feb 4, 2020

@cathyhongzhang sorry I will just add one more thing, maybe discuss it at the meeting please:
from your example "event A and event B":
A = Normal Temparature Event
B = High Temperature Event
timeout 60s

it does matter in which order they arrive:

if A comes before B we might want to run the "call the doctor" actions
if B comes before A we might want to run the "go outside and play cause fever is over" actions

boolean expression is not optimal for this imo.

serverless workflows need to define single higher-level/orchestration events such as "Fever Started" or "Fever ended" in this case.

@salaboy
Copy link
Contributor

salaboy commented Feb 4, 2020

@tsurdilo @cathyhongzhang very interesting conversation, I tend to agree with @tsurdilo because we come from the same background. In general, I would be interested in seeing some very simple bindings with Cloud Events, so basically define that the workflow can receive and emit events that are relevant for the application (not workflow events).

If you go to all the expressions handling and event correlation it gets quite difficult to handle and there are already other specs that are trying to solve these expression problems.
When I look at the definition of the issue by @tsurdilo it is quite clear what the intention is, when I think about @cathyhongzhang suggestions about adding complex expressions then it becomes fuzzy and it opens the door for way more complex requirements that I don't think we should target in this spec.

@salaboy
Copy link
Contributor

salaboy commented Feb 4, 2020

/LGTM

@cathyhongzhang
Copy link
Collaborator

@tsurdilo w.r.t the following:
if A comes before B we might want to run the "call the doctor" actions
if B comes before A we might want to run the "go outside and play cause fever is over" actions

As you can see, if we bring the sequence or timing into the expression, it could become very complicated. So my point is not to bring inter-event sequence or timing into the boolean expression. The boolean expression is just a simple and/or/not combination of events (without any inter-event timing/sequence) which as a whole will trigger some action. If some application involves complicated inter-event timing/sequence as the action trigger, then a CloudEvent bridge/grid/router should be used, as we agree before.

@cathyhongzhang
Copy link
Collaborator

@salaboy Yes, I agree with you that we should not add complex event expression. I am not suggesting adding complex event expression. With so many comments going forward and backward, I understand it could be confusing to you. Let's discuss this in the weekly meeting for a concensus before merging this.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Feb 4, 2020

@cathyhongzhang I think the confusion comes from what you mention
like simple and/or/not combination etc. - could you please provide a real-world example it would really help.

@cathyhongzhang
Copy link
Collaborator

There seems to be no difference w.r.t. giving explicit control to the user whether using a boolean expression or using multiple flow diagrams. At high level, we can divide the use cases into three categories: 1. the use case does not involve any inter-event relationship, i.e. just one event 2. the use case involves event union and/or intersection relationship but no inter-event temporal relationship. 3. the use case involves inter-event temporal relationship. Case 1 is simple. For case 2, isn't it better for the user to just specify the use case via a simple event expression in one workflow diagram instead of asking the user to specify the use case via multiple workflow diagrams? As to case 3, since case 3 can become very complicated, we can leave it outside the workflow engine implementation. Either way has its pros and cons. It all comes to where is the balance point. I am with you about starting simple and adding more support based on feedback from both workflow users and vendors. We have spent enough time on this. Let's discuss this and reach a conclusion in tomorrow's meeting.

@salaboy
Copy link
Contributor

salaboy commented Feb 11, 2020

@mswiderski that definitely brings clarity on why this is difficult. So my gut feeling is to make sure that we don't drag that complexity right now. If we say, let's start with OR, we open the door to support NOT and AND later on. Starting with OR makes sense to me, but we need to be clear that this is what we are willing to support, for now, to avoid people asking for AND and NOT.

@mswiderski
Copy link

@cathyhongzhang I don't think this use case is valid

  1. the use case involves event union and/or intersection relationship but no inter-event temporal relationship

as soon as you define that workflow should be triggered when two events are received they must be time tracked so to say.

@salaboy I agree here, let's start with just OR use case to keep it simple for now and see how this evolves and what needs will arise.

@duglin
Copy link
Collaborator

duglin commented Feb 12, 2020

If there's consensus that other operands beyond OR would head down a path that people want to avoid, would it make sense to think of it as a list of "events" (meaning an array where any event in the list triggers it) instead of an "expression" ?

@cathyhongzhang
Copy link
Collaborator

cathyhongzhang commented Feb 12, 2020

@mswiderski , there are use cases that an action is triggered when multiple events happen before a time point but they do not care whether one event happens before/after another event (no inter-event temporal relationship). Use cases that are of the map/reduce pattern fall into this (the reduce part).

@tsurdilo
Copy link
Collaborator Author

@cathyhongzhang I think might be confusing facts with events. Facts do not have a time constraint, events always do.
Multiple facts lets say like "Applicant age is over 18", "Applicant address is in Germany" can be reasoned over to deduct an event such as "Application Granted"
But events, like "Application Submitted" and "Application Granted" always have a temporal (time) relationship.

This is I think what we have been trying to convey here from the beginning.

@cathyhongzhang
Copy link
Collaborator

@tsurdilo You seem to misinterprete the college app example I gave. In this example, multiple events refer to the following: eventA--math/science teacher's recommendation is received, eventB--language teacher's recommendation is received, eventC--SAT score is received. It does not matter whether one event is received before/after another event. As long as they are received before the deadline, an action will be triggered upon receiving all.

@cathyhongzhang
Copy link
Collaborator

cathyhongzhang commented Feb 12, 2020

Maybe we can start with support of "just or events expression" and "just and events expression", but not any combination?

@tsurdilo
Copy link
Collaborator Author

I think we have several people here clearly stating the pitfalls of using expressions to begin with and their agreement not to use them in this particular case. Or am I missing something?

This pr is the starting point so we can then improve to support more stuff based on discussions.

@tsurdilo
Copy link
Collaborator Author

i think still you have then a temporal relationship of
two events that must be between "Application Started" and "Deadline Passed"

@cathyhongzhang
Copy link
Collaborator

@tsurdilo You are talking about different things. What I am talking about is that there is no inter-event temporal relationship between THESE three events in the exmaple I gave, and the user can just use one expression instead of three flow diagrams to specify it.

@mswiderski
Copy link

@cathyhongzhang you're right that there is no inter-event relationship (or it does not have to be) but as soon you need to wait for more than one event how do you know if these events should belong to same workflow instance? This is what @duglin has already brought up earlier in the discussion.
In addition as soon as first event arrives then the workflow needs to consume that event and as such start new instance and wait for the other one and then somehow correlate these events where time round should be in most of the cases assigned as well. And this is the main issue - time assigned to give up waiting and correlation is a must to make this usable at all.

Having single expression to cover all is just too generic to address at first thus suggestion to start with OR only makes the most sense in my opinion.

@cathyhongzhang
Copy link
Collaborator

@mswiderski, If I understand you correctly, you are concerned about how the workflow engine knows whether these events should belong to same workflow instance or not. Good question!

To handle this, we need the correlationToken and the timeout defined in current spec. Let's use the college application use case as an example, i.e. some action should be triggered when eventA, eventB, and eventC are all received before some application deadline. Suppose the user defines appplicant's ID as the correlationToken and the deadline as Dec 31. The following is one possible way to do it: when the workflow engine receives any of the three events, it will check whether there is already a workflow instance that is associated with the correlationToken carried in the event msg. If not, a new workflow instance will be created to handle this event and this new workflow instance is associated with this new correlationToken, and a timer will be started. If yes, the event will be delivered to the existing workflow instance. If the timer times out before the workflow engine receives all three events, the workflow engine will transit to the end state or another state specified by the user. If the last event is received before timeout, the workflow engine should trigger the action. Duplicate events should be discarded (we can discuss how to indentify a duplicate event if needed). Different use cases will most probably have different correlationToken. Each user knows its use case the best and it is up to the user to choose a proper correlationToken for its use case. It is also up to the user to define the event source and type.

If we merge this PR which takes out the "and" expression of events (it also takes out the "or" expression), how will the user specify this: "eventA and eventB and eventC"? According to Tihomir's idea, the user can specify this expression using several flow diagrams, which is more complicated to the user than a simple "and" boolean expression. Furthermore, the main issue " time assigned to give up waiting and correlation", which you are concerned about, does not seem to go away and still needs to be handled by the work engine implementation. The duplicate event issue does not go away either.

@tsurdilo
Copy link
Collaborator Author

tsurdilo commented Feb 13, 2020

According to Tihomir's idea, the user can specify this expression using several flow diagrams
@cathyhongzhang idk where anyone said several workflow diagrams? I said control flow logic and showed and provided examples of "and" being two connected event states.

The diagram I drew with multiple start states, its a visualization for "or", its 1 diagram not 2 or several.

Please let's not confuse people by saying some stuff that was never mentioned.

So by taking this pr we really do not lose anything we didnt already have + we get rid of the expression string which I think by now everyone agrees to remove but yourself, and then we start working from there to improve and add the rest of stuff we need.

@tsurdilo
Copy link
Collaborator Author

@cathyhongzhang we will work on improving the event state to include all the possible cases, but we have to have a starting point and I think starting with reference to defined events is a good start so we can improve and implement all the other options you mention. This is not a final change by any means and please create issues as soon as this is done so we make sure its high priority.

@cathyhongzhang
Copy link
Collaborator

cathyhongzhang commented Feb 13, 2020

@tsurdilo Sorry that I misunderstand you. Do you mean that you will combine those diagrams you drew, each with its own start state, into one diagram? Then the combined diagram would involve multiple event states (multiple start states) and maybe switch states, right? That is also much more complicated to the user than a simple "and" boolean expression. I may misunderstand you again. Could you please illustrate how you combine them by showing what the 1 diagram looks like for the example use case (i.e. some action "func" should be triggered when eventA, eventB, and eventC are all received before some application deadline)?

@cathyhongzhang
Copy link
Collaborator

@duglin Yes, I am with you. There are multiple ways to express the "event or": as a list of "events", using "or" operator, using "|", etc. Let's discuss what is the best way to specify this in the workflow meeting. We need to choose a way that can be consistently extended in the future to cover other possible cases.

@duglin
Copy link
Collaborator

duglin commented Feb 13, 2020

the advantage of an array is that there's no parsing, no expression language and people won't naturally think.... hey, let's also support this other syntax option

@cathyhongzhang
Copy link
Collaborator

@tsurdilo I am not disagreeing with modifying the event expression to make it simple for workflow engine implementation. What I do not agree is the way your PR modifies it. I think we should allow people to discuss, clarify, and resolve all the questions and confusions so that everyone is on the same page. It is better to get input from more people in our next week's meeting about this design.

@cathyhongzhang
Copy link
Collaborator

@duglin Yes, I see your point:-)

@tsurdilo
Copy link
Collaborator Author

After all the comments in this PR I think I have a pretty good idea on how the event state needs to be changed in order to support "and" and "or" without using string expression.

Thank you all for the valuable comments, I will close this PR and create a new one tomorrow.

@tsurdilo tsurdilo closed this Feb 14, 2020
@tsurdilo tsurdilo mentioned this pull request Feb 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants