-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Fast' callbacks can be executed concurrently #957
Comments
Hello @mediawolf , The intent is to not block the thread used for processing the service response(s) while the notification is processed by the application. (e.g. it updates the UI) |
Thanks @AlinMoldovean. So basically a receiver has to re-order messages again if chronological order is important. BTW Having the documentation XMLs included into OPCFoundation.NetStandard.Opc.Ua NuGet package would be great. |
Could we have an option to control invocation behavior (synchronous vs asynchronous) for these callbacks? Asynchronous behavior will be opt-in whilst synchronous one gives more control to receiver. For instance, concurrent queue + asynchronous processing on receiver side will preserve chronological order of incoming notifications and won't block processing of the service responses. |
I would like to second this request. While I believe I (mostly) understand the reasoning of why it was designed as it currently is, it makes it very likely that you'll end up with sporadic out-of-order updates: you will, unless you re-implement the re-ordering. To back this up: our processing in the callback is minimal - the only thing we do is dispatch it on the right thread. But still, we witness out-order-order processing, so I'm pretty confident that experiencing out-of-order processing is just a question of time. It doesn't seem to me that this was the original intention, after all, why would I conclude that either the callback should be processed synchronously (probably with a warning sign telling that you shouldn't block the thread for too long, best to dispatch...) or the reordering logic should be removed. Since ordering is not exactly trivial (also consider timeouts, shutdown, connection loss...) I'd very much prefer reordering to stay in the library. @mregen
(*) The consumer can simply wrap the delegate:
Also note: I know it's good practice to not call external code (callbacks, event handlers) while under lock - because it might cause deadlocks which are hard to understand for the consumer. However, there's a tradeoff here: how can we not do so while still guaranteeing ordering? |
Hi @BrunoJuchli, sorry for not getting back earlier, busy days! We only briefly discussed this issue last week in our dev sync, and I'm not deeply familiar with that piece of code. From what I heard such an effort had already been tried, but it is very difficult because over the network it cannot be guaranteed that the updates are received in order. The SaveMessageInCache main intention is to handle republish requests properly, the fastcallback is just a quick notifier to provide early access for a consumer of monitored items, the fast callback is out of order by design. Backward compatibility is important here, so one option I would suggest is to build a layer on top of the callback which stores and forwards monitored items in order. Or another idea, would it be possible to put the ordered items in a 'concurrent queue' which is used to notify, even by Task.Run. Then a receiver always gets to see whole list of unprocessed items in order? Do you think that's an option? |
@mregen I will have to familiarize myself a bit closer with the low-level "details" of OpcUa to propose a well thought through plan/solution. I'm now reading the online reference about subscriptions and see where it'll take me. |
I'm trying to distill the relevant parts of the spec here: 5.13.1.1 Subscription Model - Description
|
@mregen Regarding implementing the re-ordering on top of The sequence number is not known to the handler of the The existing
What would be possible is to add a second Also, since subscriptions can be moved from one session to another, that's a problem which I expect would have to be re-solved, as well. IMHO a more sensible alternative, instead of solving the problem "on top" of |
@BrunoJuchli if (SynchronousFastCallback)
OnMessageReceived(...)
else
Task.Run(() => OnMessageReceived(...)); Anyway, I believe it's something that should be done, especially when it can be, on the library level and not pushed to application level where developers have less experience with intricacies of OPC UA. |
I've reached here after experiencing a few bad scenarios in production running a system with a large amount of nodes & fast sampling time (~1000 nodes, sampling rates between 50ms to 1s) We've experienced out of order notifications & unknown slowdowns.
Both issues are related to the fact that the subscription code runs Task.Run without discriminations. Are there any thoughts of solving this? (or does the SDK provide another way for subscription?) |
In our system we have, due to limitations of one of the involved components (a PLC), multiple variables for specifying a single state. So for us "ordering only per monitored item" is a no-go. On a more general note: concurrency is difficult. Generally one should only use concurrency when the benefit is worth the increased difficulty. Also: if the library would not perform concurrent dispatching of subscription changes your problem wouldn't exist in the first place... |
Having a library which runs in seq. over IO is problematic in my eyes, esp. on high loads. If you need synchronization between items, nothing will help you besides running local logic (on a sync hook)/ disable concurrency at all. A simple solution here would be to just push all items into an ActionBlock and let us (the clients) set it's concurrency level, that will solve you issue as well But I do have to say, concurrency (LIMITED!) is blessed, but ordered notifications items is a requirement in most such infrastructures unless you refer to a pub/sub library (which is not the case here). In anyway, |
I fully agree that an application which handles IO sequentially is a performance nightmare given high loads. There's nothing stopping you from introducing concurrency yourself. If the current implementation was changed to invoke the handler synchronously and you would instead add the Task.Run in the handler yourself, the resulting performance and behavior would be exactly the same. So I really don't understand why it's problematic when a library performs IO synchronously in such a scenario. |
I think we're looking at it as different types of clients. We're using it as a fully fledged protocol to control an industrial printing machine, including:
So, my needs are a bit different indeed. In any case, as we both said, the simplest solution here is either sync or a simple action block (they both take ~3 lines of code change) - If we will require extra parallelism on top of it, we'll handle it in our wrappers. So back to step 1, is anyone taking it or do you prefer we will open a PR? |
We experience the same thing - out of order notifications & unknown slowdowns. |
Are there any suggestions on how to make something like what @mediawolf mentioned in their reply configurable without possibly breaking the Subscription DataContract? The easiest thing to do is to add some property to Subscription that controls the behavior, but that would require adding another DataMember. That's a non-breaking change if strict schema validity is not enforced. Will this be acceptable? |
#1493) Pertaining to the discussion started in Issue #957, this Pull Request seeks to add capabilities for Subscription to sequentially invoke its MonitoredItem and FastDataChange callbacks to maintain order properly. The key goal is to avoid out-of-order monitored items. To this effect, the Sequential Publishing capability is added to Subscription Resolves #957 ### SequentialPublishing A toggle which forces Subscription to raise callbacks only in a sequential order by the SequenceNumber of the incoming message. It limits the number of tasks that can process publish responses to 1, and enforces a strict +1 Sequence Number requirement before releasing a message. #### Backwards compatibility The new property has been added as DataMember at the end to maintain backwards compatibility with the DataContract, where such may be used. The relevant copy constructor has also been updated. The feature is disabled by default, so the existing behavior is the default. Users must explicitly set the property on the Subscription object to enable these new features. #### Leveraging existing "time machine" in SaveMessageInCache There is already a mechanism for re-ordering messages that entered in the wrong order. This change makes use of that existing "time machine" which sorts messages into chronological order (by the sequence number) to only pull off messages with a proper +1 sequence number each time. #### KeepAlive messages advancing sequence number KeepAlive messages do not contain notifications, and do not enter the proper branch in the code to be pulled out. They will not interrupt sequential flow. Since the next message in the sequence with data will "re-use" that sequence number, it can be expected to maintain sequence. #### Delayed messages When sequential publishing is enabled, if a message is genuinely missing from the sequence, it will "hold up" the messages until it either arrives or is pulled out of the incoming messages list by being too old. In either case, it is considered "processed" for the sequence purposes at that point and the rest of the messages may proceed. The automatic republish request mechanism is also leveraged here for that purpose as well. #### SequenceNumber roll-over after max value As specified in the OPC UA spec, at 1ms publishing, this would take almost 50 days of uptime. Naturally at any "reasonable" publishing interval this time is even longer. The matter itself is also not addressed throughout the whole existing code that handled it before this change. (Though some consideration can be seen in some places, such as placing messages at the end of the incoming message linked list if no better place was found) #### Locking on m_cache to get the semaphore I've considered making a separate locking object for the semaphore management, but decided it is not needed. The message worker would need to obtain m_cache lock later anyway, and momentarily obtaining it is not harmful for the overall flow. Only in cases of serious contention between new PublishResponses coming in and workers attempting to work would this cause any problems. #### Callbacks that "take a long time" This change will naturally suffer from callbacks that take a long time to invoke. In our use case, each callback message is passed into an ActionBlock for handling, so our callbacks are fast. But Other clients may invoke much more code on these callbacks. Sequential publishing means this capacity is possibly bottlenecked. Sequential publishing should be used for its intended purpose and with callbacks that do not hold up their calling thread for a long time. To ensure proper sequential callbacks, they cannot just be started in order, they have to be processed fully in order. #### Changing this property while the Subscription is active There is limited support to on-the-fly changes to this property, and this is not the intended use case. It is meant to be set once and ideally never changed while messages are being processed. If users need certain items properly sequenced and some items they can accept out-of-order or ignore outdated messages, they can do this by defining two separate Subscription objects, same as they would if they needed different Publishing Intervals. #### Why semaphore? At first I had considered using ActionBlock from TPL DataFlow and limiting its concurrency to the number of desired message workers, but I had opted not to add a dependency on DataFlow only for this purpose. (Referencing all of DataFlow just for ActionBlock's automatic concurrency limiting is wasteful. I had also considered using a Limited Concurrency Task Scheduler implementation and queueing the message workers on that instead of the normal ThreadPool via Task.Run, but that is similarly more complex than it needs to be. #### Why async Task? What was wrong with Void? While being on a new Task from the ThreadPool anyway, the worker would still occupy a task from the ThreadPool if it was using the synchronous Wait() method, which is a blocking call. By using WaitAsync, the ThreadPool task goes "back" to the ThreadPool until the semaphore is released. ### Disclaimer - I am not an OPC UA expert Please do not assume I have "done all my homework" with regard to these changes. I have attempted to learn as much as I can in the time I was working on this change, but I fully expect to have missed a few spots and specifics of the OPC UA protocol. I have done some testing in our application to verify the out-of-order problem is solved, and that no significant delay is introduced by the additional work done to facilitate this.
TL;DR Note that after the fix in #1493 the sporadic out-of-order is still the default behavior. One can opt-in to the fix on a per subscription basis by setting @AvenDonn Thanks for the fix! @mregen |
@BrunoJuchli, the flag is mentioned prominently in the post, but I've added clarification. @mregen, any news from the unit test? |
@AvenDonn |
Hi @BrunoJuchli and @AvenDonn, thanks for driving this .. tests are stable now and once we know there are no regression setting it as default can be considered. Please provide feedback if you hit any issues with the new release! |
Hey @mregen, good to know. Did you change anything with the tests? |
The test is doing positive/negative test with a delayed fast data change callback. Without sequential flag the sequence gets out of order after a few seconds, otherwise not. But its limited to 10 seconds to reduce the overall test runtimes. |
(checked on commit 2219e02)
I use Subscription.FastDataChangeCallback and Subscription.FastEventCallback callbacks for monitoring items and events. Subcription.PublishingInterval = 1000. If callback handler cannot cope with the pace (e.g. stepping in a debugger or handler logic cannot be executed fast enough) then other invocations can happen concurrently leading to unpredictable processing order.
According to the implementation, Subscription.SaveMessageInCache orders NotificationMessage's using their sequence number under the cache lock. Then it schedules message processing (OnMessageReceived) by using Task.Run which means several scheduled invocations of OnMessageReceived can be executed concurrently.
I can do another re-ordering in the callback using available message sequence number but firstly I would like to understand if it was designed to work so.
The text was updated successfully, but these errors were encountered: