Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Cache nonmatching filter results #166

Open
plundeen opened this issue Mar 25, 2021 · 4 comments
Open

Optimization: Cache nonmatching filter results #166

plundeen opened this issue Mar 25, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@plundeen
Copy link

If I am reading it correctly, it looks like you could further optimize the Topic pattern match evaluation in MQTT Server.lvlib:_Subscription.lvclass:Evaluate Match.vi. As it stands, if a given topic name matches the subscription pattern, it is cached so the pattern matching/wildcard evaluation does not be be performed on each subsequent sample. But if the name does not match, the result is not cached, so subsequent samples of that topic have to be reevaluated every time.

Perhaps there are some nuances that I am overlooking that preclude subsequent samples of a non-matching topic. Feel free to close if I have misinterpreted. But I think it would save some processing overhead if the evaluation was done for each unique topic name exactly once, regardless of whether it matched or not. So, the cache variant attribute table would end up having an entry for every topic, with the corresponding value the result of the evaluation.

@plundeen
Copy link
Author

Also, there's a lingering #TODO comment that I think has been addressed and can be removed here.

@francois-normandin
Copy link
Member

@plundeen I think this is worth further investigation as it might reduce significantly the burden on the server when there is a high frequency stream of messages being published. I see it having a large potential for filtering high-frequency messages, but even more so for QoS 0 messages...

Of course, what you suggest could be used in a DoS attack that could result in a huge memory footprint if attackers were to systematically bombard the server with ever-changing topics being published... but at this point, I think one would have bigger problems and should use TLS to not allow just anyone to connect in the first place.

I'll definitely consider this, and probably come up with a way for the subscription loops to share with the server (owner of subscriptions) a mean to stop the distribution upstream instead of having every subscription manage the same "not a match" list. Most subscriptions would be in the situation of having a tiny "matched" list and a huge (and nearly identical to other subscriptions) "not-a-match" list.

@francois-normandin francois-normandin added the enhancement New feature or request label Mar 27, 2021
@plundeen
Copy link
Author

Of course, what you suggest could be used in a DoS attack that could result in a huge memory footprint if attackers were to systematically bombard the server with ever-changing topics being published... but at this point, I think one would have bigger problems and should use TLS to not allow just anyone to connect in the first place.

I completely agree.

I'll definitely consider this, and probably come up with a way for the subscription loops to share with the server (owner of subscriptions) a mean to stop the distribution upstream instead of having every subscription manage the same "not a match" list. Most subscriptions would be in the situation of having a tiny "matched" list and a huge (and nearly identical to other subscriptions) "not-a-match" list.

The simplest implementation would probably be to have a single variant lookup table wrapped in a DVR that is shared by all loops. That does, however, introduce resource contention as they fight for the mutex, and could result in an increase in context switching. The actual time the mutex is held should be pretty brief, thanks to the variant attribute red-black tree performance, at least.

Thanks for taking the time to consider this, @francois-normandin .

@francois-normandin
Copy link
Member

Thanks. Most appreciated feedback.

If this were an application-specific project implementation, I would definitely consider the variant-wrapped DVR as "the" top-of-the-list contender. The subscription mechanism needs to stay independent in all possible use cases and this means that there is the possibility of multiple concurrent servers running in the same app, all in need of staying absolutely independent of one another (including their respective subscription processes). It does not mean the DVR approach is out of the question... after all, the DVR reference can be passed down in a by-value fashion from a server to all the subscriptions. But such an arrangement could result in tight coupling of the server class with the subscription class.

In the context of this project, I want to maintain as much decoupling as possible (at least in the base implementation) and leave it to specific overrides the care to introduce coupling when it serves the application. To that effect, I'm already considering extracting the subscription mechanism from the MQTT-specific context and make that a separate open source project. That way, the MQTT server could use appropriate hooks to inject a DVR-based variant attribute table as you suggest. All in all, what I'm saying is that I believe the specific implementation strategy should be a DVR-based attribute table, but that it should be implemented through dependency inversion to make it compliant with SOLID principles. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants