New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARTEMIS-1890] Any-word wildcard fix #2113

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
7 participants
@jostbg
Contributor

jostbg commented May 26, 2018

This patch fixes the reported bug, improves code readability and also improves the performance of the matches function by storing immutable values acquired via method calls in final local variables outside the loop and by avoiding the creation of short-lived temporary objects in the inner loop. Also comments are added for better understanding of the different check steps.

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 26, 2018

Contributor

@clebertsuconic It would be great if this PR or an equivalent fix could make it into 2.6.1 as this is a blocker for our particular use case.

Contributor

jostbg commented May 26, 2018

@clebertsuconic It would be great if this PR or an equivalent fix could make it into 2.6.1 as this is a blocker for our particular use case.

@clebertsuconic

This comment has been minimized.

Show comment
Hide comment
@clebertsuconic

clebertsuconic May 26, 2018

Contributor

If it’s pure bug fix it can be included.

Let me look after the holiday. And I will cherry pick accordingly.

Contributor

clebertsuconic commented May 26, 2018

If it’s pure bug fix it can be included.

Let me look after the holiday. And I will cherry pick accordingly.

@franz1981

This comment has been minimized.

Show comment
Hide comment
@franz1981

franz1981 May 26, 2018

Contributor

It seems to me like a bug fix + some nice code gardening 👍
@jostbg did you check if using String internally instead of SimpleString wouldn't make things better too?
I've spent quite a lot of time with @michaelandrepearce to improve SimpleString but String
benefit from some nice perf tricks done by the JVM and Java 9 allow them to be compact too (byte[] backed) making their footprint smaller than SimpleString too :)

Contributor

franz1981 commented May 26, 2018

It seems to me like a bug fix + some nice code gardening 👍
@jostbg did you check if using String internally instead of SimpleString wouldn't make things better too?
I've spent quite a lot of time with @michaelandrepearce to improve SimpleString but String
benefit from some nice perf tricks done by the JVM and Java 9 allow them to be compact too (byte[] backed) making their footprint smaller than SimpleString too :)

@michaelandrepearce

This comment has been minimized.

Show comment
Hide comment
@michaelandrepearce

michaelandrepearce May 26, 2018

Contributor

Im going to question the test case on this.

Address
a.b.d

Should NOT match
a.b.#.d

As the match expects a word between b and d with a seperator of "." either side which a.b.d does not

Also another note if this is just a bug fix lets leave out the other changes such as perf improvements these could bring other unexpetced outcomes. A bug fix should be minimum changes. The other improvments are welcome but should be for a 2.7.0 where it get some better bedding in time and also should be tested they do improve perf, with perf test.

Contributor

michaelandrepearce commented May 26, 2018

Im going to question the test case on this.

Address
a.b.d

Should NOT match
a.b.#.d

As the match expects a word between b and d with a seperator of "." either side which a.b.d does not

Also another note if this is just a bug fix lets leave out the other changes such as perf improvements these could bring other unexpetced outcomes. A bug fix should be minimum changes. The other improvments are welcome but should be for a 2.7.0 where it get some better bedding in time and also should be tested they do improve perf, with perf test.

@michaelandrepearce

This comment has been minimized.

Show comment
Hide comment
@michaelandrepearce

michaelandrepearce May 26, 2018

Contributor

Im actually challenging if this is a bug, and if it should be fixed. Per my comment above.

Contributor

michaelandrepearce commented May 26, 2018

Im actually challenging if this is a bug, and if it should be fixed. Per my comment above.

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 26, 2018

Contributor

@michaelandrepearce Currently there seems to be no possibility to match a.b.<zero-or-more-words>d
None of these variations a.b.#d, a.b#.d, a.b#d match a.b.d with the existing implementation. Which pattern from your view should be used to match a.b.c.d and a.b.d at the same time?
Also based on your opinion "a.b.c.#" should consequently not match a.b.c but it currently does.

Contributor

jostbg commented May 26, 2018

@michaelandrepearce Currently there seems to be no possibility to match a.b.<zero-or-more-words>d
None of these variations a.b.#d, a.b#.d, a.b#d match a.b.d with the existing implementation. Which pattern from your view should be used to match a.b.c.d and a.b.d at the same time?
Also based on your opinion "a.b.c.#" should consequently not match a.b.c but it currently does.

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 26, 2018

Contributor

@franz1981 getting rid of all SimpleString instances would involve changing the signature of the getAddressParts which felt like a too big change for now. I think getting rid of temporary new SimpleString instances in the inner loop of the matches function already is a significant improvement to help reducing gc pressure.

Contributor

jostbg commented May 26, 2018

@franz1981 getting rid of all SimpleString instances would involve changing the signature of the getAddressParts which felt like a too big change for now. I think getting rid of temporary new SimpleString instances in the inner loop of the matches function already is a significant improvement to help reducing gc pressure.

@michaelandrepearce

This comment has been minimized.

Show comment
Hide comment
@michaelandrepearce

michaelandrepearce May 26, 2018

Contributor

@jostbg i would have to make two matchers. The point more is this isnt a recent feature its worked like this for sometime, and people will be reliant on this.

As such i think it warrents more a discussion and not rushed into a bug fix release.

My personal pref is we should really switch to regex support, for this stuff. Its more powerful and is very well understood.

Contributor

michaelandrepearce commented May 26, 2018

@jostbg i would have to make two matchers. The point more is this isnt a recent feature its worked like this for sometime, and people will be reliant on this.

As such i think it warrents more a discussion and not rushed into a bug fix release.

My personal pref is we should really switch to regex support, for this stuff. Its more powerful and is very well understood.

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 26, 2018

Contributor

@michaelandrepearce The documentation states that a # matches zero or more words. It doesn't say anything about different behavior of # depending on its location in the expression. The current implementation however does not behave according the documentation, so either the documentation is wrong or the implementation has a bug. I argue that it is a bug and I doubt any user actually expects that # in the middle of an expression behaves like a one-or-more-words wildcard. Esp. after reading the docs. This is a very surprising behavior.

While I agree that having regular expressions would be nice. It should be an optional matching style as this can impact performance.

Contributor

jostbg commented May 26, 2018

@michaelandrepearce The documentation states that a # matches zero or more words. It doesn't say anything about different behavior of # depending on its location in the expression. The current implementation however does not behave according the documentation, so either the documentation is wrong or the implementation has a bug. I argue that it is a bug and I doubt any user actually expects that # in the middle of an expression behaves like a one-or-more-words wildcard. Esp. after reading the docs. This is a very surprising behavior.

While I agree that having regular expressions would be nice. It should be an optional matching style as this can impact performance.

@michaelandrepearce

This comment has been minimized.

Show comment
Hide comment
@michaelandrepearce

michaelandrepearce May 27, 2018

Contributor

@jostbg for the same reason one could argue the docs are simply wrong and dont represent actual behaviour and simply need correcting. Its for this reason i think it needs a discuss thread and not something in bug fix release as it will be a change of behaviour.

Contributor

michaelandrepearce commented May 27, 2018

@jostbg for the same reason one could argue the docs are simply wrong and dont represent actual behaviour and simply need correcting. Its for this reason i think it needs a discuss thread and not something in bug fix release as it will be a change of behaviour.

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 27, 2018

Contributor

@michaelandrepearce I understand your concern that's why I mentioned the docs may be wrong. Looking at https://access.redhat.com/documentation/en-us/red_hat_enterprise_mrg/3/html/messaging_installation_and_configuration_guide/routing_key_wildcard_examples it states that "a.e" would be routed to "a.#.e" which is what I am after.

Contributor

jostbg commented May 27, 2018

@michaelandrepearce I understand your concern that's why I mentioned the docs may be wrong. Looking at https://access.redhat.com/documentation/en-us/red_hat_enterprise_mrg/3/html/messaging_installation_and_configuration_guide/routing_key_wildcard_examples it states that "a.e" would be routed to "a.#.e" which is what I am after.

@jostbg jostbg closed this May 27, 2018

@jostbg jostbg reopened this May 27, 2018

@michaelandrepearce

This comment has been minimized.

Show comment
Hide comment
@michaelandrepearce

michaelandrepearce May 27, 2018

Contributor

So RH docs shouldnt influence the upstream/apache.

As i said im not against this change i just beleive its not suitable for bug fix release, as it will be a change in current behaviour, so think it needs a small discuss thread and should be in next feature release, with release notes highlighting this change.

as a community we need to keep bug fixes releases small and to the point to get them out (so things recently changed in last release that cause upgrade issues, data loss, or cvc) , else what will occur is same as last time and the bug fix release just turns into a feature release, which has more exposure to new introduced bugs

Contributor

michaelandrepearce commented May 27, 2018

So RH docs shouldnt influence the upstream/apache.

As i said im not against this change i just beleive its not suitable for bug fix release, as it will be a change in current behaviour, so think it needs a small discuss thread and should be in next feature release, with release notes highlighting this change.

as a community we need to keep bug fixes releases small and to the point to get them out (so things recently changed in last release that cause upgrade issues, data loss, or cvc) , else what will occur is same as last time and the bug fix release just turns into a feature release, which has more exposure to new introduced bugs

@andytaylor

This comment has been minimized.

Show comment
Hide comment
@andytaylor

andytaylor May 27, 2018

Contributor

+1 for not changing the current behaviour

Contributor

andytaylor commented May 27, 2018

+1 for not changing the current behaviour

@franz1981

This comment has been minimized.

Show comment
Hide comment
@franz1981

franz1981 May 27, 2018

Contributor

+1 for me as well

Contributor

franz1981 commented May 27, 2018

+1 for me as well

@clebertsuconic

This comment has been minimized.

Show comment
Hide comment
@clebertsuconic

clebertsuconic May 27, 2018

Contributor

I’m not sure if this should be included in 2.7.0 or not yet. But definitely not 2.6.1.

Contributor

clebertsuconic commented May 27, 2018

I’m not sure if this should be included in 2.7.0 or not yet. But definitely not 2.6.1.

@gemmellr

This comment has been minimized.

Show comment
Hide comment
@gemmellr

gemmellr May 29, 2018

Member

So RH docs shouldnt influence the upstream/apache.

On this specifically, the docs pointed to are for an implementation of the AMQP 0-x style topic binding syntax the Artemis docs indicate its wildcard syntax is similar to, it seems highly appropriate information to reference and consider. There is no upstream-downstream relationship between the two in this case.

More generally, if the expected Artemis behaviour was understood to be the same as when using the AMQP 0-x syntax it is documented as 'similar to' then the bug report seems valid. I expect testing a comparable scenario with either of the Apache Qpid brokers (of which one is actually upstream from the linked MRG broker behaviour) implementing AMQP 0-x would give the behaviour being sought. If it isnt understood to give the same behaviour, then it could simply be decided that the Artemis docs are loose/inaccurate to its implementation and theyd just need updating to better explain how # wildcarding actually operates.

I agree that fixing this might give a subtle change in behaviour not expected by some, and so even considering it as a bug would probably keep it for a 2.7.0 release personally as a result. Certainly theres no need to rush it into 2.6.1, other version numbers are available and can be used.

(I haven't looked at the code, just commenting on the behaviour)

Member

gemmellr commented May 29, 2018

So RH docs shouldnt influence the upstream/apache.

On this specifically, the docs pointed to are for an implementation of the AMQP 0-x style topic binding syntax the Artemis docs indicate its wildcard syntax is similar to, it seems highly appropriate information to reference and consider. There is no upstream-downstream relationship between the two in this case.

More generally, if the expected Artemis behaviour was understood to be the same as when using the AMQP 0-x syntax it is documented as 'similar to' then the bug report seems valid. I expect testing a comparable scenario with either of the Apache Qpid brokers (of which one is actually upstream from the linked MRG broker behaviour) implementing AMQP 0-x would give the behaviour being sought. If it isnt understood to give the same behaviour, then it could simply be decided that the Artemis docs are loose/inaccurate to its implementation and theyd just need updating to better explain how # wildcarding actually operates.

I agree that fixing this might give a subtle change in behaviour not expected by some, and so even considering it as a bug would probably keep it for a 2.7.0 release personally as a result. Certainly theres no need to rush it into 2.6.1, other version numbers are available and can be used.

(I haven't looked at the code, just commenting on the behaviour)

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 30, 2018

Contributor

Interestingly the regex-based wildcard matcher implemented in org.apache.activemq.artemis.core.settings.impl.Match.java and used by HierarchicalObjectRepository behaves like the change I suggest with my PR and actually matches a.#.c and a.c as expected according the documentation.

import org.apache.activemq.artemis.core.config.WildcardConfiguration;
import org.apache.activemq.artemis.core.settings.impl.Match;
import org.junit.*;
import java.util.regex.Pattern;

public class MatchTest {

   @Test
   public void testAnyWordWildcard() {
      Pattern p = new Match<String>("a.#.c", null, new WildcardConfiguration()).getPattern();
      Assert.assertTrue(p.matcher("a.c").matches());
      Assert.assertTrue(p.matcher("a.b.d.c").matches());
      Assert.assertFalse(p.matcher("a.c.d").matches());
   }
}

I guess this means specifying a filter expression in AddressSettings currently can match different addresses than specifying the same expression in a subscription.

Contributor

jostbg commented May 30, 2018

Interestingly the regex-based wildcard matcher implemented in org.apache.activemq.artemis.core.settings.impl.Match.java and used by HierarchicalObjectRepository behaves like the change I suggest with my PR and actually matches a.#.c and a.c as expected according the documentation.

import org.apache.activemq.artemis.core.config.WildcardConfiguration;
import org.apache.activemq.artemis.core.settings.impl.Match;
import org.junit.*;
import java.util.regex.Pattern;

public class MatchTest {

   @Test
   public void testAnyWordWildcard() {
      Pattern p = new Match<String>("a.#.c", null, new WildcardConfiguration()).getPattern();
      Assert.assertTrue(p.matcher("a.c").matches());
      Assert.assertTrue(p.matcher("a.b.d.c").matches());
      Assert.assertFalse(p.matcher("a.c.d").matches());
   }
}

I guess this means specifying a filter expression in AddressSettings currently can match different addresses than specifying the same expression in a subscription.

@clebertsuconic

This comment has been minimized.

Show comment
Hide comment
@clebertsuconic

clebertsuconic May 30, 2018

Contributor

@jostbg that's a good point actually

Contributor

clebertsuconic commented May 30, 2018

@jostbg that's a good point actually

@clebertsuconic

This comment has been minimized.

Show comment
Hide comment
@clebertsuconic

clebertsuconic May 30, 2018

Contributor

@jostbg can you add some pointers on your use case.. why this change was needed?

Contributor

clebertsuconic commented May 30, 2018

@jostbg can you add some pointers on your use case.. why this change was needed?

@clebertsuconic

This comment has been minimized.

Show comment
Hide comment
@clebertsuconic

clebertsuconic May 31, 2018

Contributor

@mtaylor @jbertram why did we implement it by hand instead of using the wildcard there?

@jostbg an updated idea on the usecase would help as well.

thanks

Contributor

clebertsuconic commented May 31, 2018

@mtaylor @jbertram why did we implement it by hand instead of using the wildcard there?

@jostbg an updated idea on the usecase would help as well.

thanks

@jostbg

This comment has been minimized.

Show comment
Hide comment
@jostbg

jostbg May 31, 2018

Contributor

@clebertsuconic We are using Artemis as a fast event/message bus for AMQP communication between (micro-)services developed and operated by different organizational units.
Each service has it's own LDAP managed login credentials. We allow all services with LDAP Ids below certain organizational units (base DNs) to access the broker.
Thus we do not manually grant access to the broker on a per-service/per-application basis.

The functional requirement we are faced with is that a service shall be able to send multicast messages while maintaining control over which other services are able to read those messages.
Since there is no finite set of services that are accessing the broker and services shall be able to specify adressees on a per-message base we cannot work with pre-configured ACL-secured addresses/queues.

In ActiveMQ 5.x we could maybe setup something involving a MessageAuthorizationPolicy.isAllowedToConsume but this feature is not yet available in Artemis.

So we came up with a system that involves topic based wildcard routing, auto-created addresses/queues and a custom Artemis plugin that performs ACLs checks on address/queue access.

It works as follows:
A service is allowed to send multicast messages to other services by publishing to addresses following the naming convention topics.<someTopicName>.for.<serviceId1>.<serviceId2>... where serviceId1, serviceId2 etc are the LDAP authenticated usernames of the services that are allowed to read the message.
Each service that can access the broker has the right to subscribe to addresses following the naming convention topics.<someTopicName>.for.#.<myAuthenticatedUsername>.# to receive messages addressed to it where myAuthenticatedUsername is it's own LDAP id.

This means, if a serviceA wants to send information about a newly created contract to serviceB and serviceC it publishes a message to the address topics.contracts.for.serviceB.serviceC. serviceB subscribes to topics.contracts.for.#.serviceB.# to receive all contracts related messages addressed to it. If serviceB then wants to send invoice information to serviceA and serviceB it would publishes a message to topics.invoices.for.serviceA.serviceC.

The only missing piece for us atm is that # must always be interpreted as zero-or-more words as stated in the documentation otherwise services will not be able to receive messages in all cases.

For performance/resources reasons and simplicity of scaling/operations/maintenance we do not want to introduce another more sophisticated external routing component like camel as the setup I described fulfills our requirements.

Contributor

jostbg commented May 31, 2018

@clebertsuconic We are using Artemis as a fast event/message bus for AMQP communication between (micro-)services developed and operated by different organizational units.
Each service has it's own LDAP managed login credentials. We allow all services with LDAP Ids below certain organizational units (base DNs) to access the broker.
Thus we do not manually grant access to the broker on a per-service/per-application basis.

The functional requirement we are faced with is that a service shall be able to send multicast messages while maintaining control over which other services are able to read those messages.
Since there is no finite set of services that are accessing the broker and services shall be able to specify adressees on a per-message base we cannot work with pre-configured ACL-secured addresses/queues.

In ActiveMQ 5.x we could maybe setup something involving a MessageAuthorizationPolicy.isAllowedToConsume but this feature is not yet available in Artemis.

So we came up with a system that involves topic based wildcard routing, auto-created addresses/queues and a custom Artemis plugin that performs ACLs checks on address/queue access.

It works as follows:
A service is allowed to send multicast messages to other services by publishing to addresses following the naming convention topics.<someTopicName>.for.<serviceId1>.<serviceId2>... where serviceId1, serviceId2 etc are the LDAP authenticated usernames of the services that are allowed to read the message.
Each service that can access the broker has the right to subscribe to addresses following the naming convention topics.<someTopicName>.for.#.<myAuthenticatedUsername>.# to receive messages addressed to it where myAuthenticatedUsername is it's own LDAP id.

This means, if a serviceA wants to send information about a newly created contract to serviceB and serviceC it publishes a message to the address topics.contracts.for.serviceB.serviceC. serviceB subscribes to topics.contracts.for.#.serviceB.# to receive all contracts related messages addressed to it. If serviceB then wants to send invoice information to serviceA and serviceB it would publishes a message to topics.invoices.for.serviceA.serviceC.

The only missing piece for us atm is that # must always be interpreted as zero-or-more words as stated in the documentation otherwise services will not be able to receive messages in all cases.

For performance/resources reasons and simplicity of scaling/operations/maintenance we do not want to introduce another more sophisticated external routing component like camel as the setup I described fulfills our requirements.

@clebertsuconic

This comment has been minimized.

Show comment
Hide comment
@clebertsuconic

clebertsuconic May 31, 2018

Contributor

I take this as a nice improvement.. but it's not a 100% bug fix.. so I will merge this.. but keep it out of 2.6.1.

Contributor

clebertsuconic commented May 31, 2018

I take this as a nice improvement.. but it's not a 100% bug fix.. so I will merge this.. but keep it out of 2.6.1.

@asfgit asfgit closed this in 9318083 May 31, 2018

@mtaylor

This comment has been minimized.

Show comment
Hide comment
@mtaylor

mtaylor Jun 1, 2018

Contributor

I see it's already merged. This is an enhancement not a bug. But, +1 on the change and it landing in 2.7.0 there's a clear use case for it.

I think the important thing here is review the documentation, as both interpretations from @michaelandrepearce and @jostbg are reasonable.

Contributor

mtaylor commented Jun 1, 2018

I see it's already merged. This is an enhancement not a bug. But, +1 on the change and it landing in 2.7.0 there's a clear use case for it.

I think the important thing here is review the documentation, as both interpretations from @michaelandrepearce and @jostbg are reasonable.

@jostbg jostbg deleted the jostbg:ARTEMIS-1890 branch Jun 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment