-
Notifications
You must be signed in to change notification settings - Fork 216
Sync multiple namespaces at the same time #23
Comments
Could we make it Source({name:"mongo", namespace:["database.cheese","database.bacon",...]}) ? They can then be passed with a constant namespace into the pipeline. |
for transformer being aware of the namespace, it might make sense to have the nodes pass message.Msg, and store the namespace on the message, rather then the nodes passing straight documents. |
What happens to dropped messages? Maybe pass a metadata object of reference information? |
This seems like it's not really possible until there's a higher level |
This would be an amazing feature to have. |
With this change we can now start to address this issue and I think a regex is the best option here. The initial change will be to add a My initial thought is to add a func (m *Pipe) Listen(string namespace, fn func(*message.Msg) (*message.Msg, error)) error Then we can apply a matching pattern inside the Thoughts? |
I like it all, apart from changing the signature of the Listen func, the adaptor should already have the namespace, see the mongo adaptor https://github.com/compose/transporter/blob/master/pkg/adaptor/mongodb.go#L453 the way we're doing this should work across all adaptors, but using mongo as a specific example and the cat func will need to iterate over the effected collections. |
If the multiple collections are defined only as a regexp then it could lead to gnarly namespace sources... eg I want all foo's collections and from bar/stow... [foo/.*|bar.stow] which looks less configgy and more codey. Would it be simpler to expose as a comma-sep list? (with the option to add say a /regexp/ format later) |
A comma separated list seems very inefficient and inflexible because if the purpose is to be able to sync multiple namespaces easily, any additions of a collection would not get picked up until you change the transporter config which IMO is not the desired result. |
How do you propose to change the regexp without changing the config or the pipeline defining js? As I read this, you specify the multiple collections on the Source node either in line or in config, incoming messages which match the specification get tagged with the canonical namespace they came from, this then can be altered by transformers where required, and passed to a destination adaptor for writing using the namespace to control where. Am I missing anything? |
@nstott I agree any adaptor acting as a source will need to know and act on the namespace it's configured for but it doesn't change the need for changing the
then during a message is received on the |
@codepope I'll do my best to describe the specific use case I believe we should support. A Source is setup as follows,
|
You've given a great example there... Regexp's are always surprising when not explicit and there you've got a regexp which would also match alphadbeta.collA and alphadbetonetwothree.collC. The regexp should of course be "alphadbet..*". My suggestion was that the namespace is a comma list with regexps denoted by /..../ so your example would simply be namespace:"/alphadbet..*/" while "alphadbet.collA,alphadbet.collB" is also valid as would "alphadbet.collA,/alphadbet.coll[B-Z]/,alphadbet.otherColl". That way, the principle of least surprise works (plain text matching by default) while the powerful option (regexp matching) is explicitly available. It also dodges the potentially breaking change* where every currently defined namespace becomes an ambiguously interpretable.
As i wrote this I was also reminded that backslashes would need backslashing too... |
For now, we're going to implement a single string with some restrictions in that the regex portion only applies to the 2nd half of the namespace. This limits adaptor to only having to work with a single "database". It's very likely we will need to expand on this in the future but to limit the scope of the initial implementation I'd like to just go with the single string. |
close via #101 |
It should be possible to sync more then one namespace with the pipeline.
I can think of a few ways this can work, but in general, I favour the idea of allowing regex / wildcard matches on a namespace. i.e. something like
Source({name: "mongo", namespace: "database.*"})
This will cause problems on the sink, as it is expecting a constant namespace. as well, transformers will need to be aware of the messages namespace.
The text was updated successfully, but these errors were encountered: