Skip to content
This repository has been archived by the owner on Oct 17, 2023. It is now read-only.

Sync multiple namespaces at the same time #23

Closed
nstott opened this issue Dec 22, 2014 · 15 comments
Closed

Sync multiple namespaces at the same time #23

nstott opened this issue Dec 22, 2014 · 15 comments
Assignees
Milestone

Comments

@nstott
Copy link
Contributor

nstott commented Dec 22, 2014

It should be possible to sync more then one namespace with the pipeline.
I can think of a few ways this can work, but in general, I favour the idea of allowing regex / wildcard matches on a namespace. i.e. something like Source({name: "mongo", namespace: "database.*"})
This will cause problems on the sink, as it is expecting a constant namespace. as well, transformers will need to be aware of the messages namespace.

@codepope
Copy link
Contributor

Could we make it Source({name:"mongo", namespace:["database.cheese","database.bacon",...]}) ? They can then be passed with a constant namespace into the pipeline.

@nstott
Copy link
Contributor Author

nstott commented Dec 22, 2014

for transformer being aware of the namespace, it might make sense to have the nodes pass message.Msg, and store the namespace on the message, rather then the nodes passing straight documents.

@codepope
Copy link
Contributor

What happens to dropped messages?
Do we want people playing with the command type?
Feels like lots of power but could turn into a mis-held chainsaw unless we surround with sanity checks.

Maybe pass a metadata object of reference information?

@mrkurt
Copy link
Contributor

mrkurt commented Dec 22, 2014

This seems like it's not really possible until there's a higher level msg to operate on, and that concept of mapping data in the save method.

@shividhar
Copy link

This would be an amazing feature to have.

@jipperinbham
Copy link
Contributor

With this change we can now start to address this issue and I think a regex is the best option here.

The initial change will be to add a Namespace field to message.Msg and update func NewMsg(op OpType, data interface{}) *Msg to also accept namespace string and this will be sent into transformer functions as well with the field name ns.

My initial thought is to add a string namespace parameter to the Listen function in Pipe such that it would look like so:

func (m *Pipe) Listen(string namespace, fn func(*message.Msg) (*message.Msg, error)) error

Then we can apply a matching pattern inside the Listen function before calling the fn passed in.

Thoughts?

@nstott
Copy link
Contributor Author

nstott commented Jul 13, 2015

I like it all, apart from changing the signature of the Listen func, the adaptor should already have the namespace, see the mongo adaptor https://github.com/compose/transporter/blob/master/pkg/adaptor/mongodb.go#L453

the way we're doing this should work across all adaptors, but using mongo as a specific example
the tail func will need to understand the wild card, we query on it here
https://github.com/compose/transporter/blob/master/pkg/adaptor/mongodb.go#L341

and the cat func will need to iterate over the effected collections.

@codepope
Copy link
Contributor

If the multiple collections are defined only as a regexp then it could lead to gnarly namespace sources... eg I want all foo's collections and from bar/stow... [foo/.*|bar.stow] which looks less configgy and more codey. Would it be simpler to expose as a comma-sep list? (with the option to add say a /regexp/ format later)

@jipperinbham
Copy link
Contributor

A comma separated list seems very inefficient and inflexible because if the purpose is to be able to sync multiple namespaces easily, any additions of a collection would not get picked up until you change the transporter config which IMO is not the desired result.

@codepope
Copy link
Contributor

How do you propose to change the regexp without changing the config or the pipeline defining js? As I read this, you specify the multiple collections on the Source node either in line or in config, incoming messages which match the specification get tagged with the canonical namespace they came from, this then can be altered by transformers where required, and passed to a destination adaptor for writing using the namespace to control where. Am I missing anything?

@jipperinbham
Copy link
Contributor

@nstott I agree any adaptor acting as a source will need to know and act on the namespace it's configured for but it doesn't change the need for changing the Listen func. Here's a scenario, Source({name: "mongo", namespace: "database.*"}) so it will first cat every collection and then process all tail ops for any collection in database. If you have 2 save calls,

pipeline.save({name:"localmongo", namespace: "database.bas"})
pipeline.save({name:"localmongo", namespace: "database.baz"})

then during a message is received on the In channel in Pipe we could perform a check before calling the fn provided to Listen, effectively adding the logic between https://github.com/compose/transporter/blob/master/pkg/pipe/pipe.go#L86 and https://github.com/compose/transporter/blob/master/pkg/pipe/pipe.go#L88. Doing so would allow "Sink" adaptors to not contain any logic pertaining to whether it should process the message or now.

@jipperinbham
Copy link
Contributor

@codepope I'll do my best to describe the specific use case I believe we should support.

A Source is setup as follows, Source({name: "mongo", namespace: "alphadbet.*", tail: true}) and a save as .save({name:"localmongo", namespace: "alphadbet.*"}). When transporter starts up, the source has the following collections:

  • collA
  • collB
    The user adds another collection, collC to alphadbet. This new collection should be automatically picked up by the Source while performing the tail operations and the collC will be synced without any changes or stopping of transporter.

@codepope
Copy link
Contributor

You've given a great example there... Regexp's are always surprising when not explicit and there you've got a regexp which would also match alphadbeta.collA and alphadbetonetwothree.collC. The regexp should of course be "alphadbet..*".

My suggestion was that the namespace is a comma list with regexps denoted by /..../ so your example would simply be namespace:"/alphadbet..*/" while "alphadbet.collA,alphadbet.collB" is also valid as would "alphadbet.collA,/alphadbet.coll[B-Z]/,alphadbet.otherColl".

That way, the principle of least surprise works (plain text matching by default) while the powerful option (regexp matching) is explicitly available. It also dodges the potentially breaking change* where every currently defined namespace becomes an ambiguously interpretable.

  • dependent on adaptor implementation

As i wrote this I was also reminded that backslashes would need backslashing too...

@jipperinbham jipperinbham added this to the v0.1.0 milestone Jul 14, 2015
@jipperinbham jipperinbham self-assigned this Jul 14, 2015
@jipperinbham
Copy link
Contributor

For now, we're going to implement a single string with some restrictions in that the regex portion only applies to the 2nd half of the namespace. This limits adaptor to only having to work with a single "database".

It's very likely we will need to expand on this in the future but to limit the scope of the initial implementation I'd like to just go with the single string.

@jipperinbham
Copy link
Contributor

close via #101

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants