Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expanded triplestore interface and pluggable backends #38

Closed
pbnjay opened this issue Jun 29, 2014 · 15 comments
Closed

expanded triplestore interface and pluggable backends #38

pbnjay opened this issue Jun 29, 2014 · 15 comments

Comments

@pbnjay
Copy link
Contributor

pbnjay commented Jun 29, 2014

I'd suggest a slightly expanded TripleStore interface which would let you move a lot of the backend-specific code out of db/init.go db/load.go and db/open.go. It would make the integration points a little more obvious for new backends.

Here's an Initial list of methods based on the above files:

Open(dbpath string, opts graph.OptionsDict) // aka NewTripleStore
CreateSchema() // aka CreateNewMongoGraph etc
BeginBulkLoad() (chan *graph.Triple)

I'd also suggest a central registry / factory of backends to make the above files cleaner and allow new modules to use them transparently.

@pbnjay
Copy link
Contributor Author

pbnjay commented Jul 17, 2014

I have a local branch that I think covers this fairly well. View diff here: https://github.com/pbnjay/cayley/compare/triplestore_registry

It implements a TripleStore registry using a TripleStoreGetter (which returns a TripleStore instance by name), and a TripleStoreInit func (which creates a TripleStore schema if necessary). TripleStoreInit can't be an interface method since some backends may require init before returning an instance.

I also added an optional BulkLoad interface (skipped if not implemented). It returns a boolean success, so if it fails then it can revert to the standard load mechanism.

I'm waiting on lawyers for the CLA... will submit a PR once they give the OK.

@barakmich
Copy link
Member

Looks good to me -- I'll give it a run as well, but on first review, seems good. Add the appropriate A+C when the lawyers come back?

@kortschak
Copy link
Contributor

A few comments on this before you send, so please wait for me to be at a real keyboard.

@kortschak
Copy link
Contributor

I can't comment on that branch, so I'll do so here.

The func BulkLoad() looks like it should return an error rather than a bool.

Where you call graph.RegisterTripleStore(), e.g. here, you can leave out the func type conversions since the func parameters are the unamed types and so assignable to the named type.

I know that variadic arguments can be used to mimic optional arguments, but it's really not a good thing to do just to save typing since it drops a whole class of programmer errors. RegisterTripleStore() here should take a TripleStoreInit and call it if not nil. To explain, what is the appropriate behaviour for someone calling RegisterTripleStore("foo", fnG, fnIA, fnIB)? I would like that to be caught at compile time. Can we find another name for TripleStoreGetter and s/getter/get/g s/initer/init/g here?

I have others, but I'll wait until the PR comes. Generally LGTM.

@pbnjay
Copy link
Contributor Author

pbnjay commented Jul 18, 2014

The func
https://github.com/pbnjay/cayley/compare/triplestore_registry#diff-601d5567ed46234cbec334b24b11cefbR27
BulkLoad() looks like it should return an error rather than a bool. So
the interface definition

If we do that we'd need a named error for "can't bulk load now", which
would probably mean moving the definition into the graph package to make it
more discoverable. Easy to do.

Where you call graph.RegisterTripleStore(), e.g. here
https://github.com/pbnjay/cayley/compare/triplestore_registry#diff-fe20db982f8dbc7b057cd2ca4d6cf7edR344,
you can leave out the func type conversions since the func parameters are
the unamed types and so assignable to the named type.

Sure thing. I thought I had an error trying this but it may not have
matched the final code I committed. Will try again.

I know that variadic arguments can be used to mimic optional arguments,
but it's really not a good thing to do just to save typing since it drops a
whole class of programmer errors. RegisterTripleStore() here
https://github.com/pbnjay/cayley/compare/triplestore_registry#diff-601d5567ed46234cbec334b24b11cefbR27
should take a TripleStoreInit and call it if not nil. To explain, what is
the appropriate behaviour for someone calling RegisterTripleStore("foo",
fnG, fnIA, fnIB)? I would like that to be caught at compile time.

I would assume anyone calling this function understands what's going on and
won't call it that way. In which case they shouldn't be surprised. I think
one could make a similar argument about what "nil" means in your version of
the argument list? But your suggestion is probably better future-proof if
other methods are added here. No prob.

Can we find another name for TripleStoreGetter and s/getter/get/g s/initer
/init/g here?

The names were by design, to describe that the "Getter" can be called
multiple times, even with the same arguments, to retrieve multiple distinct
instances. Whereas "Init" should only be called once with the same
arguments. Does that make sense or do you still suggest the rename?

@kortschak
Copy link
Contributor

If we do that we'd need a named error for "can't bulk load now", which
would probably mean moving the definition into the graph package to
make it more discoverable. Easy to do.

It really only needs to be non-nil. This can be extended later to cover
temporary errors in the same fashion as is done in net:

http://golang.org/pkg/net/#Error

Can we find another name for TripleStoreGetter and s/getter/get/g
s/initer
/init/g here?

The names were by design, to describe that the "Getter" can be called
multiple times, even with the same arguments, to retrieve multiple
distinct
instances. Whereas "Init" should only be called once with the same
arguments. Does that make sense or do you still suggest the rename?

Yes, I understand the reason behind the distinction though I feel that
the two could be merged into an InitTripleStore func with an additional
boolean parameter, but that is an issue for later. I just think the
names are too long and *Getter is not a name I have ever seen anywhere
else except as an interface (in flag). What troubles me is that I can't
think of another word that covers all the things the function does -
this suggests to me that it does too much.

@pbnjay
Copy link
Contributor Author

pbnjay commented Jul 18, 2014

It really only needs to be non-nil. This can be extended later to cover
temporary errors in the same fashion as is done in net:

http://golang.org/pkg/net/#Error

We need the temporary error to cover the existing MongoDB Init+Load
semantics. So it's the same difference, either an Error interface specific
to the BulkLoad (or all of cayley's use cases) or a named error which just
means "continue, not a fatal error". I prefer named errors similar to many
of the std packages since they are easier to describe/think about (see
database/sql or io packages for plenty of examples).

Yes, I understand the reason behind the distinction though I feel that

the two could be merged into an InitTripleStore func with an additional
boolean parameter, but that is an issue for later. I just think the
names are too long and *Getter is not a name I have ever seen anywhere
else except as an interface (in flag). What troubles me is that I can't
think of another word that covers all the things the function does -
this suggests to me that it does too much.

If we went similar to the net/http HandlerFunc we'd have NewFunc and
InitFunc but I feel like TripleStore should be in there since there are
other interfaces in graph. Or maybe this is showing us that triplestore
should be its own package like iterator?

@kortschak
Copy link
Contributor

We need the temporary error to cover the existing MongoDB Init+Load
semantics. So it's the same difference, either an Error interface
specific to the BulkLoad (or all of cayley's use cases) or a named
error which just means "continue, not a fatal error". I prefer named
errors similar to many of the std packages since they are easier to
describe/think about (see database/sql or io packages for plenty
of examples).

I [think] an Error interface that covers all use cases is preferable from an
extensibility perspective. There are error propagation issues that I
want to address later that would benefit from this.

If we went similar to the net/http HandlerFunc we'd have NewFunc
and InitFunc but I feel like TripleStore should be in there since
there are other interfaces in graph. Or maybe this is showing us that
triplestore should be its own package like iterator?

Can we use NewStoreFunc and InitStoreFunc for now with the possibility
that all these things will move into a triplestore package?

@pbnjay
Copy link
Contributor Author

pbnjay commented Jul 18, 2014

I an Error interface that covers all use cases is preferable from an
extensibility perspective. There are error propagation issues that I
want to address later that would benefit from this.

I don't know about conflating all the use cases into a single interface. A
configuration error is wholly different from an io error or an invalid
query or an empty result. Treating them all as the same thing with
different facets feels wrong.

These are trivial nits, we can just mix the two concepts. Named errors can
always have the underlying value changed to a future interface type. That
would allow for both brevity and extensibility.

If we went similar to the net/http HandlerFunc we'd have NewFunc
and InitFunc but I feel like TripleStore should be in there since
there are other interfaces in graph. Or maybe this is showing us that
triplestore should be its own package like iterator?

Can we use NewStoreFunc and InitStoreFunc for now with the possibility
that all these things will move into a triplestore package?

Sounds good

@kortschak
Copy link
Contributor

On Thu, 2014-07-17 at 19:57 -0700, Jeremy Jay wrote:

I don't know about conflating all the use cases into a single
interface. A configuration error is wholly different from an io error
or an invalid query or an empty result. Treating them all as the same
thing with different facets feels wrong.

These are trivial nits, we can just mix the two concepts. Named errors
can always have the underlying value changed to a future interface
type. That would allow for both brevity and extensibility.

Yes exactly, the functions all return type error, so what is under the
hood can vary.

My comment above was too strong - I will rephrase that I would prefer
some errors be able to contain contextual information.

As far as using named errors though it it potentially dangerous to
publish these if they subsequently become a more complex type, since
error equality tests may be broken if context details are retained
within the error value. But we can deal with these issues later.

Just to note here what I am thinking with the error propagation, the
github.com/juju/errgo and github/juju/errors packages have error
handling that would suit the kinds of issues that are likely to arise in
cayley. Both are LGPLv3, so, @barakmich is that acceptable? It would
mean that cayley is effectively LGPLv3 as far as I can see.

@barakmich
Copy link
Member

The quick answer is that LGPL isn't going to work, so we'll have to make do -- see http://www.apache.org/legal/resolved.html

@kortschak
Copy link
Contributor

That's a pity - though note below. Note that that is a preclusion on incorporation of LGPL into products that would want to retain an Apache2.0 license.

The reason I am interested in this package is that it handles error propagation across goroutine boundaries (normal error propagation is only back up a stack). After rummaging, I have found Rog's original package which is under a simplified BSD license, here: http://godoc.org/launchpad.net/errgo/errors and http://godoc.org/launchpad.net/errgo/v2/errors. This should be OK as far a licensing goes.

@barakmich
Copy link
Member

Yeah, having dealt with the license bit for a while it is a bit of a pity. BSD is great though, so have at it!

pbnjay added a commit to pbnjay/cayley that referenced this issue Jul 18, 2014
pbnjay added a commit to pbnjay/cayley that referenced this issue Jul 18, 2014
@barakmich
Copy link
Member

Fixed in the merge.

@kortschak
Copy link
Contributor

Update on the license. Roger Peppe has added a LICENCE file and move the error handling package to github under gopkg.in. I've been waiting for this since previously the license was only specified on the launchpad package front page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants