Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Fuseki metadata store #106

Merged
merged 3 commits into from Jul 18, 2018

Conversation

Projects
None yet
3 participants
@c-w
Copy link
Owner

commented Jun 26, 2018

This pull request implements a metadata cache implementation based on Apache Jena Fuseki to supplement the existing SleepyCat and SQLite implementations.

Fuseki can be run as a separate service to Gutenberg, e.g. via Docker, which makes setup of the library much easier: no more need to install bsddb3! This means that going forward we can move bsddb3 into an optional dependency. Additionally, Fuseki can be run on a separate machine from Gutenberg so it enables use-cases where multiple users may want to share a single metadata cache.

@c-w c-w requested review from sethwoodworth, MasterOdin and hugovk Jun 26, 2018

@hugovk
Copy link
Collaborator

left a comment

I've not tested the code out locally, so just a few minor comments.

README.rst Outdated
Apache Jena Fuseki
------------------

As an alternative to the BSD-DB backend, this package can also leverage `Apache Jena Fuseki <https://jena.apache.org/documentation/fuseki2/>`_

This comment has been minimized.

Copy link
@hugovk

hugovk Jun 26, 2018

Collaborator

Use plain language: use "use" rather than "leverage".

This comment has been minimized.

Copy link
@c-w

c-w Jun 26, 2018

Author Owner

Done as amend.

try:
self.graph.query('DELETE WHERE { ?s ?p ?o . }')
except ResultException:
# this is often just a false positive since jena fuseki does not

This comment has been minimized.

Copy link
@hugovk

hugovk Jun 26, 2018

Collaborator

"jena fuseki" -> "Jena Fuseki"

This comment has been minimized.

Copy link
@c-w

c-w Jun 26, 2018

Author Owner

Done as amend.

@@ -1,2 +1,3 @@
coverage
flake8
nose

This comment has been minimized.

Copy link
@hugovk

hugovk Jun 26, 2018

Collaborator

Good to add this here.

This in really another issue, but we should consider switching away from nose. From November 2015:

Nose has been in maintenance mode for the past several years and will likely
cease without a new person/team to take over maintainership. New projects
should consider using Nose2 <https://github.com/nose-devs/nose2>, py.test <http://pytest.org/>, or just plain unittest/unittest2.

https://nose.readthedocs.io/en/latest/#note-to-users

Besides, we agreed that Nose was going to be in maintenance mode, Nose2 was the way forward, and that was part of the reason I took over maintainership at all. Personally, I wasn't ever agreeing to help make Nose live forever--it was more of a fix critical bugs the best I could with the time that I had available. There's some serious deficiencies in the Nose code base that can only be fixed with a lot of TLC, and no one on the current team really has the energy to commit to it.

That is not a knock on anyone... Nose has been around a long time, has lived through several changes in unit testing mentality, and across a number of versions of Python. It's legacy and with that comes the cruft of organic growth. It's just way more than I can deal with alone.

nose-devs/nose@0f40fa9#commitcomment-14224696

This comment has been minimized.

Copy link
@c-w

c-w Jun 26, 2018

Author Owner

Done in 015107b.

This comment has been minimized.

Copy link
@MasterOdin

MasterOdin Jun 26, 2018

Collaborator

Should probably consider moving completely away from nose/nose2 and to something like pytest. From the nose2 doc:

However, given the current climate, with much more interest accruing around pytest, nose2 is prioritizing bugfixes and maintenance ahead of new feature development.

It also has a "alpha" classifier attached to it.

Though that should probably be done in a different PR.

@c-w c-w force-pushed the fuseki-store branch 4 times, most recently from 4fd4f67 to c2d4a01 Jun 26, 2018

@c-w c-w force-pushed the fuseki-store branch from c2d4a01 to 9f80480 Jun 26, 2018

@c-w

This comment has been minimized.

Copy link
Owner Author

commented Jun 26, 2018

@hugovk @MasterOdin @sethwoodworth Could someone take a look at the PR and let me know if you have any objections to the change? Thanks in advance!


ADD shiro.ini /jena-fuseki/shiro.ini

CMD ["/jena-fuseki/fuseki-server", "--loc=/fuseki", "--update", "/ds"]

This comment has been minimized.

Copy link
@MasterOdin

MasterOdin Jun 29, 2018

Collaborator

as I don't know anything about fuseki, what would happen if someone used the default shiro.ini file that comes with the base docker image?

This comment has been minimized.

Copy link
@c-w

c-w Jul 13, 2018

Author Owner

I was under the impression that the SPARQLUpdateStore didn't support authentication, but I must have stumbled across some pretty old docs. I've enabled pass-through for auth in aa7fcff.

return FusekiMetadataCache(cache_location, cache_url)
except InvalidCacheException:
logging.debug('Unable to create cache based on Apache Jena Fuseki. '
'Next trying BSD-DB implementation.')

This comment has been minimized.

Copy link
@MasterOdin

MasterOdin Jun 29, 2018

Collaborator

Given that there's no obvious way to turn on debug level for logging, this seems kind of pointless to even have, and also, if a user is seriously trying to use fuseki and it's not working, said user would probably want a warning, though it might be most appropriate to actually throw the InvalidCacheException. They went to the trouble to set an environment variable after all.

This comment has been minimized.

Copy link
@c-w

c-w Jul 13, 2018

Author Owner

Good idea. If the environment variable was set, we not throw the exception if the cache can't be instantiated.

@@ -201,6 +204,72 @@ def _check_can_be_instantiated(cls):
del db


class FusekiMetadataCache(MetadataCache):
_CACHE_URL_PREFIX = 'http://'

This comment has been minimized.

Copy link
@MasterOdin

MasterOdin Jun 29, 2018

Collaborator

What happens if their url is behind https:// (as recommended in their docs for production servers)?

This comment has been minimized.

Copy link
@c-w

c-w Jul 13, 2018

Author Owner

See above: I was misled by some old docs. Added https to the white-list.

@c-w c-w force-pushed the fuseki-store branch from 38b14b0 to aa7fcff Jul 13, 2018

@c-w

This comment has been minimized.

Copy link
Owner Author

commented Jul 13, 2018

@MasterOdin Addressed your comments. Are you okay for this to be merged now or do you have any further questions/concerns?

@c-w c-w merged commit ed7a03a into master Jul 18, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@c-w c-w deleted the fuseki-store branch Jul 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.