Conversation
hugovk
left a comment
There was a problem hiding this comment.
I've not tested the code out locally, so just a few minor comments.
README.rst
Outdated
| Apache Jena Fuseki | ||
| ------------------ | ||
|
|
||
| As an alternative to the BSD-DB backend, this package can also leverage `Apache Jena Fuseki <https://jena.apache.org/documentation/fuseki2/>`_ |
There was a problem hiding this comment.
Use plain language: use "use" rather than "leverage".
gutenberg/acquire/metadata.py
Outdated
| try: | ||
| self.graph.query('DELETE WHERE { ?s ?p ?o . }') | ||
| except ResultException: | ||
| # this is often just a false positive since jena fuseki does not |
There was a problem hiding this comment.
"jena fuseki" -> "Jena Fuseki"
requirements-dev.pip
Outdated
| @@ -1,2 +1,3 @@ | |||
| coverage | |||
| flake8 | |||
| nose | |||
There was a problem hiding this comment.
Good to add this here.
This in really another issue, but we should consider switching away from nose. From November 2015:
Nose has been in maintenance mode for the past several years and will likely
cease without a new person/team to take over maintainership. New projects
should consider usingNose2 <https://github.com/nose-devs/nose2>,py.test <http://pytest.org/>, or just plain unittest/unittest2.
https://nose.readthedocs.io/en/latest/#note-to-users
Besides, we agreed that Nose was going to be in maintenance mode, Nose2 was the way forward, and that was part of the reason I took over maintainership at all. Personally, I wasn't ever agreeing to help make Nose live forever--it was more of a fix critical bugs the best I could with the time that I had available. There's some serious deficiencies in the Nose code base that can only be fixed with a lot of TLC, and no one on the current team really has the energy to commit to it.
That is not a knock on anyone... Nose has been around a long time, has lived through several changes in unit testing mentality, and across a number of versions of Python. It's legacy and with that comes the cruft of organic growth. It's just way more than I can deal with alone.
There was a problem hiding this comment.
Should probably consider moving completely away from nose/nose2 and to something like pytest. From the nose2 doc:
However, given the current climate, with much more interest accruing around pytest, nose2 is prioritizing bugfixes and maintenance ahead of new feature development.
It also has a "alpha" classifier attached to it.
Though that should probably be done in a different PR.
4fd4f67 to
c2d4a01
Compare
|
@hugovk @MasterOdin @sethwoodworth Could someone take a look at the PR and let me know if you have any objections to the change? Thanks in advance! |
fuseki/Dockerfile
Outdated
|
|
||
| ADD shiro.ini /jena-fuseki/shiro.ini | ||
|
|
||
| CMD ["/jena-fuseki/fuseki-server", "--loc=/fuseki", "--update", "/ds"] |
There was a problem hiding this comment.
as I don't know anything about fuseki, what would happen if someone used the default shiro.ini file that comes with the base docker image?
gutenberg/acquire/metadata.py
Outdated
| return FusekiMetadataCache(cache_location, cache_url) | ||
| except InvalidCacheException: | ||
| logging.debug('Unable to create cache based on Apache Jena Fuseki. ' | ||
| 'Next trying BSD-DB implementation.') |
There was a problem hiding this comment.
Given that there's no obvious way to turn on debug level for logging, this seems kind of pointless to even have, and also, if a user is seriously trying to use fuseki and it's not working, said user would probably want a warning, though it might be most appropriate to actually throw the InvalidCacheException. They went to the trouble to set an environment variable after all.
There was a problem hiding this comment.
Good idea. If the environment variable was set, we not throw the exception if the cache can't be instantiated.
gutenberg/acquire/metadata.py
Outdated
|
|
||
|
|
||
| class FusekiMetadataCache(MetadataCache): | ||
| _CACHE_URL_PREFIX = 'http://' |
There was a problem hiding this comment.
What happens if their url is behind https:// (as recommended in their docs for production servers)?
There was a problem hiding this comment.
See above: I was misled by some old docs. Added https to the white-list.
|
@MasterOdin Addressed your comments. Are you okay for this to be merged now or do you have any further questions/concerns? |
This pull request implements a metadata cache implementation based on Apache Jena Fuseki to supplement the existing SleepyCat and SQLite implementations.
Fuseki can be run as a separate service to Gutenberg, e.g. via Docker, which makes setup of the library much easier: no more need to install bsddb3! This means that going forward we can move bsddb3 into an optional dependency. Additionally, Fuseki can be run on a separate machine from Gutenberg so it enables use-cases where multiple users may want to share a single metadata cache.