Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache manager refactor #40

Merged
merged 28 commits into from Feb 6, 2016

Conversation

Projects
None yet
2 participants
@c-w
Copy link
Owner

commented Feb 1, 2016

@cpeel: I did a bit of refactoring on your metadata cache manager code in order to gain an understanding of it. I'd like to get some feedback on whether I understood everything correctly before I merge.

The gist of this refactor:

  • Create sqlite/bsddb subclasses of MetadataCache to avoid store-type checking in methods
  • Create testing MetadataCache to remove testing-only code in the api.
  • Fall back to sqlite cache when the user doesn't have bsddb installed.

c-w added some commits Feb 1, 2016

Turn _iter_metadata_triples into a class-method
This will allow for easier instantiation of test cache managers.
Prefer from contextlib import X over contextlib.X
This lets us make lines shorter!
Make number of metadata stores explicit
Really there are only two supported stores:
- Sleepycat/BDB
- Sqlite
Take an explicit dependency on rdflib-sqlite
The sqlite-based MetadataCache should work on all systems (given that
sqlite is in the Python standard library). Thus we can use this cache as
a fall-back in case we don't manage to use Sleepycat (e.g. on Py3/Win).
Fall-back to SQLite cache when there is no BSD-DB
Also:
- remove explicit dependency on bsddb3 under py2

There are some issues installing BSD-DB on Windows (see #26, #31) so
this fix should make gutenberg work on Windows Python 2 again.
On Windows Python 3, we can't use the standard-library bsddb module and
we can't fall-back to bsddb3 so we use the slow SQLite cache instead.

def set_metadata_cache_manager(cache_manager):

This comment has been minimized.

Copy link
@cpeel

cpeel Feb 2, 2016

Contributor

The set_metadata_cache() function is not just for testing, but rather an integral part of my branch. Without a way to explicitly set a cache, you can't control where the cache is stored on the filesystem, or in the new code, control which of the two backend types you want to use. This level of control is vital when using this as part of a larger service.

For an example of what I'm trying to achieve with this module, see:
https://github.com/cpeel/openlibrary/blob/095df4b3be522eef218813a0106ee2ce7934f120/openlibrary/plugins/gutenberg/pgebook.py
and
https://github.com/cpeel/openlibrary/blob/73533a3e011074e2ffbfe416f241a6aa14fb5b6e/scripts/dev-instance/populate_gutenberg_cache.py

This comment has been minimized.

Copy link
@c-w

c-w Feb 5, 2016

Author Owner

Fixed in 8df3e24.

@cpeel

This comment has been minimized.

Copy link
Contributor

commented on requirements.pip in 9feaeae Feb 2, 2016

I'm a bit leery of having a version-less github repository as a core requirement. It was one thing for testing purposes, but quite another to say that whatever anyone commits to the rdflib-sqlalchemy github repo is good enough and won't ever break this module. This also means that this package now explicitly depends on SQLAlchemy as well, which might be overkill for some people (although it doesn't bother me any).

This comment has been minimized.

Copy link
Owner Author

replied Feb 3, 2016

Fixed in d887023.

@cpeel

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2016

Overall I like your refactor -- very clean. My one big concern is the removal of set_metadata_cache[_manager]() as that function enables one of the core requirements of what I was trying to achieve -- the ability to control where the cache ends up on the filesystem and what type of cache it is.

@@ -23,9 +23,5 @@ class CacheAlreadyExists(Error):
pass


class CacheNotRemovable(Error):

This comment has been minimized.

Copy link
@cpeel

cpeel Feb 2, 2016

Contributor

Built-in python errors (almost) all end in Exception, Error, or Warning (https://docs.python.org/2/library/exceptions.html). See also PEP8: https://www.python.org/dev/peps/pep-0008/#exception-names

(I should have raised this comment against the commit when the exceptions.py file went in, as they're all this way.)

This comment has been minimized.

Copy link
@c-w

c-w Feb 5, 2016

Author Owner

Fixed in 8921ee1.

c-w added some commits Feb 5, 2016

Bring back set_metadata_cache
This reverts commit 7c8f7c1 and adds some more documentation on how to
use the method.
@c-w

This comment has been minimized.

Copy link
Owner Author

commented Feb 5, 2016

  • Added exception suffixes in 8921ee1.
  • Froze dependency on sqlalchemy-rdflib in d887023.
  • Brought back set_metadata_cache in 8df3e24.
@cpeel

This comment has been minimized.

Copy link
Contributor

commented on README.rst in 8df3e24 Feb 6, 2016

s/backen/backend/

@cpeel

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2016

Ship it! :)

@c-w

This comment has been minimized.

Copy link
Owner Author

commented Feb 6, 2016

           (`-,-,
           ('(_,( )
            _   `_'
         __|_|__|_|_
       _|___________|__
      |o o o o o o o o/   
     ~'`~'`~'`~'`~'`~'`~

@c-w c-w closed this Feb 6, 2016

@c-w c-w reopened this Feb 6, 2016

c-w added a commit that referenced this pull request Feb 6, 2016

@c-w c-w merged commit c7343e9 into master Feb 6, 2016

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@c-w c-w deleted the cache-manager-refactor branch Feb 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.