Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor redis cache #430

Merged
merged 10 commits into from
Sep 26, 2019
Merged

Refactor redis cache #430

merged 10 commits into from
Sep 26, 2019

Conversation

hardbyte
Copy link
Collaborator

@hardbyte hardbyte commented Sep 21, 2019

This PR refactors the entityservice.cache to make it easier to understand and extend. As with many refactorings it looks more scary than it is. I've tried to clean up import code to be more explicit about where functions from the cache module.

One of the few (minor) changes is at the start of a run the current number of comparisons is set to zero, instead of waiting for the first chunk of similarities to be completed:

progress_cache.save_current_progress(comparisons=0, run_id=run_id)

In preparation for some future use of the cache the PR also extracts a new function assert_valid_run, but doesn't change the code.

sentinel = Sentinel([(redis_host, 26379)], password=redis_pass, socket_timeout=5)
if read_only:
logger.debug("Looking up read only redis slave using sentinel protocol")
r = sentinel.slave_for('mymaster')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since everything else is configurable, why not the sentinel master name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I'll create another PR as it isn't directly related to this refactoring. #436

Get a connection to the master redis service.

"""
logger.debug("Connecting to redis")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this debug message could also show the redis server address and port.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

def set_deserialized_filter(dp_id, python_filters):
if len(python_filters) <= config.MAX_CACHE_SIZE:
logger.debug("Pickling filters and storing in redis")
key = 'clk-pkl-{}'.format(dp_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you rely on the implicit assumption that dp_id will be unique throughout the servers life. This might be true now, but might change in the future. Better adopt a more defensive strategy here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair comment, this could instead be stored using the clk's upload token.

With our default config this entire encodings.py cache isn't actually used so I'd prefer a good offense as the best defense and remove the whole thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants